Dr. Gustavo Barra, Master in Molecular Pharmacology and Coordinator of the Genomics Department at Sabin Medicina Diagnóstica, discussed an introduction to Next-Generation Sequencing (NGS); the NGS technologies and platforms available on the market; clinical applications of NGS in molecular diagnostics, and other relevant aspects of the topic.
Questions & Answers
Below is the question that was not answered during the Online Meeting.
Enrichment, in the context of next-generation sequencing (NGS), refers to techniques used to select specific regions of the genome prior to sequencing. This is done to increase the coverage and depth of reading of these regions of interest, reducing the costs and computational complexity associated with whole genome sequencing.
Typically, the term enrichment is used for the Hybridization Capture strategy, which employs biotinylated oligonucleotide probes that are complementary to the target regions of interest in the genome. These probes hybridize with the sequences of interest, which are then isolated using streptavidin-coated beads. This technique allows for the capture of flanking regions but may also isolate unwanted regions, reducing coverage in the areas of interest.
The enrichment process generates a metric that can be evaluated during sequencing: the proportion of sequences corresponding to the target regions relative to the total number of sequences generated. This value typically ranges from 70% to 85%. For example, an enrichment of 80% indicates that 80% of the generated sequences correspond to the regions intended for capture, while the remaining 20% are from off-target regions.
Since the process is not perfect, there will always be a percentage of sequences that are off-target. The greater the enrichment, the more optimized the sequencing becomes, as it maximizes the coverage of the regions of interest and minimizes the generation of irrelevant data. This results in a more efficient and cost-effective analysis, allowing for more accurate detection of genetic variants in the studied regions.
References (Question 1):
– Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels: A Joint Consensus Recommendation of the Association for Molecular Pathology and College of American Pathologists. Jennings LJ, Arcila ME, Corless C, et al. The Journal of Molecular Diagnostics. 2017;19(3):341-365. doi:10.1016/j.jmoldx.2017.01.011.
– Comparison of Three Targeted Enrichment Strategies on the SOLiD Sequencing Platform. Hedges DJ, Guettouche T, Yang S, et al. PLoS One. 2011;6(4). doi:10.1371/journal.pone.0018595.
Not necessarily. Metagenomics involves the massive, untargeted sequencing of the genetic material present in a sample, allowing for the identification and analysis of all microorganisms or genetic components present, without the need for cultivation or prior knowledge. Although it is often associated with environmental samples, metagenomics is also widely applied in clinical samples, such as plasma and cerebrospinal fluid (CSF), for the detection of unknown viruses or other pathogens.
Examples of metagenomics in clinical samples:
– Metagenomics of plasma or cerebrospinal fluid:Used to identify unknown viruses, bacteria, or other pathogens in patients with infections of indeterminate origin.
– Human microbiome analyses: Studies investigating the composition of microbial communities in the gut, skin, mouth, and other parts of the body, relating them to diseases or health conditions.
Metagenomics can be considered premise-free sequencing, where the analysis is not directed toward a specific organism or gene. Instead, the goal is to obtain a comprehensive view of the genetic material present in the sample, allowing the data to reveal which organisms or sequences are present.
On the other hand, when sequencing a specific gene using NGS, it is typically conducting targeted sequencing or amplicon sequencing, focusing on a particular region of the genome. This method requires prior knowledge about the gene or organism of interest and is not considered metagenomics, as it does not provide information about other organisms or sequences present in the sample.
Therefore, to be considered a metagenomic analysis, the sequencing must be comprehensive and untargeted, allowing for the identification of multiple organisms or genetic components without prior assumptions. Sequencing only a specific gene does not characterize a metagenomic approach.
References (Question 2):
– Advances in Metagenomics and Its Application in Environmental Microorganisms. Zhang L, Chen F, Zeng Z, et al. Frontiers in Microbiology. 2021;12:766364. doi:10.3389/fmicb.2021.766364.
– From Genomics to Metagenomics in the Era of Recent Sequencing Technologies. Benz S, Mitra S. Methods in Molecular Biology. 2023;2649:1-20. doi:10.1007/978-1-0716-3072-3_1.
– STROBE-metagenomics: A STROBE Extension Statement to Guide the Reporting of Metagenomics Studies. Bharucha T, Oeser C, Balloux F, et al. The Lancet Infectious Diseases. 2020;20(10). doi:10.1016/S1473-3099(20)30199-7.
– Recent Advances in Metagenomic Approaches, Applications, and Challenges. Lema NK, Gemeda MT, Woldesemayat AA. Current Microbiology. 2023;80(11):347. doi:10.1007/s00284-023-03451-5.
– Metagenomic Data Assembly – The Way of Decoding Unknown Microorganisms. Lapidus AL, Korobeynikov AI. Frontiers in Microbiology. 2021;12:613791. doi:10.3389/fmicb.2021.613791.
– What Is Metagenomics Teaching Us, and What Is Missed? New FN, Brito IL. Annual Review of Microbiology. 2020;74:117-135. doi:10.1146/annurev-micro-012520-072314.
Regions of low coverage in next-generation sequencing (NGS) results can be caused by various factors, as evidenced in the medical literature:
Sequence composition, especially GC content (Guanine/Cytosine):
GC-rich regions are notoriously difficult to sequence and often result in low coverage, lower base quality, and mapping difficulties, in addition to high strand bias, compromising the sensitivity in detecting variants.
Repetitive elements and segmental duplications are also associated with regions of low coverage.
Library preparation methodology:
The way the sequencing library is prepared can significantly influence coverage.
Certain library preparation kits are associated with GC content-dependent coverage bias, especially in bacterial species with low GC content.
DNA fragmentation methods, such as sonication, can introduce non-random biases that affect coverage uniformity.
Mapping limitations of short reads:
Difficulty in mapping short reads to specific regions of the genome can result in non-uniform coverage.
This can be a significant issue in exome sequencing (WES) and whole-genome sequencing (WGS) platforms, where the complexity of the genome can hinder the precise alignment of reads.
Quality and quantity of the initial nucleic acid:
Low-quality nucleic acids or insufficient quantity can lead to inadequate coverage.
Efficiency of capture and amplification methods:
Amplification bias is a known limitation in PCR-based amplicon approaches, which can result in inferior performance of certain amplicons.
Thus, low coverage in NGS can be attributed to a combination of factors related to sequence composition, library preparation methods, read mapping, nucleic acid quality, and the efficiency of capture and amplification methods.
References (Question 3):
– Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists. Roy S, Coldren C, Karunamurthy A, et al. The Journal of Molecular Diagnostics. 2018;20(1):4-27. doi:10.1016/j.jmoldx.2017.11.003.
– Coverage Analysis in a Targeted Amplicon-Based Next-Generation Sequencing Panel for Myeloid Neoplasms. Yan B, Hu Y, Ng C, et al. Journal of Clinical Pathology. 2016;69(9):801-804. doi:10.1136/jclinpath-2015-203580.
– The Efficiency of Tagmentation Depends on G and C Bases in the Binding Motif Leading to Uneven Coverage in Bacterial Species With Low and Neutral GC-content. Segerman B, Ástvaldsson Á, Mustafa L, Skarin J, Skarin H. Frontiers in Microbiology. 2022;13:944770. doi:10.3389/fmicb.2022.944770.
– Novel Metrics to Measure Coverage in Whole Exome Sequencing Datasets Reveal Local and Global Non-Uniformity. Wang Q, Shashikant CS, Jensen M, Altman NS, Girirajan S. Scientific Reports. 2017;7(1):885. doi:10.1038/s41598-017-01005-x.
– Non-Random DNA Fragmentation in Next-Generation Sequencing. Poptsova MS, Il’icheva IA, Nechipurenko DY, et al. Scientific Reports. 2014;4:4532. doi:10.1038/srep04532.
– Systematic Dissection of Biases in Whole-Exome and Whole-Genome Sequencing Reveals Major Determinants of Coding Sequence Coverage. Barbitoff YA, Polev DE, Glotov AS, et al. Scientific Reports. 2020;10(1):2057. doi:10.1038/s41598-020-59026-y.
As promised, here are the main references addressing validation and verification protocols in NGS:
– Validation and Benchmarking of Targeted Panel Sequencing for Cancer Genomic Profiling. Wang D, Wang S, Zhang Y, et al. American Journal of Clinical Pathology. 2023;160(5):507-523. doi:10.1093/ajcp/aqad078.
– Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists. Roy S, Coldren C, Karunamurthy A, et al. The Journal of Molecular Diagnostics. 2018;20(1):4-27. doi:10.1016/j.jmoldx.2017.11.003.
– Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays. SoRelle JA, Wachsmann M, Cantarel BL. Archives of Pathology & Laboratory Medicine. 2020;144(9):1118-1130. doi:10.5858/arpa.2019-0476-RA.
– Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing. Park MH, Rhee H, Park JH, et al. PLoS One. 2014;9(1). doi:10.1371/journal.pone.0086664.
– Human Papillomavirus Detection by Whole-Genome Next-Generation Sequencing: Importance of Validation and Quality Assurance Procedures. Mühr LSA, Guerendiain D, Cuschieri K, Sundström K. Viruses. 2021;13(7):1323. doi:10.3390/v13071323.
– Minimal Requirements for ISO15189 Validation and Accreditation of Three Next Generation Sequencing Procedures for SARS-CoV-2 Surveillance in Clinical Setting. Maschietto C, Otto G, Rouzé P, et al. Scientific Reports. 2023;13(1):6934. doi:10.1038/s41598-023-3408.