Next-Generation Sequencing Technologies and Their Impact on Genome Analysis

The identification of the double helix structure made up of four DNA bases {A, T, C, G}, which was discovered by Watson JD and colleagues in 1953, has revolutionized our understanding of genetic sequences and the composition of DNA in living organisms. This breakthrough has enabled scientists to decode genomic sequences and decipher the genetic code underlying all forms of life on our planet. Additionally, DNA sequencing has provided invaluable insights into the diagnosis, treatment, and prevention of genetic disorders.

Until recently, the most commonly used sequencing technologies in biology were Sanger and Maxam-Gilbert sequencing. However, in 2005, Roche’s 454 technologies emerged, heralding a new era of sequencing technologies that offered high throughput and lower costs compared to the earlier methods.

These newer technologies commonly referred to as Next Generation Sequencing (NGS) or High Throughput Sequencing, have opened up new avenues for exploring and analyzing genomes. Next Generation Sequencing (NGS) technologies enable the analysis of multiple samples in a massively parallel manner, with high throughput and at a significantly reduced cost.

Next-Generation Sequencing (NGS) technologies

The initial set of Next Generation Sequencing (NGS) technologies, including pyrosequencing (such as 454), sequencing by ligation (such as SOLiD), and sequencing by synthesis (such as Illumina and Helicos), enabled the analysis of multiple samples in a massively parallel manner with high throughput. Since the advent of NGS, the cost of genome sequencing has decreased significantly. For example, the cost of sequencing the first human genome was approximately $500 million using capillary sequencing, but by 2010, the cost had dropped to around $10 million using the same method. The introduction of NGS has further lowered the cost to approximately $30,000 per genome.

A Brief Overview of Genome Analysis

The initial step in analyzing NGS data involves mapping and aligning the short reads generated during sequencing to a reference sequence. However, this task presents several computational challenges, such as managing the large volume of reads, addressing non-unique mapping, and accounting for variations in base quality. To overcome these challenges, algorithms have been developed to produce efficient programs that can accurately map NGS reads to the reference genome.

When using short-read technologies for de novo genome assembly, the resulting assemblies tend to be highly fragmented. This is because assembly quality decreases as read lengths decrease. NGS technologies have proven to be highly effective for genome resequencing, which allows for the characterization of genome variation and diversity. This approach has the potential to be a powerful tool for informing clinical practice.

Complete genome sequencing involves fragmenting the entire genomic DNA into smaller pieces, sequencing each fragment individually, and then assembling the sequenced fragments to generate the whole genome sequence.The ability of NGS technologies to effectively identify disease-causing genes has been validated.

Impact of NGS Technologies on Genome Analysis

Technological advancements have often played a key role in the discovery of new disease genes. Early studies relied on families where the disease was present to identify the genetic causes of the phenotype. These studies, known as linkage analyses, were particularly successful for highly penetrant, monogenic diseases like cystic fibrosis. Standard parametric linkage studies were also effective for some complex traits, particularly when analyzing families with extreme phenotypic distributions. For instance, analyzing families with early-onset Alzheimer’s disease led to the discovery of multiple genes that significantly contribute to the disease phenotype and provided insight into the underlying biological mechanisms, such as plaque formation during disease progression.

The human genome consists of around 3 billion nucleotides, of which only about 1% encode protein-coding genes. Mutations that occur in these genes can lead to loss-of-function or gain-of-function of proteins, which can disrupt homeostasis and cause cancer. Driver mutations are mutations that give cells a growth or survival advantage, while passenger mutations are randomly dispersed throughout the genome and do not have an immediate effect on the phenotype. In 80% of cancer cases, the disease is multifactorial and not Mendelian, with somatic mutations found in associated genes. In the remaining 20%, germline mutations are identified. NGS has been crucial in identifying numerous cancer-associated gene candidates. However, bioinformatic tools can only prioritize novel mutations and genes for functional testing and should be considered as a predictor rather than a validator. Experimental validation is necessary to confirm the role of mutations as drivers of tumorigenesis.

NGS has enabled the development of personalized medicine or precision medicine (PM), which is based on an individual’s genomic profile. This approach allows for a more accurate and effective treatment strategy tailored to the individual.

NGS technologies had a significant impact on rare genetic diseases, as demonstrated by the increasing number of entries in the Online Mendelian Inheritance in Man (OMIM) database. Since 2007, the number of inherited phenotypes with known molecular basis has almost doubled, and the number of genes associated with rare diseases has grown substantially. However, for many disorders, there is still much to learn about the underlying molecular and pathological mechanisms. Further research will be necessary to fully understand the relationship between genotype and phenotype.

NGS technologies have not only enabled the study of inherited variation but also the investigation of mutational processes that occur in humans from one generation to the next, at the resolution of individual base pairs. Family-based whole-genome sequencing studies have shown that each individual’s genome contains approximately 74 germline de novo mutations (DNMs).

The method of using gene-specific assays of PAP test samples in Next-Generation Sequencing (NGS) to detect ovarian or endometrial cancer can be referred to as a type of NGS-based detection for these types of cancer.

Advancements in Next-Generation Sequencing (NGS) have made it possible to conduct genome-wide single-cell analysis, even though the yield per cell, which currently stands at 30% to 70% of all RNA or DNA present, could still be enhanced further. Single-cell analysis has revealed that individual cells exhibit significant genomic and transcriptomic heterogeneity in both normal development and disease, indicating that they are highly distinct from one another.

Future Directions in NGS

The growing accessibility of NGS and the eagerness to develop new targeted therapies have given rise to the multi-arm, biomarker-driven basket, and umbrella trials that aim to identify efficacy signals in several biomarkers simultaneously. Examples of such trials include the National Lung Matrix and FOCUS-4 trials for non-small cell lung cancer (NSCLC) and colorectal cancer.

NGS systems have two distinct characteristics – a significant decrease in time required for analysis and a substantial increase in accuracy – that have made them valuable tools in diagnostics, prognostics, and predicting variations in the human genome. As a result, NGS methods are being used extensively for these purposes.

Conclusion

One of the major hurdles faced by NGS approaches is the absence of uniform procedures for managing quality, sequencing workflows, handling sequencing data, and analyzing it. The lack of standardization can make it difficult to compare results between different studies and can lead to inconsistencies in data interpretation. This issue can be particularly challenging when attempting to use NGS for clinical applications, where accuracy and reliability are critical. Therefore, the establishment of standardized protocols and guidelines is necessary to address these challenges and ensure the consistent and reliable use of NGS approaches.

References

Watson JD, Crick FH (1953) Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature 171: 737-738.
Le Tourneau, et.al;, (2015), “Pan-cancer integrative molecular portrait towards a new paradigm in precision medicine”, Springer.
Qiang-long Z, Shi L, Peng G, Fei-shi L (2014), “High-throughput sequencing technology and its application. Journal of Northeast Agricultural University “,21: 84-96.
Mardis ER (2011),” A decade’s perspective on DNA sequencing technology. Nature”, 470: 198-203.
Margulies M, Egholm M, et al., “Genome sequencing in microfabricated high-density picolitre reactors”, Nature 2005;437:376–80
Valouev A, et al., “A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning,”, Genome Res 2008;18:1051–63.
Bentley DR., et.al, “Whole-genome re-sequencing”, Curr Opin Genet Dev 2006;16:545–52.
Li H, Ruan J, Durbin R.,” Mapping short DNA sequencing reads and calling variants using mapping quality scores”, Genome Res 2008;18:1851–8.
Whiteford N, Haslam N, Weber G, et al.,” An analysis of the feasibility of short read sequencing, Nucleic Acids”, Res, 2005, vol. 33 pg. e171.
Brian M. Forde et.al, “Next-generation sequencing technologies and their impact on microbial genomics”, Briefings in Functional Genomics, Volume 12, Issue 5, September 2013, Pages 440-453
Koushlesh Ranjan, “Application of Molecular and Serological Diagnostics in Veterinary Parasitology”, January 2016The Journal of Advances in Parasitology 2(4):80-99
Roach JC, Glusman glycine et al., “Analysis of Genetic Inheritance in a Family Quartet by Whole- Genome Sequencing. Science “,2010;328:636–9.
Goate, A., Chartier-Harlin,., et al. (1991), “ Segregation of a missense mutation in the amyloid precursor protein gene with familial Alzheimer’s disease”, Nature 349, 704–706.
Pertea, M.; Shumate et.al;” A new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise”, Genome Biol. 2018, 19, 208.
Shin, S.H.; Bode, A.M.; Dong, Z. Precision medicine,” The foundation of future cancer therapeutics”, Npj Precis. Oncol. 2017, 1, 12.
McKusick, V.A. (2007), “Mendelian Inheritance in Man and its online version, OMIM”, Am. J. Hum. Genet. 80, 588–604.
Conrad, D.F et al.; 1000 Genomes Project. (2011). Variation in genome-wide mutation rates within and between human families. Nat. Genet. 43, 712–714.
Kinde, I., Bettegowda, et al. (2013),” Evaluation of DNA from the Papanicolaou test to detect ovarian and endometrial cancers”, Sci. Transl.Med. 5, 167ra164.
R. Bernards, “Finding effective cancer therapies through loss of function genetic screens”, Curr. Opin. Genet. Dev., 24 (2014), pp. 23-29.
Settings M. Metagenomics versus Moore’s law. Nat Methods. 2009;6:623.
Next-generation sequencing technologies and their impact on microbial genomics – Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/The-two-main-assay-based-applications-of-NGS-technologies-A-ChIP-seq-ChIP-is-combined_fig3_234123197
Goswami R. S., Luthra R., Singh R. R., et al,” Identification of factors affecting the success of next-generation sequencing testing in solid tumors”, American Journal of Clinical Pathology. 2016;145(2):222–237.