Overview of Sequencing Technologies

Genome sequencing is essential for linking genotypes with phenotypes, and sequencing technologies have numerous applications in the life sciences, including functional genomics, oncology, and evolutionary biology. The genome sequencing are divided into three generations, with Sanger sequencing being the first and providing the basis for other sequencing technologies. Next-generation sequencing (NGS) methods have overcome some of the limitations of Sanger sequencing but still have some drawbacks, such as lower accuracy and read length. The series of steps involved in sequencing any genome.

Selection of Parameters for Sequencing Genome

When selecting a sequencing method, important parameters to be considered  including read length and cost per read, raw accuracy, and the availability of mate-paired reads. Longer read lengths facilitate assembly and reduce assembly error, while decreasing the cost per read makes longer reading more economical. Raw accuracy is important for quality control and results in better assembly and higher-quality finished sequences. Mate-paired reads, which are separated by a known distance in the genome of origin, are crucial for de-novo genome assembly and resolving repeat regions.

Preprocessing and post-processing protocols: Preprocessing involves preparing the sample, constructing the library, eliminating low-quality and contaminant sequences, and removing unwanted features. The post-processing involves creating software and online tools for assembling short-read sequencing data, eliminating contig chimeras, and producing finished sequences. These protocols are essential for ensuring the accuracy and quality of the sequencing data.

Sequencing at Different Generations

First Generation Technologies

Despite the development of newer sequencing techniques, first-generation sequencing methods such as Maxam-Gilbert sequencing and Sanger sequencing is still used in some studies where accuracy is of utmost importance, such as in synthetic oligonucleotides and gene targets. These older methods may be slower and more expensive, but they are still reliable and trusted in certain applications. Maxam -Gilbert sequencing involves cleaving DNA at specific nucleotides, while Sanger sequencing is based on the synthesis of DNA strands.

The Maxam-Gilbert sequencing technique uses chemical reagents to cleave specific bases of existing DNA molecules. In this method, dsDNA is labeled with radiolabeled phosphorus, and then ssDNA is obtained through denaturation or restriction digestion. The ssDNA is then divided into four samples and treated with different chemical reagents to determine the sequence by reading the pattern of bands on a gel. However, this technique is unsuitable for large-scale DNA sequencing due to its use of hazardous chemicals and incomplete reactions. It was the first generation sequencing technology, developed in 1977.

Sanger sequencing, also known as chain termination sequencing, was developed by Frederick Sanger in 1975 and commercialized in 1977. It uses dideoxy ribonucleoside triphosphates that lack a 30 hydroxyl group to terminate the chain. Fragments of different lengths are separated by running the products on a polyacrylamide gel. Sanger sequencing has undergone further modifications, such as dye-terminator sequencing and the capillary sequencer, which have increased accuracy and speed.

Next Sequencing Technologies

Next-generation sequencing (NGS) has replaced first-generation sequencing techniques due to its rapid and cost-effective nature. NGS techniques can be grouped into categories such as cyclic-array sequencing, sequencing by hybridization, micro electrophoretic methods, and real-time sequencing of single molecules. NGS technologies can generate hundreds of millions of sequencing reads in parallel, remove cumbersome and time-consuming techniques, and dramatically decrease the effective reagent volume per reaction. When combined with bioinformatics and computational biology tools and software, NGS results in a massive increase in data generation rates. NGS advantages include producing large amounts of data at lower costs and in less time than conventional sequencing.

Second-Generation Sequencing Techniques

Second-generation sequencing includes Roche 454 pyrosequencing, Solexa technology by Illumina, and sequencing by ligation by ABI/SOLiD. Pyrosequencing, used in 454, involves the release of light from phosphates during DNA replication, which is detected for accurate sequencing. The process involves shearing duplex DNA into fragments and ligating adapters on both sides that act as the site for primer binding. Each fragment is bound to an emulsion microbead, and amplification using emulsion PCR results in many copies of each DNA molecule per bead. Immobilized enzymes are added to the wells in a fiber optic plate containing one amplified bead each, and four dNTPs are incorporated one by one to produce a light signal proportional to the amount of ATP. The signal’s intensity is captured by the CCD camera, and enzyme apyrase degrades the existing nucleotides and ATP before the addition of the next nucleotide. 454’s long read length of 400 bases and the ability to generate more than 1,000,000 individual reads per run make it suitable for de novo assembly and metagenomics.

The Illumina (Solexa) Genome Analyzer

 The Solexa platform, developed by British chemists Shankar Balasubramanian and David Klenerman, uses a technique called sequencing-by-synthesis, or CRT. This platform can generate 50-60 million reads with an average length of 40-50 bases in a single run. The genomic DNA is fragmented and adapters are ligated at both ends. The ligated fragments are loaded onto a glass slide (flow cell) where they hybridize into complementary oligos. Bridge amplification is used to generate dense clusters of duplex DNA, and the slide is then ready for sequencing. Illumina’s system is known for its ability to produce a larger quantity of data in less time and at a lower cost than Sanger sequencing and other NGS systems.

The ABI SOLiD platform, developed in 2005, uses hybridization-ligation steps with di-base probes containing fluorescent dyes. Genomic DNA is sheared and ligated with adaptors, then clonally amplified onto beads using emulsion PCR. The beads are attached to a glass slide and provided with universal primers, ligase, and di-base probes. The sequencing process involves three rounds, with signals for specific nucleotide positions detected after each round. The first round detects signals for nucleotides 1-2, 6-7, 11-12, etc. The second round detects signals for nucleotides 0-1, 5-6, 10-11, etc. The third round detects signals for -1-0, 4-5, 9-10, etc. Each base is detected twice, making it the most accurate of the NGS systems.

The Polonator G.007 is an affordable and high-performance DNA sequencer developed by Dover and the Church Lab at Harvard Medical School. It uses ligation detection sequencing with single base probes tagged with a fluorophore to identify nitrogenous bases. Genomic DNA is sheared, and emulsion PCR is used to amplify templates on polonies. Enriched polonies are inserted into the flow cell and read using cyclic sequencing array technology, allowing millions of short reads to be generated in parallel. The Polonator’s open-source software and protocols make it a popular choice for researchers seeking cost-effective sequencing technologies.

Second-generation DNA sequencing technologies have several disadvantages, including shorter read lengths and lower accuracy compared to conventional sequencing. Some technologies have difficulty resolving homopolymer-containing DNA segments, and all use cumbersome emulsion PCR. Despite these limitations, it is expected that improvements will be made as scientists continue to analyze and address these issues.

Third Generation Technologies

Ion Torrent sequencing Technology was introduced by Life Technologies in 2010, is a high-throughput DNA sequencing technology that operates on the principle of covalent bond formation catalyzed by DNA polymerase. It involves flooding microwells on a semiconductor chip with unmodified deoxynucleoside triphosphates (dNTPs) and DNA polymerase. If a nucleotide is incorporated into the growing chain, a biochemical reaction occurs with the liberation of hydrogen ions. This leads to a decrease in pH, which is detected by an Ion-Sensitive Field Effect Transistor (ISFET) positioned beneath each microwell. The ISFET detects the pH change via potential difference and records each nucleotide incorporation event. The unbound dNTP molecules are washed out before the next cycle begins, and the process is repeated.

DNA Nanoball Sequencing – Complete Genomics Inc introduced cPAL, a high-throughput third-generation sequencing approach based on unchained ligation in 2010. It utilizes rolling circle amplification of clonally amplified “nanoballs” of small target DNA sequences. The DNA nanoballs are randomly attached to a dense array and anchor and detection probes are used for sequencing. Each base call is unchained, improving the quality of the sequence.

This approach overcomes errors in reading repeat regions and eliminates the need for extensive computation. Its introduction dramatically reduced sequencing costs from one million dollars in 2008 to $4400 in 2010.

SMRT sequencing by Pacific Biosciences uses single-molecule real-time sequencing, which involves labeling the phosphate end of each nucleotide with a unique fluorescent tag. The reaction occurs in a nano-photon visualization chamber called ZMW. DNA polymerase and a single-stranded DNA template are present in the detection zone at the bottom of each ZMW. When a nucleotide is incorporated by DNA polymerase, the fluorescent tag is cleaved off, and light pulses are emitted, which are recorded by the machine.

Nanopore Sequencing – Nanopores are nanometer-sized channels that can be biological, solid-state, or hybrid in nature. Oxford Nanopore Technologies licensed the technology in 2008. Nanopore DNA sequencing does not require the labeling or detection of nucleotides but rather measures the modulation of the ionic current generated when a DNA molecule passes through the nanopore. Different nucleotides have different resistances, and measuring the time of current blockage can determine the sequence of the molecule. The technique has the potential for rapid DNA sequencing.

Fourth Generation Sequencing

In situ, sequencing has been developed which allows for direct reading of nucleic acid composition in fixed cells and tissues using second-generation NGS chemistry. Recently, the first in situ sequencing of mRNA was demonstrated using a targeted method to sequence short nucleotide sequences in breast cancer tissue sections. This was achieved by generating cDNA in situ, followed by using padlock probes to encircle a short target sequence of four to six bases, which was then clonally amplified thousands of times via rolling circle amplification (RCA). Finally, the target region was read using sequencing by ligation chemistry, which was developed by Drmanac and colleagues.

Reference

  1. Pop M, Kosack D, Salzberg SL (2002), “A hierarchical approach to building contig scaffolds. In Second annual RECOMB satellite meeting on DNA sequencing and characterization”, Stanford University.
  2. Edwards A, Caskey T (1991), “Closure strategies for random DNA sequencing Methods”.
  3. Franca LTC, Carrilho E, Kist TBL (2002), “Alanine review of DNA sequencing techniques”, Rev Biophys 35:169–200
  4. Augustine, et. Al; (2001), “Progress towards single-molecule sequencing: enzymatic synthesis of nucleotide-specifically labeled DNA.”, J Biotechnol 86:289–301.
  5. Hui P (2014), “Next-generation sequencing chemistry, technology and application”. Top Curr Chem 336:1–18.
  6. Kaji N, Okamoto Y, Tokeshi M, Baba Y (2010),” Nanopillar, nanoball, and nanofibers for highly efficient analysis of biomolecules”, Chem Soc Rev 39:948–956.
  7. Clarke J, Wu HC, et.al (2009), “Continuous base identification for single-molecule nanopore DNA sequencing”, Nat Nanotechnol 4:265–270.
  8. Marco Mignardi et.al , “Fourth-generation sequencing in the cell and the Clinic” , http://genomemedicine.com/content/6/4/31.