{"id":8741,"date":"2023-04-17T14:22:26","date_gmt":"2023-04-17T14:22:26","guid":{"rendered":"https:\/\/www.mybiosource.com\/learn\/?page_id=8741"},"modified":"2023-04-17T14:25:41","modified_gmt":"2023-04-17T14:25:41","slug":"from-dna-to-data-the-process-of-genome-sequencing","status":"publish","type":"page","link":"https:\/\/www.mybiosource.com\/learn\/from-dna-to-data-the-process-of-genome-sequencing\/","title":{"rendered":"From DNA to Data: The Process of Genome Sequencing"},"content":{"rendered":"<p>Bioinformatics has expanded beyond just analyzing genome sequence data and is now used for various important tasks such as studying <span id=\"urn:enhancement-6d183c43-d28f-497a-8696-583db331b9bd\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/gene\">gene<\/span> variation and <span id=\"urn:enhancement-a3be6061-788a-47d5-90f3-55c268cbdccc\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/expression\">expression<\/span>, predicting and analyzing <span id=\"urn:enhancement-e8027110-e319-4bae-af2b-11ed1e4a34bb\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/gene\">gene<\/span> and protein structure and function, detecting <span id=\"urn:enhancement-33411935-dd2a-4982-95b9-a5e9194481ab\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/gene\">gene<\/span> regulation networks, creating simulation environments for whole-cell modeling, complex modeling of <span id=\"urn:enhancement-6822f262-2b58-482c-8f92-a71b2b058703\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/gene\">gene<\/span> regulatory dynamics and networks, and presenting and analyzing molecular pathways to gain insight into <span id=\"urn:enhancement-59ce04e2-9bfa-4a3e-85cc-b75018851df5\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/gene\">gene<\/span>-disease interactions.<\/p>\n<h4><strong><u>Methods used for Creating Library (DNA to Data)<\/u><\/strong><\/h4>\n<p><strong>Sample Collection and DNA Extraction:<\/strong> Genomic DNA isolation is a fundamental step in many <span id=\"urn:enhancement-50c4d0d6-a484-4e08-858d-bc2763a2562b\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/molecular-biology\">molecular biology<\/span> experiments, including genome sequencing, <span id=\"urn:enhancement-5372fb07-42d2-4bef-85b7-510f5965cbeb\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/gene\">gene<\/span> <span id=\"urn:enhancement-645a2a71-151d-41c1-ac70-11c1c96890ed\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/expression\">expression<\/span> analysis, and <span id=\"urn:enhancement-bf63cd6e-43a0-4445-bbcf-ad37af5bec62\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/genetic-engineering\">genetic engineering<\/span>. The quality and quantity of the extracted DNA can significantly impact the downstream applications&#8217; success, accuracy, and reliability. There are several methods available for <span id=\"urn:enhancement-4f8d9330-d7aa-4a59-9bf6-d790b642d140\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-extraction\">DNA extraction<\/span>, and the choice of method depends on the source of DNA, the downstream applications, and the required quality and quantity of DNA.<\/p>\n<p>There are two primary solution-based approaches for DNA isolation:<\/p>\n<p>1) Solution-based methods that use organic solvents, and<\/p>\n<p>2) Those that rely on a solid base technique.<\/p>\n<p><strong>DNA Fragmentation:<\/strong> The genomic DNA is randomly fragmented into smaller pieces, typically ranging in size from 100 to 1000 base pairs. DNA can be shortened for library preparation using three common methods: Physical (using sound or pressure to break it into smaller pieces) -Sonication, enzymatic (using enzymes to cut or move the DNA), and chemical (breaking it down by heating it with certain metals).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-8742\" src=\"https:\/\/www.mybiosource.com\/learn\/wp-content\/uploads\/2023\/04\/5.1-DNA-extraction-methods.jpg\" alt=\"\" width=\"2117\" height=\"1013\" srcset=\"https:\/\/www.mybiosource.com\/learn\/wp-content\/uploads\/2023\/04\/5.1-DNA-extraction-methods.jpg 2117w, https:\/\/www.mybiosource.com\/learn\/wp-content\/uploads\/2023\/04\/5.1-DNA-extraction-methods-1280x612.jpg 1280w, https:\/\/www.mybiosource.com\/learn\/wp-content\/uploads\/2023\/04\/5.1-DNA-extraction-methods-980x469.jpg 980w, https:\/\/www.mybiosource.com\/learn\/wp-content\/uploads\/2023\/04\/5.1-DNA-extraction-methods-480x230.jpg 480w\" sizes=\"(min-width: 0px) and (max-width: 480px) 480px, (min-width: 481px) and (max-width: 980px) 980px, (min-width: 981px) and (max-width: 1280px) 1280px, (min-width: 1281px) 2117px, 100vw\" \/><\/p>\n<p><strong>Library Preparation<\/strong>: In the process of preparing DNA for sequencing, the broken ends of the DNA strands are first fixed to create flat, even ends. After that, tiny sequences called adapters are attached to the ends of the fragments. These adapters contain essential information needed for sequencing and enable the fragments to bind to the sequencing device.<\/p>\n<p>The Nextera DNA Sample Prep Kit from <span id=\"urn:enhancement-bdf2675b-20e4-4447-8107-b948588da864\" class=\"textannotation disambiguated wl-organization\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/illumina\">Illumina<\/span> is a method of preparing libraries of genomic DNA that involves using a transposase <span id=\"urn:enhancement-b7866a34-69bb-4eee-a9ef-f5cb7493e475\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/enzyme\">enzyme<\/span> to fragment and label the DNA in a one-tube reaction known as &#8220;tagmentation.&#8221; This technique allows for the simultaneous fragmentation and tagging of the DNA, which simplifies and streamlines the library preparation process.<\/p>\n<p>There are three strategies for optimizing assembly efficiency when preparing DNA libraries:<\/p>\n<ul>\n<li>Creating libraries with long inserts of approximately 1 kilobase in size. Avoiding <span id=\"urn:enhancement-a2943e1b-fb27-4f1e-be1d-4a6dd496adf0\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\">PCR<\/span> amplification during library preparation.<\/li>\n<li>Creating mate-pair libraries with a long distance spacing of 5-20 kilobases between reads.<\/li>\n<li>It is difficult to construct mate-pair libraries without <span id=\"urn:enhancement-0618e1d9-2237-465e-bf3f-44eb0e268a05\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\">PCR<\/span> amplification, but long insert libraries can be constructed without <span id=\"urn:enhancement-9f6e5646-1730-4111-b00c-27ddf3988247\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\">PCR<\/span> if there is enough DNA available. This is done by carefully breaking down genomic DNA into smaller pieces.<\/li>\n<\/ul>\n<p><strong>Sequencing:<\/strong> The prepared DNA library is loaded onto the sequencing instrument, and the DNA fragments are sequenced using one of several available sequencing technologies (e.g., <span id=\"urn:enhancement-fc51f280-1d09-4df0-9784-aa72abfe4724\" class=\"textannotation disambiguated wl-organization\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/illumina\">Illumina<\/span>, PacBio, Nanopore).<\/p>\n<p><strong><span id=\"urn:enhancement-e5fc418f-cc3d-42d1-8ea6-f0e320f822cd\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/next-generation-sequencing\">Next-generation sequencing<\/span><\/strong> (NGS) can be divided into four main process elements: sample preprocessing, library preparation, sequencing itself, and bioinformatics. Regardless of the sequencing method used, all modern sequencing technologies require a dedicated sample preparation step to create a sequencing library that can be loaded onto the instrument. The library consists of DNA fragments with a specific length distribution, which is tagged with oligomer adapters for barcoding, and then subjected to the actual sequencing process. Once the sequencing is complete, the resulting data is analyzed using bioinformatics tools.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-8743\" src=\"https:\/\/www.mybiosource.com\/learn\/wp-content\/uploads\/2023\/04\/5.2-library-preparation.jpg\" alt=\"\" width=\"1241\" height=\"430\" srcset=\"https:\/\/www.mybiosource.com\/learn\/wp-content\/uploads\/2023\/04\/5.2-library-preparation.jpg 1241w, https:\/\/www.mybiosource.com\/learn\/wp-content\/uploads\/2023\/04\/5.2-library-preparation-980x340.jpg 980w, https:\/\/www.mybiosource.com\/learn\/wp-content\/uploads\/2023\/04\/5.2-library-preparation-480x166.jpg 480w\" sizes=\"(min-width: 0px) and (max-width: 480px) 480px, (min-width: 481px) and (max-width: 980px) 980px, (min-width: 981px) 1241px, 100vw\" \/><\/p>\n<p>During the library preparation, size selection and clean-up are essential steps to ensure that DNA fragments are of a specific length. Various methods can be used for this process, such as magnetic beads, columns, or gels. While solid-<span id=\"urn:enhancement-381b6aa1-5975-4169-8c7e-0f8633efe913\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/phase\">phase<\/span> reversible immobilization beads are commonly used for size selection and clean-up, some companies offer spin-column clean-up kits, and the PACBIO SMRT bell Express Template Preparation Kit recommends gel-based size selection using the BluePippin System.<\/p>\n<p>Polymerase chain reaction (<span id=\"urn:enhancement-e8a81619-4a50-4250-8506-fa502e612404\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\">PCR<\/span>) is typically used to amplify nucleic acids during library preparation, both to attach sequencing adapters and to increase DNA concentration. However, amplification-free methods can lead to incomplete adapter attachment.<\/p>\n<p><strong><span id=\"urn:enhancement-e6236b6e-b305-4b25-a7de-4af041370273\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\">PCR<\/span> amplification<\/strong> can result in bias, leading to under-representation or the complete absence of certain loci with extreme base compositions. Efforts have been made to reduce this bias, but special <span id=\"urn:enhancement-227e3b24-8a99-4bb5-bf56-4ce808e4c15b\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\">PCR<\/span> workflows using unique molecular identifiers (UMIs) can be employed to avoid bias. These workflows involve a two to four-cycle UMI-<span id=\"urn:enhancement-0fcd6861-45f5-416e-a48c-0e2537f8fd69\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\">PCR<\/span> followed by an exchange of primers and a second <span id=\"urn:enhancement-f4de16a6-68d5-4ae5-8618-e27d0d566aef\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\">PCR<\/span> for amplification and adapter ligation. It is important to carefully automate these multi-step <span id=\"urn:enhancement-f179d5de-f629-4ccc-b047-90c8684a4618\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\">PCR<\/span> workflows to prevent contamination during processing.<\/p>\n<p><strong>Data Processing<\/strong>: The raw sequencing data is processed to remove low-quality reads and adapter sequences, and to generate high-quality sequence data.<\/p>\n<p><strong>Base-calling<\/strong>: Base-calling is the procedure of determining the identity of <span id=\"urn:enhancement-f1f3e57a-990c-4ba3-a594-d71df8606a34\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/nucleotides\">nucleotides<\/span> (A, C, G, T) from the signals produced by a sequencing device. These signals can be generated from various sources such as light intensity or electrical current fluctuations, which are associated with the passage of individual <span id=\"urn:enhancement-68e0a7f9-a04c-461f-b1c9-dd780d081acc\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/nucleotides\">nucleotides<\/span> through a nanopore. The base calling process involves matching these signals with specific nucleobases to generate a <span id=\"urn:enhancement-45446906-08e4-4962-a73b-d2ea57e1c814\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-sequence\">DNA sequence<\/span>.<\/p>\n<p><strong>Quality control checks<\/strong> are performed to ensure sequencing data is sufficient followed by <strong>read mapping<\/strong> which is the process of aligning the sequencing reads and involves mapping them to a pre-existing reference genome or assembling them from scratch to create a preliminary assembly of the genome. It is accompanied by <strong>Variant calling<\/strong>, the process of detecting and describing the variations, including but not limited to SNPs, insertions, deletions, and structural variants, between the genome of the sample being sequenced and a known <span id=\"urn:enhancement-5f515cd6-9ad2-4fa4-b2e6-751e2ffa85b5\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/reference-genome\">reference genome<\/span>. After the variants have been identified, they are annotated to assess their possible effects on genes and other genomic elements. This process helps determine the potential functional consequences of the variants.<\/p>\n<p>The identified variants are validated and analyzed using bioinformatics tools.<\/p>\n<p>There are several bioinformatics tools used in genomic sequencing, and they vary depending on the specific analysis being performed.<\/p>\n<p>Here are some commonly used tools:<\/p>\n<p><strong>Alignment and Assembly Tools:<\/strong> These tools are used to align sequencing reads to a reference genome or assemble reads to generate a de novo assembly. Examples include Bowtie2, BWA, and SPAdes.<\/p>\n<p><strong>Variant Calling Tools:<\/strong> These tools are used to identify and annotate variants between a sample and a reference genome. Examples include GATK, SAMtools, and FreeBayes.<\/p>\n<p><strong>Genome Annotation Tools<\/strong>: These tools are used to annotate genes and other genomic elements, such as regulatory regions and repeat sequences. Examples include ANNOVAR, Ensembl, and NCBI&#8217;s RefSeq.<\/p>\n<p><strong>Functional Analysis Tools<\/strong>: These tools are used to assess the functional impact of genetic variants on genes and other genomic features. Examples include SIFT, PolyPhen, and VEP.<\/p>\n<p><strong>Visualization Tools<\/strong>: These tools are used to visualize genomic data and aid in data interpretation. Examples include IGV, UCSC Genome Browser, and Circos.<\/p>\n<p><strong>Pathway and Network Analysis Tools<\/strong>: These tools are used to analyze the interactions between genes and proteins within a biological pathway or network. Examples include Reactome, KEGG, and STRING.<\/p>\n<p><strong>Machine Learning Tools:<\/strong> These tools are used to build predictive models using genomic data. Examples include Random Forests, Support Vector Machines, and Deep Learning algorithms.<\/p>\n<h4><span style=\"text-decoration: underline;\"><strong>Conclusion<\/strong><\/span><\/h4>\n<p>In conclusion, the process of DNA to data in genome sequencing involves several steps, including sample preparation, DNA sequencing, data analysis, and interpretation. The advent of high-throughput sequencing technologies has revolutionized the field of genomics, enabling rapid and cost-effective sequencing of entire genomes. This has led to many exciting discoveries in various areas, including medicine, agriculture, and environmental science. However, there are still many challenges that need to be overcome, such as improving the accuracy and completeness of genome sequencing, reducing sequencing costs, and developing better tools for data analysis and interpretation. Despite these challenges, genome sequencing is a powerful tool that has the potential to transform many fields and improve our understanding of the genetic basis of life.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>References<\/strong><\/span><\/p>\n<ol>\n<li>Diego Chacon-Cortes, Lyn R Griffiths, &#8220;Methods for extracting genomic DNA from whole blood samples: current perspectives&#8221;, Journal of Biorepository Science for Applied Medicine, 2014<\/li>\n<li>Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F, Salomon DR, et al. Library construction for <span id=\"urn:enhancement-eb67b6ee-da31-4f68-a8a5-69b2401b1a8e\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/next-generation-sequencing\">next-generation sequencing<\/span>: overviews and challenges. BioTechniques. 2014;56: 61\u201364, 66, 68, passim. pmid:24502796, View ArticlePubMed\/NCBIGoogle Scholar<\/li>\n<li>Adey A, Morrison Asan HG, Xun X, Kitzman JO, Turner EH, Stackhouse B, MacKenzie AP, et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 2010;11:R119<\/li>\n<li>Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ. Amplification-free <span id=\"urn:enhancement-492bdc82-367f-4e4e-ae3c-9ed33ef2af41\" class=\"textannotation disambiguated wl-organization\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/illumina\">Illumina<\/span> sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods. 2009;6:291\u2013295. [PMC free article] [PubMed] [Google Scholar]<\/li>\n<li>Goodwin, S., McPherson, J. &amp; McCombie, W. Coming of age: ten years of <span id=\"urn:enhancement-0a64f86a-f61b-4148-acb4-a9c88594dba8\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/next-generation-sequencing\">next-generation sequencing<\/span> technologies.\u00a0Nat Rev Genet 17, 333\u2013351 (2016)<\/li>\n<li>Sage Science, 2019. BluePippin. Available from: [Internet]. http:\/\/www.sagescience. com\/products\/bluepippin\/<\/li>\n<li>Richterich, Peter (1998-03-01). &#8220;Estimation of Errors in &#8220;Raw&#8221; DNA Sequences: A Validation Study&#8221;. Genome Research. Cold Spring Harbor Laboratory. 8 (3): 251\u2013259. doi:10.1101\/gr.8.3.251. ISSN 1088-9051. PMC 310698. PMID 9521928<strong>.<\/strong><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bioinformatics has expanded beyond just analyzing genome sequence data and is now used for various important tasks such as studying gene variation and expression, predicting and analyzing gene and protein structure and function, detecting gene regulation networks, creating simulation environments for whole-cell modeling, complex modeling of gene regulatory dynamics and networks, and presenting and analyzing [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"class_list":["post-8741","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/pages\/8741","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/comments?post=8741"}],"version-history":[{"count":0,"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/pages\/8741\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/media?parent=8741"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}