{"id":1746,"date":"2018-03-01T12:32:49","date_gmt":"2018-03-01T12:32:49","guid":{"rendered":"https:\/\/www.mybiosource.com\/learn\/?page_id=1746"},"modified":"2023-03-02T10:18:02","modified_gmt":"2023-03-02T10:18:02","slug":"exome-sequencing","status":"publish","type":"page","link":"https:\/\/www.mybiosource.com\/learn\/testing-procedures\/exome-sequencing\/","title":{"rendered":"Exome Sequencing"},"content":{"rendered":"<h3><strong>Introduction<\/strong><\/h3>\n<p>Exomes are whole sequences that compose of all the exons (<span id=\"urn:enhancement-57357619-da63-4491-9ebf-00601f2a0881\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/coding-sequences\">coding regions<\/span> for <span id=\"urn:enhancement-6737ffad-78a6-41c6-a789-1eb7596b82fe\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/proteins\">proteins<\/span> in a genome) and cover about 1 % to 2 % of the genome depending on species. Genome-wide association studies suggest that common genetic variants explain only a small fraction of heritable risk for common diseases, raising the question of whether rare variants account for a significant fraction of unexplained heritability. While <span id=\"urn:enhancement-08bc0df3-ca5d-462c-bd75-a7144560408b\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\">DNA<\/span> sequencing costs have fallen dramatically, they remain far from what is necessary for rare and novel variants to be routinely identified at a genome-wide scale in large cohorts. Second-generation methods have been developed for targeted sequencing of all protein-<span id=\"urn:enhancement-57c85140-6422-4e1b-b5df-fa1b25279056\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/coding-sequences\">coding regions<\/span> (`exomes&#8217;), to reduce costs while enriching for the discovery of highly penetrant variants. Here we report on the targeted capture and massively parallel sequencing of the exomes of twelve humans. These include eight HapMap individuals representing three populations, and four unrelated individuals with a rare dominantly inherited disorder, Freeman-Sheldon<span id=\"urn:enhancement-9687efad-e2c8-426c-8d1d-b62c31069296\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/syndrome\"> syndrom<\/span>e (FSS). We demonstrate the sensitive and specific identification of rare and common variants in over 300 megabases (Mb) of a<span id=\"urn:enhancement-c2a8ab5e-aa2c-4d48-b61b-c23a36118f5e\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/coding-sequences\"> coding sequenc<\/span>e. Using FSS as a proof-of-concept, we show that candidate genes for monogenic disorders can be identified by exome sequencing of a small number of unrelated, affected individuals. This strategy may be extendable to diseases with more complex genetics through larger sample sizes and appropriate weighting of nonsynonymous variants by predicting the functional impact.<\/p>\n<h3><strong>Steps involved in the Exome sequencing <\/strong><\/h3>\n<h4><strong><em>DNA samples, targeted capture, and massively parallel sequencing <\/em><\/strong><\/h4>\n<p>&lt;p<span class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-samples\"><span class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-samples\">&gt;DNA samples were obtained from Coriell Repositories (HapMap) or by M.B. (FSS). Each shotgun library was hybridized to two Agilent 244K microarrays for<span id=\"urn:enhancement-b0bd52ba-041a-4739-a91b-2beaecb4168e\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t enrichment, followed by washing, elution, and additional amplification. The first array targeted CCDS (2007), while the second was designed against targets poorly captured by the first array plus updates to CCDS in 2008. All sequencing was performed on the genome analyzer.<\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<h4><strong><em>Read mapping and variant analysis<\/em><\/strong><\/h4>\n<p>Reads were mapped to the reference human genome (UCSC hg18), initially with ELAND software for quality recalibration, and then again with Maq13. Sequence calls were also performed by Maq, and filtered to coordinates with &gt;= 8\u00d7 coverage and a phred-like15 consensus quality &gt;= 30. Sequence calls for HapMap individuals were compared against Illumina Human1M-Duo genotypes. NA18507 SNPs from whole genome data12 were obtained. Annotations of cSNPs were based on NCBI and UCSC databases, supplemented with PolyPhen Grid Gateway24 predictions for nonsynonymous SNPs.<\/p>\n<p><strong>Identification of Coding Indels Involves<\/strong><\/p>\n<ol>\n<li>a) Gapped alignment of unmapped reads to the genome to generate a set of candidate indels using cross-match;<\/li>\n<li>b) Ungapped alignment of all reads to the reference and alternative alleles for all candidate indels using Maq;<\/li>\n<li>c) Filtering by coverage and allelic ratio.<\/li>\n<\/ol>\n<h3><strong>Methods<\/strong><\/h3>\n<h4><strong>Genomic DNA Samples<\/strong><\/h4>\n<ol>\n<li>Targeted capture was performed on genomic<span id=\"urn:enhancement-96cdecf3-99ea-4584-832d-da28c3c9e988\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A from 8 HapMap individuals (4 Yoruba (NA18507, NA18517, NA19129, NA19240), 2 East Asians (NA18555, NA18956), 2 European-Americans (NA12156, NA12878), 4 European-American individuals affected by Freeman-Sheldon<span id=\"urn:enhancement-8859a1c0-be7c-4d90-abe8-616bed405816\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/syndrome\"> syndrom<\/span>e (FSS10066, FSS10208,\u00a0 FSS22194, FSS24895).<\/li>\n<li>Genomic<span id=\"urn:enhancement-81844dd9-d4f8-49e3-9db1-32bbeb11a63e\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A for HapMap individuals was obtained from Coriell Cell Repositories.<\/li>\n<li>Genomic<span id=\"urn:enhancement-e1e94a2b-9985-4532-96af-e4a0237d16ca\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A for Freeman-Sheldon<span id=\"urn:enhancement-e1c70b28-6fd5-40be-9923-70868988e272\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/syndrome\"> syndrom<\/span>e individuals was obtained by M.B.<\/li>\n<\/ol>\n<h4><strong>Oligonucleotides and adaptors<\/strong><\/h4>\n<p>All oligonucleotides were synthesized by Integrated<span id=\"urn:enhancement-8045937f-c4d6-4835-b59d-bea1d9a8c262\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A Technologies (IDT) and resuspended in nuclease-free water to a stock concentration of 100 \uf06dM.\u00a0 Double-stranded library adaptors SLXA_1 and SLXA_2 were prepared to a final concentration of 50 \uf06dM by incubating equimolar amounts of SLXA_1_HI and SLXA_1_LO together and SLXA_2_HI and SLXA_2_LO together at 95\u00b0C for 3 mins and then leaving the adaptors to cool to room temperature in the heat block.<\/p>\n<h4><strong>Shotgun library construction<\/strong><\/h4>\n<p>Shotgun libraries were generated from 10 \uf06dg of genomic<span id=\"urn:enhancement-7fc3282c-3295-4b7e-b079-6e5cd2356b16\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A (gDNA) using protocols modified from the standard protocol.\u00a0 Each library provided sufficient material for hybridization to two microarrays.\u00a0 For each sample, gDNA in 300\uf06dl 1\u00d7 Tris-EDTA was first sonicated for 30min, then end-repaired for 45 mins in a 100 \uf06dl reaction volume with using 1\u00d7 End-It Buffer, 10 \uf06dl dNTP mix, and 10 \uf06dl<span id=\"urn:enhancement-55a9c3cf-7f95-4d15-a197-df6fa266ddaf\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/atp\"> AT<\/span>P as supplied in the End-It<span id=\"urn:enhancement-0c9be771-4c85-4b8c-9e39-c76776d2c9e9\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A End-Repair Kit. The fragments were then A-tailed for 20 mins at 70\u00b0C in a 100\uf06dl reaction volume with 1\u00d7<span id=\"urn:enhancement-8bb10713-bb67-4423-bf62-671caa7d7e8b\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\"> PC<\/span>R buffer, 1.5mM MgCl2, 1mM dATP and 5U AmpliTaq<span id=\"urn:enhancement-76db7bca-02d9-4b51-a941-6feb792af355\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A<span id=\"urn:enhancement-58e6303b-43ba-4143-909f-a0d55ff7027e\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/polymerase\"> polymeras<\/span>e. Next, library adaptors SLXA_1 and SLXA_2 were ligated to the A-tailed sample in a 90\uf06dl reaction volume with 1\u00d7 Quick Ligation Buffer with 5ml Quick<span id=\"urn:enhancement-64c0cc34-7a3f-47a2-8740-e98f50ea33d9\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/t4\"> T<\/span>4<span id=\"urn:enhancement-af5fca87-fb86-41e8-b526-f13f03cff93b\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-ligase\"> DNA Ligas<\/span>e and each adaptor in 10\u00d7 molar excess of the sample. Samples were purified on purification columns after each of these four steps and<span id=\"urn:enhancement-8ee4d8d0-7261-45f6-9535-6ca6af54774a\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A concentration determined on a Nanodrop-1000 when necessary.<\/p>\n<p>Each sample was subsequently size selected for fragments of size 150\u2013250bp using<span id=\"urn:enhancement-e5d2cbf9-b9fd-46f3-9631-c4caf81d7cf5\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/gel-electrophoresis\"> gel electrophoresi<\/span>s on a 6% TBE-polyacrylamide gel. A gel slice containing the fragments of interest was then excised and transferred to a siliconized 0.5ml microfuge tube with a 20G needle-punched hole in the bottom. This tube was placed in a 1.5ml siliconized microfuge tube and centrifuged at 13.2rpm for 5mins to create a gel slurry that was then resuspended in 200\uf06dl 1\u00d7 Tris-EDTA and incubated at 65\u00b0C for 2hrs, with periodic vortexing. This allowed for passive elution of<span id=\"urn:enhancement-6273c472-4b22-4ba0-a8be-3b25028650e7\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A, and the aqueous phase was then separated from gel fragments by centrifugation through 0.2\uf06dm Ultrafiltration columns and the<span id=\"urn:enhancement-0f4d2e13-1b24-42dd-8098-90f952c8a9f5\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A recovered using a standard ethanol precipitation.<\/p>\n<p>Recovered<span id=\"urn:enhancement-657faaa1-fc52-4dd6-8748-2dca9e6a091a\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A was resuspended in EB buffer (10mM Tris-Cl, pH8.5) and the entire volume used in a 1ml bulk<span id=\"urn:enhancement-ba7880b0-9bab-4962-b354-e326e6644d35\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\"> PC<\/span>R reaction volume with 1\u00d7 Master Mix and 0.5\uf06dM each of primers SLXA_FOR_AMP and SLXA_REV_AMP in the following conditions \u2013 98\u00b0C for 30s; 20 cycles at 98\u00b0C for 30s, 65\u00b0C for 10s and 72\u00b0C for 30s; and finally 72\u00b0C for 5 min.<span id=\"urn:enhancement-509326e0-9b8c-417d-bb98-4b3dbcc01f02\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\"> PC<\/span>R products were purified across 4 ultrafiltration columns and all the eluants pooled.<\/p>\n<h4><strong>Design of exome capture arrays<\/strong><\/h4>\n<p>All well-annotated protein<span id=\"urn:enhancement-3ebdd692-af56-47cf-87cb-442e65053621\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/coding-sequences\"> coding region<\/span>s as defined by the CCDS was the\u00a0<span id=\"urn:enhancement-c5a8e567-82df-4e61-82e6-d79aebe0f6c5\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t. Coordinates were extracted from entries with \u201cpublic\u201d status, and regions with overlapping coordinates were merged. This resulted in a<span id=\"urn:enhancement-02ec1081-aefd-4bff-a53f-950478200b21\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t with 164,007 discontiguous regions summing to 27,931,548 bp. By comparison,<span id=\"urn:enhancement-3b364a08-5566-4e5a-a4a4-0659d13f64db\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/coding-sequences\"> coding sequenc<\/span>e defined by all of Reference Sequence comprises 31.9 Mb (14% larger). Hybridization probes against the<span id=\"urn:enhancement-bd8ca3a0-e6f4-4044-add7-190448efe96a\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t were designed primarily such that they were evenly spaced across each region. Probes were also constrained a) to be relatively unique, such that the average occurrence of each 15-mer in the probe sequence is less than 1008, b) to be between 20\u201360 bases in length, with preference for longer probes, and c) to have a calculated melting temperature <span id=\"urn:enhancement-b02173a8-eb0c-4e7e-b775-8fb11acacc57\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/tm\">(T<\/span>m) \u2264 69\u00b0C, with preference for higher Tms.<span id=\"urn:enhancement-1a7c6865-c0b3-4fe4-a9c1-d15b13faa0dd\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/tm\"> T<\/span>m was calculated by 64.9 + 41 * (number of G+Cs \u2212 16.4) \/ length of probe.<\/p>\n<p>Two arrays were designed and used per individual. The first array was common to all individuals, and contained 241,071 probes designed mainly against the subset of the<span id=\"urn:enhancement-652071d9-afa8-41fb-8a20-f09b4dce4eb2\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t that was also found in a previous version of the CCDS (CCDS20070227). For most exomes, the second array was custom-designed specifically for<span id=\"urn:enhancement-568514ff-19fc-49d7-a60f-cb89efae6198\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t regions that had not been adequately represented after capture on the first array and subsequent sequencing. For two individuals (FSS10066, FSS10208), the matching was to a different individual&#8217;s first-array data. However, this did not appear to significantly impact performance, likely because features capturing poorly on the first array largely did so consistently. Additionally, all of the second arrays also targeted sequences found in CCDS20080902 that were not in CCDS20070227 and hence not targeted by the first array. A subset of arrays used lacked control grids.<\/p>\n<h4><strong>Targeted capture by hybridization to DNA microarrays<\/strong><\/h4>\n<p>Hybridizations to the arrays were performed per manufacturer&#8217;s instructions with modifications. For each enrichment, a 520\uf06dl hybridization solution containing 20\uf06dg of the bulk amplified gDNA library, 1\u00d7 aCGH Hybridization Buffer, 1\u00d7 Blocking Agent, 50\uf06dg Human CotI<span id=\"urn:enhancement-ef89bf85-fb10-48d7-93e8-647a6a55f648\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A and 0.92nmol each of the blocking oligos SLXA_FOR_AMP, SLXA_REV_AMP, SLXA_FOR_AMP_rev, SLXA_REV_AMP_rev was incubated at 95\u00b0C for 3 min and then at 37\u00b0C for at least 30mins. The hybridization solution was then loaded and the hybridization chamber assembled as per manufacturer&#8217;s instructions. Incubation was done at 65\u00b0C for at least 66hrs with rotation at 20rpm in a hybridization oven.<\/p>\n<p>After hybridization, the slide-gasket sandwich was removed from the chamber and placed in a 50ml conical tube filled with aCGH Wash Buffer 1. The slide was separated\u00a0 from the gasket while in the buffer and then washed, first with fresh aCGH Wash Buffer 1 at room temperature for 10mins on an orbital shaker set on low speed, and then in pre-warmed a CGH Wash Buffer 2 at 37\u00b0C for 5mins. Both washes were also done in 50ml conical tubes.<\/p>\n<p>A Secure-Seal was then affixed firmly over the active area of the washed slide and heated briefly according to manufacturer&#8217;s instructions. One port was sealed with a seal tab and the seal chamber completely filled with approximately 1ml of hot EB (95\u00b0C). The other port was sealed and the slide incubated at 95\u00b0C on a heat block. After 5min, one port was unsealed and the solution recovered.<span id=\"urn:enhancement-f3b7afec-03bd-423c-a7eb-7ae69bf799ca\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A was purified from the solution using a standard ethanol precipitation.<\/p>\n<p>Precipitated<span id=\"urn:enhancement-499b8436-bea2-441f-8a8d-54be2d4841d3\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-strand\"> DN<\/span>A was resuspended in EB and the entire volume used in a 50\uf06dl<span id=\"urn:enhancement-c2d5ec20-211d-4547-938a-ab4817516bba\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\"> PC<\/span>R volume comprising of 1\u00d7 iTaq SYBR Green Supermix with passive dye and 0.2\uf06dM each of primers SLXA_FOR_AMP and SLXA_REV_AMP. Thermal cycling was done in a Real-time<span id=\"urn:enhancement-dcc07998-52c5-4487-b95d-1cc25498a633\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\"> PC<\/span>R system with the following program: 95\u00b0C for 5min, then 30 cycles of 95\u00b0C for 30sec, 55\u00b0C for 2min, and 72\u00b0C for 2min. Each sample was monitored and extracted from the<span id=\"urn:enhancement-ae0203ec-146b-4453-b5b0-2b07684a7f1d\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/pcr\"> PC<\/span>R machine when fluorescence began to plateau. Samples were then purified and sequenced.<\/p>\n<h4><strong>\u00a0<\/strong><strong>Sequencing<\/strong><\/h4>\n<p>All sequencing of post-enrichment shotgun libraries was carried out on an Genome Analyzer as single-end 76 bp reads, following the manufacturer&#8217;s protocols and using the standard sequencing primer. Image analysis and base-calling were performed by the Genome Analyzer with default parameters but no pre-filtering of reads by quality. Quality values were recalibrated by alignment to the reference human genome with the Eland module.<\/p>\n<h4><strong>Read mapping<\/strong><\/h4>\n<p>The reference human genome used in these analyses was UCSC assembly hg18 (NCBI build 36.1), including unordered sequence (chrN_random.fa) but not including alternate haplotypes. For each lane, reads with calibrated qualities were extracted from the Eland export output. Base qualities were rescaled and reads mapped to the human reference genome using Maq software . Unmapped reads were dumped using the \u2212u option and subsequently used for indel mapping. Mapped reads that overlapped<span id=\"urn:enhancement-79a99b38-3c22-4908-81be-4ae204bf4b9d\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t regions (<span id=\"urn:enhancement-13e3a192-5c88-4cc4-a0c8-ea6fa098715f\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\">\u201ctarge<\/span>t reads\u201d) were used for all other analyses.<\/p>\n<h4><strong>Target masking<\/strong><\/h4>\n<p>All possible 76-bp reads that overlapped the aggregate<span id=\"urn:enhancement-f7b92234-75e3-4d22-9124-e365b314ed5c\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t were simulated, mapped using Maq and consensus called using maq assemble with parameters \u2212q 1 \u2212r 0.2 \u2212t 0.9.<span id=\"urn:enhancement-dee194a8-b076-4218-b47a-33d25e7c3754\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> Targe<\/span>t coordinates that had read depth &lt; 76 (i.e. half of the expected depth), reflecting poor mappability, were removed from consideration for downstream analyses, leaving a 26,553,795 bp<span id=\"urn:enhancement-639333a1-fcef-42e9-90fd-7dd36838ae39\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t.<\/p>\n<h4><strong>Variant calling<\/strong><\/h4>\n<p>All reads with a map score &gt; 0 from each individual were merged and filtered for duplicates such that only the read with the highest aggregate base quality at any given start position and orientation was retained. Sequence calls were obtained using maq assemble with parameters \u2212r 0.2 \u2212t 0.9, and only coordinates with at least 8\u00d7 coverage and an estimated phred-like consensus quality value of at least 30 were used for downstream variant analyses.<\/p>\n<h4><strong>Comparison of sequence calls to array genotypes, dbSNP, and whole genome sequencing<\/strong><\/h4>\n<p>For the 8 HapMap individuals, sequence calls were compared to array-based genotyping data. We excluded from consideration genotyping assays where all 8 individuals were called by the arrays as homozygous non-reference as well as the MHC locus at chr6:32500001\u201333300000, as both sets are likely to be error-enriched in the genotyping data. ~14.2 million non-redundant coordinates were defined by this file-set. For comparison of NA18507 cSNPs to whole genome data, variant lists were obtained from Illumina, Inc.<\/p>\n<h4><strong>Identification of coding indels<\/strong><\/h4>\n<p>Reads for which Maq was unsuccessful in identifying an ungapped alignment were converted to fasta format and mapped to the human reference genome with cross_match, using parameters \u2013gap_ext -1 \u2013bandwidth 10 -minmatch 20 \u2013maxmatch 24. Output options \u2013tags \u2013discrep_lists \u2013alignments \u2013score_hist were also set.<\/p>\n<p>Alignments with an indel were then filtered for those that<\/p>\n<ol>\n<li>a) had a score at least 40 more than the next best alignment<\/li>\n<li>b) mapped at least 75 bases of the read<\/li>\n<li>c) had no substitutions in addition to the indel, and<\/li>\n<li>d) overlapped a<span id=\"urn:enhancement-48a86e04-cad0-4877-b7da-324abdb667fb\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t region.<\/li>\n<\/ol>\n<p>Reads from filtered alignments that mapped to the negative strand were then reverse complemented and, together with the rest of the filtered reads, re-mapped with cross_match using the same parameters. This was to reduce ambiguity in called indel positions due to different read orientations. After the second mapping, alignments were re-filtered using the same criteria a) through d). For each sample, a putative indel event was called if at least 2 filtered reads covered the same event. A fasta file containing the sequences of all called events +\/\u2212 75 bp, as well as the reference sequence at the same positions were then generated for each individual. All the reads from each individual were then mapped to its \u201cindel reference\u201d with Maq using default parameters. Reads that mapped multiple times (map score 0) or had redundant start sites were removed, after which the number of reads mapping to either the reference or the non-reference allele was counted for each individual and indel. An indel was called if there were at least 8 non-reference allele reads making up at least 30% of all reads at that genomic position. Indels were called as heterozygous if non-reference alleles were 30\u201370% of reads at that position, and homozygous non-reference if &gt;70%.<\/p>\n<h4><strong>Variant annotation<\/strong><\/h4>\n<p>For cSNP annotation, we constructed a local server that integrates data from NCBI (including dbSNP and Consensus CDS files) and from UCSC Genome Bioinformatics. We also generated PolyPhen predictions24 for all cSNPs identified here, using the PolyPhen Grid Gateway and Perl scripts. The server reads files with<span id=\"urn:enhancement-89b8e14a-fe2b-4041-b470-1db13a77fc1b\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/snp\"> SN<\/span>P locations and alleles and produces annotation files available for download. Annotation includes dbSNP rs IDs, overlapping<span id=\"urn:enhancement-ed8254b4-6dbb-43dd-9340-ec921701f1ab\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/gene\">-gen<\/span>e accession numbers,<span id=\"urn:enhancement-7294af11-d78a-4811-9b01-ba1fcf115f01\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/snp\"> SN<\/span>P function (e.g. whether coding missense), conservation scores, HapMap minor<span id=\"urn:enhancement-d4aa9371-e131-4d2f-8712-07cb8177cc8c\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/allele-frequencies\">-allele frequencie<\/span>s, and various protein annotations (sequence, position,<span id=\"urn:enhancement-d0749315-9e36-4971-a4b7-d4111f5c38bd\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/amino-acid\"> amino aci<\/span>d changes with physicochemical properties, and PolyPhen classification). Indels were considered annotated by dbSNP if an entry was found with the same allele (or reverse complemented) within 1 bp of the variant position. This was to allow for ambiguities in calling the indel position.<\/p>\n<h4><strong>Calculation of genome-wide estimates<\/strong><\/h4>\n<p>Extrapolated estimates for the genome-wide number of cSNPs of various classes were calculated based on the number of cSNP calls in that individual, the estimated sensitivity for making a variant call in that individual at any given position within the aggregate<span id=\"urn:enhancement-a550e455-e8ce-491e-91b6-f47b7331247a\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/target\"> targe<\/span>t (based on the fraction of array-based genotypes of that class that were successfully called; calculated separately for heterozygous and homozygous non-reference variants), and extrapolation to an estimated exome size of exactly 30 Mb (i.e. multiplying by 30\/26.6 = 1.13). A similar approach was taken to estimate the genome-wide number of uncommon cSNPs introducing nonsense codons, starting with the number observed in each individual and extrapolating based on estimated sensitivity for heterozygote detection and an estimated exome size of exactly 30 Mb.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><span id=\"urn:enhancement-0e479927-dda6-4a3e-9394-35731b543230\" class=\"textannotation disambiguated wl-thing\" itemid=\"https:\/\/data.wordlift.io\/wl1503301\/entity\/dna-samples\">&lt;\/p<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Exomes are whole sequences that compose of all the exons (coding regions for proteins in a genome) and cover about 1 % to 2 % of the genome depending on species. Genome-wide association studies suggest that common genetic variants explain only a small fraction of heritable risk for common diseases, raising the question of [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":401,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"class_list":["post-1746","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/pages\/1746","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/comments?post=1746"}],"version-history":[{"count":0,"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/pages\/1746\/revisions"}],"up":[{"embeddable":true,"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/pages\/401"}],"wp:attachment":[{"href":"https:\/\/www.mybiosource.com\/learn\/wp-json\/wp\/v2\/media?parent=1746"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}