Covalent “Click Chemistry”-Based Attachment of DNA onto Solid

PDF (1 MB) ..... sample (Figure S-6), copy number quality control (Figure S-7); tables of oligonucleotides, target genes, sequencing metrics, and muta...
0 downloads 0 Views 462KB Size
Subscriber access provided by Iowa State University | Library

Letter

Covalent ‘click chemistry’-based attachment of DNA onto solid phase enables iterative molecular analysis Billy T. Lau, and Hanlee P. Ji Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b05139 • Publication Date (Web): 17 Jan 2019 Downloaded from http://pubs.acs.org on January 18, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Covalent ‘click chemistry’-based attachment of DNA onto solid phase enables iterative molecular analysis. Billy T. Lau1 and Hanlee P. Ji1,2,* 1Stanford

Genome Technology Center, Stanford University, Palo Alto CA, 94304 of Oncology, Stanford School of Medicine, Stanford CA, 94305 *Corresponding author. Email: [email protected]. Phone: 650-721-1503. 2Division

ABSTRACT: Molecular analysis of DNA samples with limited quantities can be challenging. Repeatedly sequencing the original DNA molecules from a given sample would overcome many issues related to accurate genetic analysis and mitigate issues with processing small amounts of DNA analyte. Moreover, an iterative, replicated analysis of the same DNA molecule has the potential to improve genetic characterization. Herein, we demonstrate that the use of ‘click’-based attachment of DNA sequencing libraries onto an agarose bead support enables repetitive primer extension assays for specific genomic DNA targets such as gene exons. We validated the performance of this assay for evaluating specific genetic alterations in both normal and cancer reference standard DNA samples. We demonstrate the stability of conjugated DNA libraries and related sequencing results over the course of independent serial assays spanning several months from the same set of samples. Finally, we finally applied this method to DNA derived from a tumor sample and demonstrated improved mutation detection accuracy.

Next generation sequencing (NGS) is increasingly used to analyze genomic DNA from a variety of tissue samples including clinical biopsies1,2. One of the key challenges for NGS analysis involves the scarce quantities of nucleic acid material from specific samples – examples include biopsies of disease tissue or circulating DNA isolated from blood plasma. Limited amounts of tissue samples frequently yield enough DNA or RNA for only a single assay3, limiting the breadth of analyses that can be performed. Because the majority of sequencing assays commonly employ a DNA polymerase to generate copies of a template molecule, we developed a process whereby the original material is preserved for use in subsequent polymerase reactions. In this study, we explore the novel application of biocompatible ‘click chemistry’ reactions on solid support as a novel method to preserve DNA across multiple molecular assays. In targeted sequencing, NGS is used to analyze specific segments of genomic DNA. Also, referred to as deep sequencing, this approaches enables sensitive detection of genetic variation even when these DNA alterations occur in a very low fraction of the available genomic DNA molecules. Oftentimes, these targeted sequencing assays and their various enzymatic steps deplete the available genomic DNA material from a tissue sample. Some samples such as clinical biopsies, provide a limited amount cellular material and nucleic acid. This limitation eliminates the possibility of subsequent replicate or alternative molecular assays to be performed. ‘Click chemistry’ provides an attractive method for conjugation of various biomolecules. Recent developments of this method have lead to simple and rapid reactions between paired reactive species. Specifically, the inverse electron demand Diels-Alder cycloaddition (iEDDA) between tetrazines (Tz) and trans-cyclooctenes4 (TCO) is well suited for the bioconjugation of molecular species at dilute concentrations

ranging in the sub-micromolar range. Frequently, DNA extracted from low cellularity samples such as plasma falls into this concentration range. For this study, we describe a method utilizing iEDDA to covalently tether genomic DNA to a solidphase support. This reaction generates a reusable DNA substrate for iterative polymerase-based enzymatic reactions. Referred to as APEX (attachment-based primer extension), this

Figure 1. Overview of APEX. A DNA sample library is covalently conjugated to functionalized agarose beads. Interrogation of genomic regions by primers and DNA polymerase creates copies of the conjugated template molecules. These copied fragments can then be eluted and sequenced. Because the DNA is covalently conjugated, the process can be repeated.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

molecular sequencing assay provides DNA fragments for nextgeneration sequencing (Figure 1a). As a proof-of-concept demonstration, we validated APEX by performing highly mutiplexed primer extension assays targeting the exons of 185 genes of interest. We show its robustness across reaction conditions and iterative experiments. We observed that covalently attached genomic DNA is stable on this substrate, and molecular assays can be performed over the course of months with negligible detection of degradation. Finally, to demonstrate its clinical applicability, we applied this technique on a patient-derived matched tumor sample. APEX utilizes iEDDA ‘click chemistry’ to conjugate genomic library fragments tailed with TCO-modified nucleotides to a corresponding Tz-functionalized crosslinked agarose substrate (Figure 1b). This variant of click chemistry is several orders of magnitude faster than conventional coppercatalyzed version, obviates the need for copper ions, and is robust at the micromolar concentrations commonly observed in biomolecular samples. The wide variety of buffer conditions in biomolecular assays required an initial validation of iEDDA. First, we characterized the Tz-TCO ligation reaction and verify its applicability for conjugating nucleic acids. Using a TCO-functionalized fluorescent Cy5 dye and dUTP nucleotide, we measured the conjugation efficiency on a corresponding Tz-functionalized crosslinked agarose support (Supporting Information, Experimental Section). Using fluorescence and spectrophotometric measurements respectively, we observed that over 99% of the TCO-functionalized molecules were conjugated after an overnight incubation. Excessive washing of the columns did not yield any fluorescent signal, indicating negligible non-specific adsorption. We validated the iEDDA conjugation performance on DNA from a control human DNA sample (NA12878 DNA) (Supporting Information, Experimental Section). First, we enzymatically ligated only one of two required Illumina sequencing adapters (“P5-Read 1”) onto DNA that was sheared to approximately 500bp. Both adapters (“P5-Read 1” and “P7Read2”) are required for a complete assayable molecule for Illumina NGS. We subsequently PCR amplified the ligated fragments to 1 microgram of DNA. Second, to this modified partial DNA library, we used terminal transferase to add multiple TCO-functionalized dUTPs to the 3’-end of each DNA molecule in solution (Figure 2a). After reaction cleanup, the functionalized DNA is added to the Tz-functionalized crosslinked agarose in a spin column format. A spin column additionally enabled streamlined operation without extensive technological infrastructure. The spin column is then endcapped and the mixture incubated overnight at room temperature. Full details are available in the Experimental Section (Supporting Information). We determined the amount of non-conjugated DNA that could be eluted from the agarose substrate after the conjugation reaction based on the amount of DNA in the eluent. By Nanodrop spectrophotometry, we calculated that the conjugation efficiency of sequencing libraries was 30.1 ± 1.6% (N=4). As enzymatic tailing of TCO-dUTP nucleotides to DNA library molecules results in the incorporation of multiple modified nucleotides, steric hindrance and sequence context may contribute to variations in conjugation efficiency. This phenomenon was previously observed in other ‘click’-based biomolecular assays5,6. Nevertheless, the amount of DNA

Figure 2. Preparation of DNA libraries and performance on control DNA. (A) DNA is fragmented and ligated with a single adapter. Terminal transferase adds TCO-functionalized nucleotides. The DNA library is then added to a spin column loaded with Tz-functionalized crosslinked agarose beads. The subsequent covalent reaction conjugates the DNA library to the beads. (B) Primer extension on NA12878 control genome libraries targeting 185 genes. The boxplot shows the depth of each target exon regions for every target gene. Each replicate is a separate column containing NA12878 genomic DNA. (C) Cumulative distribution function of target exon coverage. Each line represents the cumulative fraction of the number of target exons with a given coverage yield.

library (hundreds of nanograms) conjugated onto agarose beads was sufficient for downstream assays. We validated the compatibility of conjugated DNA with NGS-based assays. An oligonucleotide primer containing the second “P7-Read 2” sequencing adapter is required to generate a complete library molecule compatible for sequencing – the second adapter is incorporated via primer extension reaction. We performed highly multiplexed primer extension targeting of genomic regions of interest on conjugated DNA libraries inside the spin column. Approximately 12.4k unique oligonucleotides were generated by microarray synthesis (Supporting Information, Experimental Section, Figure S-1, and Table S-2) – these DNA primers hybridize to specific sequences flanking the exons of 185 genes, many of which play a role in cancer (Supporting Information, Table S-3). We designed these primers using a previously described strategy for enriching genomic targets7-9. To expand this primer pool for multiplexed primer extension assays, we amplified the oligonucleotides with common flanking primers and subsequently digested them with a type-IIS restriction enzyme and lambda exonuclease to yield oligonucleotides with a common 5’-adapter region and a 40bp 3’ priming sequence. Full experimental details are outlined in the Experimental Section (Supporting Information). We confirmed the primer pool preparation by gel electrophoresis, and tested for singlestrandedness by exonuclease I digestion (Supporting Information, Figure S-1). To perform the primer extension reaction, we incubated four columns, each containing a conjugated NA12878 library with the oligonucleotide primer pool, followed by two wash steps, and then a primer extension reaction (Supporting Information, Experimental Section). We sequenced the

ACS Paragon Plus Environment

Page 2 of 5

Page 3 of 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry eluted fragments (Supporting Information, Table S-4) and performed a series of sequence read alignment and processing procedures to evaluate the assay performance (Supporting Information, Experimental Section). We observed excellent coverage of target exons (Figure 2b), with 98.0 ± 2.0% (N=4) of regions being covered by at least one read (Figure 2c). Overall, the representation of individual primers in the sequence data is relatively even, with over 95% of target regions being within an order of magnitude of the median primer yield (Supporting Information, Figure S-2). However, the spread between exonic coverage will require between-sample normalization in order to accurately measure somatic mutations and copy number changes. We also measured the correlation in target yield between replicate the four control libraries (Supporting Information, Figure S-3), and observed that performance was highly consistent across experiments. Overall, these results demonstrated that the conjugation and molecular assay process is reproducible from experiment to experiment. We determined the limit of detecting genetic variation from the APEX process. These experiments rely a set of admixtures between two reference lymphocyte-derived DNA samples, NA12878 and NA24385; the two DNA samples were mixed together in different concentration ratios, covering a range from 0 to 50%. Detection of the minor DNA component (lower fraction) indicates the sensitivity of detection. The admixture DNA samples underwent the initial step of library processing, were conjugated into the spin column format and then subject to targeted sequencing using the primer mixture previously described. We used the sequencing data to determine the different admixture ratios composed of the two reference DNAs. To quantify these ratios, we relied on the fractional representation of genetic variations specific to the NA24385 DNA. These genetic variants are referred to as single heterozygous single nucleotide variants (SNVs). Counting the sequence reads containing a specific SNV compared to the total number of sequence reads from a given target is a direct measurement of the DNA admixture ratio between the two samples. From the sequencing data representing the target genes, we identified the fractional representation of those SNVs unique to NA24385, meaning that they are not present in the NA12878 reference DNA. For the various admixtures, the sequencing

Figure 3. Genetic variant analysis of admixtures in control and cancer genomes. (A) Admixtures of the NA24385 and NA12878 DNA samples for APEX processing and sequencing. The presence of known genetic variants (i.e. allele) unique to NA24385 were measured and compared to the expected admixture ratio. Dashed line indicates perfect concordance. Pearson correlation: 0.878. (B) Admixtures of a cancer reference standard were spiked into NA12878 at varying ratios. The presence of known variants unique to the cancer standard were measured and compared to the expected allele fraction. Dashed line indicates perfect concordance. Pearson correlation analysis: 10% spike-in: 0.787, 50% spike-in: 0.879, 100% spike-in: 0.941. Inset: Boxplot of observed allelic fractions at 0% spike-in

libraries had similar performance metrics and sequencing coverage distributions (Supporting Information, Figure S-4 and Table S-4). We observed that the mean read fraction corresponding to the SNVs unique to NA24385 correlated with the expected input admixture ratio (Figure 3a). Below a ~15% SNV fraction, we observed a decline in correlation. At this lower admixture ratio, this variance may be due to errors introduced by the molecular assay and the sequencing process. In particular, the substantial error rate of Bst polymerase10,11 could be a major contributor in the reduction in performance at low minor allelic fractions. We performed some additional validation experiments using admixtures of a reference DNA sample from a cancer cell line and the NA12878 reference DNA. This cancer reference standard is derived from cancer cell lines - the cancer mutations have been verified and the fraction were confirmed with digital PCR. We performed several dilutions of the cancer reference DNA into the NA12878 control DNA, and generated DNA libraries for conjugation onto agarose beads (Supporting Information, Experimental Section). Conducting a similar analysis as previously described, we observed the correlations of measured versus expected SNV allele fractions to be 0.787, 0.879, and 0.941 for the 10%, 50%, and 100% cancer reference standard spike-in respectively (Figure 3b). DNA conjugated with our APEX platform is stable across multiple molecular assays. To assess the stability of DNA fragments conjugated to the crosslinked agarose beads, we sought to measure the overall complexity of conjugated DNA molecules during the NGS assay. Higher complexity of DNA fragments across iterative assays indicates an overall retention of molecules. We used molecular barcodes in the form of random DNA tags that are ligated onto genomic DNA during library preparation12. Molecular barcodes are indicators of single molecules independent of the actual identity of the DNA

Figure 4. Measuring substrate stability with molecular barcodes. (A) Conjugated DNA libraries are ligated with an adapter containing a molecular barcode. Primer extension copies this molecular barcode sequence and can be measured across repeated assays. The maintenance of target yield and molecular barcode diversity indicates stable conjugation, while reductions in these metrics indicates loss of DNA material from the beads. (B) The depth per target exon normalized by the total number of reads is shown per repeated assay. Iteration number represents the number of repeat assays performed on the same column. (C) The number of unique molecular barcodes per exon normalized by the total number of reads is shown per repeated assay. Iteration number represents the number of repeat assays performed on the same column.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

insert sequence. The identities and abundances of these tags is thus a marker of the stability of the conjugation: maintenance of tag complexity would indicate stable conjugation, while a reduction in complexity would indicate substantial losses (Figure 4a). We conjugated libraries from NA12878 genomic DNA that contained molecular barcodes (Supporting Information, Experimental Section). We performed a series of iterative primer extension reactions on the same substrate over the course of several months. To assess performance, we measured the overall depth of enriched target reactions normalized by sequencing depth (Figure 4b). This indicated that performance is maintained over repeated experiments. We also counted the overall molecular barcode diversity (Supporting Information, Experimental Section). Here, we measured the number of unique molecular barcodes found in each target region normalized by the total number of sequenced reads (Figure 4c). We did not observe an overall decline in molecular barcode diversity, which indicates stable retention of DNA on the beads. We also measured the duplication rate of observed molecular barcodes at each exon; except for iterations 3 and 4 we did not measure a substantial change in the duplication rate (Supporting Information, Figure S-5). Lastly, we applied the APEX process for storing and analyzing patient-derived DNA samples. Frequently, clinical biopsies provided limited amounts of DNA, thus providing a perfect example of application of APEX. We conjugated DNA libraries derived from a matched-normal colorectal tumor sample (from the same patient) to crosslinked agarose beads (Supporting Information, Experimental Section). Multiplexed primer extension yielded similar performance to control samples, with excellent sequencing coverage uniformity and relative coverage of individual primers (Supporting Information, Fig. S-6). We determined the presence of cancerspecific gene copy number changes. This analysis involved comparing the relative yields of each primer between the paired tumor and normal sample. Here, we observed evidence of increased copy number of several genes (Figure 5a) as indicated by the increased target sequence read counts found in the tumor DNA compared to the matched normal DNA sample. Notably, we observed increased read counts for multiple exons in NOTCH1, whose activation in the Wnt pathway results in increased proliferation in colorectal cancer13,14. We also measured the dependence of candidate copy number variant regions on the sequenced depth in the normal sample (Supporting Information, Figure S-7). We did not observe a dropout in sequencing coverage for potential amplifications and vice versa for deletions, indicating that exon-to-exon variability in sequencing coverage did not affect our results. Iterative assays on the same sample provides an opportunity to improve the accuracy of DNA sequencing assays. We performed a cancer mutation analysis of the conjugated patientderived sample across two iterative primer extension assays. When filtering for somatic variants observed in both assays, we observed concordance in a number of mutations (Figure 5b, Supporting Information, Table S-5). Remarkably, these mutations correlated strongly in allele frequency, implying that such assays can be effectively used to confirm somatic mutation calls. Discordant variant calls may be due to a combination of experimental and bioinformatic factors; in particular, the inherent error-prone nature of Bst polymerase may be a major contributing factor and area for future optimization.

Figure 5. Analysis of patient-derived samples. (A) Analysis of copy number variation by primer extension yield. The yield of target DNA from a matched tumor and normal sample were used to derive a heuristic copy number profile. Each point represents a separate exon target. Colors represent different chromosomes. Exon targets with a tumor-to-normal ratio of over 3 or less than 0.5 are labeled with their gene target. Only targets with a normalized depth of greater than 0.1 in the normalized sample are considered as candidates. (B) Analysis of somatic mutations with iterative analysis. The allele frequency of somatic mutations is plotted for each iterative assay. Red points indicate somatic variants that were not detected in an iterative assay. Black points indicate variants found across both assays. Each of these points is labelled with their corresponding gene target.

Overall, our results demonstrate the potential of APEX in analyzing patient-derived clinical samples. Despite the wide range in DNA quantities across biopsy types (eg. cell-free DNA to tissue biopsies), covalent attachment of such fragments with APEX will enable multiple replicate or orthogonal assays.

ASSOCIATED CONTENT Supporting Information Figures S-1-5, Tables S-1-5, and experimental section are detailed in Supporting Information as a PDF. The Supporting Information is available free of charge online. Sequence data is available on NCBI’s SRA under the accession number SRP167034.

AUTHOR INFORMATION Corresponding Author *Corresponding author. Email: [email protected]. Phone: 650-721-1503.

Author Contributions B.T.L. performed the experiments and analyzed the data. B.T.L. and H.P.J. designed the experiments. B.T.L. and H.P.J. wrote the manuscript. All authors have given approval to the final version of this manuscript.

Notes The authors declare no competing financial interests.

ACKNOWLEDGMENT This work was supported by US National Institutes of Health grants NHGRI P01HG000205 (to B.T.L. and H.P.J.), NCI R33CA174575 (to H.P.J.) and NHGRI R01HG006137 (to H.P.J.). The American Cancer Society provided support to H.P.J. (Research Scholar grant, RSG-13-297-01-TBG). H.P.J. also received support from the Doris Duke Charitable Foundation, the Clayville Foundation, the Seiler Foundation and the Howard Hughes Medical Institute.

REFERENCES (1) Garraway, L. A. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 2013, 31, 1806-1814.

ACS Paragon Plus Environment

Page 4 of 5

Page 5 of 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry (2) Hyman, D. M.; Taylor, B. S.; Baselga, J. Cell 2017, 168, 584599. (3) Meyerson, M.; Gabriel, S.; Getz, G. Nature reviews. Genetics 2010, 11, 685-696. (4) Karver, M. R.; Weissleder, R.; Hilderbrand, S. A. Bioconjugate chemistry 2011, 22, 2263-2270. (5) Khatwani, S. L.; Mullen, D. G.; Hast, M. A.; Beese, L. S.; Distefano, M. D.; Taton, T. A. Bioorganic & medicinal chemistry 2012, 20, 4532-4539. (6) van Buggenum, J. A. G. L.; Gerlach, J. P.; Eising, S.; Schoonen, L.; van Eijl, R. A. P. M.; Tanis, S. E. J.; Hogeweg, M.; Hubner, N. C.; van Hest, J. C.; Bonger, K. M.; Mulder, K. W. Scientific Reports 2016, 6, 22675. (7) Shin, G.; Grimes, S. M.; Lee, H.; Lau, B. T.; Xia, L. C.; Ji, H. P. Nature communications 2017, 8, 14291. (8) Hopmans, E. S.; Natsoulis, G.; Bell, J. M.; Grimes, S. M.; Sieh,

W.; Ji, H. P. Nucleic Acids Research 2014, 42, e88-e88. (9) Myllykangas, S.; Buenrostro, J. D.; Natsoulis, G.; Bell, J. M.; Ji, H. P. Nat Biotechnol 2011, 29, 1024-1027. (10) de Bourcy, C. F.; De Vlaminck, I.; Kanbar, J. N.; Wang, J.; Gawad, C.; Quake, S. R. PloS one 2014, 9, e105585. (11) Potapov, V.; Fu, X.; Dai, N.; Correa, I. R., Jr.; Tanner, N. A.; Ong, J. L. Nucleic Acids Res 2018, 46, 5753-5763. (12) Kivioja, T.; Vaharautio, A.; Karlsson, K.; Bonke, M.; Enge, M.; Linnarsson, S.; Taipale, J. Nature methods 2011, 9, 72-74. (13) Ishiguro, H.; Okubo, T.; Kuwabara, Y.; Kimura, M.; Mitsui, A.; Sugito, N.; Ogawa, R.; Katada, T.; Tanaka, T.; Shiozaki, M.; Mizoguchi, K.; Samoto, Y.; Matsuo, Y.; Takahashi, H.; Takiguchi, S. Oncotarget 2017, 8, 60378-60389. (14) Zhang, Y.; Li, B.; Ji, Z. Z.; Zheng, P. S. Cancer 2010, 116, 5207-5218.

For TOC Only

ACS Paragon Plus Environment