www.acsnano.org
Quantum Point Contact Single-Nucleotide Conductance for DNA and RNA Sequence Identification Sepideh Afsari,†,‡ Lee E. Korshoj,†,‡ Gary R. Abel, Jr.,†,‡ Sajida Khan,†,‡ Anushree Chatterjee,†,§ and Prashant Nagpal*,†,‡,§,⊥ †
Department of Chemical and Biological Engineering, ‡Renewable and Sustainable Energy Institute (RASEI), §BioFrontiers Institute, and ⊥Materials Science and Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States S Supporting Information *
ABSTRACT: Several nanoscale electronic methods have been proposed for high-throughput single-molecule nucleic acid sequence identification. While many studies display a large ensemble of measurements as “electronic fingerprints” with some promise for distinguishing the DNA and RNA nucleobases (adenine, guanine, cytosine, thymine, and uracil), important metrics such as accuracy and confidence of base calling fall well below the current genomic methods. Issues such as unreliable metal−molecule junction formation, variation of nucleotide conformations, insufficient differences between the molecular orbitals responsible for singlenucleotide conduction, and lack of rigorous base calling algorithms lead to overlapping nanoelectronic measurements and poor nucleotide discrimination, especially at low coverage on single molecules. Here, we demonstrate a technique for reproducible conductance measurements on conformation-constrained single nucleotides and an advanced algorithmic approach for distinguishing the nucleobases. Our quantum point contact single-nucleotide conductance sequencing (QPICS) method uses combed and electrostatically bound single DNA and RNA nucleotides on a self-assembled monolayer of cysteamine molecules. We demonstrate that by varying the applied bias and pH conditions, molecular conductance can be switched ON and OFF, leading to reversible nucleotide perturbation for electronic recognition (NPER). We utilize NPER as a method to achieve >99.7% accuracy for DNA and RNA base calling at low molecular coverage (∼12×) using unbiased single measurements on DNA/RNA nucleotides, which represents a significant advance compared to existing sequencing methods. These results demonstrate the potential for utilizing simple surface modifications and existing biochemical moieties in individual nucleobases for a reliable, direct, single-molecule, nanoelectronic DNA and RNA nucleotide identification method for sequencing. KEYWORDS: biophysics, STM break junction, single-molecule conductance, single-molecule DNA sequencing, RNA sequencing
N
successful techniques that can identify individual nucleotides within a single DNA molecule using single measurements or low molecular coverage (repeat measurements on the same nucleotide) still remain elusive.17,18 This can be seen in metrics such as base calling confidence, false-positive and false-negative calls, and confusion matrix analyses, which fall well short of currently used genome sequencing methods. These shortcomings occur in part due to unreliable metal−molecule junctions (point of contact on a DNA nucleobase), variable nucleotide conformations, and sample noise, all of which lead to large variance in single-molecule conductance and hinder practical applications.1,19−21 Additionally, insufficient differ-
ext-generation single-molecule DNA sequencing will require unambiguous identification of DNA nucleotides in an enzyme-free, low-cost, and highly accurate method that does not involve synthetic amplification or complex sample preparation.1−8 Furthermore, RNA sequencing is currently accomplished by reverse transcription into cDNA, followed by DNA sequencing. Since the reverse transcription process is prone to errors, the problem of determining correct RNA sequences in the absence of a widely used direct RNA sequencing method is accentuated.9 Recent developments in direct electronic identification of DNA nucleotides rely on techniques to measure structure-dependent charge transfer through single nucleotides in molecular junctions formed between two nanoelectrodes.10−16 While these designs have shown promising “electronic fingerprints” for detecting DNA nucleotides using a large number of ensemble measurements, © 2017 American Chemical Society
Received: August 2, 2017 Accepted: October 2, 2017 Published: October 2, 2017 11169
DOI: 10.1021/acsnano.7b05500 ACS Nano 2017, 11, 11169−11181
Article
Cite This: ACS Nano 2017, 11, 11169-11181
Article
ACS Nano
Figure 1. DNA surface immobilization and transmission through single-nucleotide junctions using a cysteamine SAM. (A) Schematic for the formation of single-nucleobase junctions using the STM-BJ technique. (Inset) Molecular structure of a single-adenosine junction using a cysteamine SAM. (B) Tapping mode AFM image of a sample prepared using overnight adsorption of poly(dA)50 at a concentration of 1.0 nM, in which individual DNA strands are visible on the cysteamine SAM. (C) UHV-STM image of poly(dC)100 on a cysteamine SAM, obtained at a temperature of 12 K and a pressure of 2 × 10−10 Torr, Vbias= 1.5 V, and It = 200 pA. (D) Conductance histograms from STM-BJ measurements of ethanedithiol (orange) and cysteamine (purple) at −0.10 V, composed of 1526 and 1439 individual current−distance curves, respectively (2509 and 4784 individual current−distance curves prior to filtering, respectively). Structures of both molecules are shown. (E) Conductance histogram from STM-BJ measurements of ethanedithiol (orange) and cysteamine (purple) at −0.50 V, comprising 2422 and 3034 individual current−distance curves, respectively. (F) Conductance histogram comparing single adenine nucleotides in DNA vs RNA (dATP vs ATP) at −0.50 V, composed of 285 and 585 individual current−distance curves, respectively. The conductance features are nearly identical, indicating that the additional hydroxyl group in ATP does not significantly perturb conductance signatures, and this technique is directly applicable to RNA.
ences between nucleobase molecular orbitals contribute to overlapping conductance measurements and require robust algorithms for base calling.8 Recent studies have used functionalized gap junctions to reduce nucleotide entropy and utilize recognition tunneling to identify individual DNA nucleotides.3,5,6 Strategies for exploiting surface modifications and existing chemical moieties in DNA nucleotides can further improve resolution in conductance signatures and facilitate base calling. Here we show a quantum point contact singlenucleotide conductance sequencing (QPICS) method that forms reliable quantum point contact junctions with existing chemical moieties on single nucleotides within DNA and RNA macromolecules. This claim is validated using a theoretical formalism of Landauer transmission coefficients calculated for single-nucleotide measurements. We further verify the reliable quantized metal−molecule junctions by reversibly modifying the anchoring groups to turn quantized conductance ON and
OFF. Single or low-coverage measurements were analyzed with two recognition algorithms rooted in machine learning to identify specific charge conduction signatures in individual measurements for sequencing applications: a Gaussian base calling algorithm (GABA) and peak correlation for nucleotides (PECAN) algorithm. Not only do these results prove successful at enhancing nucleobase recognition, but they are further significant since such molecular conductance measurements can be accomplished using a number of techniques such as mechanical break junctions, scanning tunneling break junctions, conductive atomic force microscopy, nanopores, fixed nanoscale electrodes, electromigration or electrochemical deposition, or other nanostructure-based contact junction methods.
RESULTS AND DISCUSSION Measurements of Single Nucleotides on a Cysteamine Surface. Charge transport through single-molecule 11170
DOI: 10.1021/acsnano.7b05500 ACS Nano 2017, 11, 11169−11181
Article
ACS Nano
adsorption of DNA on the cysteamine SAM keeps the nucleotides immobilized as the STM-BJ forms single-nucleotide junctions. This resulted in single-molecule junctions with a reproducible geometry where the STM tip attaches to an anchoring group on the nucleobases whenever it is brought into contact with the DNA or RNA adlayer on the surface (Figure 1A). Thus, we adopted this approach for preparing samples for single-molecule quantized conductance measurements in order to test our theoretical formalism with experimental measurements. In the STM-BJ experiments for QPICS measurements of DNA and RNA nucleotides, the STM gold tip was repeatedly brought in and out of contact with the single-crystal Au(111) substrate. A single current−distance measurement, or spectrum, consists of 304 data points collected in 99% identification of G in five measurements, the second step can reach the same level of identification of A in five measurements, and the last step can discriminate C and T to the same degree in six measurements. Therefore, the designed sequencing scheme only requires ∼12× coverage ((5 + 10 + 16 + 16)/4) to achieve >99.8% accuracy for DNA base calling, as seen in trace plots (Figure 6B) and confusion matrices (Figure 6C). For comparison, the commercially available Oxford Nanopore sequencers are often run at 30−60× coverage, but with some assembly algorithms, 16× coverage has been shown to detect variants at 99% accuracy.41,42 Additionally, the Illumina sequencing by synthesis platform often requires >100× coverage depending on the application.43 Our results present a potential sequencing method capable of exceeding the performance of the most commonly used and state-of-the-art sequencing platforms currently available. 11178
DOI: 10.1021/acsnano.7b05500 ACS Nano 2017, 11, 11169−11181
Article
ACS Nano
Figure 7. QPICS-NPER sequence identification method and metrics for RNA. (A) Conductance signatures for uracil (U) at −0.50 V for pH 4−5 and pH 3. No attenuation of peaks is observed at lower pH, the same as seen for thymine (T) in DNA. (B) The third step in the sequence identification scheme applied to RNA: cytosine (C) and U nucleobases are determined at −0.50 V and pH 3 since C junctions are turned OFF at pH 3 while U junctions remain ON. The first two steps are the same as in DNA since nucleobase structure remains the same. (C) Probability values (obtained from the base calling algorithm), confidence of base calling, and accuracy (X indicates incorrect calls) for the proposed sequencing scheme leading to >99.7% accuracy (5 spectra at −0.30 V pH 4−5, 5 spectra at −0.20 V pH 4−5, and 6 spectra at −0.50 V pH 3). (D) Confusion matrix resulting from the proposed sequencing scheme for RNA.
In order to extend our method to RNA, the same QPICSNPER scheme was used with U in place of T. We have already verified that the conductance signatures for A, G, and C are unaltered between DNA and RNA. Due to the difference in structure between T and U nucleobases, we measured a separate conductance signature for U at −0.50 V bias (as seen in Figure 2A). As expected, the U signature closely resembles that of T (three conductance peaks), and the signature remains ON at both pH 4−5 and pH 3 (Figure 7A). The QPICS scheme (calling T/U) is repeated for RNA and shown in Figure 7B. For RNA, the designed sequencing scheme with ∼12× coverage ((5 + 10 + 16 + 16)/4) achieved >99.7% accuracy for base calling, as seen in trace plots (Figure 7C) and confusion matrices (Figure 7D). Overall, we have demonstrated that QPICS-NPER can achieve very high accuracy (>99.8% and >99.7%) with low coverage (∼12×) for single-nucleotide identification, using simple surface modifications and existing biochemical moieties in well-designed, high-throughput measurements.
calling calculations using experimental data provide a robust demonstration of using the QPICS technique, with an unmodified tip, for reading single-molecule DNA and RNA sequences with quantum conductance measurements. We demonstrated that increasing coverage (using multiple STMBJ spectra measurements) can be used to achieve highly accurate base calls while improving confidence and signal-tonoise level. Furthermore, we showed that varying the bias and pH conditions can help selectively identify nucleotides with improved accuracy and confidence by turning conductance signatures ON and OFF (>99.7% for DNA/RNA with ∼12× coverage), which exceeds comparable metrics demonstrated by current second- and next-generation sequencing technologies. Overall, this study helps to lay the groundwork for a singlemolecule, label-free DNA and RNA sequencing platform that is accurate and inexpensive using nanoelectronic measurements with unmodified nucleic acids and a metal tip.
CONCLUSION Identifying individual nucleotides in a single nucleic acid strand is the first step in designing a technique that can accurately read the base sequence of single DNA or RNA molecules. In this study, we presented a strategy for fabricating a nucleic acid adlayer on an electrode surface that enables formation of junctions with individual nucleotides in long DNA and RNA strands with reproducible geometry between two gold electrodes. These reproducible, single-nucleotide junctions result in conductance histograms with well-defined signature peaks that can be employed for statistical identification of nucleotides in DNA and RNA from their measured conductivity. Our base
Chemicals. Cysteamine (≥98% titration, Sigma Life Science), sulfuric acid (99.8%, anhydrous, Sigma-Aldrich), and single-stranded homo-oligomers of DNA and RNA poly(dA)100, poly(dG)15, poly(dC)100, poly(dT)100, and poly(dU)15 (Invitrogen, USA) were used. Cysteamine solutions (2 mM) for normal SAM preparation were made in dilute sulfuric acid (pH 4−5) in DNase- and RNase-free water. Some cysteamine solutions (also 2 mM) for pH studies were made in concentrated sulfuric acid (pH 1) in DNase- and RNase-free water. Nucleic acid oligomers were dissolved in dilute sulfuric acid solutions (pH 4−5) in DNase- and RNase-free water at a concentration of 1 × 10−9 M and stored at −20 °C until used. Electrodes and Cleaning. The Au(111) electrode for STM imaging and break junction experiments was a single crystal disc purchased from Princeton Scientific Corp. STM tip electrodes used for
METHODS
11179
DOI: 10.1021/acsnano.7b05500 ACS Nano 2017, 11, 11169−11181
Article
ACS Nano break junction experiments were prepared by mechanically cutting a gold wire (99.998%, 0.25 mm diameter, Alfa Aesar). The STM tip used for imaging was electrochemically etched Pt/Ir (80/20, 0.25 mm diameter, Keysight Technologies CA, USA). The gold crystal disc, presenting well-ordered Au(111) single crystal facets on which wide (∼100 nm) terraces could be easily found, was used as the substrate. Before all experiments, the substrate, the Teflon cell, and the O-ring (Viton) were cleaned by immersion in hot piranha solution, 1:3 H2O2 (J. T. Baker, CMOS)/H2SO4 (96%, J. T. Baker, CMOS), for 1 h and then rinsed and heated in ultrapure deionized (DI) water obtained from a Barnstead Thermolyne NANOpure Diamond purification system equipped with a UV lamp, water resistivity >18 MΩ cm (Caution! The piranha solution is a very strong oxidizing reagent and can be dangerous; protective equipment including gloves, goggles, and a lab coat should be used at all times). A hydrogen flame was used to anneal the crystal, followed by quenching in ultrapure DI water. The gold disc, Teflon cell, and O-ring were then dried under a stream of nitrogen gas. The cell was quickly set up, a cysteamine solution was added to cover the electrode, and the cell was installed in the microscope. Cysteamine SAM. We used an electrochemical adsorption method for preparing the cysteamine SAM. Electrochemical adsorption was performed with a PicoScan STM system (Keysight). The PicoPlus bipotentiostat (Keysight) was used to control the surface and tip potential independently. The Au(111) crystal disc was used as the working electrode. The cysteamine solution was used as the electrolyte under potential for 1 h (2 mM cysteamine in dilute sulfuric acid pH 4−5 for normal setup and concentrated sulfuric acid pH 1 for pH studies, as previously described). A silver wire and a platinum wire were used as quasi-reference electrode and counter electrode, respectively, for dilute sulfuric acid solutions. For concentrated sulfuric acid solutions, a platinum wire was used as the quasi-reference electrode instead of silver. Nucleic Acid Adlayer on the Cysteamine SAM. For preparing the DNA adlayer, the 1 nM solutions of DNA in dilute sulfuric acid (pH 4−5) were added on top of the electrochemically adsorbed cysteamine SAM and left to adsorb overnight. The crystal was then rinsed with 500 μL of dilute sulfuric acid solution (pH 4−5) and dried under a stream of nitrogen gas. AFM Imaging. Imaging was carried out using a 5500 model atomic force microscope with a multipurpose scanner, manufactured by Keysight. Images were acquired while operating the AFM in semicontact (tapping) mode under an aqueous sodium acetate imaging buffer (acetic acid/sodium acetate, 10 mM total, pH ∼5). Imaging was performed using silicon tips mounted on silicon nitride cantilevers with a nominal spring constant of ∼0.2−0.4 N/m and a resonant frequency of approximately 15−30 kHz in liquid (model SNL-10, manufactured by Bruker, USA). STM Imaging. STM images were collected with a RHK PanScan Freedom microscope using electrochemically etched 0.25 mm diameter Pt/Ir (80/20) tips (Keysight). The STM image shown in Figure 1C was obtained at 12 K, 2 × 10−10 Torr, Vbias= 1.5 V, and It = 200 pA. STM-BJ Experiments. The STM-BJ experiments were carried out with a Keysight microscope in ambient conditions (in air). A 1 nA/V preamplifier was used for all single-molecule conductance measurements. STM software (PicoView 1.14) drove the gold tip to approach the gold surface using the applied bias voltage. The tip was then retracted at a sweep rate of 39 nm/s to break the contact. During the retracting process, the current−distance traces were recorded (304 data points collected in