Conformational Smear Characterization and Binning of Single

Oct 10, 2017 - Electronic conduction or charge transport through single molecules depends primarily on molecular structure and anchoring groups and fo...
0 downloads 10 Views 5MB Size
Article pubs.acs.org/JACS

Cite This: J. Am. Chem. Soc. 2017, 139, 15420-15428

Conformational Smear Characterization and Binning of SingleMolecule Conductance Measurements for Enhanced Molecular Recognition Lee E. Korshoj,†,‡ Sepideh Afsari,†,‡ Anushree Chatterjee,†,§ and Prashant Nagpal*,†,‡,§,∥ †

Department of Chemical and Biological Engineering, ‡Renewable and Sustainable Energy Institute (RASEI), §BioFrontiers Institute, and ∥Materials Science and Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States S Supporting Information *

ABSTRACT: Electronic conduction or charge transport through single molecules depends primarily on molecular structure and anchoring groups and forms the basis for a wide range of studies from molecular electronics to DNA sequencing. Several highthroughput nanoelectronic methods such as mechanical break junctions, nanopores, conductive atomic force microscopy, scanning tunneling break junctions, and static nanoscale electrodes are often used for measuring single-molecule conductance. In these measurements, “smearing” due to conformational changes and other entropic factors leads to large variances in the observed molecular conductance, especially in individual measurements. Here, we show a method for characterizing smear in single-molecule conductance measurements and demonstrate how binning measurements according to smear can significantly enhance the use of individual conductance measurements for molecular recognition. Using quantum point contact measurements on single nucleotides within DNA macromolecules, we demonstrate that the distance over which molecular junctions are maintained is a measure of smear, and the resulting variance in unbiased single measurements depends on this smear parameter. Our ability to identify individual DNA nucleotides at 20× coverage increases from 81.3% accuracy without smear analysis to 93.9% with smear characterization and binning (SCRIB). Furthermore, merely 7 conductance measurements (7× coverage) are needed to achieve 97.8% accuracy for DNA nucleotide recognition when only low molecular smear measurements are used, which represents a significant improvement over contemporary sequencing methods. These results have important implications in a broad range of molecular electronics applications from designing robust molecular switches to nanoelectronic DNA sequencing.



INTRODUCTION Charge conduction through single molecules forms the basis for a large number of studies from single-molecule circuits in molecular electronics1−4 to molecular recognition and identification.5−13 Since charge transport in single-molecule circuits relies on intrinsic properties like molecular structure, the resulting electronic orbitals, local environment, and anchoring groups,14−19 charge conduction can be used as a probe for recognizing chemically distinct molecules or be manipulated by appropriately designing molecules.20−22 While such approaches can lead to single-molecule recognition and the fabrication of molecular building blocks for nanoscale electronics, progress has been hindered in part because of molecular conformations and entropy.23−29 These factors result in large variances in the observed conductance from unbiased individual measurements, likely due to molecular orbital perturbations from the conformational changes and interaction with the substrate.30−35 This molecular conformational “smear” diminishes the reliability of single-molecule circuits and accuracy of molecular recognition. Conventionally, single-molecule conductance measurements are collected as current-distance spectra, where steps or plateaus in current are observed as charges tunnel through a © 2017 American Chemical Society

molecular bridge attached to two retracting electrodes. Spectra are then combined to produce a histogram where a characteristic molecular conductance can be determined. Recent studies have analyzed the shape of plateaus in individual current-distance measurements and suggested that classification of diverse plateau shapes may provide insights into the junction dynamics, such as molecular binding conformation and conformational changes, although these questions remain unanswered.36,37 Here, we show a method of characterizing molecular smear in individual measurements based on the plateau distance or the distance over which molecular junctions are maintained. We hypothesized that the variance in this distance is a result of molecular conformations and hence can be used as a smear parameter. This is validated by observing a direct correlation with variance in individual conductance measurements and charge conduction parameters. We further demonstrate how this smear characterization improves the recognition of four DNA nucleobases (adenine (A), guanine (G), cytosine (C), thymine (T)) within a DNA macromolecule from single-molecule scanning tunneling Received: August 8, 2017 Published: October 10, 2017 15420

DOI: 10.1021/jacs.7b08246 J. Am. Chem. Soc. 2017, 139, 15420−15428

Article

Journal of the American Chemical Society

Figure 1. Smear characterization and binning. (A) Conductance histogram signatures for all nucleobases (A, G, C, T) at −0.50 V bias, from 250 to 700 individual current-distance measurements on homologous oligomers. (B) Smear is a measure of the molecular conformation and entropy with respect to the surface substrate and/or surface molecules. Here, C nucleotides electrostatically immobilized on a cysteamine SAM are in the low smear state when oriented vertically, perpendicular to the surface and have minimal to no mixing of molecular orbital states with the surface and/or surface molecules. The high smear state occurs when at large angles with respect to vertical, there is greater molecular orbital interaction with the surface and/or surface molecules. (C) Observing the impact of smear on conductance signatures for C. As smear parameter SP increases (increasing distance over which the molecular junction exists), the variance in the signature histogram peaks also increases. Measurements can be binned according to the SP. (D) Quantifying the increase in variance (related to fwhm, denoted smear factor SF) with SP for C. (E) Distributions of SP for each nucleotide. These distributions determined how measurements were binned.

measurements and substantially improves the accuracy for identifying nucleobases. Therefore, the success of smear characterization through SCRIB demonstrates that the proposed method can evaluate molecular conformation and entropy in single nanoelectronic conductance measurements.

microscopy break junction (STM-BJ) measurements. Reliable electronic junctions were formed during quantum point contact measurements on single nucleotides, with conductance histogram signatures for each nucleobase shown in Figure 1A. By analyzing the measurements using the proposed molecular smear parameter and binning them accordingly in a process we denote as smear characterization and binning (SCRIB), we can achieve higher resolution between signatures for the different nucleobases. We also developed two recognition algorithms to identify specific charge conduction signatures in individual measurements for sequencing applications: a Gaussian base calling algorithm (GABA) and a peak correlation for nucleotides (PECAN) algorithm. These algorithms perform a machine learning classification of single measurements as A, G, C, and T, and we used them to compare base calling with and without SCRIB analysis. The SCRIB process better characterizes the structure-dependent molecular orbitals (including their interaction with the substrate) from single-molecule conduction



RESULTS AND DISCUSSION STM-BJ Measurements. In our single-molecule conductance measurements, we used a positively charged selfassembled monolayer (SAM) of cysteamine molecules on a single-crystal Au(111) substrate to electrostatically bind negatively charged DNA molecules (Figure 1B, detailed information in the Experimental Section). The cysteamine SAM was used to confine DNA molecules in a conformation with the phosphate backbone downward and individual nucleotides pointing upward away from the surface, facilitating reliable junction formation and precise quantum point contact

15421

DOI: 10.1021/jacs.7b08246 J. Am. Chem. Soc. 2017, 139, 15420−15428

Article

Journal of the American Chemical Society

Figure 2. Verifying smear with transmission model calculations. (A) Electronic charge conduction pathways available for purines (A used as example) versus pyrimidines (C used as example). The various nitrogen anchoring groups and conduction pathways within the different nucleobases give rise to the unique signatures and Landauer transmission model (histogram peaks corresponding to the unique anchoring groups are labeled in bold). (B) Observing the impact of smear on transmission parameter values for A. As the smear bin increases (increasing smear parameter SP), the variance in the calculated purine parameter (transmission coefficient for conjugated ring hopping, T5 in the schematics) also increases. (C) As with A, the variance in calculated pyrimidine parameter (ratio of conductance peaks) increases with smear bin. (D and E) Quantifying the increase in variance (based on the fwhm, denoted smear factor SF) with smear bin for (D) purine A and (E) pyrimidine C.

measurements on single nucleotides. We attempted to further reduce entropy using molecular combing.38,39 In the STM-BJ technique, current-distance traces are recorded as the STM tip is repeatedly brought in and out of contact with the surface substrate. The displacement of the STM tip in relation to the substrate (electrode-electrode distance) gives rise to quasiexponentially decaying traces without molecules, and current plateaus or steps in the current-distance traces when a

molecular junction forms (Figure S1). Histograms made from STM-BJ measurements on the cysteamine SAM by itself (no DNA) showed a double conductance peak17 at −0.10 V bias, which was also seen for ethanedithiol. However, the peaks were not observed in the measurement window at −0.50 V bias, which we then selected as the fixed bias for subsequent molecular conductance measurements (Figure S2) to ensure any signal originated from the molecular adlayer on top of 15422

DOI: 10.1021/jacs.7b08246 J. Am. Chem. Soc. 2017, 139, 15420−15428

Article

Journal of the American Chemical Society

distances over which molecular junctions are maintained (our smear parameter SP) were smaller for purines (A, G) than pyrimidines (C, T), which suggests that this distance does not correlate with the physical length of the molecule in the junction (Figure 1E). Therefore, both of these observations of increased conductance smear with SP, and inverse correlation of molecular size with SP, validate the proposed hypothesis of using smear to qualitatively characterize molecular conformation and entropy. To further test the applicability of using the distance over which a molecular junction is maintained as a smear parameter to evaluate molecular conformation, we calculated charge conduction through the four DNA nucleotides and analyzed the Landauer transmission coefficients40,41 of identical bonds in purines and pyrimidines (Figure 2A) within individual smear bins. Detailed solutions and validation of the transmission model are provided in the Supporting Information and Figures S7−S9. Here, we present evidence from the model supporting the idea of molecular smear characterization. Using A as an example for purines, the charge conduction in single deoxyadenosine nucleotides corresponding to observed peaks A4, A3, and A2 (from high to low conductance) can be expressed as

cysteamine. The surface coverage of DNA increased with concentration up to ∼5 nM, where it appeared to saturate around 2000 molecules·μm−2 (Figure S3). In order to increase the coverage, we extended the adsorption time to overnight, allowing the DNA in solution to bind and the solvent to evaporate before rinsing off excess unbound DNA. With DNA on the cysteamine SAM, the nitrogen groups in DNA nucleobases can chemisorb and act as anchoring groups with a gold STM tip. In our STM-BJ measurements on homologous DNA oligomers (poly(dA)100, poly(dG)15, poly(dC)100, poly(dT)100) on the cysteamine SAM at the selected −0.50 V bias, unique conductance features (in the form of peaks in the histograms) were observed for the four nucleotides (Figure 1A), comprised of 250−700 individual current-distance curves per nucleobase (more details in the Experimental Section). Smear Characterization. In individual current-distance traces comprising the conductance histograms shown in Figure 1A, we observed a range of distances over which the molecular junction is maintained, corresponding to the length of the current plateau in STM-BJ current-distance measurements. This led us to hypothesize that molecular conformation and entropy can be evaluated from the plateau distance, defined as our smear parameter SP (Figure 1B). This SP does not quantify entropy, but gives a qualitative characterization. In vertically oriented molecules, the smear is low, and molecular orbital states (specially HOMO orbitals for molecules between the Au(111) substrate and gold tip) are not mixed with the substrate, resulting in quick breaking of the charge conducting molecular junction. On the contrary, for molecules with larger angles in close proximity to the surface, smear is high, molecular orbital states can be perturbed, and junctions remain intact for longer periods of time as the molecule can be pulled to vertical prior to breaking. Before quantifying this molecular smear, we first characterized the instrument response function (IRF) of our STM-BJ system by collecting measurements on the bare Au(111) substrate. We treated gold atoms as “hard spheres” and characterized the IRF to measure “zero smear” using a quality factor or percentage variance in conduction (Figure S4, ΔG0,f whm/G0, where ΔG0,fwhm is the full width at half-maximum, fwhm, for the quantum conductance peak in the gold histogram, and G0 is the quantum conductance 2e2/h = 7.75 × 10−5 S).15−19 On measurements with immobilized DNA, we observed that while all DNA nucleotides have their characteristic conduction peaks, the variance in molecular conduction, or peak broadening, reduces with SP. This observation is clearly illustrated in Figure 1C for C nucleotides (and in Figure S5 for A, G, and T). When individual currentdistance STM-BJ spectra used to generate the histogram signature for C shown in Figure 1A are first separated based on the smear parameter SP (as detailed in Figure S6), a direct correlation is seen between variance and SP. This evidence provides a validation for our hypothesis that molecules at larger angles and in close proximity with the substrate likely show higher intermixing of molecular states and in turn larger variance, or molecular smear, in conduction. To quantify this molecular smear using a smear factor SF, we calculated the fwhm normalized by peak conductance minus the IRF (SF = ΔGfwhm/G − IRF, Figure 1D). Note that simple stretching of molecular bonds vertically, without any change in conformation, is unlikely to induce such pronounced effect on conductance, especially due to applicability of the Landauer transmission formalism and prior studies on charge conduction.16,18,30−34,40 Furthermore, we observed that the average

G A4 G0

G A3 G0 GA2 G0

= T1 · T 2 · T 3

= T1 · T 2 · T 4 · T 3

= T1 · T 2 · T 5 · T 3

(1)

(2)

(3)

T1 is a lumped transmission coefficient for the electrodemolecule connections, cysteamine, phosphate backbone, and deoxyribose sugar, while T2 − T5 are the transmission coefficients for bonds within the purine ring and conjugated ring hopping (Figure 2A). Using C as an example for pyrimidines, the charge conduction expression for peak C2 in deoxycytidine nucleotides is G C2 G0

= T1 · T 6 · T 7

(4)

T6 and T7 are transmission coefficients for bonds within the pyrimidine ring (Figure 2A). To further illustrate and prove the concept of molecular smear, we used single STM-BJ currentdistance measurements to calculate a distribution of transmission values for the range of smear parameter SP bins (schematic for this process shown in Figure S8). To use individual A measurements to extract Landauer transmissions, we created a histogram from a single current-distance measurement and identified conductance values from spikes greater than one count in the histogram within ± fwhm of the known signature peak positions for A (A2, A3, and A4). Then, we calculated a mean peak location weighted by the counts for each spike, e.g., GA4 = ∑CA4,i·GA4,i/∑CA4,i. From the weighted mean peak locations, we then obtained a mean transmission coefficient value (e.g., T4 = GA3/GA4 and T5 = (GA2/GA3)·T4, with expressions from eqs 1−3) and also a transmission coefficient value for individual spikes (e.g., T4i = GA3,i/GA4 with count CA3,i and T5i = (GA2,i/GA3)·T4 with count CA2,i). Using T5 15423

DOI: 10.1021/jacs.7b08246 J. Am. Chem. Soc. 2017, 139, 15420−15428

Article

Journal of the American Chemical Society

Figure 3. Smear signatures and base calling algorithm with smear. Conductance signatures for all nucleobases (A, G, C, T) with (A) low smear, (B) high smear, and (C) all combined smear. (D) Deconvolution of conductance signatures and algorithm for using single spectra to make a base call. (Left) Each peak within the signature for A is fitted with a Gaussian to extract peak centers and fwhm. From this, thresholds and Pearson correlation coefficients are generated (shown in the Supporting Information). (Right) (1) A single unknown current-distance measurement is collected; (2) the smear parameter SP is calculated; (3) the measurement is binned based on its SP; (4) a histogram is generated from the single measurement; and (5) the histogram is compared to the known signatures using the GABA+PECAN algorithm. Steps 2 and 3 are only used with SCRIB, and they are skipped without SCRIB analysis.

the molecular junction was maintained (smear parameter SP), resulting in a separate signature for each bin, as previously seen in Figures 1C and S5−S6. Low smear and high smear bin signatures for all nucleobases A, G, C, and T are shown in Figure 3A,B, respectively. For the control analysis (no SCRIB), all individual current-distance traces were combined into a single histogram and hence a single signature (Figure 3C, same as Figure 1A). We developed two algorithms, GABA and PECAN, which we used together to match a minimal number of single current-distance measurements with the known conductance histogram signatures. We first deconvoluted signatures by fitting each peak to a Gaussian and extracting the center location and fwhm (Figure 3D, example shown for A at the left). Then, we calculated Pearson correlation coefficients and threshold count values to understand how individual spectra contribute to the multitude of peaks in the overall histogram signature (values compiled in Figure S10). In GABA, individual current-distance spectra are compared to the Gaussian fittings for each peak in the signature for all nucleobases. Specifically, a probability is computed from the number of counts seen in a histogram from the single measurement within the fwhm region of known reference signatures for A, G, C, and T. While this algorithm allows statistical mapping of single measurements to the signature library, single-molecule conductance measurements have further salient features that are not captured by this algorithmic probability alone. Since not all peaks seen in an ensemble of

(transmission for conjugated ring hopping) calculations from all A nucleotide measurements, we observed that variance in this transmission parameter increased with the smear parameter SP bin (Figure 2B,D). For C nucleotides, no transmission values can be directly solved, so we take the ratio of peaks C2 and C1, which correspond well with the double conduction peaks in cysteamine and ethanedithiol (Figure S2)17 as the transmission parameter. Similar to A, we observed an increase in parameter variance with SP bin (Figure 2C,E). This is for using a similar smear factor SF for transmission parameters as conductance (ΔTfwhm/T). These observations further validate this smear characterization for analyzing molecular conformation and entropy. Enhancing Molecular Recognition with SCRIB. In order to test the efficacy of the SCRIB method for enhancing molecular identification using single conductance measurements, we analyzed our ability to recognize nucleobases with and without smear characterization. To do this, features in conductance histograms from an ensemble of STM-BJ measurements on known nucleotides must be used as signatures to accurately identify (base call) unknown nucleobases from a single measurement or small sets of measurements. Our signatures were compiled from hundreds of STM-BJ measurements on individual nucleotides in homologous DNA oligomers (as previously described, and in the Experimental Section). For SCRIB analysis, all individual STMBJ current-distance traces were binned on distance over which 15424

DOI: 10.1021/jacs.7b08246 J. Am. Chem. Soc. 2017, 139, 15420−15428

Article

Journal of the American Chemical Society

Figure 4. Enhanced molecular recognition with SCRIB. Increasing levels of nucleobase identification accuracy with coverage for (A) no SCRIB, (B) SCRIB, and (C) low smear measurements. (D) Comparing base calling accuracies at specific coverages for the cases of no SCRIB, SCRIB, and low smear measurements.

where n is the number of peaks seen in the reference signature for the i nucleobase, j and k are the specific peak numbers (i.e., 1, 2, 3, etc.), p is the number of points (or counts) in the fwhm region of each peak, F is the fwhm, c is the correlation coefficient for two peaks, and T is the threshold value for each peak. In this calculation for Pi, the first term (GABA portion) is normalized by the number of peaks and fwhm to prevent overcalling nucleotides for which the reference signature shows more peaks (such as A) or has a larger fwhm. The second term (PECAN portion) is specifically designed to give both positive and negative values, leading to a larger resolution, or separation, in probability values. These two algorithms together (GABA +PECAN) provide a better probability match for identifying unknown molecules (single nucleotides here) from single conductance measurements. When we evaluated the accuracy of identifying the four DNA nucleotides using a single or a small number of repeat measurements on DNA oligomers, it verified our hypothesis that SCRIB can be a very useful tool for recognition using single conductance measurements (details on calculations given in the Supporting Information). Using the GABA+PECAN algorithm on molecular conductance measurements without

measurements can be observed in each individual measurement, the calculated Pearson correlation coefficients for every combination of peaks are used by the PECAN algorithm to weight single measurements according to conductance values where the largest number of counts are observed. Steps 1, 4, and 5 in Figure 3D illustrate this process when not using SCRIB: For single current-distance measurements, a histogram is generated and compared to the known signature Gaussians. For SCRIB analysis, steps 1−5 in Figure 3D are followed: The smear parameter is measured for each single spectra and is binned accordingly before being compared to reference signatures. Overall we combined GABA+PECAN for a base calling algorithm that uses fitted Gaussians to signatures, threshold count values, and correlation coefficients to determine how closely individual STM-BJ measurements relate to the known signatures, in the form of a probability value, P, for each nucleobase i: pi , j

n

Pi =

∑j=1 F

i ,j

n

n

+

n

∑∑ j=1 k=1

ci , j , k(pi , j − Ti , j)(pi , k − Ti , k) 2·Fi , j·Fi , k

(5) 15425

DOI: 10.1021/jacs.7b08246 J. Am. Chem. Soc. 2017, 139, 15420−15428

Article

Journal of the American Chemical Society

Figure 5. DNA base calling with SCRIB. Probability values (obtained from the base calling algorithm), confidence of base calling, and accuracy (X indicates incorrect calls) using 20× coverage for (A) no SCRIB and (B) SCRIB and (C) 7× coverage for low smear. SCRIB analysis significantly enhances both accuracy and confidence of base calling. Confusion matrices detailing base calling tendencies are shown to the right for each case.

increase in the ability to identify all four DNA nucleotides (Figure 4D). These calculated accuracies at modest to low coverages meet or exceed the performance of the most widely used sequencing technologies. For example, Oxford Nanopore sequencers are typically run at 30−60× coverage, and the Illumina sequencing platforms often requires >100× coverage for comparable accuracies.42−44 We further evaluated the recognition capabilities using trace plots and confusion matrices which show whether the molecule was correctly identified as one of the four DNA nucleotides, with 20× coverage for measurements without SCRIB and with SCRIB and 7× coverage for low smear (Figure 5A−C). In addition to accuracy, confidence is an important metric for assessing molecular recognition (specifically for sequencing applications) and is shown in the trace plots of Figure 5A−C. The confidence in calling a particular base can be calculated using the probability values from the base calling algorithm Ci = (Pi − Pj)/Pi, where Ci is the confidence for calling base i, Pi is the probability value associated with the called base, and Pj is the second largest probability (for the second most probable

SCRIB, we observed an overall base calling accuracy of 81.2% with 20 measurements, or 20× coverage (Figure 4A). However, as hypothesized, by binning individual measurements on smear with SCRIB, we achieved much greater accuracy (93.9%) with the same 20× coverage (Figure 4B). For this SCRIB analysis, the binning process carries a second probability calculated from the distribution of smear parameter SP values in Figure 1E, along with the GABA+PECAN probabilities for the specific bin. To significantly illustrate the full potential of smear characterization, we also performed base calling with only low smear measurements. For this low smear analysis, we achieved 97.8% base calling accuracy with only 7× coverage (Figure 4C). Although this result clearly outlines the effectiveness of our smear characterization and how smear binning combines molecules with similar conformation and entropic smear, it slightly exaggerates the resolution and capability of molecular identification since real unbiased single-molecule measurements consist of a mixture of smear values (Figure 1E). Analyzing the accuracy of molecular identification using these three methods (no SCRIB, SCRIB, and low smear), we observed a substantial 15426

DOI: 10.1021/jacs.7b08246 J. Am. Chem. Soc. 2017, 139, 15420−15428

Article

Journal of the American Chemical Society

terraces could be easily found, was used as the substrate. Before all experiments, the substrate, the Teflon cell, and the O-ring (Viton) were cleaned by immersion in hot piranha solution 1:3 H2O2 (J. T. Baker, CMOS):H2SO4 (96%, J. T. Baker, CMOS) for 1 h and then rinsed and heated in ultrapure deionized (DI) water obtained from a Barnstead Thermolyne NANOpure Diamond purification system equipped with a UV lamp, water resistivity >18 MΩ·cm (Caution! The piranha solution is a very strong oxidizing reagent and can be dangerous. Protective equipment including gloves, goggles, and a lab coat should be used at all times). A hydrogen flame was used to anneal the crystal, followed by quenching in ultrapure DI water. The gold disc, Teflon cell, and O-ring were then dried under nitrogen gas. The cell was quickly set up, cysteamine solution was added to cover the electrode, and the cell was installed in the microscope. Cysteamine SAM. We used an electrochemical adsorption method for preparing the cysteamine SAM. Electrochemical adsorption was performed with a PicoScan STM system (Keysight). The PicoPlus bipotentiostat (Keysight) controlled the surface and tip potential independently. The Au(111) crystal disc was the working electrode. The cysteamine solution was the electrolyte under potential for 1 h (2 mM cysteamine in dilute sulfuric acid pH ∼ 4−5). A silver wire and a platinum wire were used as quasi-reference electrode and counter electrode, respectively. DNA Adlayer on the Cysteamine SAM. For preparing the DNA adlayer, 1 nM solutions of DNA in dilute sulfuric acid (pH ∼ 4−5) were added on top of the electrochemically adsorbed cysteamine SAM and allowed to sit overnight. The crystal was then rinsed with 500 μL of dilute sulfuric acid solution (pH ∼ 4−5) and dried under a stream of nitrogen gas. STM-BJ Measurements. The STM-BJ experiments were carried out with a Keysight microscope in ambient conditions (in air). A 1 nA/V preamplifier was used for all single-molecule conductance measurements. STM software (PicoView 1.14) drove the gold tip to approach the gold surface using the applied bias voltage. The tip was then retracted at a sweep rate of 39 nm/s to break the contact, during which current-distance traces were recorded over a total of 4.5 nm. The process of forming and breaking junctions was repeated many times, and a large number of current-distance traces were recorded, typically 2000−4000 traces for statistical analysis.

base). Analyzing the confidence, or percentage difference between the highest and second highest probability values, and finally all probabilities for repeat measurements on each nucleotide, we observed the SCRIB method increased both the confidence and base calling accuracy for molecular identification. While the trace plots in Figure 5A−C only show a range of 30 base calls at the particular coverage, the full range (all 800 base calls) can be seen in Figures S11−S13, with the correct sequence of calls in Figure S14. Rigorous metrics on molecular recognition, specifically indication of true positive/ false positive and true negative/false negative base calls, can be seen in the confusion matrix analysis (Figure 5A−C, right panels), where we clearly observed significantly improved base calling using SCRIB.



CONCLUSION We define an important method for characterizing conformations and entropy in single-molecule conductance measurements, in the form of a parameter called molecular smear. We prove that this molecular smear is measurable, quantifiable, and can significantly improve molecular recognition by conductance, as specifically demonstrated for the four nucleotides in DNA. The proposed smear characterization showed a direct correlation between the variance, or spread in single-molecule conductance, and distance over which molecular junctions were maintained. Since such variance results in broadening of singlemolecule conductance measurements and makes molecular recognition difficult, we used measurements on four DNA nucleotides as a test case for demonstrating the efficacy of the proposed smear characterization and binning (SCRIB) method. Using unbiased single-nucleotide measurements and no SCRIB analysis, 20 measurements (or 20× coverage) resulted in only 81.3% accuracy for identifying nucleobases, with low confidence of base calling. With smear binning and use of low smear measurements, merely 7 measurements (or 7× coverage) resulted in 97.8% accuracy for identifying nucleobases, with high confidence. Overall, this study lays the groundwork for a robust molecular smear characterization and binning method for utilizing single-molecule conductance for molecular electronic circuits or chemical identification. It also paves the way for a robust single-molecule DNA sequencing platform capable of producing high accuracies with a minimal number of repeat measurements, or low coverage, on individual nucleotides far surpassing existing sequencing platforms. Therefore, these results can have important implications for the broader fields of molecular electronics and nanoelectronic single-molecule identification.





ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/jacs.7b08246. STM-BJ general information, data processing for conductance histograms, characterization of DNA adlayers on cysteamine, determination of IRF, calculating smear parameter and binning, Landauer transmission model, details on base calling and molecular recognition calculations, and additional figures as described in text (PDF)



EXPERIMENTAL SECTION

Reagents. Cysteamine (≥98% titration, Sigma Life Science), sulfuric acid (99.8%, anhydrous, Sigma-Aldrich), and single-stranded homologous oligomers of DNA: poly(dA)100, poly(dG)15, poly(dC)100, poly(dT)100 (Invitrogen, USA). Cysteamine solutions (2 mM) for normal SAM preparation were made in dilute sulfuric acid (pH ∼ 4−5) in DNase- and RNase-free water. DNA oligomers were dissolved in dilute sulfuric acid solutions (pH ∼ 4−5) in DNase- and RNase-free water at a concentration of 1 × 10−9 M and stored at −20 °C until used. Electrodes and Cleaning. The Au(111) electrode for STM-BJ experiments was a single-crystal disc purchased from Princeton Scientific Corp. STM tip electrodes used were prepared by mechanically cutting a gold wire (99.998%, 0.25 mm diameter, Alfa Aesar). For STM-BJ experiments, the gold crystal disc, presenting wellordered Au(111) single crystal facets on which wide (∼100 nm)

AUTHOR INFORMATION

Corresponding Author

*[email protected] ORCID

Anushree Chatterjee: 0000-0002-8389-9917 Prashant Nagpal: 0000-0002-7966-2554 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors acknowledge funding for this work from W. M. Keck Foundation and partial support through National Science Foundation Soft Materials MRSEC at the University of 15427

DOI: 10.1021/jacs.7b08246 J. Am. Chem. Soc. 2017, 139, 15420−15428

Article

Journal of the American Chemical Society

(31) Venkataraman, L.; Klare, J. E.; Nuckolls, C.; Hybertsen, M. S.; Steigerwald, M. L. Nature 2006, 442, 904. (32) Moresco, F.; Meyer, G.; Rieder, K.; Tang, H.; Gourdon, A.; Joachim, C. Phys. Rev. Lett. 2001, 86 (4), 672. (33) Mishchenko, A.; Vonlanthen, D.; Meded, V.; Bu, M.; Li, C.; Pobelov, I. V.; Bagrets, A.; Viljas, J. K.; Pauly, F.; Evers, F.; Mayor, M.; Wandlowski, T. Nano Lett. 2010, 10, 156. (34) Quek, S. Y.; Kamenetska, M.; Steigerwald, M. L.; Choi, H. J.; Louie, S. G.; Hybertsen, M. S.; Neaton, J. B.; Venkataraman, L. Nat. Nanotechnol. 2009, 4, 230. (35) Xie, Z.; Baldea, I.; Demissie, A. T.; Smith, C. E.; Wu, Y.; Haugstad, G.; Frisbie, C. D. J. Am. Chem. Soc. 2017, 139, 5696. (36) Inkpen, M. S.; Lemmer, M.; Fitzpatrick, N.; Milan, D. C.; Nichols, R. J.; Long, N. J.; Albrecht, T. J. Am. Chem. Soc. 2015, 137 (31), 9971. (37) Lemmer, M.; Inkpen, M. S.; Kornysheva, K.; Long, N. J.; Albrecht, T. Nat. Commun. 2016, 7, 12922. (38) Bensimon, A.; Simon, A.; Chiffaudel, A.; Croquette, V.; Heslot, F.; Bensimon, D. Science 1994, 265, 2096. (39) Zheng, H.; Pang, D.; Lu, Z.; Zhang, Z.; Xie, Z. Biophys. Chem. 2004, 112, 27. (40) Landauer, R. Phys. Lett. A 1981, 85 (2), 91. (41) Nitzan, A. Annu. Rev. Phys. Chem. 2001, 52, 681. (42) Loman, N. J.; Quick, J.; Simpson, J. T. Nat. Methods 2015, 12 (8), 733. (43) Ashton, P. M.; Nair, S.; Dallman, T.; Rubino, S.; Rabsch, W.; Mwaigwisya, S.; Wain, J.; O’Grady, J. Nat. Biotechnol. 2015, 33 (3), 296. (44) Jain, M.; Olsen, H. E.; Paten, B.; Akeson, M. Genome Biol. 2016, 17 (1), 239.

Colorado through NSF Award DMR 1420736. L.E.K. acknowledges financial support from National Science Foundation Graduate Research Fellowship Program under grant no. DGE 1144083.



REFERENCES

(1) Aviram, A.; Ratner, M. A. Chem. Phys. Lett. 1974, 29 (2), 277. (2) Nitzan, A.; Ratner, M. A. Science 2003, 300, 1384. (3) Aradhya, S. V.; Venkataraman, L. Nat. Nanotechnol. 2013, 8 (6), 399. (4) Xiang, D.; Wang, X.; Jia, C.; Lee, T.; Guo, X. Chem. Rev. 2016, 116, 4318. (5) Branton, D.; Deamer, D. W.; Marziali, A.; Bayley, H.; Benner, S. A.; Butler, T.; Di Ventra, M.; Garaj, S.; Hibbs, A.; Huang, X.; Jovanovich, S. B.; Krstic, P. S.; Lindsay, S.; Ling, X. S.; Mastrangelo, C. H.; Meller, A.; Oliver, J. S.; Pershin, Y. V.; Ramsey, J. M.; Riehn, R.; Soni, G. V.; Tabard-Cossa, V.; Wanunu, M.; Wiggin, M.; Schloss, J. A. Nat. Biotechnol. 2008, 26 (10), 1146. (6) Garaj, S.; Hubbard, W.; Reina, A.; Kong, J.; Branton, D.; Golovchenko, J. A. Nature 2010, 467, 190. (7) Heerema, S. J.; Dekker, C. Nat. Nanotechnol. 2016, 11 (2), 127. (8) Lagerqvist, J.; Zwolak, M.; Di Ventra, M. Nano Lett. 2006, 6 (4), 779. (9) Lindsay, S.; He, J.; Sankey, O.; Hapala, P.; Jelinek, P.; Zhang, P.; Chang, S.; Huang, S. Nanotechnology 2010, 21, 262001. (10) Huang, S.; He, J.; Chang, S.; Zhang, P.; Liang, F.; Li, S.; Tuchband, M.; Fuhrmann, A.; Ros, R.; Lindsay, S. Nat. Nanotechnol. 2010, 5 (12), 868. (11) Di Ventra, M.; Taniguchi, M. Nat. Nanotechnol. 2016, 11 (2), 117. (12) Lindsay, S. Nat. Nanotechnol. 2016, 11 (2), 109. (13) Howorka, S. Nat. Nanotechnol. 2017, 12 (7), 619. (14) Haiss, W.; Wang, C.; Grace, I.; Batsanov, A. S.; Schiffrin, D. J.; Higgins, S. J.; Bryce, M. R.; Lambert, C. J.; Nichols, R. J. Nat. Mater. 2006, 5 (12), 995. (15) Salomon, A.; Cahen, D.; Lindsay, S.; Tomfohr, J.; Engelkes, V. B.; Frisbie, C. D. Adv. Mater. 2003, 15 (22), 1881. (16) Engelkes, V. B.; Beebe, J. M.; Frisbie, C. D. J. Am. Chem. Soc. 2004, 126 (43), 14287. (17) Li, X.; He, J.; Hihath, J.; Xu, B.; Lindsay, S. M.; Tao, N. J. Am. Chem. Soc. 2006, 128 (6), 2135. (18) Lindsay, S. M.; Ratner, M. A. Adv. Mater. 2007, 19, 23. (19) Chen, F.; Hihath, J.; Huang, Z.; Li, X.; Tao, N. J. Annu. Rev. Phys. Chem. 2007, 58, 535. (20) Su, T. A.; Neupane, M.; Steigerwald, M. L.; Venkataraman, L.; Nuckolls, C. Nat. Rev. Mater. 2016, 1 (3), 16002. (21) Venkataraman, L.; Klare, J. E.; Tam, I. W.; Nuckolls, C.; Hybertsen, M. S.; Steigerwald, M. L. Nano Lett. 2006, 6 (3), 458. (22) Venkataraman, L.; Park, Y. S.; Whalley, A. C.; Nuckolls, C.; Hybertsen, M. S.; Steigerwald, M. L. Nano Lett. 2007, 7 (2), 502. (23) Li, C.; Pobelov, I.; Wandlowski, T.; Bagrets, A.; Arnold, A.; Evers, F. J. Am. Chem. Soc. 2008, 130 (17), 318. (24) Dell, E. J.; Capozzi, B.; Dubay, K. H.; Berkelbach, T. C.; Moreno, J. R.; Reichman, D. R.; Venkataraman, L.; Campos, L. M. J. Am. Chem. Soc. 2013, 135 (32), 11724. (25) Zhang, X.-G.; Krstic, P. S.; Zikic, R.; Wells, J. C.; FuentesCabrera, M. Biophys. J. 2006, 91, L04. (26) Girdhar, A.; Sathe, C.; Schulten, K.; Leburton, J. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (42), 16748. (27) Tsutsui, M.; Taniguchi, M.; Yokota, K.; Kawai, T. Nat. Nanotechnol. 2010, 5 (4), 286. (28) Tabard-cossa, V.; Trivedi, D.; Wiggin, M.; Jetha, N. N.; Marziali, A. Nanotechnology 2007, 18, 305505. (29) Krstic, P.; Ashcroft, B.; Lindsay, S. Nanotechnology 2015, 26, 84001. (30) Donhauser, Z. J.; Mantooth, B. A.; Kelly, K. F.; Bumm, L. A.; Monnell, J. D.; Stapleton, J. J.; Jr, D. W. P.; Rawlett, A. M.; Allara, D. L.; Tour, J. M.; Weiss, P. S. Science 2001, 292, 2303. 15428

DOI: 10.1021/jacs.7b08246 J. Am. Chem. Soc. 2017, 139, 15420−15428