Universal Readers Based on Hydrogen Bonding or π−π Stacking for Identification of DNA Nucleotides in Electron Tunnel Junctions Sovan Biswas,†,‡,¶ Suman Sen,†,‡,¶ JongOne Im,†,§,¶ Sudipta Biswas,†,‡ Predrag Krstic,∥ Brian Ashcroft,† Chad Borges,†,‡ Yanan Zhao,†,§ Stuart Lindsay,*,†,‡,§ and Peiming Zhang*,† †
Biodesign Institute, ‡School of Molecular Sciences and §Department of Physics, Arizona State University, Tempe, Arizona 85287, United States ∥ Institute for Advanced Computational Science, Stony Brook University, Stony Brook, New York 11794-5250, United States S Supporting Information *
ABSTRACT: A reader molecule, which recognizes all the naturally occurring nucleobases in an electron tunnel junction, is required for sequencing DNA by a recognition tunneling (RT) technique, referred to as a universal reader. In the present study, we have designed a series of heterocyclic carboxamides based on hydrogen bonding and a large-sized pyrene ring based on a π−π stacking interaction as universal reader candidates. Each of these compounds was synthesized to bear a thiolated linker for attachment to metal electrodes and examined for their interactions with naturally occurring DNA nucleosides and nucleotides by 1H NMR, ESI-MS, computational calculations, and surface plasmon resonance. RT measurements were carried out in a scanning tunnel microscope. All of these molecules generated electrical signals with DNA nucleotides in tunneling junctions under physiological conditions (phosphate buffered aqueous solution, pH 7.4). Using a support vector machine as a tool for data analysis, we found that these candidates distinguished among naturally occurring DNA nucleotides with the accuracy of pyrene (by π−π stacking interactions) > azole carboxamides (by hydrogen-bonding interactions). In addition, the pyrene reader operated efficiently in a larger tunnel junction. However, the azole carboxamide could read abasic (AP) monophosphate, a product from spontaneous base hydrolysis or an intermediate of base excision repair. Thus, we envision that sequencing DNA using both π−π stacking and hydrogenbonding-based universal readers in parallel should generate more comprehensive genome sequences than sequencing based on either reader molecule alone. KEYWORDS: DNA sequencing, recognition tunneling, universal reader, hydrogen bonding, π−π stacking, abasic site
A
of the nucleotides in the narrowest part of the pore channel, from which a DNA sequence can be deduced. Since there is no theoretical limit on a length of DNA for the translocation, nanopore sequencing can overcome the short read limitation of next generation DNA sequencing (NGS). The commercially
n electronic readout of nucleobases based on their physical properties provides a direct way to sequence DNAa process of determining the precise order of nucleobases in a DNA strand. It is exemplified by nanopore DNA sequencing. The nanopore is an orifice with a nanometerscale diameter that passes an ionic current under a voltage bias in electrolyte solution. When a single-stranded DNA polyanion is electrophoretically translocated through the nanopore, it blocks the ionic current by an amount that depends on the size © 2016 American Chemical Society
Received: September 25, 2016 Accepted: November 17, 2016 Published: November 17, 2016 11304
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
www.acsnano.org
Article
ACS Nano
Figure 1. Schematic illustration of (A) A tunneling device embedded in a nanopore to read DNA bases when they sequentially translocate through a nanopore. (B) Recognition interactions in the nanogap where reader molecules (universal reader) attached to the electrodes catch a DNA base by forming a hydrogen-bonded complex to cause electronic spikes as nucleotides bind and unbind.
Figure 2. Universal reader candidates derived from the imidazole-2-carboxamide structure.
been working on development of an electron tunneling-based DNA sequencing technology. Given that the tunneling current is highly sensitive to changes in distance (∼ an order of magnitude per Å), tunneling measurements may be able to achieve high spatial resolution for sequencing DNA since the distance between two adjacent bases in a single-stranded DNA is >3.4 Å. Figure 1A illustrates an approach to reading DNA sequences by a tunnel junction embedded in a solid-state nanopore. As a single-stranded DNA is translocated by electrophoresis to pass the tunnel junction, the individual nucleobases will induce fluctuations of tunneling current. It has been demonstrated that nucleoside monophosphates and oligonucleotides can generate tunneling currents in a small nanogap composed of bare metal electrodes (2 nm, a size that will allow single-stranded DNA to pass. Preliminary data show that Iz recognizes all of the DNA bases with ∼80% accuracy on average for a single read.16 This encouraged us to search for new chemical structures for the RT readout, as we also work on improvements of the physical measurement and data analysis. In this manuscript, we report on our efforts in increasing the calling accuracy of RT from fine-tuning the chemical structure of Iz to exploring the new interaction forces.
RESULTS AND DISCUSSION Molecular Design. A univesal reader should be flexibile enough to effectively interact with each of nucleobases. Composed of a recognition moiety (imidazole-2-carboxamide) and an attachemnt moiety (thiolated two-carbon alkyl chain), Iz was designed to carry multiple hydrogen-bonding sites and a 11306
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano Table 1. Association Constants with Standard Deviations Determined from Curve Fitting of NMR Titration Data in Chloroforma substrate
titrator
Iz-Sb
dA dC dG dT dA dC dG dT dA dC dG dT dAc dCc dG dTc dT
Bz-S
Tz-S
Pr-S
dA a
log β11 1.08 3.10 3.31 1.46 1.53 2.54 2.70 1.81 1.15 1.89 2.12 1.20 0.18 0.73 2.25 0.32 1.44
± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±
log β12
0.09 0.10 0.02 0.05 0.04 0.10 0.04 0.01 0.01 0.04 0.03 0.02 0.01 0.03 0.08 0.01 0.07
log β11′
log β12′
5.04 ± 0.09 5.49 ± 0.01 2.79 3.83 5.18 2.79
± ± ± ±
0.03 0.08 0.01 0.02
1.33 2.44 2.66 1.62
± ± ± ±
0.07 0.07 0.01 0.01
2.68 4.01 5.15 2.65
± ± ± ±
0.10 0.05 0.05 0.04
Each number is an average of two individual experiments. bAdopted from ref 14. cThe interactions were too weak to be accurately determined.
Therefore, the π−π stacking could be a hydrogen-bonding substitute for reading nucleobases with the possibility of enhanced electron transfer. Computer modeling shows that these candidate molecules can form complexes with both purine and pyrimidine nucleobases, in which two sulfur atoms are separated by a distance of 2.2 nm (Figure 3). We have used palladium as tunneling electrodes due to its metal−oxide−semiconductor (CMOS) compatibility and better tunneling conductance compared to gold electrodes.23 Given a ∼ 0.24 nm length for the Pd−S bond,24 RT would be able to occur in a junction with a size around ∼2.7 nm. The large-sized junction would provide a manufacturing advantage over narrower ones (≤1 nm). Computer simulation also indicates that pyrene can form a sandwiched complex with a purine (guanine) ring, which is more conductive than with the pyrimidine (uracil) in a nanogap when they are optimally stacked without steric hindrance (Figure S1 and Section 1 in Supporting Information, hereinafter referred to as SI). Structural Features of Universal Reader Candidates. We have developed facile routes to synthesizing these new universal reader candidates (SI, section 2). The amide group in each of the hydrogen-bonding-based universal readers is connected to the heterocyclic ring via a σ bond so it can freely rotate. Thus, Pr would be expected to adopt two preferential conformations, designated as syn and anti (Figure 4A).25 In contrast, the azole ring can exist in a tautomeric form due to interconversion of the N−H proton between the ring nitrogen atoms. Our previous study indicated that the tautomeric proton of Iz would preferably take a configuration with the NH2 of the amide at a trans position to the ring NH,14 which is stabilized by an intramolecular O···N−H hydrogen bond. This type of hydrogen bonding can also exist in Bz and Tz (Figure 4A). 1H NMR data show that the imidazole has two tautomers distributed with a ratio of 1:0.95 (Figure 4B). Similarly, both Bz and Tz show the tautomerization in their NMR spectra as well, but their distributions are less uniform with ratios of 1:0.72 and 1:0.60:0.24, respectively (Figure 4B). We also notice that the ring protons have different chemical shift with an order of triazole’s (Tz) > benzoimidzole’s (Bz) >
imidazole’s (Iz), which implies that as hydrogen-bonding donors, these N−H protons form hydrogen bonds with different strengths to the same acceptors. We have examined the hydrogen-bonding interactions of these universal reader candidates with DNA bases by NMR titration in an aprotic solvent. To do so, these molecules were attached with lipophilic groups (designated as Iz-S, Bz-S, Tz-S, and Pr-S; see Section 3 of SI for their structures and synthesis). Because of their limited solubility in chloroform, these compounds were only used as substrates for the NMR titration, where they were held at a concentration with minimum selfassociation. Meanwhile, four naturally occurring DNA nucleosides (designated as dA, dC, dG, and dT) were protected on their hydroxyls with tert-butyldimethylsilyl (TBDMS) groups making them soluble in chloroform. Previously, we determined the association constants of (Iz-S) with DNA bases by monitoring changes in chemical shifts of the amide protons (see Table 1).14 In the same manner, we first carried out a control study by titrating dA with dT in chloroform, from which the association constant of dA base pairing to dT was determined as log β11 = 1.44 (Table 1), close to a value reported in literature.26 In turn, we determined association constants of Bz-S, Tz-S, and Pr-S interacting with the DNA nucleosides. As examples, Figure 5A,B shows changes in chemical shift of Tz-S’s amide and Bz-S’s ring N−H proton
Figure 5. NMR titration spectra of Tz-S’s amide proton (A) and BzS’s ring N−H proton (B) in different concentrations at 297 K. 11307
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano Table 2. Calculated Physical Properties of DNA Bases and Reader Moleculesa
dipole moment (D) surface area (Å2) log P
A
C
G
T
Iz
Py
Bn
2.58 164.76 −1.07
6.21 146.37 −0.76
7.12 174.37 −1.36
4.76 161.57 −0.36
1.46 159.26 −0.96
1.41 276.24 1.40
1.57 175.7 2.57
a
The calculations were carried out using DFT of B3LYP/6-311+G(2df, 2p) in software Spartan’14; dipole moments of DNA bases were calculated with R = CH3, and the surface areas calculated based on the CPK model; Red arrows indicate directions of dipole moments.
Table 3. Characteristic MS Peaks of 1:1 and 2:1 Complexes of Bzc with Nucleobases and Their MS/MS Products
more stable complexes with both dA and dT, which indicates that there was a multiplexed hydrogen-bonding interaction between them. The NMR titration with Bz-S shows that both tautomeric protons can interact with DNA nucleosides. In addition, Bz-S and Iz-S can form 1:2 triplets with DNA nucleosides, which are significantly more stable than the 1:1 complexes. Due to insufficient solubility of these substrates in chloroform, however, we could not perform reversed NMR titration to determine if 2:1 complexes can form between these universal reader candidates and DNA nucleosides. For the stacking-based reader, we calculated size, dipole moment, and hydrophobicity (log P) of Py and related molecules by means of density functional theory (DFT), as listed in Table 2. 2-Phenylethane-1-thiol (Bn) was used as a control for the stacking interaction. It has been shown that the pyrene ring can strongly stack on DNA bases with a free energy (ΔG) of ∼ −3.4 kcal/mol,27 compared to a benzene ring (ΔG = ∼ −1.0 kcal/mol).28 In general, the stacking interaction depends on surface area, polarizability, and hydrophobicity of the aromatic system.27,29 Based on our calculations, these properties are significantly different among the nucleobases,
when titrated with dT, respectively. In chloroform, the two tautomeric protons of Bz-S′ clearly show up in the 1H NMR spectra. The chemical shifts of both amide and tautomer N−H protons moved to lower fields with an increase in concentrations of the titrant, implying the association of Bz-S with dT is through the hydrogen bonding. The titration data were analyzed using HypNMR 2008 program (a commercial product, http://www.hyperquad.co.uk/), from which association constants (β) deduced, expressed as log β11 for a 1:1 complex and log β12 for a 1:2 complex of substrate to titrator (Table 1). At first glance, Iz-S and Bz-S form more stable 1:1 complexes with DNA nucleosides than Tz-S and Pr-S. The interactions of Pr-S with DNA nucleosides (except dG) were very weak, so that their association constants could not be determined accurately with the NMR titration. These could be explained by more loss in entropy on Tz-S and Pr-S interacting with DNA nucleosides than Iz-S and Bz-S because Tz-S has more tautomeric forms and Pr-S′ amide has more freedom to rotate. Also, these molecules form the hydrogen-bonding complexes with their stabilities in an order of dG > dC > dT > dA. Compared to the natural dA-dT base pair, Bz-S forms 11308
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano
Figure 6. RT current−time recordings generated with (i) Bz, (ii) Iz, (iii) Pr, and (iv) Tz functionalized probes and substrates at a set point of 4 pA and 0.5 V.
solution by ESI-MS and found that these candidates formed both 1:1 and 2:1 complexes with DNA nucleotides with yields in the same range as those just mentioned above (SI, Tables S1−S6). By comparing the data from NMR titration with those from ESI-MS, we noticed that the hydrogen-bonding interactions of the pyrrole-carboxamide (Pr-S) with dA, dC, and dT in chloroform were too weak to be determined by NMR titration. Accordingly, this type of hydrogen-bonding interaction would not be expected to occur in a bulky aqueous solution, but we observed the complexes of Pr with the DNA nucleotides by ESI-MS (Table S6). One explanation is that the hydrogen bonding may be enhanced in a confined environment. In the ESI chamber, the solutes are capsulated in water droplets which undergo a fission process from micrometer to nanometer sizes to gas-phase ions. The droplet evolution can not only stabilize the complexes from solution but also result in formation of new complexes in the gas phase.31 In addition, we did observe the 2:1 complexes of the reader molecules with DNA nucleotides, but their intensities were much weaker than those of 1:1 complexes. Based on the NMR data, the triplets would be much more stable than the 1:1 complexes if they are formed. This can also be explained by the enhanced interaction in a droplet. It has been reported that the hydrogen-bonding complexes can survive from transfer into a gas phase better than those bound largely by the hydrophobic effect.32 Thus, the complexes of universal reader candidate with nucleotides measured by ESIMS in the gas phase may largely be attributed to hydrogen bonding and to some degree the stacking interaction
indicating that RT may be able to read them out through the stacking interactions. Interactions of Universal Reader Candidates with Nucleobases in an Aqueous Environment. We have also examined if these complexes could exist in aqueous solution by means of ESI-MS. The electrospray (ES) is a soft ionization technique, preserving noncovalent complexes during the ion transmission from solution to the gas phase.30 Because Py is not soluble in water, we only tested the hydrogen-bonding readers. First, we carried out a model study on interaction of 1H-benzimidazole-2-carboxamide (Bzc) with 1-methylcytosine, a methylated pyrimidine base, and with 9-methylguanine, a methylated purine base, respectively (see Table 3 for their structures). When an aqueous solution of Bzc and 1methlycytosine mixed with a 2:1 molar ratio was injected to ESI-MS, we observed some characteristic ion peaks: m/z 287.13 (H form) and 309.10 (sodium form), corresponding to a 1:1 complex with an intensity of ∼3%, and m/z 470.18 corresponding to a 2:1 complex with an intensity of 0.01%, which were further confirmed by MS/MS (SI, Figure S2, and Table 3). We also observed that Bzc formed 1:1 and 2:1 complexes with 9-methylguanine with yields of ∼1.4 and 0.1%, respectively, referenced to the base peak (SI, Figure S3, and Table 3). This study demonstrates that benzimidazole-2carboxamide can particularly form noncovalent complexes with DNA bases in aqueous solution. In the same manner, we examined mixtures of the universal reader candidates (Iz, Bz, Tz, and Pr) with DNA nucleoside monophosphates (referred to as dAMP, dCMP, dGMP, and dTMP) in aqueous 11309
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano
Figure 7. Histograms of (A) averaged amplitude and (B) peak width with fitting curves (in colors) of current spikes from interactions of DNA nucleotides with different universal reader candidates at the set point of 0.5 V and 4 pA.
involved.33 This leads us to believe that interactions of a universal reader with nucleobases in the nanodroplets may resemble those in the tunnel junction, where they may be forced to form stable complexes with nucleobases albeit surrounded by water. RT with Hydrogen-Bonding Universal Reader Candidates. A scanning tunneling microscope (STM) was employed to quickly create a tunnel junction for the RT study. In a typical RT experiment, the tunneling current was set at 4 pA with a voltage bias of 0.5 V, which corresponded to a ∼2.4 nm nanogap.34 The RT measurement followed a process of mounting a palladium (Pd) probe and a Pd substrate, both functionalized with the same reader molecules (see SI, section 4 for formation and characterization of reader molecule monolayers on palladium substrates), to a PicoSPM scanning tunneling microscope, stabilizing the tunnel junction in a phosphate buffer (1.0 mM, pH 7.4) until a clean baseline was achieved (∼2 h), introducing a solution of dAMP, dCMP, dGMP, or dTMP (typically 100 μM in a 1.0 mM phosphate buffer, pH 7.4) to the liquid cell, and collecting current recordings for ∼20 min. For each analyte, four separate measurements were carried out with freshly made probes, substrates, and samples. When changing the set-point to 2 pA and 0.5 V, corresponding to a larger gap size, it still read these analytes, but less frequently (see SI, Table S10). Figure 6 shows typical RT spectra of DNA nucleotides acquired with different reader molecules. Each current−time recording shows a
stochastic train of spikes with time, which reflects thermal fluctuations of a molecule trapped in the tunnel junction. We first analyzed these spikes by their averaged peak amplitude (in picoamps, pA), an average of all individual current points constituting a spike, with fitting them to a log-normal function (Figure 7A). It was found that the amplitude had a distribution with a mean value of 4.86 ± 0.05 pA for Pr, 4.82 ± 0.06 pA for Bz, 4.72 ± 0.10 pA for Tz, and 4.58 ± 0.06 pA for Iz on average of those for four DNA nucleotides (SI, Table S11). In the same manner, we determined the peak widths (in milliseconds, ms, Figure 7B) with a mean value of 0.449 ± 0.003 ms for Pr, and 0.449 ± 0.003 ms for Tz, 0.447 ± 0.004 ms for Bz, 0.419 ± 0.003 ms for Iz on average (SI, Table S12). Both amplitude and peak width vary negligibly among different readers as well as among different DNA nucleotides. This suggests that these nucleotides form similar structures with different reader molecules in the tunnel junction, resulting in similar tunneling pathways. Apparently, these individual parameters in the time domain alone cannot be used to identify individual DNA nucleotides. Indeed, a RT spectrum bears rich information on the trapped molecules beyond the above-mentioned parameters. In order to call these DNA nucleotides, the tunneling current data were subjected to Fourier transform followed by cepstrum conversion (see the Methods section for details), which produced a list of features (SI, Table S13) for each of spikes and clusters of spikes. Again, when those individual features 11310
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano
selected features. Plots of accuracies vs number of signal features for reading DNA nucleotides with the universal reader candidates are also shown in Figure 9. Clearly, the accuracy fluctuates with different combinations of features. Such a method of training on a subset of all sixteen data sets (collected with sixteen microscopically-different tunnel junctions) may set an upper limit on accuracy (called “optimistic” accuracy) for each of reader molecules. Table 4 lists the highest accuracies
were utilized to identify DNA nucleotides, we found that none of them alone can effectively distinguish any of two DNA nucleotides from one another (see SI, Figure S10 for examples). Interestingly, a combination of two individual features gives a large improvement in the calling rate. As shown in Figure 8, a two-dimensional (2D) plot, showing the
Table 4. Highest Accuracy (%) Determined by SVM with RT Data
Bz
Figure 8. 2D histograms of different readers’ features where the brightness of each point represents the frequency value of the pair of features for dAMP (red) and dGMP (green), the accuracy (P) with which data can be assigned increases compared to 1D plot. Colors are yellowed with overlapped points.
Iz Pr Tz
frequency with which pairs of parameter values occur together, can separate two DNA nucleotides dAMP (red dots) and dGMP (green dots) from each other with a probability of 0.68 for Bz, 0.7 for Iz, 0.62 for Pr, and 0.79 for Tz (random would be 0.5). The 2D plot demonstrates a multiparameter approach to calling DNA nucleotides from RT data. We have adapted a support vector machine (i.e., SVM), a machine-learning algorithm to carry out the multidimensional analysis, which was previously tested in analysis of RT data generated with Iz.16 Since then, we have optimized experimental conditions for the RT measurements, for example, replacing gold with Pd for electrodes,23 and refined the SVM process (with varied frequency window sizes instead of a uniform size to generate more features, see the Methods section) to increase the calling accuracy. In the present study, we collected 16 sets of RT data for each universal reader candidate with four DNA nucleotides (four sets for each one) and then randomly took 10% from each data set to train the SVM. The training process iteratively reduced the 264 available signal features (SI, Table S13) to a range of smaller numbers that maintains the training data with 100% separation (see the SVM training curve in Figure 9). There were 159 features left for Bz, 55 for Iz, 28 for Pr, and 67 for Tz (SI, Tables S14−S17), which were used to identify DNA nucleotides from the remaining 90% of RT data. With SVM, we first identified and removed those spikes that were common to all analytes owing to contamination, capture events that were insensitive to chemical variation, and noise spikes generated by the STM electronics and servo control, and then assigned each remaining signal to the individual DNA nucleotides using those
a
optimistic predictive optimistic predictive optimistic predictive optimistic predictive
dAMP
dCMP
dGMP
dTMP
98.5 49.9 96.5 94.6 90.1 NFa 94.3 68.6
98.8 37.5 97.4 75.3 89.8 66.7 95.5 63.1
98.7 98.7 96.4 40.3 89.2 80.3 96.5 97.5
98.9 97.0 98.1 44.1 88.2 41.3 99.0 64.7
mean ± σ 98.7 70.8 97.1 63.6 89.3 40.1 96.3 73.5
± ± ± ± ± ± ± ±
0.2 31.2 0.8 26.0 0.8 35.3 2.0 16.2
NF: cannot identify by a trained SVM.
each reader achieved in identification of DNA nucleotides. As a result, these candidates read DNA nucleotides with an order of Bz’s > Iz’s > Tz’s > Pr’s accuracy. Due to the highest optimistic accuracy and smallest standard deviations, Bz reads the DNA nucleotides uniformly, functioning more like a universal reader than others. This may be explained by a rigid structure that provides an entropic advantage for the binding. We rule out Pr as a universal reader because of its low accuracy of Bz’s > Iz’s > Pr’s on average. Although there are a number of predictive accuracies ≥95%, they are overall lower than those optimistic ones and widely spread. As far as the accuracy is concerned, we should point out that any small changes would make a great difference in sequencing of a large genome such as the human genome especially for finding mutations. Any misidentification of nucleotides in a 1 out of 100 rate would result in about 30,000,000 false calls over
Figure 9. Plot for nucleotide calling accuracy vs number of signal features used with different readers. 11311
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano
Figure 10. Examples of RT spectra generated with (i) Py functionalized tip and substrate at a set point of 2 pA and 0.5 V; (ii) Iz functionalized tip and substrate at a set point of 4 pA and 0.5 V; (iii) Bn functionalized tip and substrate at a set point of 2 pA and 0.5 V; (iv) Bn functionalized tip and Py substrate at a set point of 2 pA and 0.5 V.
Figure 11. Histograms of (A) averaged amplitude and (B) peak width and their fitting curves of current spikes generated from Py interacting with nucleotides at the set point of 0.5 V and 2 pA. Mode is defined as the maximum point of each fitting curve.
a human genome, which would overwhelm the identification of real mutations. As a whole, the optimistic accuracy implies a RT method that requires internal calibrations for each tunnel junction, whereas the predictive accuracy uses the existing data as references to analyze data from a new measurement. Although the predictive accuracy is relatively low, it can be improved by increasing the number of training data sets. For example, the SVM analysis indicates that the predictive accuracy of Bz determined with three training data sets is higher than those with one and two training data sets on average (see Table S18, SI). RT with the Stacking Reader Py. We first confirmed that Py and the control Bn could form monolayers on the Pd substrates by contact angle measurement, ellipsometry, FTIR, and XPS, as did for those hydrogen-bonding readers (SI, Section 4), and then determined that a DNA nucleotide such as
dGMP could be adsorbed on the Py monolayer with an affinity (Kd) of ∼2.46 mM in aqueous solution by surface plasmon resonance (SPR) (SI, section 5, Figure S11, and Table S19). The RT measurements were carried out under the same conditions as described above, except that the set point was 0.5 V and 2 pA at which the Py tunnel junction presented a cleaner baseline. Two additional analytes (abasic 5′-monophosphate, designated as AP, and D-glucose) were included in this part of study. As shown in Figure 10i, Py produced RT signals with all the DNA nucleotides, but not with AP and D-glucose. In contrast, Iz produced RT signals with all of the analytes (Figure 10ii). The results may be best explained by Py forming sandwiched structures with nucleobases through stacking interactions,35 and Iz interacting with analytes through hydrogen bonding. Under the same conditions, Bn did not generate any tunneling signals with these nucleotides (Figure 11312
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano
CONCLUSIONS To gain a structure that can effectively function as a universal reader for the RT sequencing, we first studied a series of nitrogen heterocycles derived from 4(5)-2-mercaptoethyl-1Himidazole-2-carboxamide (Iz). These reader molecules are designed to interact with edges of nucleobases through hydrogen bonding. Although the hydrogen-bonding interaction can be compromised in bulk aqueous solutions, it may be enhanced in a confined tunneling junction. By means of a STM setup, we examined these universal reader candidates for identification of DNA nucleotides. The data indicate that in addition to the amide group, the azole ring is essential for a universal reader. Among the azoles, the benzimidazolecarboxamide (Bz) provides better optimistic accuracy and less discrimination in recognition of DNA nucleotides than imidazole-carboxamide (Iz) and triazole-carboxamide (Tz), functioning as a universal reader. As far as the predictive accuracy is concerned, Tz is slightly better than Bz, but both of them function much better than Iz. Furthermore, we explored the paradigm of π−π stacking interaction as a mechanism of reading DNA nucleotides in the tunnel junction. We found that the stacking reader Py read the nucleobases more specifically and accurately than the hydrogenbonding readers. Interestingly, our data show that Iz can recognize the abasic site, so that it may prove useful in identification of apurinic-apyrimidinic (AP) sites in a genome when a comparison is run with data generated by Py. Preliminary analysis shows that the RT data of AP can be separated from those of DNA nucleotides generated by Py with optimistic accuracy of ∼97.8% (SI, section 6, Figure S12, and Table S22), but not from those generated by Iz. Thus, the RT sequencing with both Py and Iz would give more comprehensive information on genomic sequences including damage of DNA bases, which is lost in NGS due to the use of polymerases that can incorporate dAMP into the opposite of an abasic site (“A rule”)37 or may cause a frameshift.38
10iii). Interestingly, as the tip was functionalized with Bn and the substrate with Py, it generated tunneling signals with the purine nucleotides dAMP and dGMP, but not with the pyrimidine nucleotides dCMP and dTMP (Figure 10iv). This indicates that the stacking interactions in the tunneling junction are driven by the size of reader molecules, given that Bn is smaller than Py in size, but more polar and hydrophobic (see Table 2 for their physical properties). These RT data show that Py interacts more specifically with nucleobases than Iz does. Each spike in RT spectra of Py was characterized by the averaged amplitude (Figure 11A) and peak width (Figure 11B). We found that Py had an averaged amplitude of 4.61pA on average when interacting with DNA nucleotides (SI, Table S20), about the same current responses measured at a smaller set point with those hydrogen bonding readers. Thus, the electron transport is clearly enhanced relative to the hydrogenbonding readers. Py had a peak width of 0.40 ms on average (SI, Table S20), slightly smaller than those with hydrogenbonding readers, which indicate that the stacking reader may hold DNA nucleotides less tightly than those hydrogenbonding readers in the tunneling junction. By comparison, Py can generate tunneling spikes as frequently as Iz with dAMP, dCMP, and dGMP and much more frequently with dTMP than Iz (SI, Table S20), even at a larger tunnel gap size. This suggests that the π−π stacking is as efficient as the hydrogen bonding for capturing DNA nucleotides. The most important is that the Py reader resolves the four DNA nucleotides with distinguishable peaks (defined by modes, Figure 11), although significant overlaps remain, better than those hydrogenbonding readers (comparing Figure 11 with Figure 7). The accuracies of Py in identifying individual DNA nucleotides resulting from SVM analysis (SI, Table S21) are given in Table 5. Py shows that optimistic accuracy ranges from Table 5. Accuracy (%) of Py Achieved in Identification of Individual DNA Nucleotides by RT
Py
optimistic predictive
dAMP
dCMP
dGMP
dTMP
mean ± σ
98.8 76.3
99.4 89.0
97.1 93.4
96.7 83.6
98.0 ± 1.3 85.6 ± 7.4
METHODS Computational Modeling. DFT calculations were performed using Spartan’14 for Windows, a commercially available software from Wave Function, Inc. Molecules were drawn in ChemDraw Ultra 12.0 and imported to Spartan’14 to generate corresponding 3D structures and hydrogen-bonding complexes. Each structure was subjected to energy minimization using the built-in MMFF molecular mechanics prior to optimization calculation. The DFT calculations for hydrogenbonding interactions and structure optimizations of reader molecules and N-methylated nucleosides were performed at their ground-state equilibrium geometry conformation using B3LYP/6-31+G* basis set in vacuum. 1 H NMR. 1H NMR measurements and titration experiments were carried out using a Varian INOVA 500 spectrometer operating at 500 MHz at 25 °C. Chemical shifts are reported in ppm and referenced to CDCl3 residual peak (δH = 7.26 ppm). Deuterated chloroform (CDCl3) was purchased from Spectrum Chemical MFG CORP (99.8 atom % D). It was stored over activated molecular sieve (4 Å) in a glovebox (0.5 ppm moisture and 0.05 ppm oxygen). Chemicals were dried at 40 °C under vacuum for 2 days and stored over drierite. Solutions were prepared in glovebox before each NMR titration. A gastight syringe was used for the addition of nucleoside solution into the NMR tube. The addition was done as quickly as possible to minimize moisture entering into the system. Typically, 0.6 mL of ∼5 mM Bz-S and Pr-S and ∼1.0 mM Tz-S were used in a NMR tube for the titration. The guest solutions (modified nucleosides) were prepared to a concentration of around 800 mM (0.2 mL) for Bz-S and Pr-S and around 500 mM for Tz-S. CDCl3 was used as solvent to
99.4% to 96.7%, on average of 98.0% that is close to the Bz’s value. However, Py can read the four DNA nucleotides with its predictive accuracies on average of ∼86%, which is much higher than anyone of the hydrogen-bonding readers. Compared to NGS, the accuracy of Py is modest, but we believe that it can be improved by altering the aromatic structure as we did for the hydrogen-bonding readers. Noticeably, the hydrogen-bonding readers particularly read some of DNA nucleotides with high predictive accuracy (>90%). For example, Bz can read dTMP with ∼97% accuracy, and Iz can read dAMP with ∼95% accuracy (Table 4). Combining hydrogen-bonding readers with the stacking reader may provide another way to increase the sequencing accuracy. We have developed a RT nanopore device using semiconductor technology,36 which would allow us to produce an array of RT nanopores on a silicon wafer. Thus, the high accuracy can be achieved by the parallel sequencing to increase the depth of coverage. With our current setup, each group of RT nanopores can be functionalized with a different chemical using an arraying technology such as piezo dispensing. 11313
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano
Figure 12. A workflow for feature extraction at different domains. (A) primary parameters in time domain; (B) secondary features in frequency domain; and (C) secondary features in cepstrum domain. Data Collection. RT measurements were carried out in PicoSPM instrument (Agilent Technologies), interfaced with a customized Labview program. The sampling rate for tunnel current was 50 kHz. Prior to an experiment, the STM Teflon cell was cleaned with a piranha solution (caution: Piranha solution is a strong oxidant and should be handled with extreme caution!), followed by vigorous rinsing with Milli-Q water and ethanol. After adding ∼150 μL of phosphate buffer (1.0 mM, pH 7.4) to the STM cell, a palladium probe (with leakage current 0.99. Extraction of Features from RT Data. We define RT events by two types of signals: spikes and clusters. The spike is an individual single RT peak, and the cluster is a subset of close spikes. The cluster was determined by applying a Gaussian window (4096 data points and one-unit height) to the center of each spike. Spikes that lie within a region where the sum of the Gaussian windows continuously exceeds 0.1 were identified as belonging to a cluster. Although each independent tunneling spike was identified by having amplitude above 15 pA, the cluster includes all the spikes within the defined region. Table S12, SI, lists features used to describe an individual spike. With a workflow shown in Figure 12, spikes and clusters of a RT spectrum in the time domain (Figure 12A) were Fourier transformed into a 25 kHz window that is the Nyquist frequency of amplifier, and then the whole frequency range was downsampled to small windows. As shown in Figure 12B, peakFFT and clusterFFT denote features that have the same sampling window size (marked in red), and peakFFT_Whole and clusterFFT_Whole denote features with varied window size (marked in green), of which those with lower frequencies have smaller sampling window sizes. Furthermore, logarithm of the Fourier transformed spectrum was subjected to the inverse Fourier transform, which generated a new spectrum, referred to as cepstrum (Figure 12C). The cepstrum was also down sampled into the even size of windows for sampling. Once all the features are determined, they were normalized and scaled to make standard deviation to 1. In this
prepare these solutions. In a typical titration experiment, the titrating solution (guest) was added in 15−25 separate portions to the host solution, and 1H NMR was recorded of the resulting mixture after each addition. For example, modified dT nucleoside solution with a concentration of 800 mM was added to Bz-S solution (5.15 mM) in an increasing order of 2.5, 5.0, and 10.0 μL aliquots up to a total volume of 190 μL of dA, and chemical shifts of the ring protons were recorded after each titration. Similarly, a modified dT solution with a concentration of 500 mM was added to a 1.0 mM solution of Tz-S in a certain increment up to a total volume of 180 μL of dT, and chemical shifts of the amide protons were recorded after each titration. Titration data were analyzed by nonlinear regression analysis using the HypNMR2008 program. ESI-MS Measurements. Individual readers and nucleoside monophosphates were prepared in 200 μM and 100 μM solutions respectively in water (specific resistance: ∼18 MΩ·cm; total organic carbon: ∼4 ppb) and sparged with argon. These solutions were used to record their individual MS data (Tables S1 and S2, SI). For preparing mixtures, a reader molecule solution (200 μM) was mixed with a 100 μM of analyte solution to maintain 2:1 mixing ratio. The sample solution was infused into a Bruker maXis 4G electrospray ionization quadrupole time-of-flight (ESI-Q-TOF) mass spectrometer at a flow rate of 3 μL/min via syringe pump. The ESI source was equipped with a microflow nebulizer needle operated in positive ion mode. The spray needle was held at ground, and the inlet capillary set to −4500 V. The end plate offset was set to −500 V. The nebulizer gas and dry gas (N2) were set to 1.2 bar and 1.5 L/min, respectively, and the dry gas was heated to 220 °C. In TOF-only mode the quadrupole ion energy was set to 4 eV, and the collision energy was set to 1 eV. Collision gas (N2) was set to a flow rate of 20%. In most cases MS/MS experiments were conducted with a precursor ion isolation width of 2 m/z units. However, if other ions were present in this range, precursor ion isolation width was set to 1 m/z unit. Collision energy was set to 5−20 eV, which was sufficient to fragment noncovalent complexes. Each spectrum was recorded over a time period of 0.5−1 min. Typically a spectrum acquired for 1 min is an accumulation of 60 separate recorded mass spectra averaged across a 1 min time period. A signal to noise ratio >3 (S/N > 3) was used to define the limit of detection. Due to the lack of an acid modifier in the infused solutions, most analytes and molecular complexes were observed as single or multiple sodium ions [M + nNa − (n − 1)H] + rather than as protonated molecules [M + H]+. Average mass accuracy was within 0.025 Da. STM Measurements of RT. DNA Nucleoside Monophosphate Solutions. All of analytes were purchased from Sigma-Aldrich with purity of ≥98% except AP (≥95%). Ultrapure water with specific resistance of ∼18 MΩ and organic carbon ∼4 ppb from Milli-Q system were used for preparation of solutions. Each solution was prepared to have an analyte concentration of 100 μM in a 1.0 mM phosphate buffer, pH 7.4. Functionalization of Palladium Substrates and Probes. See section 4 of SI for details. 11314
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano
Nanopore Sequencing Reads of Natural DNA. Nat. Biotechnol. 2014, 32, 829−833. (3) Iqbal, S. M.; Akin, D.; Bashir, R. Solid-State Nanopore Channels with DNA Selectivity. Nat. Nanotechnol. 2007, 2, 243−248. (4) Venta, K.; Shemer, G.; Puster, M.; Rodríguez-Manzo, J. A.; Balan, A.; Rosenstein, J. K.; Shepard, K.; Drndic, M. Differentiation of Short, Single-Stranded DNA Homopolymers in Solid-State Nanopores. ACS Nano 2013, 7, 4629−4636. (5) Plesa, C.; van Loo, N.; Ketterer, P.; Dietz, H.; Dekker, C. Velocity of DNA during Translocation through a Solid-State Nanopore. Nano Lett. 2015, 15, 732−737. (6) Heerema, S. J.; Dekker, C. Graphene Nanodevices for DNA Sequencing. Nat. Nanotechnol. 2016, 11, 127−136. (7) Feng, J.; Liu, K.; Bulushev, R. D.; Khlybov, S.; Dumcenco, D.; Kis, A.; Radenovic, A. Identification of Single Nucleotides in MoS2 Nanopores. Nat. Nanotechnol. 2015, 10, 1070−1076. (8) Drndic, M. Sequencing with Graphene Pores. Nat. Nanotechnol. 2014, 9, 743. (9) Lindsay, S. The Promises and Challenges of Solid-State Sequencing. Nat. Nanotechnol. 2016, 11, 109−111. (10) Lagerqvist, J.; Zwolak, M.; Di Ventra, M. Fast DNA Sequencing via Transverse Electronic Transport. Nano Lett. 2006, 6, 779−782. (11) Smolyanitsky, A.; Yakobson, B. I.; Wassenaar, T. A.; Paulechka, E.; Kroenlein, K. A MoS2-Based Capacitive Displacement Sensor for DNA Sequencing. ACS Nano 2016, 10, 9009−9016. (12) Tsutsui, M.; Taniguchi, M.; Yokota, K.; Kawai, T. Identifying Single Nucleotides by Tunnelling Current. Nat. Nanotechnol. 2010, 5, 286−290. (13) Tsutsui, M.; Rahong, S.; Iizumi, Y.; Okazaki, T.; Taniguchi, M.; Kawai, T. Single-Molecule Sensing Electrode Embedded in-Plane Nanopore. Sci. Rep. 2011, 1, 46. (14) Liang, F.; Li, S.; Lindsay, S.; Zhang, P. Synthesis, Physicochemical Properties, and Hydrogen Bonding of 4(5)Substituted 1-H-Imidazole-2-Carboxamide, a Potential Universal Reader for DNA Sequencing by Recognition Tunneling. Chem. Eur. J. 2012, 18, 5998−6007. (15) Lindsay, S.; He, J.; Sankey, O.; Hapala, P.; Jelinek, P.; Zhang, P.; Chang, S.; Huang, S. Recognition Tunneling. Nanotechnology 2010, 21, 262001. (16) Chang, S.; Huang, S.; Liu, H.; Zhang, P.; Liang, F.; Akahori, R.; Li, S.; Gyarfas, B.; Shumway, J.; Ashcroft, B.; He, J.; Lindsay, S. Chemical Recognition and Binding Kinetics in a Functionalized Tunnel Junction. Nanotechnology 2012, 23, 235101. (17) Huang, S.; He, J.; Chang, S.; Zhang, P.; Liang, F.; Li, S.; Tuchband, M.; Fuhrmann, A.; Ros, R.; Lindsay, S. Identifying Single Bases in a DNA Oligomer with Electron Tunnelling. Nat. Nanotechnol. 2010, 5, 868−873. (18) Petersheim, M.; Turner, D. H. Base-Stacking and Base-Pairing Contributions to Helix Stability: Thermodynamics of Double-Helix Formation with CCGG, CCGGp, CCGGAp, ACCGGp, CCGGUp, and ACCGGUp. Biochemistry 1983, 22, 256−263. (19) Yakovchuk, P.; Protozanova, E.; Frank-Kamenetskii, M. D. BaseStacking and Base-Pairing Contributions into Thermal Stability of the DNA Double Helix. Nucleic Acids Res. 2006, 34, 564−574. (20) Riley, K. E.; HOBZA, P. H. On the Importance and Origin of Aromatic Interactions in Chemistry and Biodisciplines. Acc. Chem. Res. 2013, 46, 927−936. (21) Kelley, S. O.; Barton, J. K. Electron Transfer Between Bases in Double Helical DNA. Science 1999, 283, 375−381. (22) Xu, B.; Zhang, P.; Li, X.; Tao, N. Direct Conductance Measurement of Single DNA Molecules in Aqueous Solution. Nano Lett. 2004, 4, 1105−1108. (23) Chang, S.; Sen, S.; Zhang, P.; Gyarfas, B.; Ashcroft, B.; Lefkowitz, S.; Peng, H.; Lindsay, S. Palladium Electrodes for Molecular Tunnel Junctions. Nanotechnology 2012, 23, 425202. (24) Chen, X. Palladium as Electrode in DNA Sequencing. Appl. Phys. Lett. 2013, 103, 063306. (25) Zhang, P.; Johnson, W. T.; Klewer, D.; Natasha Paul, G. H.; Davisson, V. J.; Bergstrom, D. E. Exploratory Studies on Azole
way, we avoid features of large numeric values from dominating those that have small numeric values. Feature Selection. We first used randomly selected 10% data to construct support vectors (hyper plane to separate analyte data points) to train the SVM and then tested the rest of the 90% of data to determine the calling accuracy for each DNA nucleotide. There are totally 264 features in Table S12, SI. Some of them are strongly correlated with one another so they were removed through the normalized correlation calculation between feature pairs with coefficient larger than 0.7. A feature variation between the repeated experiments and different analytes is calculated by comparing histograms of a feature in a single measurement with the accumulated measurements. The difference between the repeated runs histogram and the accumulated histogram of an analyte is assigned as ‘in-group’ fluctuation (variation of the repeats). The difference of a feature between the normalized histogram of a pairs of analytes is ‘out-group’ fluctuation (variation of the analytes). The features were ranked by the ratio between the in-group fluctuation and the out-group fluctuation, and the low ranked features were dropped. The survived features were further optimized to get the maximum true positive accuracy. SVM Analysis. We used the kernel-mode SVM available from https://github.com/vjethava/svm-theta. The SVM running parameters C and γ were optimized through cross-validation of randomly selected sub data set. Full details of the SVM (written in Matlab) can be found in the Web site: https://github.com/ochensati/SVM_DNA_ TunnelVision.
ASSOCIATED CONTENT S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acsnano.6b06466. Computer modeling, synthesis and characterization of chemical compounds, functionalization and characterization of palladium probes and substrates with universal reader candidates, additional tables and figures about analysis of mass data, SPR, and RT data (PDF)
AUTHOR INFORMATION Corresponding Authors
*E-mail:
[email protected]. *E-mail:
[email protected]. ORCID
Yanan Zhao: 0000-0002-9909-5980 Peiming Zhang: 0000-0003-2831-2308 Author Contributions ¶
These authors contributed equally to this work.
Notes
The authors declare the following competing financial interest(s): S.B., S.S., S.L., and P.Z. are named as inventors for patent applications.
ACKNOWLEDGMENTS This work was supported by grant HG006323 from the National Human Genome Research Institute. REFERENCES (1) Manrao, E. A.; Derrington, I. M.; Laszlo, A. H.; Langford, K. W.; Hopper, M. K.; Gillgren, N.; Pavlenok, M.; Niederweis, M.; Gundlach, J. H. Reading DNA at Single-Nucleotide Resolution with a Mutant MspA Nanopore and Phi29 DNA Polymerase. Nat. Biotechnol. 2012, 30, 349−353. (2) Laszlo, A. H.; Derrington, I. M.; Ross, B. C.; Brinkerhoff, H.; Adey, A.; Nova, I. C.; Craig, J. M.; Langford, K. W.; Samson, J. M.; Daza, R.; Doering, K.; Shendure, J.; Gundlach, J. H. Decoding Long 11315
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316
Article
ACS Nano Carboxamides as Nucleobase Analogs: Thermal Denaturation Studies on Oligodeoxyribonucleotide Duplexes Containing Pyrrole-3-Carboxamide. Nucleic Acids Res. 1998, 26, 2208−2215. (26) Sartorius, J.; Schneider, H.-J. A General Scheme Based on Empirical Increments for the Prediction of Hydrogen-Bond Associations of Nucleobases and of Synthetic Host- Guest Complexes. Chem. Eur. J. 1996, 2, 1446−1452. (27) Guckian, K. M.; Schweitzer, B. A.; Ren, R. X.-F.; Sheils, C. J.; Tahmassebi, D. C.; Kool, E. T. Factors Contributing to Aromatic Stacking in Water: Evaluation in the Context of DNA. J. Am. Chem. Soc. 2000, 122, 2213−2222. (28) Lai, J. S.; Qu, J.; Kool, E. T. Fluorinated DNA Bases as Probes of Electrostatic Effects in DNA Base Stacking. Angew. Chem., Int. Ed. 2003, 42, 5973−5977. (29) Swart, M.; van der Wijst, T.; Fonseca Guerra, C.; Bickelhaupt, F. M. π-π stacking Tackled with Density Functional Theory. J. Mol. Model. 2007, 13, 1245−1257. (30) Erba, E. B.; Zenobi, R. Mass Spectrometric Studies of Dissociation Constants of Noncovalent Complexes. Annu. Rep. Prog. Chem., Sect. C: Phys. Chem. 2011, 107, 199. (31) Kitova, E. N.; El-Hawiet, A.; Schnier, P. D.; Klassen, J. S. Reliable Determinations of Protein-Ligand Interactions by Direct ESIMS Measurements. Are We There Yet? J. Am. Soc. Mass Spectrom. 2012, 23, 431−441. (32) Bich, C.; Baer, S.; Jecklin, M. C.; Zenobi, R. Probing the Hydrophobic Effect of Noncovalent Complexes by Mass Spectrometry. J. Am. Soc. Mass Spectrom. 2010, 21, 286−289. (33) Li, Y.; Liu, J.; Wang, Y.; Chan, H. W.; Wang, L.; Chan, W. Mass Spectrometric and Spectrophotometric Analyses Reveal an Alternative Structure and a New Formation Mechanism for Melanin. Anal. Chem. 2015, 87, 7958−7963. (34) Chang, S.; He, J.; Zhang, P.; Gyarfas, B.; Lindsay, S. Gap Distance and Interactions in a Molecular Tunnel Junction. J. Am. Chem. Soc. 2011, 133, 14267−14269. (35) Grimme, S. Do Special Noncovalent π-π Stacking Interactions Really Exist? Angew. Chem., Int. Ed. 2008, 47, 3430−3434. (36) Pang, P.; Ashcroft, B. A.; Song, W.; Zhang, P.; Biswas, S.; Qing, Q.; Yang, J.; Nemanich, R. J.; Bai, J.; Smith, J. T.; Reuter, K.; Balagurusamy, V. S. K.; Astier, Y.; Stolovitzky, G.; Lindsay, S. FixedGap Tunnel Junction for Reading DNA Nucleotides. ACS Nano 2014, 8, 11994−12003. (37) Strauss, B. S. The “A” Rule Revisited: Polymerases as Determinants of Mutational Specificity. DNA Repair 2002, 1, 125− 135. (38) Patra, A.; Zhang, Q.; Lei, L.; Su, Y.; Egli, M.; Guengerich, F. P. Structural and Kinetic Analysis of Nucleoside Triphosphate Incorporation Opposite an Abasic Site by Human Translesion DNA Polymerase η. J. Biol. Chem. 2015, 290, 8028−8038.
11316
DOI: 10.1021/acsnano.6b06466 ACS Nano 2016, 10, 11304−11316