Comment on Accurate Data Process for Nanopore Analysis

Sep 28, 2015 - context and identifies a few flaws with the methods proposed in ref 7 and suggests techniques in the literature that could remedy these...
0 downloads 0 Views 206KB Size
Comment pubs.acs.org/ac

Comment on Accurate Data Process for Nanopore Analysis William B. Dunbar* Department of Computer Engineering, University of California, Santa Cruz, California 95064, United States

Anal. Chem. 2015, 87, 907−913. DOI: 10.1021/ac5028758

T

problem they consider is focused specifically on recovering information in events that are attenuated by the filter, and with such events too fast to have hit full amplitude depth, the model is an appropriate choice. As an aside, it is not clear whether this model is appropriate for nanopore signal analysis in general. When a DNA or protein transiently interacts with the pore, for example, the signal can display transitions between amplitudes that are more ramplike and slower than the filter response; in such cases, a pulse-train model would not make sense. Given that the pulse-train model is appropriate in ref 7, my first issue with their work is that they do not compare their method with any of the other relevant idealization methods in the literature10−12 (the cited QUB software is free and relatively easy to use) or with the other nanopore-derived methods they cite.1,3 Instead, they compare their method to what they call “the conventional method”, which is based on a subset of the samples flagged by their own method, as detailed below. As a result, comparing their method with this “conventional method” is actually comparing to a handicapped version of their own method, resulting in expectedly good performance benefits. I now discuss in detail the issues I have with the methods proposed in ref 7. In a first step, a local threshold method and tracking-back routine are used to approximate the start and end of each event. This step finds a start and an end point by locating where the signal crosses a threshold at or near the open channel baseline and is essentially the same method used in other related papers.1−3 The novelty of what is proposed in ref 7 begins with the second step, which refines the flagged start and end points of the event. The second step is referred to as a second-order-differential-based calibration (termed “DBC”) that first performs a curve fitting and then calculates the refined start and end boundaries based upon properties of the fitted signal. In the fitting step, the authors fit the time series data with a truncated Fourier series. A Fourier series decomposes any periodic signal into the sum of a (possibly infinite) set of sines and cosines, with the frequencies being an integer multiple of a common frequency that matches the periodic signals slowest frequency. Although there is no periodic signal in this problem, the fitting performance is quite good. Notably, the method of parameter fitting was not provided in the paper, and fitted parameters values also were not reported. The authors do compare R2 values with fitting using other basis functions (polynomials, Gaussian), and these perform reasonably well but not quite as well as the sines and cosines. They state this is “probably because the blockade shape is similar to the simple oscillating functions of sines and cosines.” In fact, the sines and

he development of tools for automated analysis of nanopore signals is a topic of growing interest,1−5 particularly for sequencing with biological nanopores.6 A challenge with nanopores is that many analytes pass through the pore too fast to resolve features in the signal that could have otherwise been used to detect and (ideally) identify the analyte. The recent paper by Gu and coauthors7 proposes a method that is focused on recovering information from such fast events recorded with a biological nanopore. This comment provides context and identifies a few flaws with the methods proposed in ref 7 and suggests techniques in the literature that could remedy these flaws. When considering methods for analysis of nanopore signals, it is important to first note that all nanopore experiments use the same circuitry that is employed in single channel recording experiments. Namely, the voltage-clamp amplifier regulates the voltage between a pair of electrodes by feedback of the measured current, such that the current remains proportional to the net ion path resistance between the electrodes. With a feedback resistor about the size of the net path resistance, one can measure even 1 pA changes in current that reflect the opening and closing of single protein channels; a brilliant use of feedback and engineering. The field of single channel recording is very mature, with a surplus of methods for automated analysis and modeling,8 including techniques that extract transitions in the signal too fast to be directly measured.9 To what extent are these existing methods applicable for nanopore signal analysis? To answer this, and to put7 in context, we must consider what the underlying assumptions are that these methods are built upon. While nanopores measure the entry, presence, and exit of individual polymers and proteins through an open channel, single channel recording is concerned with the ion flux and gating kinetics of one or more channels in a cellular membrane. In that venue, constant amplitude periods correspond to fluxstate times and gating transitions between flux states happen instantaneously relative to the measurement bandwidth of the system. Thus, an appropriate model signal of the flux/gating kinetics is a current amplitude “pulse-train” with constant amplitude periods connected by instantaneous transitions. Mathematically, the idealized signal is a piecewise-constant function of time. Computational techniques are designed to first generate the idealized pulse train and then fit kinetic models to the resulting distributions.10−12 The pulse train model makes sense because, as stated, transitions happen instantaneously relative to the measurement bandwidth. Put another way, the filter is the rate-limiting component in the measurement response. As in other nanopore studies,1,3 the method in ref 7 presumes the underlying signal is a pulse train. Since the © XXXX American Chemical Society

A

DOI: 10.1021/acs.analchem.5b02281 Anal. Chem. XXXX, XXX, XXX−XXX

Comment

Analytical Chemistry cosines fit the temporal response well because the amplifier and filter are a stable linear system, and the output response of any stable linear system can be represented as the sum of a finite number of exponentially decaying terms, including constants, sines and cosines. The biological nanopore system was shown to behave as a stable linear system using tools from system identification.13 For a voltage-clamp that stays in the linear regime (i.e., the current output does not saturate), the output is the convolution of the amplifier response with the filter response. When the ionic current change is step-like, as presumed in the pulse-train model, and the op-amp is fast compared to the filter, the measured output is essentially the step response of the linear filter. With a 3 pole Bessel filter at 3 kHz bandwidth as used in ref 7, the step response of the filter is fully described by the three poles −25.7 and −20.4 ± 19.4j kHz. With this in mind, a comparison between their fitted frequency value ω and the known frequency content (19.4 kHz) of the filter at 3 kHz bandwidth would be valuable. The simulation module they used for generating the ideal events (Figure S1) are likely contributing other linear modes that make up the total response. As an aside, other methods that fit a single exponential function of time to the transition between amplitudes are essentially fitting to the dominant (slowest) mode in the linear response,4,5 corresponding to the filter poles closest to the imaginary (j ω) axis in the complex plane. Since the fitted curve used in ref 7 is an analytic function, it has an infinite number of smooth and bounded derivatives. The authors use this feature to compute the minimizers of the second derivative Ps3 and Pe3 and use these as the start and end times of events, respectively. This is an interesting approach. The conventional method to which they compare is generated by keeping the same start time estimate Ps3 but use the minimizing sample Pe4 as the end time. This “conventional estimate” is not conventional at all, since it uses the Ps3 value generated from the second derivative calculation. It also has the effect of taking their reasonable duration estimate (Pe3 − Ps3) and cutting it in half; naturally, this halved-value does not do as well as the reasonable value. As an alternative, a very simple and accurate duration estimate is computed with the full-width at half-maximum method, to which the authors should compare as done in ref 1. Their duration estimation method may perform well, but we can not know until it is compared to other methods in the literature that do perform well. The presented method for evaluating the current amplitude has other issues. The method computes the area of the filtered event and then appears to use the known unfiltered pulse duration value (td in eqs 4 and 5) to recover the equivalent pulse amplitude that generated the attenuated event. The problem is that the true pulse value td is unknown in practice, so I do not see how this method could ever be implemented. Since it is not stated in the paper, I will assume instead that the authors use their duration estimation method (to estimate td) in concert with the area calculation to estimate the amplitude. The price for this approach is that it compounds estimation errors and performs poorly, particularly for pulses that do not hit full amplitude depth, which I show here by example. We are assuming that the nanopore signals are well modeled by a pulse-train that is filtered by a multiple-pole low pass filter. Consider the two ideal pulses with equal area shown in Figure 1, with one of the pulses being half as tall and twice as wide as the other. The pulses are then passed through an analog 4-pole low pass Bessel filter (this was done numerically in Matlab using the lsim.m function, the continuous time transfer function

Figure 1. Different pulses with equal area can produce signals that are increasingly indistinguishable by lowering the filter bandwidth. The signals shown compare the affect of applying a low pass 4-pole Bessel filter to two ideal pulses and at two bandwidths (10 kHz, 55 kHz) commonly used in nanopore experiments. The pulses have equal area, with one twice as long and half as tall. After filtering, the faster pulse (dashed) and shallower pulse (solid) are visually distinguishable at 55 kHz but not at 10 kHz. When a pulse is faster than twice the rise time of the filter, attenuation is pronounced and recovering an equivalent idealized pulse shape is challenging. Only the wider pulse is slow enough at the higher bandwidth to hit full depth.

for the Bessel filter, and a simulation sample rate of 100 MHz to avoid discretization effects). The authors in ref 7 use the fact that an unfiltered pulse and it’s filtered version have exactly equal areas; for reference, a proof of this is provided at the end of this comment. As shown in Figure 1, when the filter response is slower than both pulse widths, the pulses are visually hard to distinguish. A low-pass filter has a temporal step response that can be quantitated by the 10%-to-90% rise-time (denoted tr). The higher the bandwidth, the smaller the tr and the faster the response. It is well-known for Bessel filters that pulses that are faster in duration than 2tr will not hit full amplitude. This is shown for both the 10 and 20 μs pulses at the lower 10 kHz bandwidth, since 2 tr = 70 μs, while only the 10 μs pulse does not hit full amplitude at the 55 kHz (2 tr = 12 μs). In the absence of measurement noise, the areas of the two filtered pulses are equal to the unfiltered pulses, and thus the areas are equal to each other. However, with greater attenuation, the duration estimates become harder to distinguish. For example, the full-width at half-maximum estimate is 32 μs for the taller pulse and 36 μs for the wider pulse after 10 kHz filtering. Using these duration estimates and the area of the filtered pulses, the inferred unfiltered pulse amplitudes estimates are closer in value than they should be, and both are smaller than their true pulse amplitudes. Since the areas are exactly equal, the proposed area-based calculation will produce amplitude estimates that are indistinguishable whenever the duration estimates are indistinguishable. The only remedy I can see is to abandon the proposed areabased method, which compounds errors by requiring the duration estimate to compute the amplitude estimate. Instead, an independent method of estimating amplitude should be identified. Since the fitting method provides a signal that can be differentiated many times, one possibility is to explore the use of higher order derivatives of the fitted curve. This would be an extension of the method in ref 1, which computes a local estimate of the first derivative by directly fitting to the slope of the transition regions of the signal. It is possible that there is B

DOI: 10.1021/acs.analchem.5b02281 Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry



more information in higher order derivatives of curves that fit the data, provided the fit performance is still preserved after differentiation. On a final note, the shape of the filtered signals shown in Figures 4 and S6A do not appear to match the verified model for rapid nanopore signals (i.e., a pulse train passing through a low-pass Bessel filter). In Figure 4, the pulse itself (250 μs) lasts longer than twice the rise time of the 3-pole 3 kHz filter (220 μs), so the filtered pulse should hit the 40 pA depth and stay there for ∼30 μs before the pulse ends. Instead the pulse only appears to get to 30 pA below the baseline before the pulse ends. The same is true for the 100 pA/100 μs pulse filtered at 10 kHz shown in Figure S6A, which should have fit depth after 80 μs. I suspect that the circuit used to generate the idealized filtered signals (Figure S1) modifies the shape of the signal in some way. As a result, the training set generated by this circuit produces signals that are not the same as what a nanopore will produce. Lastly, in numerically filtering the 100 pA/100 μs pulse at the three bandwidths shown in Figure S6A, all filtered pulses matched the 10 pA-ms area calculated using the Matlab function trapz.m, suggesting differences in the area calculations shown in Figure S6A are an artifact of the other elements in the circuit that distort the signal shape and perhaps also due to the influence of digitizing the signal. In summary, there are tools available for the automated analysis of nanopore signals, but more tools are still needed. Gu and coauthors7 method of curve fitting may prove useful for the problem of extracting information from events that are otherwise too fast for robust quantification, provided some of the issued raised in this comment can be addressed.

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.

■ ■

ACKNOWLEDGMENTS Thanks to Hongyun Wang for simplifying my proof of area equivalence. REFERENCES

(1) Pedone, D.; Firnkes, M.; Rant, U. Anal. Chem. 2009, 81, 9689− 9694. (2) Plesa, C.; Dekker, C. Nanotechnology 2015, 26, 1−8. (3) Raillon, C.; Granjon, P.; Graf, M.; Steinbock, L. J.; Radenovic, A. Nanoscale 2012, 4, 4916−4924. (4) Baaken, G.; Ankri, N.; Schuler, A.-K.; Rühe, J.; Behrends, J. C. ACS Nano 2011, 5, 8080−8088. (5) Balijepalli, A.; Ettedgui, J.; Cornio, A. T.; Robertson, J. W. F.; Cheung, K. P.; Kasianowicz, J. J.; Vaz, C. ACS Nano 2014, 8, 1547− 1553. (6) Watson, M.; Thomson, M.; Risse, J.; Talbot, R.; Santoyo-Lopez, J.; Gharbi, K.; Blaxter, M. Bioinformatics 2015, 31, 114−115. (7) Gu, Z.; Ying, Y.-L.; Cao, C.; He, P.; Long, Y.-T. Anal. Chem. 2015, 87, 907−913. (8) Neher, E., Sakmann, B., Eds. Single-Channel Recording, 2nd ed.; Springer: New York, 2009. (9) Lape, R.; Colquhoun, D.; Sivilotti, L. G. Nature 2008, 454, 722− 727. (10) Qin, F.; Auerbach, A.; Sachs, F. Biophys. J. 1996, 70, 264−280. (11) Qin, F.; Auerbach, A.; Sachs, F. Biophys. J. 2000, 79, 1928−1944. (12) Venkataramanan, L.; Sigworth, F. J. Biophys. J. 2002, 82, 1930− 1942. (13) Garalde, D. R.; O’Donnell, C. R.; Maitra, R. D.; Wiberg, D. M.; Wang, G.; Dunbar, W. B. IEEE Trans. Contr. Syst. Technol. 2013, 21, 2038−2051.

APPENDIX: AREA EQUIVALENCE BETWEEN UNFILTERED AND FILTERED PULSES Area equivalence is established here in continuous time, with the understanding that the results will hold practically within a quantifiable numerical precision for the signals after discretization and without aliasing. Let f(t) be the unfiltered signal that is zero everywhere except on the finite interval [0,T]. A piece-wise constant “pulse train” that has a finite (but possibly large) number of steps, but is finite in total duration, is an example f(t). With low-pass filter impulse response h(t), the filtered signal y(t) is

∫0

AUTHOR INFORMATION

Corresponding Author



y(t ) =

Comment

+∞

f (t − s)h(s) ds

(1)

The impulse response of a unity-gain low pass filter satisfies ∫ +∞ 0 h(s) ds = 1. The filtered signal y(t) is zero for t < 0 and nonzero for t ≥ 0, with y(t) → 0 as t → + ∞. Area equivalence +∞ is established if ∫ +∞ −∞ y(t) dt = ∫ −∞ f(t) dt. This can be shown by using (1) and changing the order of integration, as follows: +∞

∫−∞ = =

+∞

y(t ) dt =

∫0 ∫0

+∞

{∫ {∫

∫−∞ {∫0

+∞

+∞

}

f (t − s) dt h(s) ds

−∞

+∞

}

f (t − s)h(s) ds dt

+∞

}

f (τ ) dτ h(s) ds =

−∞

+∞

∫−∞

f (τ ) d τ

This concludes the proof. C

DOI: 10.1021/acs.analchem.5b02281 Anal. Chem. XXXX, XXX, XXX−XXX