Submultiple Data Collection to Explore ... - ACS Publications

Mar 20, 2018 - NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States. ∥. Dep...
1 downloads 7 Views 2MB Size
Article Cite This: Anal. Chem. XXXX, XXX, XXX−XXX

pubs.acs.org/ac

Submultiple Data Collection to Explore Spectroscopic Instrument Instabilities Shows that Much of the “Noise” is not Stochastic Curtis W. Meuse,*,† James J. Filliben,*,‡ and Kenneth A. Rubinson*,§,∥ †

Institute for Bioscience and Biotechnology Research of the University of Maryland and the Biomolecular Measurement Division, National Institute of Standards and Technology, Rockville, Maryland 20850, United States ‡ Statistical Engineering Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States § NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States ∥ Department of Biochemistry and Molecular Biology, Wright State University, Dayton, Ohio 45435, United States S Supporting Information *

ABSTRACT: As has long been understood, the noise on a spectrometric signal can be reduced by averaging over time, and the averaged noise is expected to decrease as t1/2, the square root of the data collection time. However, with contemporary capability for fast data collection and storage, we can retain and access a great deal more information about a signal train than just its average over time. During the same collection time, we can record the signal averaged over much shorter, equal, fixed periods. This is, then, the set of signals over submultiples of the total collection time. With a sufficiently large set of submultiples, the distribution of the signal’s fluctuations over the submultiple periods of the data stream can be acquired at each wavelength (or frequency). From the autocorrelations of submultiple sets, we find only some fraction of these fluctuations consist of stochastic noise. Part of the fluctuations are what we call “fast drift”, which is defined as drift over a time shorter than the complete measurement period of the average spectrum. In effect, what is usually assumed to be stochastic noise has a significant component of fast drift due to changes of conditions in the spectroscopic system. In addition, we show that the extreme values of the fluctuation of the signals are usually not balanced (equal magnitudes, equal probabilities) on either side of the mean or median without an inconveniently long measurement time; the data is almost inevitably biased. In other words, the unbalanced data is collected in an unbalanced manner around the mean, and so the median provides a better measure of the true spectrum. As is shown here, by using the medians of these distributions, the signal-to-noise of the spectrum can be increased and sampling bias reduced. The effect of this submultiple median data treatment is demonstrated for infrared, circular dichroism, and Raman spectrometry.

T

By collecting the data at shorter time intervals, not only can signal averaging be done, but the fluctuations of the signals from various spectrometers can be characterized. We have utilized the abilities of contemporary, fast analog-to-digital converters and terabyte storage capacities to retain as much information as we can about the signal train’s amplitude in time. To do so, the data is collected for many sequential, short (here as short as 6.25 μs) submultiple periods of the total data collection time. (A submultiple in mathematics is a number that can divide another number an integral number of times without a remainder. That definition applied to the total time for data collection means the data is collected for a number of equal periods−the submultiple time periods.) The signal fluctuations

he presence of noise in every spectrometric measurement is expected. Also, the stochastic (Gaussian or Markovian) noise on a spectrometric signal can be reduced by averaging over time, where the averaged noise is expected to decrease as t1/2, with the total data collection time t. A question left unanswered is the behavior of noise that cannot be characterized as stochastic. Prior to the advent of digital data acquisition, the noisy signal went through a high-frequency cutoff filter that reduced the noise compared to the unfiltered signal. A similar result could be obtained for higher frequency signals by phase-sensitive detection (PSD) where the signal is modulated with detection by demodulation centered at zero frequency with subsequent low-pass filtering. When digitization and computing tools became available, signal averaging either in place of or in addition to these two filtering practices became the method of choice. © XXXX American Chemical Society

Received: November 28, 2017 Accepted: March 20, 2018

A

DOI: 10.1021/acs.analchem.7b04940 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

without bias; without an unequal sum of values on opposite sides of the median or mean. This overall bias is straightforward to detect with submultiple sampling, and the reasons for it can be investigated, while SMM can be applied to reduce the resulting measurement bias. This is all visually represented in graphs shown throughout the text. Using a median in place of an average to obtain an estimate of the true signal for a noisy experiment is not new, although the method mostly has been used for image processing.1 Bhargava and Levin2 did so to improve infrared spectral images when only a few repeated runs could be collected. They did so because it is well-known that the median is a more robust measure of the central tendency in the presence of outlier values. Further, note that SMM is not related to median filtering for images.1 Image median filtering involves calculating medians of adjacent pixels of two-dimensional images so as to effect a trade-off between noise reduction and resolution. However, the authors are unaware of work that intentionally creates more short data acquisitions to characterize the data stream and allow the creation of SMM spectra. Spectra that result from SMM treatment reduce fluctuations in the instrumental output by calculating in the time domain, as does signal averaging. But unlike averaging, SMM can effect the removal of fast drift while not having to trade off between fluctuation reduction and line width broadening that result from other time-domain methods such as Savitzky-Golay smoothing.3 The insights offered by collecting and analyzing the submultiple amplitude data and using the SMM data treatment is demonstrated for infrared, circular dichroism (CD), and Raman spectrometries.

over the sets of short periods are, as expected, larger than any in the signal averaged over the sum of all the submultiple periods, that is, the total collection time. Data acquired in this way reveals previously unseen characteristics of the spectra and also, in some cases, allows better signal-to-noise (S/N) to be achieved. This latter capability comes from utilizing medians instead of averages, which take advantage of the median’s well-known ability to reject contributions from outliers. For example, we will show that the extreme values of the fluctuations in amplitude at a given wavelength are usually not balanced. By not being balanced we mean that on either side of the mean or median, the extremes in amplitude neither have equal magnitudes nor occur with equal probabilities. This will be illustrated simply by providing plots of the submultiple data ordered by magnitude, which show that every spectroscopic wavelength (or frequency or interferometer position) appears to differ in its distribution. Submultiple data collection also allows us to calculate autocorrelation times of a set of submultiples. From the characteristics of such autocorrelations, we show that the reason calculating the medians of the submultiples−which we term the submultiple median (SMM) method−reduces the magnitude of the fluctuations in the spectrum is that only part of these fluctuations result from true stochastic (i.e., Gaussian, Markovian) noise. The rest of these fluctuations are characterizable by what we call “fast drift”, which is due to changes of intrinsic and environmental conditions of the spectroscopic system. In other words, the fast drift labels the drift that occurs over times equal to or less than the sum of the submultiple times, the acquisition time of the spectrum. The signal being measured cannot jump instantaneously from one value to any distant value in the signal distribution, although in theory possible. Consequently, a data point at the high end of the amplitude distribution, such as illustrated in the probability density function (PDF) in Figure 1, followed by a



EXPERIMENTAL METHODS Infrared Spectrometer, Data Collection, and Data Treatment. The transmission interferograms and absorbance spectra were obtained using a Bruker Vertex 80 spectrometer (Billerica, MA) To test if the rate of data collection altered the results, the interferograms were obtained both in step-scan and single-sided, single direction, normal scan modes. The step-scan mode utilized the 160 kHz analog-to-digital converter (ADC) with a data collection time of 6.25 μs. A photovoltaic mercury cadmium telluride detector was used. The normal, continuous scan had a 10 kHz sampling and storage rate along the interferogram but a rate of ∼0.5 Hz between submultiple acquisitions (scans) and utilized a deuterated triglycine sulfate detector. Data was registered at every laser crossing. The amplification was set so that the burst magnitude filled the full range of the ADC. SMM analysis was performed only on the interferograms because infrared absorbance spectra require the acquisition of data during both a sample and a reference time period. Since both the noise and the fast drift of the instrument during the two time periods are different, comparisons of absorbance spectral data become unnecessarily complex. Data manipulation and export were carried out with Bruker’s Opus 7.5 software. Display graphs were produced with Igor (WaveMetrics, Portland, OR) from the data-point-table files from the instrument. In step scan mode, the mirror is stationary during the time of data collection. This makes it necessary to record the data with the detector DC rather than AC coupled. Since the DC signal is large and the changes of interest are relatively small, utilizing step-scan mode required resetting the zero balance of the detector amplifier. Adjusting the zero balance allows the variable signal to be amplified to cover more

Figure 1. The amplitude data of the instrument is digitized in time at the black points, and the probability density for the amplitudes are collected in the top probability density curve from a complete data set. For the time period shown here, more points lie to the right than the left of the distribution.

data point near the lowest value will not occur. The changes in conditions in the spectrometer have what can be labeled as a memory, momentum, persistence, lag time, characteristic time, frequency limit, or stickiness. Given one value of the signal, the next one is more likely to remain nearby in magnitude. This characteristic amplitude memory also means that it is highly unlikely the signal’s variation range can be fully sampled B

DOI: 10.1021/acs.analchem.7b04940 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

(CD and Raman) was plotted in rank order from the lowest value to the highest value. Horn diagrams, named after the resulting shapes that are seen, are constructed from this ordered data by folding the magnitude versus index plot about a vertical axis passing through the median value. With N the total number of data points collected, the leftmost points lie at a position N/2 away from the median. On the horn plots, pairwise sum values of the signal magnitudes are also plotted; these sums are of the upper and lower points at the same ith horizontal position away from the median point. Each of the N/2 sums is plotted at the same ith position and lies near the projection of the median value in order to provide a visual illustration of the lack of balance in the values of the measured amplitudes about the median combined with the imperfect sampling of the range. Naming the amplitude of the summed points Ai, the difference of the mean and the median for the set of N points is

of the ADC range in a manner similar to a time-resolved stepscan protocol. No optional filters were used in the light path. CD Spectrometer, Data Collection, and Data Treatment. The absorbance and CD spectra were obtained on an Applied Photophysics Chirascan V100 spectrometer (Leatherhead, Surrey, U.K.). The spectra were collected in two different modes. To obtain the full spectra that are displayed in the CD section, the data was collected in scans where each recorded wavelength’s dwell time was either 50 ms or 100 ms and 200 separate, sequentially recorded spectra were obtained. The data was put into a spreadsheet file with the Chirascan v.4.5 software. For the full spectrum, Excel spreadsheet functions were used to calculate the average and median values at each recorded wavelength. Once again, like with the infrared, we wanted to compare measurements acquired at a submultiple rate of ∼0.5 Hz with those acquired much more quickly, here 66.7 kHz. For the measurements used to determine the autocorrelations of the data, the instrument was set to one of a chosen set of wavelengths, and 4000 data points were collected at 15 μs per point (total time 60 ms) and a second set at 25 μs per point (total time 100 ms). For these single-wavelength time-series measurements, the data was analyzed using Igor Pro and TableCurve 2D (Systat Software, San Jose, CA) statistical data treatment to find the autocorrelation properties and Fourier power spectra−[(FT of real part)2]−of each autocorrelation. The data collection electronics consisted of the following: The DC channel with a 16 bit, 200 kHz ADC; the AC channel with a 18 bit, 1 MHz ADC. The DC channel measures an average amplitude of the signal, that is, the total absorbance. The AC channel monitors the signal variation between lefthanded circularly polarized light and right-handed circularly polarized light. The photoelastic modulator optimum drive frequency is wavelength dependent, and the detector electronics are synchronized to that frequency. The samples were: 0.25 mM L-histidine (Sigma) self-buffered at pH 6 in a 2.0 mm pathlength cell; 12 mg/mL human serum albumin (Sigma) in phosphate buffered saline (PBS), cell pathlength 4.50 mm, submultiple period of 50 ms, 200 scans; and 3 mg/mL lysozyme (Sigma) in PBS, submultiple period 100 ms, 200 scans. All samples were held at 22.4 °C. Raman Spectrometer, Data Collection, and Data Treatment. The Raman spectra were obtained on a liquid CCl4 sample in a square capillary using a Horiba Evolution HR spectrometer (Edison, NJ) equipped with an open-electrode thermoelectric cooled CCD detector (1024 × 256 pixels), with a 532 nm laser at 100% power, 600 lines/mm grating, and 100× objective. Spectral calibration was based on the 1332 cm−1 line of diamond. It was run in SWIFT mode (normally used for rapid imaging), where a spectrum from 112 cm−1 through 1808 cm −1 was acquired each 50 ms with a storage time approximately 2 ms for a total of 1000 spectra over a period of less than 1 min. A second, similar run was made with 100 ms per spectrum. The sequential Raman spectra were imported into an Excel spreadsheet where medians and averages at each wavelength could be calculated for the full set of 1000 spectra or any subset of these. Construction of the Horn Diagrams. A standard operation of a spreadsheet can find the medians for the entire spectrum, but to illustrate the distribution of the data values at each collection point, the magnitudes comprising the data set at each chosen, representative mirror position (IR) or wavelength

1 N

∑ (A̅i − median) N

This protocol allows us to quantify the result of the combined biases in the fluctuations (especially from the low-probability extreme values) and in the sampling of those fluctuations for a particular measurement. Autocorrelation and Other Exploratory Data Analysis Methods. Autocorrelation plots and other exploratory data analysis was first made with Dataplot, a NIST suite of publically available software http://www.itl.nist.gov/div898/software/ dataplot/homepage.htm and http://www.itl.nist.gov/div898/ software/dataplot/summary.htm. Autocorrelation functions were also obtained within Igor with removal of the mean and scaling to unity at zero time. Power spectra of the autocorrelation functions were calculated using the full, symmetric function (both positive and negative times) with zero filling to reach 2n points. The frequencies were assigned to the transformed frequency points from the inverse of the total time of the zero-filled autocorrelations.



RESULTS AND DISCUSSION Infrared Spectra. Instability in Repetitive Scans. As is well-known to spectroscopists, with time-averaging of the signal, often longer averaging does not improve the S/N of the spectrum.4,5 In other words, the expected characteristics of random noise are not observed. As can be seen from Figure 2, where sequential single-scan total internal reflection spectra are compared, we can see clearly that all the data sets do not show only stochastic fluctuations about some fixed spectrum. We seek to understand the origins of this behavior, to characterize the fluctuations, and to obtain an output closer to the true underlying spectrum. We begin in the next subsection by looking at the infrared interferograms of repeated scans. Infrared Interferogram Submultiple Data Analysis Shows Instrument Instability. Figure 2 illustrates the instability found by sequentially recording 11 individual, spectra with the background correction from the first of them. In Figure 3 are shown the same section of three scans of the interferometer with each in the same direction and each acquired over a twosecond period. These three interferograms were selected to look at shorter term variations in the instrument. The chosen scans were the 10th, 20th, and 30th of a longer train, and as a result, they are separated in time by 20 s. As described in the caption, the line was drawn through the open points, the amplitudes of the 10th run. Although the data points from the

C

DOI: 10.1021/acs.analchem.7b04940 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

and the next subsection serve to demonstrate in a visual way how the data sets can be used to probe instrumental stochastic noise and fast drift, the properties illustrated in Figure 1. As can be seen in the graphs of Figure 4, by ordering the collected

Figure 2. Illustrating the IR instability: ten repeated total-internalreflection FTIR scans. Each curve is a single double-sided, forwardbackward scan. The background−the first of 11 scans−was subtracted from each. The individual scans began 3.37 s apart. The inset shows an expanded view of the 1200 to1700 cm−1 range.

Figure 4. Experimental time series of 10,000 points at 8 μs sampling times taken at four different, representative, fixed mirror positions of a step scanned FTIR. The curvatures of the ends differ both between the opposite ends of each individual curve and between the same ends of different curves. For example, note that the left sides of the two middle curves nearly intersect, while the curves themselves are well separated.

time-series submultiple amplitudes from lowest to highest for selected interferometer positions, each plot has the property that its slope is steep at the end values where few submultiple samples are found and flatter in the center where the values are more probable and where the median is located. These graphs are similar to but not congruent with the integral of a Gaussian−the error function. In the graphs, note that the shapes of the right and left ends of each individual curve differ from each other. Further, the shapes of the four low termini of the individual curves are not congruent, and also the shapes of the high termini are also dissimilar. These variations show clearly that the extremes of the fluctuations of the signal are not symmetric. These dissimilar values contribute directly to each average, but have less effect on the median as will be shown in the next subsection. Explaining Why Averages and Medians Differ in FTIR: The Horn Diagram. To illustrate how unbalanced sampling and unsymmetric amplitude magnitudes cause the differences between a spectrum from the medians compared to those from the averages, we have devised what we call horn diagrams. One is illustrated in Figure 5. To make a horn diagram, the higher-valued half of the ordered points such as shown in Figure 4 are folded back from the median point forming the blue horn shape. In the figure, the black line shows the median value’s projection along the x-axis. Each set of experimental values vertically in the same number position are summed to give the points making the red line. The sum of all the deviations of the red from the black divided by the number of points in the sum is the difference between the mean and median that are measured. Such graphs do not enhance any statistical meaning, but do illustrate the reasons the SMM is superior to averaging the signal amplitude. Here, the red line lies above the median value

Figure 3. A section of three infrared interferogram scans lasting 2 s separated by 20 s showing the instability of the data collection. Hollow circles show the first of the three scans, and the line of the graph is for this submultiple. The solid red and the solid green data points are from the 10th and 20th scans following, i.e., 20 s and 40 s later.

following scans lie at the same x-value, they do not overlie each other. On the other hand, for the smaller oscillations on the right-hand side, the low slopes cause the points acquired later to appear off the drawn line. Averaging the data points at each mirror crossing will distort the interferogram and the subsequent Fourier-transform infrared (FTIR) spectrum. Similarly, instrument jitter in Raman and other narrow-line spectra will broaden the spectrum compared to a single submultiple scan. Following in the next two subsections, we will show that the time-series fluctuations at each mirror position clearly differ, which shows why the SMM method improves on instrument averages. Properties of the FTIR Interferogram Submultiple Data Ordered by Amplitude. The plots of submultiple data in this D

DOI: 10.1021/acs.analchem.7b04940 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

quasiperiodic pattern appears for the time series at this single, representative mirror position in the FTIR instrument.

Figure 5. A horn diagram of representative infrared intensities such as those illustrated in Figure 4. Here, the unbalance (bias) is to the high side.

indicating that some combination of having more samples on the right half of Figure 1 than on the left half together with a skew to high-values of the measured signal create an average value that is biased high compared to the median. As should be clear, it is highly improbable that a distribution of fluctuations is equally sampled on both sides of the fluctuations’ eventual, final PDF. This bias will be proportionally smaller the longer the data collection period, but will not disappear except by chance. The horn diagram illustrates the origin of the effects of one contribution to the fast drift, that is, instrument drift that occurs over times equal to or less than the acquisition time of the spectrum. Note also, however, that the bias of the sampling caused by instrumental stickiness also causes a shift in the position of the median by half the number of the point imbalance. As a result, the effects of fast drift are not fully eliminated. The fast drift may arise from many different mechanisms, and the conventional division into signal-intensity-dependent (modulation, scintillation, 1/f) noise, signal-intensity-independent (detector-limited) noise, and noise proportional to the square root of the signal intensity (source-limited)2 seems less clear. As an example for the infrared spectrometer, inside the instrument we can expect time-dependent refractive index variations to affect the optical path.6 Further, source radiance instability is known to contribute to variations that depend on spectral resolution.7 Such changes also modulate the level of intensity-dependent noise of a solid-state detector.8 Autocorrelation Properties of the FTIR Submultiple Data. The autocorrelation plot is a tool to characterize sequential dependencies in a time series and, in part, to determine whether a time series is generated by a random process or not. The values for such an autocorrelation vary from +1 for certainty to change in the same direction to −1 for certainty to change in the opposite direction. A value of zero is usually interpreted as being random. The general assumption of Gaussian (random) noise comes from an analogy of picking numbers randomly from a population as described by Pearson.9 Nothing in this theory prevents picking, say, the lowest value present followed immediately by the highest one. However, as noted earlier, spectrometers are unlikely to exhibit large variations in output values instantaneously. (Later, we shall see an autocorrelation function that is characteristic of random noise for the Raman spectra.) However, as is seen in Figure 6A, a clear, nonrandom,

Figure 6. (A) Data taken for a single mirror position in a step scan IR data collection. The 4000 submultiples each are measured for 6.25 μs and plotted on the time axis. (B) The autocorrelation function for the data of (A). The major peaks occur at ∼8.3 ms (∼120 Hz) intervals. (C) Power spectrum−[(real FT)2]−of the autocorrelation shown in (B).

This pattern is shown in the graph in Figure 6B to have a periodic autocorrelation with the three largest peaks occurring approximately 8.3 ms apart, equivalent to about 120 Hz, twice the U.S. line frequency. The power spectrum of the autocorrelation seen in Figure 6C shows the 120 Hz feature and a few other, smaller contributions to the fast drift. CD Spectra. CD Submultiple Time-Series Data. The CD submultiple time-series amplitudes plotted in Figure 7 is data collected at 240 nm over 60 ms consisting of 4000 points with submultiples of 15 μs for a solution 0.25 mM L-histidine in H2O. This data is similar to measurements obtained both at 15 μs and 25 μs at other wavelengths. The data plot is relatively E

DOI: 10.1021/acs.analchem.7b04940 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

highest and lowest values, this asymmetry results in a bias of the measured value compared to that of the true, underlying spectrum. This is shown quantitatively in the next subsection where a representative curve is converted to a horn diagram. Quantitative Differences between Averages and Medians at Each CD Wavelength. Straightforward calculations involving the data shown on the horn diagram of Figure 9

Figure 7. Submultiple time series of 4000 points at 15 μs each over a total time of 60 ms for the CD data collection at wavelength 240 nm for L-histidine. The ordinate has a linear arbitrary scale.

smooth over the 60 ms of its collection compared with data with more expected behavior over about the same duration for the IR shown in Figure 6a. This smoothness occurs because the output is filtered by a circuit with a 0.3 ms time constant. The autocorrelation and autocorrelation’s power spectrum reveal more insights as shown below. Properties of the CD Submultiple Data Ordered by Amplitude. The ordered submultiple data for the 240 nm wavelength for CD instrument is graphed in Figure 8. The blue

Figure 9. Horn diagram for the 25 μs CD data of L-histidine shown in Figure 8. Here, the unbalance is to the low side.

for the 25 μs submultiple data collection for the CD at 240 nm illustrates how the fast drift contributes to imbalances. As before, the black, horizontal, straight line is the projection of the median, and the wavy red line plots the sums of the vertically aligned points. As can be seen, imbalances lie below the median unlike those for the infrared shown in Figure 5, which lie above the median. The quantitative relationships between the median and average found from the submultiple points are shown in Table 1. Table 1. CD Numerical Relationships sum of all data points/4000 = −0.17375 mdeg median = −0.13741 mdeg avg − median = −0.03634 mdeg

Here, the fast drift has caused a sum to be unbalanced to the negative side of the distribution. This analysis shows that the median rejects part of the fluctuation bias that is retained in the average. Autocorrelation Properties of the CD Submultiple Data. Shown in Figure 10 is the autocorrelation function of the data presented in Figure 7. There are clearly harmonic components in these autocorrelations, the origin of which is uncertain. What is also interesting is that we see no evidence of a correlation time that is separable from the harmonic components, that is, the initial drop does not appear to be exponential. In Figure 11 the Fourier components of the Figure 10 autocorrelations are shown. They indicate that different frequencies appear in the power spectra at each wavelength position. The largest peak, which is for the 260 nm data, reflects the regular, continuing oscillation of the autocorrelation plot from which it is derived. The origins of these components of fast drift in the CD instrument are unclear. Differences between Average and Median CD Spectra. In Figure 12 and its inset, the CD spectrum of human serum albumin shows little difference between the average values and

Figure 8. Ordered submutiple data for 4000 time series points collected at 240 nm for separate runs with 15 μs and 25 μs submutiples for the CD of 0.25 mM L-histidine. The red curve (upper curve on the right side) is the ordered data from Figure 7.

curve’s points (lower of the two at the right side) were obtained at 25 μs intervals and the red at 15 μs during a separate run. Unlike the relatively smooth curves of the infrared measurements, the curves themselves are far from smooth and even show breaks in the slope. The origins of this roughness are not understood, but it does not arise from the steps of the ADC (18-bit; 262,144 levels for CD) converters even if the experimental amplitudes were only a fraction of the full range. Again we see that the curves for the submultiple CD time series are not the same even with two series collected at the same wavelength about 20 min apart. For the CD submultiple data, when the distributions of the fluctuations of the signal are not symmetric, especially at the F

DOI: 10.1021/acs.analchem.7b04940 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Figure 10. Autocorrelations of the 25 μs time series data at three selected CD wavelengths, 200, 240, and 260 nm.

Figure 13. Absorption spectrum of 3 mg/mL lysozyme collected simultaneously with the CD data shows the average values (circles) and the SMM values (triangles); 200 scans were collected. Note the lowest value of the graphed absorbance is not zero.

independent bias is not clear. Comparison of data collected at different rates for Figures 8 and 12 indicate that similar submultiple properties are found without regard to the collection rate (data not shown). Raman Spectra. Raman Submultiple Time-Series Data. Figure 14 shows the ordered submultiple data from three

Figure 11. Power spectra (in the same colors) of the CD autocorrelations shown in Figure 10. The inset enlarges the smallerpeak region.

Figure 14. Ordered submultiple amplitudes at three Raman wavelengths, 1603 cm−1, 1333 cm−1, and 1437 cm−1. These represent a sharp peak, a baseline region, and a broad peak, respectively. The inset shows plots of the 1000-point submultiple time series for two representative wavelengths.

representative wavelengths of the Raman spectrum of CCl4, along with two representative time series. The two notable characteristics of the ordered amplitude curves are, first, their bumpiness especially near the extremes, and second, the differences in the curves at the endmost points. With the relatively long 50 ms submultiple period, it is perhaps not surprising that the autocorrelation of the 1000 submultiple data points have the properties approximating white noise. This is illustrated in Figure S1 in the Supporting Information. White noise autocorrelation has a necessary peak at zero (since an observation is 100% correlated with itself) with the remaining correlations being low and fixed. Fourier

Figure 12. CD spectrum of 200 scans of human serum albumin showing the average values as circles and the SMM spectrum as triangles. Only the two shortest wavelengths show visually clear differences between the median and average values. Inset is the full spectrum collected at the same time.

the medians except for the lowest few collected points. On the other hand, for the absorbance spectra seen in Figure 13, which result from 200 submultiple spectra of 3 mg/mL lysozyme, the SMM spectrum lies above that of the averages over the entire spectral range. The cause of the apparently absorbanceG

DOI: 10.1021/acs.analchem.7b04940 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Figure 15. Raman spectra of the medians of the submultiple data in two different regions of the Raman spectrum. The red lines connect the points of the medians, and the blue points show the values of the averages with both from the same data set. The left plot shows the average and median spectra of 100 submultiples in its wavelength range, and the right figure shows a similar graph for both at a different part of the spectrum for the full 1000 submultiples.

indicate that the highly variable extreme values would be rejected by the median calculation. However, unlike Raman and CD measurements, the large and ubiquitous background absorbance of an IR spectrum, the single-beam instrument requires two spectra to be acquired at different times. However, at these different times, the instrumental response differs in terms of both stochastic noise and fast drift, which makes the promise of SMM analysis difficult to realize in this type of relative measurement. Collecting submultiple data and carrying out the SMM data treatment is expected to benefit other spectrometries that do not register single-photon or single-event data. These benefits include the ability to track down sources of instrument instability as well as producing spectra that more accurately reflect the properties of the materials being measured as well as having better S/N. Collecting submultiple data and applying SMM data treatment provides a new window to observe the properties of spectrometric instruments and to separate fast drift from Gaussian noise and quantitate both as well as giving context to explain the limited improvement in S/N with prolonged data collection times.

transforms of a number of Raman autocorrelation functions appear with numerous peaks. (The Fourier transform of the autocorrelation is shown in Figure S2.) But the peaks that appear for such power functions do not appear with particular amplitudes at particular positions. Together, these show that under the experimental conditions the instrument does not have correlated fluctuations. Comparison between the Raman Spectra of Averages and of Medians. As can be seen from the Raman spectra in Figure 15, where the spectrum of the medians is plotted as a line and the spectrum of the averages is shown as points, the median does seem to be doing its job of reducing the spectral fluctuations by, in effect, rejecting the outliers. The points of the averages tend to have greater positive and more negative values than do the equivalent median points. Both spectra were calculated from the same stored data set, and the spectral medians appear to have a modestly better S/N ratio then that of the averaged submultiple data.



CONCLUSIONS The results described for infrared, CD, and Raman spectrometries found from collecting, storing, and exploring submultiple data show the promise of utilizing such data to benchmark and improve the performance of instrument hardware and software. The representative horn diagrams indicate the ubiquitous contributions that the momentum of the instrument’s components and environment make to fluctuations of the output data. This fast drift can be rejected at least partially by the SMM data treatment reported here. The Raman spectra show us that even with data that appear to exhibit typical white noise, that the spectral S/N can be improved with the use of SMM data treatment. The CD results indicate that the instrument used in this study has a significant harmonic component to its fluctuations nevertheless has only minor differences in the short wavelength range between an average and a median treatment. However, the median of the absorption spectrum, collected on the same instrument, appears to show the ability of SMM to locate the presence of bias in the averaged data. The infrared interferograms have been shown to have jitter in their measurement, which contributes to line broadening in the average. In both the CD and infrared data, we show that the rate of the submultiple collection does not compromise the benefits of SMM analysis. In addition, like Raman and CD, the ordered sets of infrared submultiples



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.7b04940. Figures of a representative Raman autocorrelation and the Fourier transform of that autocorrelation (PDF).



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. *E-mail: fi[email protected]. *E-mail: [email protected]. ORCID

Curtis W. Meuse: 0000-0002-0847-6594 Notes

The authors declare no competing financial interest. Note: Certain commercial equipment, instruments, chemicals, and software are identified in this paper to specify the experimental procedures adequately. Such identification is not intended to imply recommendation or endorsement by the H

DOI: 10.1021/acs.analchem.7b04940 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.



ACKNOWLEDGMENTS Thanks to Professor Faye Rubinson, Georgetown University, for acquisition of the Raman spectra. Support is acknowledged from the National Science Foundation MRI Program (NSF CHE-1429079) for purchase of the Raman instrument. We also wish to thank Dave Krile, Lee Richter, Veronika Szalai, and Jeff Hudgens for their critical reading of the typescript. We are grateful for comments by Sergey Shilov of Bruker Optics and Lindsey Graham of Applied Photophysics for discussions about the inner workings of their respective instruments.



REFERENCES

(1) Gabbouj, M.; Coyle, E. J.; Gallagher, N. C., Jr. Circuits Systems Signal Process 1992, 11, 7−45. (2) Bhargava, R.; Levin, I. W. Anal. Chem. 2002, 74, 1429−1435. (3) Gorry, P. A. Anal. Chem. 1990, 62, 570−573. (4) Hazel, G.; Bucholtz, F.; Aggarwal, I. D. Appl. Opt. 1997, 36, 6751−6759. (5) Hirschfeld, T. Appl. Spectrosc. 1976, 30, 234−236. (6) Manning, C. J.; Griffiths, P. R. Appl. Spectrosc. 1997, 51, 1092− 1101. (7) Flanigan, D. F.; Samuels, A. C.; Ben-David, A. Appl. Opt. 2004, 43, 2767−2776. (8) van Vliet, K. M. Proc. IRE 1958, 46, 1004−1018. (9) Pearson, K. London, Edinburgh and Dublin philosophical magazine and journal of science 1900, 50, 157−175.

I

DOI: 10.1021/acs.analchem.7b04940 Anal. Chem. XXXX, XXX, XXX−XXX