Anal. Chem. 1991, 63,357-360
357
Noise Reduction of Gas Chromatography/Mass Spectrometry Data Using Principal Component Analysis Terrence A. Lee, Lisa M. Headley, and James K. Hardy*
Department of Chemistry, The University of Akron, Akron, Ohio 44325
Prlnclpai component analysis has been evaluated as a digital filter to improve the overall quality of gas chromatography/ mass spectrometry (GCIMS) data sets. Data are lnltiaily read Into a matrix, scaled, and then processed by using the NIPALS algorithm, which is used to separate signal from the matrix. A new matrix is then reconstructed as the difference between the original and residual matrices, which Is then rescaled and a new data flie created. By use of a six-component solvent mixture with samples of from 0.5 to 150 pg of each component, slgnlficant Improvements in mass spectral quality and spectral matches were observed. Signal to noise was improved by a factor of from 2 to 100 due to Improved Integration. Linearity and precision of chromatographic data were also Improved.
INTRODUCTION Mass spectrometers are relatively noisy in comparison to other gas chromatographic detectors. While to a high degree this is due to the inherent noise associated with the ion multiplier, quality can also be degraded as a result of changes in chromatographic conditions, such as carrier flow rate and column bleed, during an analysis. A number of investigators have reported filtering systems to improve the signal to noise ratio (SIN) (1-10). Hieftje summarized several instrumental methods for enhancing SIN (1,2), and Doerfler and Campbell reported the use of an analog delay device for on-the-fly reduction of ion multiplier noise (3). The ability to postprocess gas chromatography/mass spectrometry (GC/MS) data sets allows for the use of software noise filters to improve signal to noise, peak shape, and spectral quality. Summing of several measurements is commonly used by GC/MS software to provide an initial smoothing of data (4,5) though the actual approach taken will vary based on the analyzer type. The Hewlet-Packard approach for its mass selective detector (MSD) is to sum from 2 to 128 measurements a t 0.1-amu intervals, whereas Finnigan's approach with its ion trap system is to add entire scans. Postprocessing for both systems is limited to either a moving average or Savitzky-Golay smoothing (6) of chromatographic data and simple averaging for mass spectra. The use of polynomial smoothing, measurement of noise variance, estimation of peak shape, and cross-correlation (7-10) has also been reported for enhancement of S I N for integrated signals. Multivariate data analysis based on pattern recognition has been applied to a number of spectral and chromatographic methods. The typical goal of pattern recognition is to classify a new sample by comparing it to a reference set of predetermined ones. Derde and Massart reviewed a number of techniques and their application (11). Thomas and Haaland compared several least-squares methods and principal component analysis (PCA) for use in quantitative spectral analysis (12). With PCA, a data matrix can be decomposed into linear combinations of orthogonal vectors by diagonalization of the covariance matrix (13). This, however, requires that all components be determined simultaneously. The NIPALS
algorithm can be used to determine a single component accounting for the greatest amount of variance in a data set (14). The matrix is degraded into a loading vector and a vector of scores (principal component or PC) and a matrix residual. Variables showing the highest degree of correlation are removed, being combined in the first PC, and their effect subtracted from the residual. Subsequent PCs will account for other sources of variance in decreasing order. Subsequent components can be determined with additional calls to the routine. Shah and Gemperline used PCA to classify nearinfrared reflectance spectra (15),and it application to mass spectra of mixtures has also been reported (16). The approach has also been applied to chromatographic data to classify beverages, cheese, and soy sauce (17-20).Recently, PCA has been employed as a digital filtering method for reduction of artifacts in two-dimensional Fourier transform (2D FT)NMR data sets (21). Commonly, a plot of the first and second PC is used to determine whether or not underlying trends in the set exist. However, when applied to three-dimensional data sets such as 2D NMR or GC/MS data, the method can be used to separate noise and artifacts from signal.
EXPERIMENTAL SECTION Instrumentation. A Hewlett-Packard capillary column gas chromatograph (Model 5890) interfaced by direct capillary inlet to a Hewlett-Packard MSD (Model 5970) was used for the production of all GC/MS data sets. Samples were introduced by using a Hewlett-Packard autoinjector (Model 7673A). The chromatographicconditions were sample size, 1pL; sample split, 1OO:l; column, 25 m X 0.2 i.d. HP-1; and temperature program, T,,, = 30 "C, tI = 3.2 min, ramp 40 OC/min to 140 "C. The temperatures of the injection port and transfer line were set to 200 and 250 " C , respectively. The MSD was operated in a scan mode averaging 4 samples per scan, the HP default, with a mass range of 30-150 amu. Instrument control and data collection were accomplished by using a Hewlett-Packard Unix ChemStation (Model 59940) running version A.01.03 of the MSD software. After each analysis, data files were transferred to a Sun SPARCstation 1 workstation for postprocessing. Samples. To evaluate the method, a mixture of low molecular weight solvents was used. This was done to assure that a significant portion of each substance's mass spectrum would occur in the low mass region, *--
50
,
02.
z
,r
- -.:
.............
............/...:,' ........................... ......................
/
.............
I '
--------1 '
- ..-