Anal. Chem. 1996, 68, 2913-2915
Appearance of Discontinuities in Spectra Transformed by the Piecewise Direct Instrument Standardization Procedure Paul J. Gemperline* and JungHwan Cho
Department of Chemistry, East Carolina University, Greenville, North Carolina 27858 Paul K. Aldridge and S. Sonja Sekulic
Pfizer Central Research, Eastern Point Road, Groton, Connecticut 06340
Several years ago, we noted that spectra transformed by the piecewise direct standardization (PDS) method may contain discontinuities. Having noticed that the problem was a recurring one, we studied it and recently diagnosed its source. Our investigations suggest that this problem also occurs in applications of window factor analysis, evolving factor analysis, and any other procedure that uses piecewise principal component models. In this work, we report the source of the problem and illustrate it with one example. A procedure is presented for eliminating the problem that is effective in PDS pattern recognition applications. Further work is needed to develop modified algorithms suitable for calibration applications. The piecewise direct standardization (PDS) method for transferring multivariate calibration models between different instruments has been described in a series of recent publications since its first introduction in this Journal.1-4 The method can compensate for mismatches between spectroscopic instruments due to small differences in optical alignment, gratings, light sources, detectors, etc. The method works well in many near-infrared assays, where principal component regression (PCR) or partial least-squares (PLS) calibration models are used with a small number of factors. We have recently attempted to use the PDS method for transferring pattern recognition libraries, where the data analysis method employed is an analysis of residual variance (SIMCA).5-7 Inspection of the residual spectra revealed the occurrence of discontinuities which were caused by the PDS transfer method. Subsequent investigations revealed that it is possible for these discontinuities to appear in any PDS application. Optimization of PDS window size and number of factors is insufficient to guarantee that discontinuities will be eliminated. The discontinuities are data specific; e.g., different training sets produce discontinuities in different spectral regions. It is impossible to predict where the discontinuities will appear in a spectrum. (1) Wang, Y.; Veltkamp, D. J.; Kowalski, B. R. Anal. Chem. 1991, 63, 2750. (2) Wang, Y.; Kowalski, B. R. Appl. Spectrosc. 1992, 46, 764. (3) Wang, Y.; Lysaght, J. M.; Kowalski, B. R. Anal. Chem. 1992, 64, 562. (4) Wang, Z.; Dean, T.; Kowalski, B. R. Anal. Chem. 1995, 67, 2379. (5) Wold, S.; Sjoestoem, M. In Chemometrics: Theory and Application; Kowalski, B. R., Ed.; American Chemical Society: Washington, DC, 1977; pp 242282. (6) Gemperline, P. J.; Webber, L. D.; Cox, F. O. Anal. Chem. 1989, 61, 138. (7) Shah, N. K.; Gemperline, P. J. Anal. Chem. 1990, 62, 465. S0003-2700(96)00419-2 CCC: $12.00
© 1996 American Chemical Society
In the PDS algorithm, a subset of calibration transfer samples are measured on a master instrument and a slave instrument, producing spectral response matrices R h 1 and R h 2. A permutation matrix F is used to map spectra measured on the slave instrument so that they match the spectra measured on the master instrument.1
R h1 ) R h 2F
(1)
At each wavelength i, a least-squares regression vector b is computed for a narrow range of wavelengths j, that bracket the point of interest.
jr1,i ) jr2,jbi
(2)
These regression vectors are then assembled to form a banded diagonal transformation matrix F, where p is the number of response values to be converted. Either PLS or PCR may be used to compute b at less than full rank by discarding principal components with very small eigenvalues.
F ) diag(b1T, b2T, ..., biT, ..., bpT)
(3)
Discontinuities arise when “swapping” occurs (described below) between discarded and retained eigenvalues for the solution at adjacent wavelengths bi-1 and bi. ILLUSTRATION OF DISCONTINUITIES The following set of data is used to illustrate how discontinuities are introduced by eigenvalue swapping. Near-IR reflectance spectra of five lots of a raw drug substance were acquired on two different NIRSystems Model 6500 spectrophotometers, using a standard powder sample cup. The spectra were acquired over the range of 1100-2498 nm and digitized at 2 nm intervals. A three-factor PDS transfer matrix was calculated using 31 point windows and eigenvectors from principal component analysis. The transform matrix F was then used to map the five calibration transfer spectra from the slave instrument to the master instrument. The spectra of 20 additional lots not used for computing the transform matrix were also mapped from the slave instrument to the master instrument. A plot of the five calibration transfer spectra is shown in Figure 1. A discontinuity can be observed between 1232 and 1234 nm that is typical of the kind of problems Analytical Chemistry, Vol. 68, No. 17, September 1, 1996 2913
Table 1. Eigenvalues Computed by Constructing Piecewise Principal Component Models from 31 Point Windowsa
a Five calibration transfer spectra were used. Swapping of eigenvector patterns 3 and 4 occurs between 1232 and 1234 nm.
Figure 1. PDS-transformed near-IR spectra of a raw drug substance. Five calibration spectra were used to compute a three-factor PDS transformation matrix using 31 point windows. Discontinuities appear between 1232 and 1234 nm. Smaller discontinuities also appear between 1448/1450 nm, 1460/1462 nm, 1542/1544 nm, 1546/ 1548 nm, 1728/1730 nm, and 2172/2174 nm.
Figure 3. PDS regression vectors computed from piecewise principal component models with 31 point windows. Five calibration transfer spectra were used. Swapping of eigenvector patterns (see Figure 2) between 1232 and 1234 nm causes a large change in the shape of the resulting regression vector.
Figure 2. (A) Eigenvector 2 and (B) eigenvector 3 computed by constructing sequential piecewise principal models in 31 point windows. Five calibration transfer spectra were used (see text). Swapping of eigenvector patterns 3 and 4 occurs between 1232 and 1234 nm due to swapping of eigenvalues shown in Table 1.
we have experienced. Although they are not shown here, the 20 validation spectra show discontinuities of a similar magnitude in the same location. Discontinuities appear at six other regions in this data set, although they are slightly smaller in magnitude than the ones shown in Figure 1. Changing the size of the window and the number of factors causes discontinuities to appear elsewhere. For example, when two and four factors are used, discontinuities appear in three and seven different regions, respectively. Use of a window size of 25 points and three factors gives discontinuities in four different regions. EIGENVECTOR SWAPPING Starting at 1200 nm, a series of eigenvectors and their corresponding eigenvalues were computed from sequential win2914 Analytical Chemistry, Vol. 68, No. 17, September 1, 1996
dows 31 points wide using the above calibration transfer data set. Patterns in the eigenvector that characterize the spectral absorption bands may be observed in these eigenvectors (see Figure 2). Notice that computation of the principal component model by sliding the 31 point window to the next sequential position tends to produce eigenvalues of similar magnitude and eigenvectors with similar characteristic patterns. Because eigenvalues 3 and 4 are close in magnitude at 1232 and 1234 nm, a small change has caused their order to be swapped (see Table 1). The regression vector b1232, constructed from the resulting eigenvector patterns 1, 2, and 3, therefore has a different shape than vector b1234, which was constructed from eigenvector patterns 1, 2, and 4 (see Figure 3). It is the abrupt change in the shape of these regression vectors that causes discontinuities in the transformed spectra. Whenever piecewise solutions of less than full rank are used to compute regression vectors in the above fashion, the possibility exists for this type of swapping to occur. It is important to note that the swapping may not adversely affect calibration performance in PCR and PLS near-IR assays. It becomes problematic, however, when one begins looking at residual spectra. Since the SIMCA pattern recognition method employs residual spectra, these discontinuities cause problems in our pattern recognition work when transformed libraries are employed. GLOBAL PRINCIPAL COMPONENT MODELS A simple modification to the PDS procedure can eliminate these discontinuities. Instead of computing the principal components in a piecewise fashion, a “global” principal component model is computed for all five calibration transfer spectra (e.g., one principal component model is computed for the full wavelength
Figure 4. Modified PDS-transformed near-IR spectra of a raw drug substance. Use of a global principal component model to compute the PDS transformation matrix eliminated the discontinuities between 1232 and 1234 nm.
range of these calibration transfer spectra). As before, the three eigenvectors having the largest eigenvalues are retained. Piecewise regression vectors are then computed from appropriate windows in the global eigenvectors; hence, there is no opportunity for swapping to occur. For comparison, the transformed spectra shown in Figure 1 were recomputed using our modified PDS procedure and are shown in Figure 4. Inspection of these plots reveals a smooth, continuous transition over the range from 1230 to 1270 nm, instead of the abrupt changes that can be observed
in Figure 1. Although not shown here, discontinuities are also absent from the transformed 20 validation spectra when the our modified PDS procedure is used. Computation of residual spectra of these 20 samples with a mean-centered, three-factor PC model revealed a slight increase in a magnitude of the residuals compared to the residuals for the untransformed spectra. In conclusion, the PDS algorithm has been modified to eliminate small discontinuities that appear due to eigenvalue swapping in the piecewise computation of regression vectors. We have reported a simple modification of the procedure which can eliminate these discontinuities. The spectra transferred with the modified routine do not exhibit the discontinuities and may give more reliable pattern recognition performance. The original PDS algorithm was able to produce a transformation matrix F of very high rank, whereas the modified algorithm reported here cannot. In calibration applications that require an F matrix of high rank, the modifications reported here may not be suitable. We are currently working on other strategies for removing discontinuities that will benefit calibration applications by computing an F matrix of high rank. Received for review April 25, 1996. Accepted May 23, 1996.X AC9604191 X
Abstract published in Advance ACS Abstracts, July 1, 1996.
Analytical Chemistry, Vol. 68, No. 17, September 1, 1996
2915