Anal. Chem. 1996, 68, 997-1002
Near-IR Detection of Polymorphism and Process-Related Substances P. K. Aldridge, C. L. Evans, H. W. Ward, II, and S. T. Colgan
Analytical Research and Development, Pfizer Central Research, Eastern Point Road, Groton, Connecticut 06340 N. Boyer and P. J. Gemperline*
Department of Chemistry, East Carolina University, Greenville, North Carolina 27858
This paper reports a fast, sensitive pattern recognition method for determining the polymorphic quality of a solid drug substance, polymorph A. The pattern recognition method employed can discriminate between the desired polymorphic form of the drug substance and another undesired polymorph. In addition, it can reliably detect samples containing minor levels of the undesired polymorph. The method can also discriminate between the desired polymorph and other crystalline forms. Most significantly, this sensitive method has been successfully transferred to six other near-IR instruments without resorting to sophisticated multivariate calibration transfer strategies. Near-infrared spectroscopy has found widespread use in agricultural, food science, and chemical processing applications.1-3 Most of these applications employ multivariate calibration to quantitatively determine the composition of samples or a physical property that is dependent on a sample’s chemical composition. In recent years, near-IR spectroscopy has found increasing use in pharmaceutical applications.4,5 Many pharmaceutical applications employ pattern recognition analysis to train a computer to recognize spectra of acceptable samples and reject unacceptable samples by multivariate statistical analyses. This approach has been used to identify excipients and raw materials,6-8 intact tablets and blister packed tablets,9-11 and polymorphic forms.12 (1) Osborne, O. B.; Fearn, T.; Hindle, P. H. Practical Near Infrared Spectroscopy with Applications in Food and Beverage Analysis; Longman Scientific & Technical: Essex, England, 1993. (2) Near-infrared Technology in the Agricultural and Food Industries; Williams, P., Norris, K., Eds.; American Association of Cereal Chemists: St. Paul, MN, 1990. (3) Handbook of Near-Infrared Analysis; Burns, D. A., Ciurczak, E. W., Eds.; Marcel Dekker: New York, 1992. (4) Lodder, R. A.; Hieftje, G. M. Appl. Spectrosc. 1988, 42, 556-558. (5) Rostaing, B.; Delaquis, P.; Guy, D.; Roche´, Y. S. T. P. Pharma 1988, 4, 509-515. (6) Gemperline, P. J.; Webber, L. D.; Cox, F. O. Anal. Chem. 1989, 61, 138144. (7) Shah, N. K.; Gemperline, P. J. Anal. Chem. 1990, 62, 465-470. (8) Corti, P.; Dreassi, E.; Savini, L.; Petriconi, S.; Genga, R.; Montecch, L.; Lonardi, S. Process Control Qual. 1992, 2, 131-142. (9) Dempster, M. A.; Jones, J. A.; Last, I. R.; MacDonald, B. F.; Prebble, K. A. J. Pharm. Biomed. Anal. 1993, 11, 1087-1092. (10) Aldridge, P. K.; Mushinsky, R. F.; Andino, M. M.; Evans, C. L. Appl. Spectrosc. 1994, 48, 1272-1276. (11) Dempster, M. A.; Meagher, N. E.; MacDonald, B. F.; Gemperline, P. J.; Boyer, N. R. Anal. Chim. Acta, in press. (12) Ciurczak, E. W. Appl. Spectrosc. Rev. 1987, 23, 147-163. 0003-2700/96/0368-0997$12.00/0
© 1996 American Chemical Society
Polymorphism exists whenever an element or molecule has the ability to crystallize in more than one distinct crystalline species. Studies have shown that polymorphism can be quite common in drug molecules, with as many as 67% of steroids exhibiting multiple forms.13 As these multiple forms are differing crystalline entities, their physical properties can also vary widely. Solubility, melting point, chemical reactivity, and even bioavailability can vary with polymorphic form.14 Because polymorphs are differing arrangements of molecules, molecular phenomena that give rise to IR spectra may also be affected. For example, hydrogen bonding and polarization may be very different due to different arrangements of functional groups. Numerous authors have shown IR spectroscopy to be useful in the study of polymorphism,15 with some studies showing utility in quantitation.16 Most recently, near-IR spectroscopy has been shown to be a powerful tool in the study of polymorphism.12 Due to the elimination of sample preparation difficulties, near-IR has the additional advantage of reducing the likelihood that sample handling will induce polymorphic conversion.17 In addition, nearIR may prove useful for investigating polymorphs where mid-IR spectra appear identical.18 In this paper, a pattern recognition method is reported for the identification and determination of the quality of a drug substance, which will be referred to as polymorph A throughout this work (see Figure 1). Additional details can be found in ref 19. One of the main goals of this project was to develop a rapid, inexpensive test that could easily discriminate between the two polymorphic forms of the drug substance, only one of which is desired. It was also necessary to discriminate between three crystalline forms of the drug substance. To determine the quality of the drug, it was important that the method be able to detect minor levels of the undesired polymorphic form. Two computational methods were examined in this work: the Mahalanobis distance method and the SIMCA residual variance method.6,7,25 In this paper, we will show that of these two, the Mahalanobis distance pattern recognition method met all of the above requirements. Compared to some of the pattern recognition methods reported in the literature, the methods reported here may be (13) Kuhnert-Brandsla¨tter, M. Pure Appl. Chem. 1965, 10, 133-144. (14) Halebin, J.; McCrone, W. J. Pharm. Sci. 1969, 58, 911-929. (15) Mesley, R. J. Spectrochim. Acta 1966, 22, 889-917. (16) Kendall, D. N. Anal. Chem. 1953, 25, 382-389. (17) Borka, L.; Backe-Hansen, K. Acta Pharm. 1968, 5, 271-278. (18) Herzberg, G. Infrared and Raman Spectra of Polyatoimc Materials; Van Nostrand: New York, 1945; p 262. (19) Kadin, S. B. U.S. Patent 4,556,672, 1985.
Analytical Chemistry, Vol. 68, No. 6, March 15, 1996 997
Figure 1. Structure of drug substance. Polymorphs A and B have identical structures but different unit cells.
easier to implement because they employ parametric statistical tests and probability thresholds. A decision threshold for hypothesis testing is automatically computed to account for changes in the number of training samples or input variables so that the user does not need to select and adjust arbitrary thresholds during development of pattern recognition methods.25 One significant impediment to widespread adoption of nearIR methods has been the problem of transferring a “calibration model” or “training sets” from one instrument to another instrument or multiple instruments. Slight differences in response (spectral bandwidth, wavelength calibration, etc.) between instruments or long-term drift within an instrument can cause significant calibration errors and significant pattern recognition errors, especially when full-spectrum multivariate models are employed. While full-spectrum multivariate models offer advantages, such as the ability to handle complex mixtures with better sensitivity, these advantages come at a price. The full-spectrum multivariate models typically exhibit a greater dependence on the instrument’s measurement reproducibility compared to methods that employ signals taken at a few wavelengths. Recently, the multivariate instrument standardization has been developed to solve the calibration transfer problem.20-24 The methods employ a small subset of calibration transfer “standards”, which are used to map the response of one or more “slave instruments” to a single “master instrument”. In this paper, data are presented that demonstrate that the Mahalanobis distance pattern recognition method for polymorph A is not sensitive to variation in response between instruments. In this case, multivariate instrument standardization methods were not needed. PATTERN RECOGNITION METHOD DEVELOPMENT The basic procedure for developing a pattern recognition method consists of a training stage and a testing stage. During the training process, one must develop a training set of acceptable samples. Choosing an appropriate training set is the single most important step, because a pattern recognition method can only recognize variability to which it is accustomed. The training set must include examples of all expected sources of spectral variability (lot-to-lot, instrument drift, etc.). After a representative training set is obtained, the computer is trained according to whichever pattern recognition technique is to be used. (20) Wang, Y.; Veltkamp, D. J.; Kowalski, B. R. Anal. Chem. 1991, 63, 27502756. (21) Wang, Y.; Lysaght, M. J.; Kowalski, B. R. Anal. Chem. 1992, 64, 562-564. (22) Wang, Y.; Kowalski, B. R. Appl. Spectrosc. 1992, 46, 764-771. (23) Shenk, J. S.; Westerhaus, M. O. Crop Sci. 1991, 31, 1694-1696. (24) Bouveresse, E.; Massart, D. L.; Dardenne, P. Anal. Chim. Acta 1994, 297, 405-416. (25) Gemperline, P. J.; Boyer, N. R. Anal. Chem. 1995, 67, 160-166.
998
Analytical Chemistry, Vol. 68, No. 6, March 15, 1996
During the testing phase, identification error rates, type I error rates, and type II error rates are assessed (vide infra). An identification error occurs when a sample is identified as a member of the wrong class. In our methods, this occurs when the sample’s probability density for the false set is larger than the true set. In addition to identification, a pattern recognition method can also be used to assess the quality of a sample (e.g., perform qualitative analysis) by classifying a sample as acceptable or unacceptable. This is accomplished in the pattern recognition methods described here by comparing a sample’s probability density to a threshold. For example, we often define acceptable samples as those falling inside the 99% confidence interval for the respective test statistic. A type I error occurs when an acceptable sample is classified as unacceptable during qualitative analysis. A pattern recognition method’s type I error rate is relatively easy to assess. Acceptable samples from lots not used in the training set are tested, and the number rejected is used to estimate the type I error rate. A type II error occurs when an unacceptable sample is classified as acceptable during qualitative analysis. Testing a method’s ability to reject unacceptable samples is difficult because unacceptable samples are often not readily available (too rarely produced, too expensive to produce, etc.). One alternative is to artificially prepare unacceptable samples by adulterating them with process intermediates, degradation products, or undesirable crystalline forms. By adulterating samples in the lab, one has the advantage of knowing exactly how much adulterant will cause a sample to be rejected. On the other hand, adulterated samples may not be representative of unacceptable samples produced by the commercial production process. PATTERN RECOGNITION METHODS The Mahalanobis distance and SIMCA residual variance models utilized in this paper use principal component analysis (PCA). To conduct either test, a principal component model is first computed for a representative training set using the wellknown singular value decomposition. Principal components associated with small singular values are deleted so that a subset of k components remains.
UkSkVkT ) X
(1)
Here the matrix X represents the training set (n rows in X are spectra), U represents normalized principal component scores (e.g., eigenvectors of XXT), S is the matrix of singular values, and rows of VT are eigenvectors of XTX. Mahalanobis Distances. For the Mahalanobis distance method, we compute the unknown sample’s normalized principal component scores according to eq 2, and its squared distance, D2, from the mean, u j , according to eq 3,
ui ) xiVTS-1
(2)
j )TΣ-1(ui - u j) D2 ) (ui - u
(3)
where the training set’s variance-covariance matrix, Σ, is estimated by eq 4.
1 j )T(Uk - u j) Σ ) (Uk - u n
(4)
Hotelling’s T2-test26 can be used to calculate the confidence level for accepting the null hypothesis, (ui ) u j ), against the alternate hypothesis, (ui * u j ), where we seek to determine if a specific observation in the training or test set, ui, is a plausible value for the population mean u j:
[
]
(n - k) pr ) 1 - P D2 > (R) F k(n - 1) k,n-k
(5)
where n is the number of training samples, k is the number of principal component scores used, D is the Mahalanobis distance, R is the probability level for critical values of F, and pr is the confidence level probability. Using this strategy, it is possible to determine the probability of class membership, R, even when the number of factors, k, or the number of training samples, n, is adjusted. Complete details of the method may be found in refs 6, 7, and 25. SIMCA Residual Variance Test. To conduct the SIMCA residual variance test, we compute the matrix of n residual spectra digitized at m points for the training set, R, and the residual class variance, s02:
R ) (X - xj ) - UkSkVkT s02 )
n
1 m(n - k - 1)
(6)
m
∑∑r
2
ij
(7)
i)1 j)1
To perform the test on a single sample from the test set, we compute the sample’s residual spectrum according to eq 8, its residual variance according to eq 9, and its residual variance distance according to eq 10.
ri ) (xi - xj ) - uikSkVkT si2 )
1 m
D2 )
(8)
m
∑r
2
ij
(9)
j)1
si2 s02
(10)
The number of degrees of freedom in the denominator of eq 9 must be changed to (m-k/n-1/n) when training samples are tested. An F-test can be used to calculate the confidence level for accepting the null hypothesis, H0 (si2 ) s02), against the alternate hypothesis, H1 (si2 * s02):
pr ) 1 - P[D2 > F1,n-k-1(R)]
(11)
where n is the number of training samples, k is the number of principal component scores used, R is the probability level for critical values of F, and pr is the confidence level probability. As before, it is possible to determine the probability of class membership, R, even when the number of factors, k, or the (26) Johnson, R. A.; Wichern, D. W. Applied Multivariate Statistical Analysis, 3rd ed.; Prentice Hall: Englewood Cliffs, NJ, 1992; p 180.
Table 1. Composition of Adulterated Polymorph A Standards undesired polymorph (wt %) std no.
set I
set II
1 2 3 4 5
7 9 11 13 15
3 5 7 9 12
number of training samples, n, is adjusted. Complete details of the method may be found in refs 6, 7, 25 and 27. EXPERIMENTAL SECTION Reflectance spectra from 34 lots of polymorph A were collected over a period of 2 days on an near-IRSystems 6500 spectrophotometer (NIRSystems, Silver Springs, MD) equipped with a standard sample cup. This instrument served as a reference instrument. All pattern recognition methods were trained using only spectra acquired from this instrument. In addition, spectra of four different crystalline forms, including the undesired polymorph B (five lots), solvate (three lots), free acid (13 lots), and hydrate (five lots), were collected on the reference instrument. Polymorphs A and B have exactly the same formula but different unit cells, as determined by X-ray crystallography. The scan range for all spectral data was 400-2500 nm at 2 nm intervals; however, only the region from 1100 to 2500 nm was used for multivariate analysis. Multivariate analysis was performed with programs written in the MATLAB programming language, v. 4.2c for Macintosh, Windows, and Unix. (The Mathworks Inc., South Natick, MA). Computations were performed on an IBM RS/6000 workstation and Macintosh computers. The log ratios of all spectra were computed using a reference scan collected from a block of 99% reflective Spectralon (SRS-99-010, Labsphere Inc., North Sutton, NH). All pattern recognition calculations were performed using log(1/reflectance) data. To test the transferability of the pattern recognition method, spectra of five acceptable lots of polymorph A not used for training were measured in random order on the reference instrument and six other NIRSystems 6500 spectrophotometers at different Pfizer sites in the United States and Europe. Acceptability was previously determined by conducting approval testing according to specifications for the drug substance. Polymorphic purity of these lots had previously been confirmed by X-ray powder diffraction. To test the sensitivity of the method to minor levels of impurities, two sets of adulterated samples of polymorph A were prepared. These were prepared by mixing different amounts of the undesired crystalline form with an acceptable lot of polymorph A. Samples of the desired form were spiked with the undesired form by weighing ∼5 g of the desired form into an amber glass vial and adding the appropriate amount of the undesired form. The vial was then sealed, shaken, and stirred with a metal spatula for about 10 min. Differing lots of desired and undesired polymorphs were used when possible. The composition of the adulterated samples is shown in Table 1. Adulterated set I was (27) Wold, S.; Sjostrom, M. In Chemometrics: Theory and Application; Kowalski, B. R., Ed.; ACS Symposium Series 52; American Chemical Society: Washington, DC, 1977; pp 242-282.
Analytical Chemistry, Vol. 68, No. 6, March 15, 1996
999
Table 2. Training Results for Polymorph A, 30 Spectraa Mahalanobis Sample
Figure 2. Representative near-IR reflectance spectra of polymorph A and four other crystalline forms: (s) polymorph A, (‚‚‚) free acid, (-‚-) undesired polymorph, (-‚‚-) hydrate, and (- - -) solvate.
measured on the reference instrument and remote instruments I, III, V, and VI. Adulterated set II was measured on remote instruments II and IV. As before, the scan range for these samples was 400-2500 nm, but only the region from 1100 to 2500 nm was used for multivariate analysis. As before, log ratios of these spectra were computed using a reference scan collected from a block of 99% reflective Spectralon. RESULTS AND DISCUSSION Figure 2 shows representative spectra of polymorph A and four other crystalline forms, including the undesired polymorphic form. Significant spectral differences can be seen between polymorph A and the other crystalline forms. Four of the original 34 training samples of polymorph A were found to be outliers by near-IR during the training phase because of their unusually large distances. Visual examination of the spectra of these samples revealed some small differences from the expected shape; consequently, these four lots were resampled. Upon resampling and inspection of the spectral data, it was determined that the spectral deviations observed for these four samples were reproducible. In these cases, near-IR, which measures bulk properties, was detecting acceptable variability in particle size, solvent content, and/or purity levels. These four samples were removed from the training sets, and all further pattern recognition calculations were performed using the “cleaned” training set of 30 spectra. Training Set Results. A principal component model (12 factors) was constructed for the clean training set of 30 samples and used to estimate the normalized principal component scores of the remaining test samples by eq 2. Mahalanobis distances and their corresponding probability levels were calculated according to eqs 3 and 5, respectively. Eight principal components (99.999% cummulative variance) were selected for the Mahalanobis distance test. SIMCA residual variance distances and their corresponding probability levels were calculated according to eqs 10 and 11, respectively. Four principal components (99.999% cummulative variance) were selected for the SIMCA test. The 99% confidence level was selected; e.g., samples were classified as “unacceptable” if their probability level by eq 5 or 11 was less than 0.01. These samples lie outside the 99% confidence level. The number of principal components was selected to give the lowest number of training set and test set errors; however, only training set samples were used to construct the pattern recognition models. The distances and probability levels for the training set are shown in Table 2. All of the samples lie well within the 99% 1000 Analytical Chemistry, Vol. 68, No. 6, March 15, 1996
lot 1 lot 2 lot 3 lot 4 lot 5 lot 6 lot 7 lot 9 lot 10 lot 11 lot 12 lot 13 lot 14 lot 15 lot 16 lot 17 lot 18 lot 19 lot 20 lot 21 lot 22 lot 23 lot 24 lot 27 lot 28 lot 29 lot 30 lot 31 lot 32 lot 33
SIMCA residual variance
distance
prob level
distance
prob level
3.896 1.584 2.791 1.949 2.735 1.812 2.224 2.186 3.509 4.639 1.855 2.991 1.136 1.531 1.667 2.157 2.072 2.291 2.145 2.662 1.932 1.931 4.525 2.704 1.285 4.421 2.778 3.628 3.752 3.511
0.236 0.979 0.657 0.931 0.681 0.953 0.865 0.875 0.361 0.089 0.947 0.572 0.998 0.983 0.971 0.883 0.904 0.845 0.886 0.711 0.934 0.934 0.104 0.694 0.995 0.120 0.663 0.319 0.278 0.360
1.149 0.369 0.987 0.488 1.137 0.358 0.608 1.262 1.418 2.788 1.141 0.566 0.472 0.742 0.389 0.528 0.907 0.629 0.605 0.491 0.346 0.697 1.629 0.984 0.386 1.019 0.956 0.494 1.398 0.996
0.251 0.712 0.324 0.626 0.256 0.721 0.543 0.208 0.157 0.005 0.254 0.572 0.637 0.458 0.697 0.598 0.365 0.529 0.546 0.624 0.729 0.486 0.104 0.326 0.700 0.309 0.339 0.622 0.163 0.320
a Eight principal components were used for the Mahalanobis distance results. Four principal components were used for the SIMCA residual variance results. The sample’s distances and their corresponding probability levels are shown. Probability levels were estimated by numerical integration of the appropriate distribution function.
confidence interval, except lot 11. Using Mahalanobis distances, lot 11 has a probability density of 0.089, indicating that it lies just inside the 91% confidence level. The SIMCA residual variance test indicates that the sample lies at the 99.5% confidence level, causing it to be rejected by this method. Visual examination of the spectrum of lot 11 revealed that it had a baseline offset somewhat different from those of the other training samples, due to a different particle size distribution than those of the other training samples. Test Set Results. Acceptable Samples. The results for acceptable test samples are summarized in Table 3. Each value in the table represents the average taken over five lots by instrument. The same five lots were measured on all seven instruments. The response characteristics of the individual instruments can be evaluated by comparing the averages to those obtained using the reference instrument. In general, it is expected that the individual instruments would have average distances approximately equal to the average distance obtained using the reference instrument. Using the pattern recognition methods described in this paper, the response characteristics of instruments I, III, IV, V, and VI seem well matched to the reference instrument. Instrument II consistently gave larger distances for the five lots used in this study, indicating that its response characteristics were not as well matched to the reference instrument as those of the other five instruments. It is anticipated that a larger fraction of acceptable samples would be rejected by this instrument.
Table 3. Average Distances and Probability Levels of Acceptable Samples Measured on Seven Different Instrumentsa Mahalanobis
SIMCA residual variance
instrum no.
distance
prob level
distance
prob level
ref I II III IV V VI
3.150 3.069 4.425 2.041 3.671 3.356 3.125
0.507 0.562 0.126 0.874 0.337 0.462 0.521
1.461 1.339 1.783 0.862 1.317 1.479 1.459
0.145 0.192 0.081 0.393 0.193 0.147 0.150
a The same five acceptable samples were measured on all seven instruments. The response characteristics of instruments I, III, IV, V, and VI seem well matched to those of the reference instruments. The response characteristics of instrument II are not well matched to those of the other instruments.
Figure 3. Plot of average Mahalanobis distances as a function of amount (%) of undesired polymorph. (O) Outlier (see text). Table 5. Average Distances for Process-Related Substancesa
Table 4. Average Distances and Probability Levels for Adulterated Samplesa amt B, wt %
N
3 5 7 9 11 12 13
2 2 7 7 5 2 4
Mahalanobis distance prob level 8.90 14.28 18.30 21.68 27.29 22.34 35.87
7.59 × 10-4 6.24 × 10-8 1.29 × 10-9 6.85 × 10-10 2.19 × 10-13 1.16 × 10-11 7.88 × 10-15
SIMCA residual variance distance prob level 2.18 3.78 4.25 5.35 7.01 5.17 9.22
4.76 × 10-2 1.68 × 10-4 6.10 × 10-5 1.00 × 10-5 4.85 × 10-11 1.33 × 10-6 3.32 × 10-14
a The column labeled N indicates the number of different instruments used for computing the averages. One sample adulterated at the 3% level was not detected by the SIMCA residual variance method.
Overall, the Mahalanobis distance method seemed less sensitive to variation between instruments. Five of the measured spectra of acceptable lots had SIMCA residual variance probability levels that were marginally significant; e.g., they were in the range from 0.03 to 0.09. Three of these spectra came from instrument II. Using the Mahalanobis distance method, only one spectrum had a probability level that was marginally significant. Adulterated Samples. The results for adulterated samples are summarized in Table 4. Each value in the table is the average over N instruments. For the drug substances studied in this paper, the Mahalanobis distance method is more sensitive to adulterated samples than the SIMCA residual variance method. Large Mahalanobis distances and very small probability levels indicated that it is highly unlikely that future samples with similar levels of adulteration would be classified as acceptable. Furthermore, a linear relationship was observed between the Mahalanobis distance and the percentage of adulteration (see Figure 3), suggesting that levels as low as 2% of polymorph B may be reliably detected. The point at 12% may be unreliable due to the fact that it represents an average of only two instruments. While it is possible to detect low levels of adulteration, other criteria need to be used to establish meaningful control levels for polymorph B. Furthermore, it is important to note that the number of adulterated samples employed in this study is too small to establish limits of detection. We suspect the Mahalanobis distance method is more sensitive to adulteration in this application because eight principal components were used, compared to only four principal components for
substance
N
Mahalanobis
SIMCA residual variance
polymorph B solvate free acid hydrate
4 3 13 5
349.5 259.0 207.1 365.5
96.1 133.1 79.2 150.9
a The corresponding probability levels were too small to reliably compute. The column labeled N indicates the number of different lots used for computing the averages. All spectra were measured on the reference instrument.
Table 6. Number of Acceptable and Unacceptable Samples Found by the Two Pattern Recognition Methodsa Mahalanobis distance (8 principal components)
SIMCA residual variance (4 principal components)
data set
accept
reject
accept
reject
training set test set (other instruments, acceptable samples) test set (other instruments, adulterated samples) test set (same instrument, foreign samples)
30 36
0 0
29 36
1 0
0
35
1
34
0
25
0
25
a Overall, the Mahalanobis distance method gave the best results (zero classification errors) while the SIMCA residual variance method gave two classification errors.
the SIMCA method. Use of eight components more fully characterizes the training set response, making small deviations from this response easier to detect. The SIMCA method can be made more sensitive by increasing the number of principal components used; however, the type I error rate (e.g., rejection of acceptable samples) becomes unacceptably high in this application. For example, increasing the number of principal components from four to five in the SIMCA method causes five out of 36 acceptable test samples to be rejected and causes an additional 16 out of 36 samples to have marginally significant probabilities (e.g., 0.01 < pr < 0.050). Increasing the number of SIMCA principal components from four to eight makes the situation even worse, causing 30 out of 36 samples to be rejected. Other Crystalline Forms. The distances for other crystalline forms are summarized in Table 5. Each value represents an Analytical Chemistry, Vol. 68, No. 6, March 15, 1996
1001
average over N lots measured on the reference instrument. The probability levels for these samples are not reported because they are so close to zero that reliable estimates cannot be computed with double-precision floating point operations. The Mahalanobis distances, in general, are larger than the SIMCA distances, indicating that the Mahalanobis distance method may be more sensitive to minor levels of adulteration by these process-related substances. CONCLUSIONS The performances of the two pattern recognition methods are compared in Table 6. The Mahalanobis distance method gave the best overall performance for the drug substances studied in this application. No classification errors were obtained using it. The SIMCA residual variance method had one type I error (false
1002
Analytical Chemistry, Vol. 68, No. 6, March 15, 1996
rejection of an acceptable sample) and one type II error (false acceptance of adulterated samples). Compared to traditional methods for determination of polymorphism, like X-ray powder diffraction, the near-IR method is fast, provides comparable reliability, and is likely to be less costly to implement and maintain. Automated computations can be easily implemented, so highly trained personnel are not needed to perform the test.
Received for review October 2, 1995. Accepted January 6, 1996.X AC950993X X
Abstract published in Advance ACS Abstracts, February 15, 1996.