Anal. Chem. 2006, 78, 6003-6011
Articles
Characterization of Chloramphenicol Palmitate Drug Polymorphs by Raman Mapping with Multivariate Image Segmentation Using a Spatial Directed Agglomeration Clustering Method Wei-Qi Lin,† Jian-Hui Jiang,*,† Hai-Feng Yang,† Yukihiro Ozaki,‡ Guo-Li Shen,† and Ru-Qin Yu*,†
State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China, and School of Science and Technology, Kwansei Gakuin University, Sanda 669-1337, Japan
Chemical imaging analysis holds great potential in probing the chemical heterogeneity of samples with high spatial resolution and molecular specificity. This paper demonstrates the implementation of Raman mapping for microscopic characterization of tablets containing chloramphenicol palmitate polymorphs with the aid of a new multivariate image segmentation approach based on spatial directed agglomeration clustering. This approach performs the agglomeration clustering by stepwise merging the pixels possessing both spatial closeness and spectral similarity into clusters that define the image segmentation. The incorporation of spatial closeness into the clustering process enables the approach to improve the robustness and avoid poorly defined image segmentation arising from clusters with highly separated pixels. Additionally, the stepwise merging of clusters offers an F-statistic-based procedure to automatically ascertain the number of image segments. Raman mapping analysis of tablets containing two polymorphs of chloramphenicol palmitate followed by multivariate image segmentation reveals that the proposed technique offers the identification of each polymorph and a quantitative visualization of the spatial distribution of the polymorphs identified. This technique holds promise in rapid, noninvasive, and quantitative polymorph analysis for pharmaceutical production processes. Chemical imaging has recently gained significant attention in analytical chemistry, because these imaging methods, in particular those based on Raman, infrared, and fluorescence spectroscopy, have proved their capability of rapidly probing the chemical heterogeneity of samples at high spatial resolution and with * To whom correspondence should be addressed. E-mail:
[email protected];
[email protected]. Tel: +86-731-8822577. Fax: +86-731-8822782. † Hunan University. ‡ Kwansei Gakuin University. 10.1021/ac0520902 CCC: $33.50 Published on Web 07/29/2006
© 2006 American Chemical Society
molecular specificity.1,2 The detailed knowledge of the distribution of chemical species over a surface of interest is essential for chemical manufactory processes, for instance, in the mass production of pharmaceutical preparations. For the characterization of pharmaceutical preparations such as tablets, it is very important to gain the information concerning the homogeneity of the tablets and the distribution of the compositions. The coupling of spectroscopic imaging with chemometrics could substantially enhance the information outcome from the experimental methodology.3 For this purpose, multivariate exploratory and resolution methods are adapted to image analysis. Of those methods, principal component analysis, K-means clustering algorithm, and multivariate curve resolution are used frequently in multivariate imaging.4-12 The application of most existing multivariate methods requires the reduction of multivariate images to a two-way data matrix by ignoring the (x, y) spatial positions in the image plane. Such an approach, however, might lead to the loss of the specific high-order data structure, in particular the spatial correlation of multivariate images in the (x, y) dimensions, thereby creating extra uncertainty in subsequent data analysis. For instance, in the analysis of pharmaceutical tablets, such approaches are susceptible to poorly defined image segmentation in that segments with geometric dimensions not (1) Treado, P. J.; Morris, M. D. In Spectroscopic and Microscopic Imaging of the Chemical State; Morris, M. D., Ed.; Marcel Dekker: New York, 1992; Chapter 3. (2) Schaeberle, M. D.; Karakatsanis, C. G.; Lau, C. J.; Treado, P. J. Anal. Chem. 1995, 67, 4316-4321. (3) deJuan, A.; Tauler, R.; Dyson, R.; Marcolli, C.; Rault, M.; Maeder, M. TrAC, Trends Anal. Chem. 2004, 23 (1), 70-79. (4) Kargacin, M. E.; Kowalski, B. R. Anal. Chem. 1986, 58, 2300-2306. (5) Lavine, B.; Workman, J. J., Jr. Anal. Chem. 2004, 76, 3365-3372. (6) Shafer-Peltier, K. E.; Haka, A. S.; Motz, J. T.; Fitzmaurice, M.; Dasari, R. R.; Feld, M. S. J. Cell. Biochem. 2002, 39, 125-137. (7) Duponchel, L.; Elmi-Rayaleh, W.; Ruckebusch, C.; Huvenne, J. P. J. Chem. Inf. Comput. Sci. 2003, 43, 2057-2067. (8) Noordam, J. C.; van den Broek, W. H. A. M. J. Chemom. 2002, 16, 1-11. (9) Artyushkova, K.; Fulghum, J. E. Surf. Interface Anal. 2002, 33, 185-195. (10) Artyushkova, K.; Fulghum, J. E. Surf. Interface Anal. 2004, 36, 1304-1313. (11) Smentkowski, V. S.; Keenan, M. R.; Ohlhausen, J. A.; Kotula, P. G. Anal. Chem. 2005, 77, 1530-1536. (12) deJuan, A.; Tauler, R. Anal. Chim. Acta 2003, 500 (1-2), 195-210.
Analytical Chemistry, Vol. 78, No. 17, September 1, 2006 6003
greater than the ingredient granule size contain pixels from different clusters. Therefore, it is highly desirable to incorporate the spatial correlation into multivariate image segmentation. Approaches constructed along this direction are expected to show improved robustness or the stability to measurement errors and poor definition of segmentations. Recently, Willse proposed multivariate methods that allow for spatial correlation of neighboring pixels based on Poisson and multinomial mixture models to segment secondary ion mass spectrometry images into chemical homogeneous regions.13 In this paper, we have developed a new approach to image segmentation based on spatial directed agglomeration clustering for identifying the chemically unique regions in multivariate images. This approach differs from most current multivariate methods in two main features: (1) It utilizes spatial closeness between neighboring pixels during clustering. This means that both the spectral differences and spatial closeness are taken as the measure of distance of pixels in the agglomeration of the pixel into a cluster. (2) An F statistic is introduced as a termination criterion for the pixel agglomeration such that the developed procedure could proceed automatically without the user’s intervention and any prior knowledge. The incorporation of spatial closeness into the clustering process provides the possibility of improving the robustness of the approach and avoiding poorly defined image segmentation arising from clusters with highly separated pixels. In the present study, the proposed approach is coupled with Raman mapping analysis for the characterization of microscopic spatial distribution of multiple drug polymorphs of chloramphenicol palmitate. Various polymorphs of the same drug are different not only in their crystal shape and structure but also in their solubility, melting point, thermodynamic stability, density, vapor pressure, and electrical properties. It is known that changes in polymorphic behavior may have great impact on the pharmaceuticals’ stability, suspendibility, and bioavailability as well as their mixing and milling properties.14,15 To better understand the pharmaceutical manufacturing processes and produce drugs of consistent quality, it is important to characterize the polymorphic transition and the spatial distribution in solid pharmaceuticals. Polymorphs are commonly analyzed by off-line detection tools, including X-ray powder diffraction, solid-state NMR, and differential scanning calorimetry.16 Nevertheless, it seems very difficult to couple these methods with imaging or mapping techniques for microscopic analysis. Characterization of the microscopic spatial distribution of multiple drug polymorphs continues to pose a challenge in analytical chemistry. Raman microspectroscopy as a nondestructive analytical tool offers distinct advantages for microscopic characterization of polymorphism in such pharmaceuticals’ systems. In the present study, chloramphenicol palmitate is taken as a model drug. The results obtained with the multivariate image segmentation (MIS)-aided Raman mapping analysis show that the proposed technique offers definite evidence for the identity of each polymorph and allows a quantitative visualization of the spatial distribution of these two polymorphs. (13) Willse, A.; Tyler, B. Anal. Chem. 2002, 74, 6314-6322. (14) Haleblian, J. K.; McCrone, W. J. Pharm. Sci. 1965, 58, 911-929. (15) Helmy, R.; Zhou, G. X.; Chen, Y. W.; Crocker, L.; Wang, T.; Wenslow, R. M., Jr.; Vailaya, A. Anal. Chem. 2003, 75, 605-611. (16) Threlfall, T. L. Analyst. 1995, 120, 2435-2460.
6004 Analytical Chemistry, Vol. 78, No. 17, September 1, 2006
Table 1. Comparison of the Results for Polymorph Mixed Tablets Obtained by the Proposed Method with Those by K-Means Cluster Analysis actual wt % of form I
calcd wt % by proposed method
std dev
calcd wt % by K-means cluster analysisa
std dev
25.3 50.1 74.9
27.7 52.8 76.2
3.81 2.87 7.86
29.0 58.2 78.2
4.90 13.7 9.03
a Data represent the best result obtained when the K-means algorithm was run 10 times.
EXPERIMENTAL SECTION Instrumentation. Raman spectra were recorded on a Jobin Yvon micro-Raman spectrometer (RamLab-010). It comprises an integral Olympus BX40 microscope with a 50× objective (8 mm) focusing the laser on the sample to collect the backscattered radiation, a notch filter cutting the exciting line, a holographic grating (1800 grooves mm-1) providing a spectral resolution of 2 cm-1, and a semiconductor-cooled 1024 × 256 pixels chargecoupled device detector. Radiation of 632.8 nm from a He-Ne laser with power of ∼5 mW was used as the excitation line that gave a spot size of ∼3 µm in diameter on the sample surface. The slit and pinhole were set at 300 and 1100 µm, respectively. The grating was centered at 1200 cm-1, with data collected over a range of 702-1625 cm-1, to provide the best differentiation between the polymorphs. The collection time was 3 s for each spectral record, and the average of these two records was taken as the spectrum for the pixel. An automatic XY stage with the capability of moving one step of 3 µm was utilized to perform Raman mapping. Each spectrum is composed of 990 readings, and each image cube has 30 × 30 spectra. The mapping area of ∼90 × 90 µm2 was selected arbitrarily on a monochrome image of the surface obtained by an optical axis conjugate camera and a white light source. Sample Preparation. The two pure polymorphs of chloramphenicol palmitate, form I and form II, were supplied by National Institute for the Control of Pharmaceutical and Biological Products (NICPBP, Beijing, China). Both crystal forms are white powder with purity greater than 99.5%, particle size distribution ranging from 4.0 to 9.0 µm, melting point between 86 and 89.5 °C, free chloramphenicol less than 381 ppm and free palmitate ∼0.37%. Known amounts of these two pure polymorphs were weighed separately to an accuracy of 0.1 mg and mixed together to prepare a set of polymorph mixtures. For instance, 25.3 mg of form I was mixed with 74.6 mg of form II to prepare a sample containing 25 wt % form I. Prolonged agitation of the mixture that would cause the transition of form II into form I should be avoided. The mixture was placed in an agate mortar and thoroughly agitated to guarantee a uniform mixture while avoiding the crystal form transition. The total weight was kept ∼100 mg for each sample. The polymorph mixtures containing 0, 25, 50, 75, and 100 wt % of form I were prepared in this manner. The exact weight percentage of these tablets is given in Table 1. Then, the mixtures were pressed into tablets with a mild pressure of ∼100 bar by a 769YP15A tablet press (Tianjin Scientific Instruments). Chemical Imaging. Images from five tablets with form II as bioactive component in different proportions (100, 75, 50, 25, and
0 wt %, respectively) were recorded. The experimental output was a data cube with three dimensions among which the first accounted for the spectral intensities at different wavenumbers and the remaining two for the geometric coordinates x and y of the pixels on the surface. For example, if an image comprises 990 × 30 × 30 coordinates, the last two numbers denoting the spatial x and y dimensions, by unfolding one obtains a 990 × 900 data matrix as all the 990 × 30 matrices, one per x coordinate, are appended in sequence according to the y coordinate. A row of the unfolded data matrix represents one spectrum, that is, a pattern in cluster analysis, while the columns represent the wavenumbers, i.e., the variables of a pattern. The baselines of all spectra were corrected by a standard piecewise linear fitting procedure and then normalized to unit length before further treatment. Multivariate Image Segmentation Based on Spatial Directed Agglomeration Clustering Method. The rationale of the spatial directed agglomeration clustering method is to utilize spatial closeness between pixels during the construction of the cluster. This implies that both the spectral differences and spatial closeness are taken in the measure of distance of pixels in the agglomeration of the pixel into a cluster. Such incorporation of spatial closeness into the clustering process enables the approach to exhibit improved stability to measurement errors, in cases when the data are contaminated by noise of large variance, and to circumvent the risk of poorly defined image segmentation. Such a risk might arise from the situation when segments with geometric dimensions not greater than the ingredient granule size contain pixels from different clusters. The spatial directed agglomeration clustering procedure starts from a cluster composed of two nearest pixels among the pixels with the Raman spectra belonging to the top 10% most similar ones as measured by the norm of the projection residual vector between each pair of spectra (Top 10% similarity was determined via extensive simulations and real data analysis.). The cluster then grows by stepwise merging the next pixel that is spatially the nearest to the cluster among unclassified pixels with the spectra belonging to the top 10% most similar ones to the mean spectrum of the cluster. In addition, an F statistic, which accesses the significance of difference between the spectrum at the pixel to be merged and those of the cluster formed, is introduced as a criterion to terminate the agglomeration of current cluster and resume the next cluster. Consequently, the whole image segmentation is defined until all pixels are agglomerated into clusters. Then, the first principal component (PC) of the spectra in each cluster is computed using principal component analysis (PCA). These PCs offer the estimates of the pure spectra of the polymorphs, thereby giving definite evidence for the identity of each polymorph. To visualize the segmentation results, a series of correlation maps are constructed using the correlation coefficients of the spectra at all pixels versus each pure spectrum of the polymorph. These correlation maps allow a visualizable quantitative ascertainment of the spatial distribution of the polymorphs. The introduced F statistic, as a criterion to terminate the agglomeration of current cluster, is analogous to the statistic in soft independent modeling of class analogy (SIMCA) classifica-
tion.17 It is defined as follows:
1
[yT(I - xmeanxmean+)y] (m - 1)
F)
1
(1)
n
∑x
(m - 1)(n - 1)k ) 1
T k
+
(I - xmeanxmean )xk
where m is the number of the wavenumbers, y is the spectrum at the pixel newly searched out to merge into the current cluster, I is the m by m identity matrix, xmean is the mean spectrum of the current cluster, n is the number of pixels in the current cluster, and xk denotes the spectrum at the kth pixel in the current cluster. It is clear that the term yT(I - xmeanxmean+)y is the squared norm of the residual vector of y projected on xmean, representing the variance of the object y with respect to the current cluster, while n
∑x
T k
(I - xmeanxmean+)xk/(n - 1)
k)1
reflects the within-cluster variance of the current cluster. This formula generates an F statistic to access the significance of difference between the newly searched out spectrum and the spectra in the current cluster. Suppose that the three-way data cube obtained at I × J pixels and K wavenumbers from Raman mapping is unfolded into the following two-way matrix
X) [x11, x12, ‚‚‚, x1J, x21, x22, ‚‚‚, x(I - 1),J, xI1, xI2, ‚‚‚, xIJ] (2) where xij represents the spectrum recorded at the (i, j)th pixel. The implementation of the spatial directed agglomeration clustering method is outlined as follows: Step 1. Initialize the spectra matrix corresponding to all pixels for clustering X1 ) X. The dissimilarity matrix is constructed for all spectra in X1. The dissimilarity between each pair of spectra xi and xj in X1 is computed as the projection residual of one spectrum on another according to the following formula
dij ) xiT(I - xjxj + )xi
i, j ) 1, 2, ..., J, i * j
(3)
Note that the projection residual of xi on xj equals to the projection residual of xj on xi, since all the spectra have been normalized to unit norm. The use of dissimilarity instead of the traditional similarity measure enables automatic implementation of an Fstatistic-based terminating criterion for the agglomeration clustering process. Step 2. If there are more than one entries or columns in X1, the algorithm continues. Otherwise, terminate the algorithm. Step 3. The top 10% most similar Raman spectra are searched within the dissimilarity matrix. The nearest two pixels among (17) Wold, S.; Sjo ¨stro¨m, M. SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy. In Chemometrics: Theory and Applications; Kowalski, B. R., Ed.; ACS Symposium Series 52; American Chemical Society: Washington, DC, 1977; pp 243-282.
Analytical Chemistry, Vol. 78, No. 17, September 1, 2006
6005
Figure 1. Chemical structure of chloramphenicol palmitate.
those corresponding to these spectra are agglomerated into a cluster (If two at equal distances, any one is agglomerated into a cluster.). Step 4. If there are remaining pixels not merging into any cluster, the algorithm continues. Otherwise, terminate the algorithm. Step 5. The mean spectrum of the current cluster, xmean, is calculated by n
xmean )
∑x /n k
(4)
k)1
where xk is the kth spectrum in the cluster and n is the number of the spectra in the cluster. Step 6. The spectral dissimilarities between xmean and the spectra at the remaining pixels (i.e., The pixels do not belong to any cluster.) are calculated. Among the top 10% most similar Raman spectra with respect to xmean, the pixel nearest to any of the pixels in the current cluster is then selected as the candidate. Step 7. If the number of pixels in the current cluster is smaller than 10 or the F value for the candidate pixel to the current cluster is less than a predefined value of Fcrit, the candidate is merged into the current cluster and goes to step 4. Otherwise, the agglomeration of the current cluster is terminated with the pixels in the cluster excluded in X1 in subsequent clustering. Go to step 2. Step 8. Compute the first PC of the spectra in each cluster as the estimates of the pure spectra of different components. A series of correlation maps are then constructed using the correlation coefficients of the spectra at all pixels versus each pure spectrum. Note that in step 7 a termination criterion that the number of pixels in the current cluster is smaller than 10 is used in this step for circumventing the cluster with too few pixels, since in the setting of multivariate image analysis a cluster containing too few pixels seems chemically insignificant. These small clusters might be generated by very few spectra with high similarity. In this case, the within-cluster variance is singular and statistically implausible for the F test that validates the inclusion of remaining pixels. RESULTS AND DISCUSSION The chemical structure of chloramphenicol palmitate is depicted in Figure 1. This drug is known to exist in four crystal forms, named form I, form II, form III, and the amorphous form. Form I and form II are thermodynamically stable, and pure crystal forms of these two polymorphs are commercially available. Form II has higher free energy and better bioavailability than form I; thus, it is deemed as the optimum crystal form for manufacturing. Form III and the amorphous form are unstable and easily transformed into form I or form II, and pure crystal samples of form III and the amorphous form are difficult to obtain com6006 Analytical Chemistry, Vol. 78, No. 17, September 1, 2006
Figure 2. Raman spectra of the pure crystal forms of chloramphenicol palmitate: (a) crystal form I; (b) crystal form II.
mercially. Moreover, form I and form II are quite stable and will not convert into other polymorphs under general processing conditions, enabling the investigator to prepare tablets composed of only these two crystal forms with desired proportions. Additionally, these two polymorphs could be obtained commercially as microparticles with relatively narrow size distribution, which prevents the size distribution from significantly influencing the analytical outcome. Therefore, the tablets containing two polymorphs of chloramphenicol palmitate, form I and form II, were taken in the present study as the model system for the Raman mapping studies. This experimental design enables the investigators to confirm the obtained results in a straightforward manner. Figure 2 shows the Raman spectra collected on the powder samples of pure crystal form I and form II of chloramphenicol palmitate. One observes that the spectrum of crystal form I differs to some degree from that of crystal form II with the most substantial distinction appearing in the range from 740 to 1180 cm-1. It seems that other regions might also be valuable in terms of the peak ratios. However, spectra of different crystal forms in these regions show smaller dissimilarity than those in the selected range. The introduction of these regions might cause extra variability in the clustering. Therefore, the spectral region from 740 to 1180 cm-1 was used in the present study. It is also observed in Figure 2 that there is appreciable variations within the spectra
Figure 3. Projections of the normalized spectra on the first two PCs for the tablet containing 25 wt % crystal form I; (•) data points in cluster 1; (+) data points in cluster 2; (4) data points in cluster 3; (*) data points in cluster 4.
of the same crystal form, especially for the crystal form I. This increases the difficulty in data analysis, since it is essential for multivariate image segmentation studies to require a dominant between-cluster variance over the within-cluster variance. It is noteworthy that there is no appreciable change in Raman spectra at the same pixel during the measurements, implying that no solidstate conversion occurs under investigation. To further investigate the variance of the spectra obtained in Raman mapping experiments, the spectra were normalized to have unit 1-norm, i.e., the sum of the spectra was 1. Provided the Raman spectra are a linear combination of the pure spectra of the crystal forms, the spectral data points are distributed closely around a linear segment. This geometric property has been illustrated in self-modeling curve resolution literature.18,19 Considering that the pure spectra of two polymorphs are very similar to each other, this linear segment is relatively short such that the conventional display of these data on the plane spanned by the first two PCs might be misled by the mean of the spectral data points. As a result, the mean of these spectral data points were subtracted, which implied a translation of the linear segment to cross the origin. Then, the spectral data could be displayed in the twodimensional plane for a visual evaluation. For instance, Figure 3 shows the projections of these spectra subjected to normalization and mean-centering on the first two PCs for the tablet containing 25 wt % crystal form I. It is clear that the projection points do not show a line-shaped configuration. The variance accounted for by the first PC is 0.5974, while the second PC represented a variance of 0.1432. The first two PCs’ cumulative ratio of variance is 0.7406. These observations indicate that the data obtained in our Raman mapping experiments exhibit substantial errors as described by the bilinear model, which disables some elegant curve resolution methods in such data analysis. Bright-field optical microscopy was also used to inspect the spatial distribution of the polymorphs on the tablet sample. An optical image of the tablet containing 25 wt % crystal form I is (18) Jiang, J.-H.; Ozaki, Y. Appl. Spectrosc. Rev. 2002, 37, 321-345. (19) Lawton, W. H.; Sylvestre, E. A. Technometrics 1971, 13, 617-633.
Figure 4. (a) Optical image of the tablet containing 25 wt % crystal form I. (b) Raman spectrum taken at pixel b, showing spectral characteristic of crystal form II. (c) Raman spectrum taken at pixel c, showing spectral characteristic of crystal form I.
shown in Figure 4a. The optical image contrast is primarily due to the surface roughness, which is not specifically correlated with the compositions of the sample. As one can see, the two pixels marked as b and c are both located in the bright area that exhibits little contrast; the corresponding Raman spectra depicted in Figure 4b and c, however, show substantial difference and clearly reveal the presence of form II at pixel b and form I at pixel c. These findings indicate that bright-field optical imaging is not capable of discriminating different polymorphs of chloramphenicol palmitate, and polymorphism studies are not a simple problem that could be solved easily. Moreover, it is observed that in comparison with that for the Raman spectra obtained with the powders of pure Analytical Chemistry, Vol. 78, No. 17, September 1, 2006
6007
Figure 5. (a) Reliability curve for the tablet containing 25 wt % crystal form I. (b) The dissimilarity curve for the tablet. (c) The first PC of cluster 1 (solid line) and the pure spectrum of crystal form II (dotted line). (d) The first PC of cluster 2 (solid line) and the pure spectrum of crystal form I (dotted line). (e) The correlation map of the tablet where the gray scale represents the correlation coefficients with respect to the representative spectra of cluster 1. (f) The correlation map of the tablet with respect to the representative spectra of cluster 2.
crystal forms, the signal-to-noise ratio for the microspectra deteriorates appreciably and the difference between the pure crystal forms becomes less significant, implying that the identification of the polymorphs in the sample by Raman microspectra increases the difficulty in data analysis. Therefore, a multivariate image segmentation based on spatial directed agglomeration clustering was employed to aid the interpretation of the data obtained in the Raman mapping analysis. Selecting a proper critical value (Fcrit) plays a key role in the clustering process, as it controls the number of the clusters. If the Fcrit value is too small, the difference of any two spectra would be evaluated as significant. As a result, too many clusters would 6008
Analytical Chemistry, Vol. 78, No. 17, September 1, 2006
be obtained. On the contrary, if the Fcrit value is too large, one would find only a single cluster in which the data structure might be masked. In principle, one could determine the Fcrit value using the F statistic tables with a predefined level of significance. However, as the number of pixels in the current cluster is varying during the agglomeration, the degree of freedom of the F statistic is a variable. Moreover, the dimension of the spectra is large; thus, the F statistic value is generally absent in the F statistic tables and should be calculated by numeric integration. Then, it is difficult to calculate the corresponding F value in terms of a prescribed level of significance. To circumvent the difficulty and discover the clusters of chemical significance, the reliability curve
Figure 6. (a) Dissimilarity curve for the tablet containing 50 wt % crystal form I. (b) The correlation map of the tablet where the gray scale represents the correlation coefficients with respect to the representative spectra of cluster 1. (c) The correlation map of the tablet with respect to the representative spectra of cluster 2.
that plots the cluster number versus the Fcrit value was utilized to determine a proper Fcrit value. The proper Fcrit can be determined to be a certain value at the longest level section in the reliability curve throughout the Fcrit value range of 1.5-4.0. This range was chosen according to extensive simulation and experimental data analysis. The Fcrit value range can also be extended without substantial effect on the data analysis results. Actually, one can infer that use of Fcrit smaller than 1.5 would not give a significant plateau, since the cluster number varies appreciably with the Fcrit value in cases where it is small. By using a large Fcrit value, all clusters would merge into a big one. Therefore, even if the Fcrit range is extended, one could obtain very similar results, provided the plateau associated with one cluster is excluded due to the absence of useful information. The spatial directed agglomeration clustering was implemented for the segmentation of Raman mapping data measured on the tablet containing 25 wt % form I. To access the number of components in the sample, a reliability curve plotting the cluster number versus the varying values of Fcrit was computed. As depicted in Figure 5a, this curve includes several flat sections where the cluster number is relatively stable with respect to the values of Fcrit. Generally, such flat sections imply that the corresponding clustering is statistically significant, because the length of the flat section could serve as a measure of the width of the blank area spacing different clusters. Then, the longest flat section is associated with the clustering with the resulting clusters maximally separated. Though it was possible to interrogate all the significant clustering results, which might provide a multiresolution insight into the data structure of the multivariate images, in the present study, we would inspect merely the clustering corresponding to the longest flat section. It was also found that, provided the Fcrit value is taken at the longest plateau, it did not influence the results significantly. Therefore, one could choose any value at the longest plateau for the Fcrit value. For simplicity of reading the value, we would take the outset value of the plateau for Fcrit value throughout the study. With Fcrit taking the value of 1.9 that corresponds to the outset of the longest flat section, the shift of the dissimilarity values during the agglomeration clustering process is shown in Figure 5b. As can be seen from Figure 5b, there are two leaps in the dissimilarity curve, one at the 607th pixel and the other at the 805th pixel. This indicates that the agglomeration of the first cluster ends with 607
pixels, while the second cluster terminates the merging of pixels as 198 remaining pixels are allocated into it. One also observes that there are several needlelike parts along the dissimilarity curve. These parts are attributed to the merging of a pixel in a separate segment in the image into the current cluster that may induce a sharp decrease in dissimilarity. The spectra for each cluster were then subjected to PCA to extract the first PC, giving the representative spectrum of the cluster. Panels c and d in Figure 5 depict the PCs for different clusters. It is clear that the first PC of cluster 1 gives an excellent approximation to the pure spectrum of crystal form II, indicating that pixels allocated into this cluster could be identified as crystal form II. Similarly, one observed that the pure spectrum of crystal form I coincides with the representative spectrum of cluster 2, evidencing the identity of cluster 2 as crystal form I. The third cluster consists of ∼90 pixels, which shares ∼10% of the total pixels under consideration. The spectra within the third cluster differ from each other substantially and exhibit the spectral characteristics of both polymorphs. This means that the third cluster is attributed to the mixture of these two crystal forms. As a matter of fact, it can be observed from Figure 3 that the first cluster, i.e., the spectra of crystal form II, is distributed in the right side, the second cluster of the spectra of crystal form I in the left side, and the third cluster of mixture spectra located between them, which coincide with the geometry of the bilinear data with substantial model errors. The spectra among the fourth cluster are contaminated to such a degree with the measurement errors that the spectral characteristics were severely distorted, which could be considered as the outliers. To visualize the spatial distribution of different polymorphs over the tablets, the correlation coefficients between the spectra at all pixels and the respective representative spectrum of each cluster, i.e., each crystal form, were calculated and plotted. Panels e and f in Figure 5 show the correlation coefficient maps of the multivariate image with respect to the representative spectrum of each crystal form. It is observed in Figure 5e that most of the area in the correlation coefficient map with reference to the representative spectrum of crystal form II is colored black with relatively high correlation. This implies that the black area displays the spatial distribution of crystal form II, and the tablet composition is dominated by crystal form II. The area colored white is less correlated to the representative spectrum of crystal form II, meaning that the white area represents the spatial distribution of Analytical Chemistry, Vol. 78, No. 17, September 1, 2006
6009
Figure 7. (a) Dissimilarity curve for the tablet containing 75 wt % crystal form I. (b) The correlation map of the tablet where the gray scale represents the correlation coefficients with respect to the representative spectra of cluster 1. (c) The correlation map of the tablet with respect to the representative spectra of cluster 2.
Figure 8. (a) Dissimilarity curve for the tablet containing 100 wt % crystal form I. (b) The correlation map of the tablet where the gray scale represents the correlation coefficients with respect to the representative spectra of cluster 1.
Figure 9. (a) Dissimilarity curve for the tablet containing 0 wt % crystal form I. (b) The correlation map of the tablet where the gray scale represents the correlation coefficients with respect to the representative spectra of cluster 1.
crystal form I. In contrast, Figure 5f is almost inverse of the color in the correlation coefficient map due to the alternation of the reference spectrum. Similar analysis was performed on the tablets containing 50, 75, 100, and 0 wt % crystal form I. The dissimilarity curves during the corresponding clustering processes are shown in Figure 6a, Figure 7a, Figure 8a, and Figure 9a, respectively. It is observed that there are several leaps in the curves that correspond to the number of clusters in the image data. By inspecting the representative spectrum for each cluster, one could determine the chemical identity of the clusters. Correlation maps of the spectra 6010
Analytical Chemistry, Vol. 78, No. 17, September 1, 2006
in the image with reference to the chemically significant representative spectrum of an individual cluster are also shown in Figure 6b and c, Figure 7b and c, Figure 8b, and Figure 9b. Note that these gray scales are of considerable importance in the interpretation of the correlation maps. It can be observed that the gray scales in the correlation maps of mixture tablets vary from 0.86 to 1.00, while those for the pure crystal forms only show insignificant variation between 0.97 and 1.00. Only in the cases where the gray scales are significantly large, the gray areas could represent the spatial allocation of individual component. As one sees in Figure 6b and Figure 7b, the black areas signify the
distribution of crystal form I in the imaging domain. On the contrary, the gray in Figure 8b and Figure 9b is of little chemical significance, which might be entirely dependent upon the measurement noise. According to the qualitative results obtained in agglomeration clustering, one could further quantitatively ascertain the content of two polymorphs in each tablet. Provided that the distribution of different crystal forms is adequately uniform and the imaging area is much larger than the geometric dimensions of the crystal particles, then the surface volume occupied by each crystal form is proportional to its volume percent. In other words, the volume percent of each component in the tablet could be estimated according to the number of pixels allocated into the clusters of different polymorphs. Nevertheless, considering that imaging area is very limited such that the distribution of different components could not be sufficiently uniform in the imaging area, a reliable method for the composition determination is to repeat the Raman mapping experiments in multiple areas of the tablets and obtain an average estimate of the contents. In the present study, the Raman mapping experiments were repeated in five different areas of the tablets, and each mapping area was of ∼90 × 90 µm2. To avoid the effect of rearrangement in tablet handling, the powder of two crystal forms should be agitated sufficiently such that the distribution of two components could be uniform enough. Based on this method, the volume percent of crystal form I in the tablets containing 25, 50, and 75 wt % crystal form I was estimated to be 27.1, 52.0, and 75.6%, respectively. With reference to the density of different polymorphs, i.e., the density of crystal form I and II is 1.31 and 1.27 g/mL, respectively, the weight percent of crystal form I in the corresponding tablets was then calculated, as shown in Table 1. It is clear that these estimates of weight percent are in good agreement with the actual values, demonstrating that the clustering of the multivariate image was capable of offering a quantitative determination of the contents of both polymorphs. This also provides confirmative evidence that the proposed MISenhanced Raman mapping technique is a viable approach for characterizing the microscopic spatial distribution of multiple drug polymorphs. To further confirm the experimental results, the allocation of the pixels was validated using the SIMCA method20 with a training set of pure component spectra. In the validation, 20 spectra were collected for each crystal form to construct the training set. An individual soft model for each crystal form was obtained with the first principal component of these 20 spectra. Given a significance level of 5%, the F statistic value for SIMCA discrimination was 1.13. Then, the spectrum at each pixel was discriminated one by one using the SIMCA method. Such validation uses a supervised learning algorithm to identify the crystal forms, which might offer some reliable supporting information for the obtained results. To demonstrate the advantage of the developed MIS method, the K-means algorithm-based clustering was also implemented in the analysis of the multivariate image data. For comparison, the number of clusters in K-means clustering was selected to be consistent with that used in the proposed method. Because the performance of K-means clustering is strongly dependent on initialization of the cluster centers, the K-means algorithm was run 10 times and the best results were reported. The analysis
results of K-means clustering were also summarized in Table 1. As can be seen, the proposed spatially directed clustering method gave more accurate estimates for the contents of different polymorphs than the K-means algorithm. In addition, the SIMCAbased validation of the results revealed that the K-means clustering of spectra for the tablets containing 25 wt % crystal form I revealed that 35 spectra with obvious spectral characteristics of crystal form I were allocated into the cluster associated with crystal form II, and 45 spectra characteristic of crystal form II were misclassified as the cluster corresponding to crystal form I. In contrast, for the developed MIS method, there were merely 12 spectra of crystal form I misclassified spectra as crystal form II, and the number of spectra of crystal form II incorrectly allocated into crystal form I was 17. These results suggested that the proposed method could offer superior performance to the K-means algorithm. This might be due to the fact that the K-means algorithm did not take the relationship between pixels in the spatial domain into account, which seemed to play a key role in the treatment of multivariate image data.
(20) Wold, S. Pattern Recognit. 1976, 8, 127-139.
AC0520902
CONCLUSION The present study demonstrated the application of Raman mapping analysis of tablets containing chloramphenicol palmitate polymorphs with the aid of a proposed multivariate image segmentation approach. This new approach incorporated the spatial correlation into the agglomeration clustering, which provide the possibility of improving the stability to measurement errors and avoiding poorly defined image segmentation arising from clusters with highly separate pixels. The results verified that the proposed approach gave desirable image segmentation even in the case where the spectra of two components are very similar to each other, and the resultant performance was superior to that obtained using the K-means algorithm. Furthermore, much insight into the polymorph tablets was also achieved with the data analysis enhanced Raman mapping technique. The clustering of spectra into different groups revealed the presence of various polymorphs in the tablets. Chemical identity of the clusters was then evidenced by the representative spectra. Quantification of the content of identified components could also be implemented based on the counting of the pixels associated with the component. With the construction of the correlation maps between the pixel spectra with respect to the representative spectra, one could evaluate visually the spatial distribution of the components in the tablets. Furthermore, we have used the experiments with drug tablets of varying weight percents and a supervised learning method to confirm these results. This approach was expected to be a promising tool for rapid microscopic characterization of samples in various chemical processes. ACKNOWLEDGMENT The work was financially supported by National Natural Science Foundation of China (Grants 20375012, 20435010, 20205005) and Ministry of Education (NCET-04-0768).
Received for review November 28, 2005. Accepted May 10, 2006.
Analytical Chemistry, Vol. 78, No. 17, September 1, 2006
6011