Online Estimation of Diastereomer Composition Using Raman

Oct 9, 2008 - The estimation is obtained through a partial least square (PLS) regression model that utilizes online Raman spectroscopy and additional ...
4 downloads 6 Views 2MB Size
Online Estimation of Diastereomer Composition Using Raman: Differentiation in High and Low Slurry Density Partial Least Square Models Sze-Wing Wong,† Christos Georgakis,*,† Gregory D. Botsaris,† Kostas Saranteas,‡ and Roger Bakale‡

CRYSTAL GROWTH & DESIGN 2008 VOL. 8, NO. 12 4398–4408

System Research Institute for Chemical & Biological Processes and Department of Chemical & Biological Engineering, Tufts UniVersity, Medford, Massachusetts 02155, and Chemical Process Research and DeVelopment, Sepracor Inc., Marlborough, Massachusetts 01752 ReceiVed December 6, 2007; ReVised Manuscript ReceiVed February 28, 2008

ABSTRACT: This paper addresses the estimation of fractional solid composition of two diastereomers during crystallization. The estimation is obtained through a partial least square (PLS) regression model that utilizes online Raman spectroscopy and additional process information such as temperature and slurry density. Twelve PLS models were constructed with the same 95 calibration standards. The models differ from each other on whether they model all the data or that from one of two subsets and on whether they involve temperature or slurry density or both along with the spectral data. It is shown that in situ Raman spectroscopy is capable of differentiating diastereomers in a crystallization slurry, provided the changing process parameters of temperature and slurry density are also included in the estimation. Furthermore, models developed from one of the two subsets, which were classified by high and low slurry density, were more accurate than the corresponding model developed from the whole data set.

1. Introduction The production process of an active pharmaceutical ingredient (API) often necessitates the separation of enantiomers, which are chiral molecules, mirror images of each other. Since the physical properties of both enantiomers (R ) right-handed and S ) left-handed) are the same, a traditional separation method such as crystallization by seeding with only one of the two is feasible, but it becomes very sensitive to the experimental conditions.1,2 Industrial practice usually relies on the achiral synthesis of the enantiopure product or the reaction of the enantiomers with another chiral acid or base to produce two different diastereomers. Diastereomers are molecules that have two or more chiral centers but are not mirror images of each other (Figure 1). The resulting diastereomers have different physical properties such as solubility and often crystallize into different crystal structures. With these differences, crystallization can effectively separate the desired product with very high purity.3,4 The traditional analytical methods determining the diastereomer composition in both solid and liquid phase are chiral HPLC or polarimetry to determine the chirality in the solution phase. However, both of these analytical methods cannot be conducted online and require sampling and sample preparation. On the other hand, online Raman spectroscopy is suitable for this application since Raman can detect the lattice vibrations corresponding to the translational and rotational motion of the entire molecule within the lattice structure of the crystal.5 As a result, Raman spectroscopy is capable of differentiating similar molecules with different crystal lattice structures. The ability for online monitoring of the optical purity of the crystals present at a given time will greatly enhanced the understanding of crystallization and will enable one to significantly improve its operation. Several authors have demonstrated the ability of monitoring online the changing compositions of two different * Corresponding author. Tel: 617-627-2573. Fax: 617-627-3991. E-mail address: [email protected]. † Tufts University. ‡ Sepracor Inc.

Figure 1. Relationships diagram between enantiomers and diastereomers. R and S denoted the handedness of one of the two chiral centers.

crystals during a solvent-mediated polymorphic transformation with Raman.6-10,15 However, there is no previous work reported in the literature using Raman to differentiate the composition of the two diastereomers in crystallization. Furthermore, chemometric techniques can be applied to the Raman spectra to differentiate slight peak shifts and to remove noise from the signals off-line.11-13 The use of fiber optics to collect data through an immersion probe allows analysis of solid phase composition in real-time. The collected Raman signal depends on the amount of inelastic scattering from the solids that are detected by the analyzer within the detection zone. As a result, the relative Raman intensity corresponding to the diastereomers in the slurry will be impacted by a number of solid-state factors besides the primary one of particle characteristics. Several authors have suggested that the intensity of the returned Raman signal during crystallization of different polymorphs may be a function of particle size and shape.7,14 This is based on the assumption that the returned Raman signal primarily comes from the surface of the crystal. Additionally, slurry density defined as the total mass of crystals per unit volume (mg/mL) may yet be another solid-state factor since the number of crystals inside the detection zone will influence the intensity of the returned Raman signal. Furthermore, the Raman spectrum will be affected by the relative amount of the solvent and solids in its path. Consequently, slurry density should impact the returned Raman signal intensity. A

10.1021/cg701201x CCC: $40.75  2008 American Chemical Society Published on Web 10/09/2008

Online Estimation of Diastereomer Composition

Crystal Growth & Design, Vol. 8, No. 12, 2008 4399

Figure 2. Polarized-light microscopy of (left) (R,R)-AD agglomerate and (right) (S,R)-AD.

recent study15 showed that Raman spectra along with multivariate analysis were capable of predicting both the slurry density and percent composition of an anhydrous/monohydrate system. However, the authors did not take into account whether temperature has any affect on the Raman spectra in the prediction model because all the experiments were run at the same temperature. For a batch cooling crystallization process, temperature changes constantly, and it affects slurry density directly since the solubility of the solute is a function of temperature. In addition, temperature can alter Raman band areas by changing the sample density and therefore the number of analyte molecules in the sampled volume.16 Hence, in the present work, the first objective is to examine whether information provided by Raman spectroscopy is sufficient or whether it needs to be complemented by additional process measurements in order to provide an accurate estimation, through a partial least square (PLS) model, of the solid weight composition of one of the two diastereomers involved in the production of an API, denoted here as compound A. It can be observed (Figure 2) that both diastereomers have distinct particle sizes and shapes and this is an ideal system to test the above hypothesis that Raman spectroscopy can provide valuable information. The selection of process variables was based on the cooling crystallization procedure of compound A (a Sepracor proprietary compound) since the changing process parameters and percent composition of the diastereomers in the solid phase would affect the peak position and peak intensity. Temperature, slurry density, and the Raman spectrum were the variables selected in our modeling task. Partial least square regression (PLS) was the model structure used to estimate the composition of the solid diastereomeric mixture. The second objective was to examine whether dividing the data to subsets would allow us to build models that better represent data variations. Principal component analysis (PCA) was used to examine the overall data, identify two distinct clusters, and thus separate them into two different subsets that appeared to correspond to low and high slurry densities. Twelve different PLS models were developed from either the whole data set or the subsets of the training groups. They are compared for their estimation accuracy.

2. Experimental Section 2.1. Materials. HPLC grade solvents were used as received from commercial suppliers without further purification. The starting materials (racemic mixture of compound A and a pure enantiomer of a chiral acid, denoted here as D) that met the specifications were used as received from qualified suppliers without further purification.

2.2. Preparation of Pure (S,R)-AD Diastereomer. A racemic mixture of compound (R,S)-A was reacted with (R)-D (a pure enantiomer of the chiral acid) in a solvent mixture of well-defined composition, and the solution was heated and held at 5 °C above the saturated temperature to ensure complete dissolution. The solution was then slowly cooled and seeded with 2% by weight of the (S,R)-AD diastereomer at the specified seeding temperature. The seeded slurry was cooled to a target solid recovery temperature. It was then followed by a filtration wash and drying step. The end product was analyzed by a chiral HPLC method resulting in optical purity of at least 97% of the (S,R)-AD diastereomer. 2.3. Preparation of Pure (R,R)-AD Diastereomer. The preparation of pure (R,R)-AD diastereomer, needed for the calibration process, first involved the purification of the (R)-A enantiomer from the racemic mixture of compound A, followed by reaction with (R)-D to form the (R,R)-AD diastereomer. Since the crystallization kinetics of the (R,R)AD diastereomer is slow compared with the (S,R)-AD diastereomer, the purification step of the (R)-A enantiomers would ensure the optical purity of the (R,R)-AD diastereomer. For the purification step, a racemic mixture of compound A was reacted with another chiral acid, denoted as (S)-E (a pure enantiomer), in solvent, and the solution was heated and held at 5 °C above the saturated temperature to allow for complete dissolution. The solution was then slowly cooled and seeded with 2% by weight of the (R,S)-AE diastereomer at a specified seeding temperature. The seeded slurry was cooled to a target isolation temperature, followed by a filtration and drying step. The dry (R,S)AE crystals would then go through a free basing step to obtain the pure (R)-A enantiomer. Finally, the pure (R)-A enantiomer was reacted with (R)-D and crystallized to form the (R,R)-AD diastereomer. The product was analyzed by a chiral HPLC method, revealing in optical purity greater than 98%. 2.4. Raman Spectroscopy. A RamanRxn1 analyzer (Kaiser Optical System, Inc.) coupled with an immersion fiber optic probe was used for the in situ measurements. Raman spectra were recorded using NIR excitation radiation at 785 nm, and the spectroscopy incorporates the TE-cooled CCD detector technology. The Raman detector was set to have an exposure time of 5 s with five accumulation spectra. Each recorded Raman spectrum was the average of the five accumulation spectra in 1 min intervals. 2.5. Calibration Experiments. In order to obtain spectra of a known amount of solids in suspension (slurry density) and percent composition of the (S,R)-AD diastereomer in solid phase, the solution was first presaturated with respect to both diastereomers at specified temperatures (Table 1). The saturated solution was prepared by adding excess amounts of a racemic mixture of compound A (50% S and 50% R) and (R)-D in the solvent system, and it was heated until dissolution. After nucleation occurred upon cooling, the slurry was under constant stirring for 4 h to ensure it reached equilibrium with both diastereomers at the specified temperature, and the process was finished by a filtration step. The saturated solution was kept in a jacketed round-bottom flask to maintain constant temperature. Spectra of the standards were obtained for isothermal mixtures. The spectra were collected at different temperatures (from 0 to 40 °C) and with different slurry densities (from 13.3 to 80 g/L) that reflected the

4400 Crystal Growth & Design, Vol. 8, No. 12, 2008

Wong et al.

Table 1. Experimental Condition for Standards

in the calibration experiment is a compromise between ensuring a representative sampling of the slurry and adding a condenser to reduce solvent evaporation during the crystallization experiments. 2.6.1. Experiment 1: Unseeded Crystallization, Low Slurry Density. A total of 5 g of compound (R,S)-A (racemic mixture) was reacted with 1.7 g of (R)-D in 238 mL of solvent mixture. The solution was first heated to 40 °C and held for 20 min to allow for complete dissolution. It was then cooled to 0 °C at a rate of 5 °C/min, and afterward the temperature remained constant at 0 °C for 4 h. Since the crystallization kinetics of the (R,R)-AD diastereomer is much slower than the (S,R)-AD, a rapid cooling rate was applied in attempt to crash out both diastereomers simultaneously. After nucleation occurred, samples were drawn sequentially for HPLC analysis until the end of experiment. A total of six samples were drawn throughout the experiment. 2.6.2. Experiment 2: Unseeded Crystallization, High Slurry Density. A total of 10 g of compound (R,S)-A (racemic mixture) was reacted with 3.4 g of (R)-D in 238 mL of solvent. The solution was first heated to 56 °C and held for 20 min to allow for complete dissolution of the higher initial concentration. It was then cooled to 0 °C at a rate of 5 °C/min, and the temperature remained constant at 0 °C for 4 h. Samples were drawn for HPLC analysis, after nucleation occurred, sequentially until the end of experiment. A total of six samples were drawn throughout the experiment. 2.6.3. Experiment 3: Seeded Crystallization Experiment with (S,R)-AD Diastereomer. A total of 10 g of compound (R,S)-A (racemic mixture) was reacted with 3.4 g of (R)-D in 238 mL of solvent. The solution was first heated to 56 °C and held for 20 min to allow for complete dissolution. It was then cooled at a rate of 0.1 °C/min to 46 °C and then seeded with 2% by weight of the (S,R)-AD diastereomer. After seeding, the solution was cooled at 1 °C/min to 0 °C; the use of a slower cooling rate aims to minimize nucleation of the (R,R)-AD diastereomer. After secondary nucleation occurred, samples of the slurry were drawn for HPLC analysis until the solution temperature reached the final set point. A total of seven samples were drawn throughout the experiment. 2.6.4. Experiment 4: Seeded Crystallization Experiment with (R,R)-AD Diastereomer. Due to the slower crystallization kinetics of the (R,R)-AD diastereomer, the experiment was carried out in an (R)-A enriched solution environment. The (R)-A enriched solution was recovered from a (S,R)-AD seeded crystallization in which the recovered mother liquor contained around 80% of the (R,R)-AD diastereomer. The recovered mother liquor then went through a recovery process to obtained the (R)-A and (S)-A enantiomers. The recovered crystals were compound A with optical purity of 80% of the R enantiomer. Ten grams of the recovered compound A (R-enriched) was reacted with 3.4 g of (R)-D in 238 mL of solvent. The solution was first heated to 56 °C and held for 20 min to allow for complete dissolution. It was

fixed parameter fixed parameter 20 °C 40 °C 30 °C 10 °C 0 °C 0 °C 0 °C 15 °C 15 °C 15 °C 5 °C 5 °C 5 °C 5 °C 10 °C 10 °C 10 °C 10 °C

13.3 g/L 13.3 g/L 13.3 g/L 13.3 g/L 13.3 g/L 33.3 g/L 20 g/L 26.7 g/L 40 g/L 53.3 g/L 50 g/L 60 g/L 70 g/L 80 g/L 50 g/L 60 g/L 70 g/L 80 g/L

variable parameter

no. of samples

0-100% (S,R)-AD 0-100% (S,R)-AD 0-100% (S,R)-AD 0-100% (S,R)-AD 0-100% (S,R)-AD 80-95% (S,R)-AD 70-100% (S,R)-AD 50-100% (S,R)-AD 65-85% (S,R)-AD 80-100% (S,R)-AD 15-95% (S,R)-AD 15-95% (S,R)-AD 15-95% (S,R)-AD 15-95% (S,R)-AD 5-95% (S,R)-AD 5-95% (S,R)-AD 5-95% (S,R)-AD 5-95% (S,R)-AD

16 6 6 6 8 4 4 3 3 3 5 5 5 5 5 5 5 5

range of the crystallization process conditions. The Raman probe was inserted top-down into the 15 mL vial with constant volume of the saturated solution and with a known amount of solids (Figure 3a). All the spectra were collected under constant stirring with a magnetic stir bar to suspend the slurry. A total of 95 standards were used with varying conditions (Table 1) and were then divided into two groups, training and testing groups. The training group(s) was used to construct the PLS model, while the testing group(s) was used to test the accuracy of the model. The testing group(s) was randomly selected samples at different conditions and on different experiments to cover the whole experimental space. 2.6. Dynamic Crystallization Experiments. Four batch crystallization experiments were performed with different operating conditions to test the validity and the limits of the prediction model in estimating the percent of diastereomers in the solid phase. Two of the crystallization experiments were unseeded and have different initial solute concentration and hence represent different changes in the slurry density of the systems. The other two crystallization experiments were seeded and had the same initial solute concentration but were designed to cover the extreme ranges of the percent diastereomers by seeding with one of the two diastereomers. All the crystallization experiments were carried out in the same crystallizer, a 250 mL jacketed round-bottom flask that was under constant stirring with a magnetic stir bar, and the jacket was connected to a temperature-controlled chiller. The Raman probe and the thermocouple were inserted at a 45° angle at opposite sides of the reaction flask, and the spectra were collected at 1 min intervals (Figure 3b). The reason for placing the probe at a 45° angle instead of top-down as

Figure 3. Equipment set up (left) in the calibration experiment and (right) in the crystallization experiment.

Online Estimation of Diastereomer Composition

Crystal Growth & Design, Vol. 8, No. 12, 2008 4401

Figure 4. Score plot of PC2 vs PC1 (PC1 captures 91% and PC2 captures 5% of data variance). The dotted set (---) incorporated all the low slurry density data point and the solid line set (s) includes all the high slurry density data. The numbers indicate the slurry density of each data point. then cooled at a rate of 0.1 °C/min to 46 °C and seeded with 2% by weight of the (R)-D diastereomer. After seeding, the solution was cooled at 1 °C/min to 0 °C and remained constant at 0 °C for 6 h. After secondary nucleation occurred at 0 °C, samples of the slurry were drawn for HPLC analysis until the end of the experiment. A total of seven samples were drawn throughout the experiment. 2.6.5. Sample Analysis. The collected samples were first filtered, and the mother liquor was submitted for HPLC analysis for total solute concentration and percent composition of the two diastereomers in solution phase. The wet cake did not go through solvent wash to avoid dissolution of the crystals during the wash. The cake was then weighed and vaccuum-dried at 40 °C overnight. The dry cake was again weighed, and the solid contents were again submitted to a chiral HPLC analysis for percent composition of the diastereomers in solid phase. Since the evaporated solvent contained residual of both diastereomers in the solution phase, the amount of solute in the evaporated solvent was calculated with the weight difference of the cakes and multiplied by the concentration of solute in solution from HPLC analysis. The relative composition of the evaporated residual diastereomers from the solvent was obtained from the HPLC result in the mother liquor samples. The percent composition of the (S,R)-AD diastereomer was then corrected with the residual solute from the solution phase. 2.7. Sampling Error. Since the HPLC analysis for the percent composition of the diastereomers in solid phase was conducted after drying, the sampling error was due primarily to inhomogeneous mixing of the dry cake. The estimation of the sampling error was conducted by mixing a known amount of both diastereomers in a presaturated solution and following the same filtration and drying steps as described above. Six different samples were drawn from different locations of the dry cake, and the standard deviation of the sampling error was calculated. The procedure was repeated five times with different compositions of both diastereomers, and the average standard deviation was estimated to be (6%. 2.8. Data Pretreatment. Data pretreatment using the calculation of the first derivative of the spectral measurement with respect to the spectral frequencies was performed on the whole Raman spectrum to correct any possible baseline offset from the slurry. It can be observed that there is slight baseline drifting from the raw Raman spectra (Figure 5) and using first derivative was sufficient to remove the baseline offset. The first derivative of all the spectra was computed with the Savitzky-Golay method using second-degree polynomial fit and an 11-point window.14 It was found that the 11-point window was sufficient to achieve a high level of smoothing without attenuating the extrema in the data. The higher order derivative did not show any significant improvement in model accuracy, and hence the first derivate of the spectral measurement was chosen as the only data pretreatment step.

The data structure, Dj (j ) 1,2,3,4), of each of the four PLS models was applied to either the whole training group or one of the two subsets within the training group that is composed of different combinations of measurements such as the entire Raman spectrum (spectrum range 75-3300 cm-1), temperature, or slurry density as shown in eqs 1-4. The percent composition of the diastereomers was the output variable of each of the PLS models.

D1 ) [spectra(S)]

(1)

D2 ) [temp(T)spectra(S)]

(2)

D3 ) [slurry density(D)spectra(S)]

(3)

D4 ) [temp(T)slurry density(D)spectra(S)]

(4)

In order to compare the contribution of each input variable equally, temperature (T), slurry density (D), percent diastereomer composition (P), and spectra (S) were first scaled to have zero mean and variance equal to one as shown in eqs 5-10.

Tˆi )

Ti - Tavg σT

(5)

ˆi) D

Di - Davg σD

(6)

Pˆi )

Pi - Pavg σP

(7)

∫ fi(w) dw

(8)

Fi )

n

Favg )

∑ Fi i)1

n fi(w) - Favg S˜i ) σF

(9) (10)

ˆ , Pˆ, and Sˆ represented the scaled values, obtained by Here Tˆ, D subtracting the sample average values (Tavg, Davg, Pavg, and Favg) and dividing by the corresponding sample standard deviation (σ). The spectral data were scaled slightly differently because they are a distribution over the frequency range, but in the same spirit. The scaling was done using the average value Favg and the standard deviation σF of the area Fi under the spectral curve fi(w); see eqs 8-10. It should

4402 Crystal Growth & Design, Vol. 8, No. 12, 2008

Wong et al.

Figure 5. Graphical representation of residual and leverage projection onto the PCA model.17 be noted that fi(w) is the spectrum function with respect to the wavenumber w that uniquely corresponds to the spectral frequency.

3. Estimation Model 3.1. Principal Component Analysis. PCA is a useful data analysis tool that enables us to evaluate the weak or strong crosscorrelation among the measured variables, and to enable the dimensionality reduction of the data.18 The analysis results in projecting the data onto a new space defined by the calculated principal components (PCs). These PCs are linear combinations of the original measurement variables from the data set. Because the majority of the data variability can be effectively represented by a much smaller number of PCs than the number of the original variables, one thus obtains a substantial reduction in dimensionality of the original data. The use of PCA on the total data set, with every available measurement, provides information about the clustering or lack of clustering of the data to possible subsets. It provides information whether two local linear submodels might be more accurate than a single overall linear model in representing a possibly nonlinear process. This can also be useful in the development of the PLS models. The PCA analysis showed that only two PCs were needed to represent 96% of the data variability. The score plot (Figure 4) showed how the original data points project onto the two PCs. This plot indicated how samples are related to each other and clearly reveals that there are two clusters. The dotted circle groups data with slurry density between 13 and 40 mg/mL, while the solid line circle groups data with slurry density between 50 and 80 mg/mL. These clusters suggested two different linear models, one for each cluster, might be needed to represent the data more accurately than a single linear model for all the data. 3.2. Partial Least Square Model. The PLS modeling approach makes use of inverse calibration, which does not require a reference spectrum of the pure components.18 In order to investigate whether one or two linear model(s) should be used, the original calibration data set of the standards was separated as follows:

TG-All ) 80 data points Test-1 ) 15 data points TG-LD ) 46 data points Test-2 ) 11 data points TG-HD ) 40 data points Test-3 ) 4 data points

(11) (12) (13) (14) (15) (16)

The first training group (TG-All) included all the data points except the randomly selected 15 data points in the testing group

(Test-1) to be used for model testing. The testing groups included standard points taken from different days of the experiment with varying experimental conditions to ensure they are representative of later samples. In addition, the data points from the testing groups were set aside during model development. The second training group (TG-LD) included data points with slurry density between 13.3 and 40 mg/mL, and TG-HD included data points with slurry density between 50 and 80 mg/ mL. The second and third testing groups had the same data points as Test-1 but further separated into two groups according to slurry density. Since there were three training groups and within each group there were four PLS models (eqs 1-4), 12 PLS models were developed and compared in terms of prediction accuracy from the test data to test the prediction accuracy. 3.3. SIMCA. In order to find out whether one or two linear model(s) has better representation of the data set, the PLS models from TG-All and the PLS models from either TG-LD or TG-HD would be compared with their prediction accuracy of the dynamic crystallization experiments. For the above dynamic crystallization experiments, a decision needs to be made in whether to use the PLS model from the TG-HD or the TGLD data. This is resolved by the SIMCA (soft independent modeling of class analogies) test, which is applied in order to classify which subset of the training group data was more similar to the data of the dynamic crystallization experiments by calculating the sample-to-model distance for the two possible models. SIMCA uses PCA to model the shape and position of the new data from each of the two training groups.17-19 Once the PCA models for TG-LD and TG-HD are developed, the new samples from the crystallization experiments are projected onto each of the model spaces and the sample-to-model distances were calculated. The model with the shortest sample-to-model distance is chosen and is compared with the PLS models from TG-All. The sample-to-model distance calculation (eq 19) takes into account the sample residual, Q, and the sample leverage, T2, with respect to each model (Figure 5). The sample residual indicates how well the sample conformed to the PCA model (eq 17), while the sample leverage indicates how far the sample was from the center of the PCA model (eq 18). Again, the selection between the two training groups was based on the shortest sample-to-model distance calculated from eq 19.

Qi ) xi(I - PkPkT)xiT

(17)

Ti2 ) xiPkΛk-1PkTxiT

(18)

di ) √Qi + Ti2

(19)

where xi is the sample, Pk is the loadings vector of the PCA model, and Λk is the eigenvalue associated with the k number of PCs retained in the model. The calculation was repeated for each PCA class model from both subsets.

4. Result and Discussions 4.1. Raman Spectra of Pure Diastereomers. The Raman spectra of the pure diastereomer in Figure 6 show that there was only a slight difference between them. The circled regions of the spectra highlight the slight peak shifts between the diastereomers. Hence, chemometric techniques need be employed to account for the subtle differences in the whole spectrum. In addition, the spectra of the pure (S,R)-AD diastereomer were compared at different temperatures (Figure

Online Estimation of Diastereomer Composition

Crystal Growth & Design, Vol. 8, No. 12, 2008 4403

Figure 9. RMSECV vs PC for the four PLS models from training group TG-All. Figure 6. Raman spectra of pure (R,R)-AD and (S,R)-AD solids at the same temperature and slurry density. The circled regions of spectra are the slight differences between the diastereomers.

Figure 7. Raman spectra of pure (S,R)-AD with the same slurry density at different temperatures.

Figure 8. Raman spectra of pure (S,R)-AD with different slurry densities.

7) and with different slurry densities (Figure 8). The relative intensity differs slightly, and there are no peak shifts observed due to temperature changes. However, the Raman spectra of samples with different slurry density showed differences in peak positions and peak shape. The denser sample shown in Figure 8 had more distinct peak shapes that resembled the Raman spectrum of pure solid. The appearance of the peak shapes was an indication that the Raman analyzer detected a higher amount of solids as slurry density increased. The Raman spectra of the standards and of the crystallization experiments had a baseline offset problem (Figure 6). Several data preprocessing techniques such as multiscatter correction

(MSC), standard normal variate (SNV), and differentiation with the Savitzky-Golay method were applied to the raw spectra.17,18 The result from the first derivative worked best in removing the baseline offset and thus was chosen as the only preprocessing step after scaling as described in section 2.7. 4.2. PLS Calibration Model. Twelve PLS models were developed to investigate whether the additional process measurements of temperature and slurry density would improve the accuracy of the estimation model.20,21 In addition, the question of whether two linear models (TG-LD and TG-HD) would provide higher accuracy than using a single model (TG-All) representing all the data will be answered. Each of the PLS models was developed from one of the four data matrices (eqs 1-4) that either neglect or account for the variability in the additional process variables and were repeated for each of the three training groups (TG-All, TG-LD, and TG-HD). All twelve PLS models were first tested with the corresponding testing groups (eqs 12, 14, and 16), and the model performance was further examined with the data from the dynamic crystallization experiments. It should be noted that each of the calibration experiments only offered a static view of the crystallization since solute concentration of both diastereomers remained constant throughout the experiment; the solute and crystals were in phase equilibrium. In contrast, in the dynamic crystallization experiments, the solute concentration of the diastereomers in the solution phase was constantly changing as nucleation and crystal growth occurred. The estimation models were calculated by regressing the data matrix of each training group with the corresponding percent composition of the (S,R)-AD diastereomer using partial least square (PLS) regression. Each training group was further divided into m subgroups with two data points per subgroup for crossvalidation. For example, there are 43 subgroups in the training group of TG-All, 23 subgroups in the training group of TGLD, and 20 subgroups in the training group of TG-HD. The cross-validation method used (m - 1) subgroups to build the model and tested with the last subgroup as validation step. The process was repeated m times with all combinations of subgroups as training and testing sets. The number of latent variables (LVs) selected for each PLS model was based on the cross-validation error E1, provided that the percent variance captured in the response data Y was more than 90%. The crossvalidation error E1 is defined by (eq 20).

E1 )

√(

n

)

∑ (yies - yiex)2 i)1

⁄n

(20)

It should be noted that it was more difficult to determine the appropriate number of LVs to be used in the PLS models that incorporated all the data (TG-All). The E1 in Figure 9 flattened

4404 Crystal Growth & Design, Vol. 8, No. 12, 2008

Wong et al. Table 3. Information from Four PLS Models from Low Slurry Density Data (TG-LD)

E1 % Y variance no. of LVs process variables

model 5

model 6

model 7

model 8

5.10 94% 8 S

6.64 96% 7 T, S

6.70 96% 6 D, S

6.18 97% 7 T, D, S

Table 4. Information from Four PLS Models from High Slurry Density Data (TG-HD)

Figure 10. RMSECV vs PC for the four PLS models from training group TG-LD.

Figure 11. RMSECV vs PC for the four PLS models from training group TG-HD.

E1 % Y variance no. of LVs process variables

model 9

model 10

model 11

model 12

7.74 91% 4 S

7.70 92% 4 T, S

7.90 92% 4 D, S

7.90 92% 4 T, D, S

data points from TG-LD (Figure 13) and TG-HD (Figure 14) all lie relatively close to the 45° line indicating small prediction error, while the model data points from TG-All (Figure 12) have larger scatterings along the 45° line suggesting larger prediction error. Furthermore, the regression coefficients of temperature and slurry density were reported in Table 5 to illustrate the relative weight that temperature and slurry density has on the estimation models. It is evidenced that the regression coefficients of slurry density weight relatively more than the regression coefficients of temperature, which shows quantitatively the importance of slurry density in the model. The last step in assessing the accuracy of the PLS models is to calculate the prediction error with the testing groups (eqs 12, 14, and 16) that were set aside during model development. All 12 PLS models were used to predict the percent composition

Table 2. Information from Four PLS Models from All Data (TG-All)

E1 % Y variance no. of LVs process variables

model 1

model 2

model 3

model 4

6.45 97% 11 S

7.28 96% 11 T, S

6.55 95% 11 D, S

4.47 97% 14 T, D, S

as the number of LVs increased with no clear minimum, and as a result, the number of LVs selected was based mostly on the percent Y variance captured. Finally, the relative percent error (E2i) and the root-mean-square, E3, were also used to compare the model performance of the different PLS models (eqs 21 and 22).

E2i )

|

(yies - yiex)

E3 )

yiex

√(

n

|

× 100%

)

∑ (E2i)2 i)1

⁄n

(21)

Figure 12. Predicted value of Y vs measured value of Y for model 4 from model with all data (TG-ALL).

(22)

In eqs 20 and 21, n is the number of the total data points used in the calculation of the cross-validation and crystallization experimental results; the superscripts, es and ex, are the model estimation and the experimental result of yi (% (S,R)-AD), respectively. From the results of the cross-validation, the number of latent variables was selected with the minimum error E1 (Figures 9–11) and the percent variance at least 90%. The detailed information of the models is summarized in Tables 2-4. In addition, the predicted values, yp, of Y (% (S,R)-AD) were plotted with the measured values, ym, of Y (% (S,R)-AD) in Figures 12-14 to check the accuracy of the models. It was observed that the model

Figure 13. Predicted value of Y vs measured value of Y for model 8 from low slurry density data (TG-LD).

Online Estimation of Diastereomer Composition

Crystal Growth & Design, Vol. 8, No. 12, 2008 4405 Table 7. Detail Information of the PCA Model

no. of PCs % variance

Figure 14. Predicted value of Y vs measured value of Y for model 12 from high slurry density data (TG-HD). Table 5. Regression Coefficients of Temperature and Slurry Density in Models 4, 8, and 12 model

temperature

slurry density

4 8 12

-4.41 1.545 1.523

-20.539 -25.885 -4.977

Table 6. Model Accuracy, E1, for Testing Data training group

model with S

model with T, S

model with D, S

model with T, D, S

TG-All TG-LD TG-HD

9.30 9.91 5.08

9.73 10.53 5.18

8.28 8.78 5.73

8.16 9.60 5.43

of the (S,R)-AD diastereomer with the corresponding testing group (Table 6). The E1 values calculated from the testing groups were similar to the E1 values obtained from the crossvalidation results (Tables 2-4), which indicated that there was no overfitting of the models. The result showed that the PLS models that incorporated slurry density or with slurry density and temperature along with Raman spectral data performed slightly better than the models using only spectral data with or without temperature data (Table 6). Although the trend of the model prediction error in TG-HD was slightly different from the other groups, the relative differences in the prediction error were well within the sampling error of 6%. 4.3. Dynamic Crystallization Experiment. In order to test the models accuracy and the limit of the estimation models, the PLS models were used to predict the percent composition of the (S)-D diastereomer in four dynamic crystallization experiments. While the standards from the calibration set were premixed slurries in 15 mL vials, the new crystallization experiment was a time-evolving or dynamic run in a 250 mL round-bottom jacketed flask. Samples were drawn sequentially at different time instances after the onset of nucleation (sample collection began after seeding in the seeded experiment) and were submitted for HPLC analysis. During the dynamic crystallization experiments, Raman and temperature data were collected and the slurry densities were calculated from the withdrawn samples by weight analysis. According to the present understanding of the specific experimental system examined in the study, the (R,R)-AD diastereomer has a wider metastable zone and a slower growth rate compared with the (S,R)-AD diastereomer. Besides assessing the accuracy of the different PLS models, the unseeded crystallization (experiments 1 and 2) experiments aimed to investigate whether the (R,R)-AD diastereomer would crystallize out simultaneously with the (S,R)-AD diastereomer in an

PCA-LD

PCA-HD

3 95

3 97

unseeded environment. The HPLC analysis revealed that as nucleation occurred, both diastereomers initially crystallized out. However, due to the slower crystallization kinetics of the (R,R)AD diastereomer, the percent composition of the (S,R)-AD diastereomer slowly increased over time. On the other hand, the (R,R)-AD and (S,R)-AD seeded crystallization experiments (experiments 3 and 4) were designed to get samples with wider ranges of the percent composition of the (S,R)-AD or the (R,R)AD diastereomer to verify the prediction models. 4.3.1. SIMCA Result. As stated earlier, SICMA is used as a mean to decide which linear model (TG-LD or TG-HD) should be applied to the crystallization experiments to answer the question of whether one or two linear model(s) better represent the data variance as shown in Figure 4. Two PCA models were developed for each of the training group (TG-LD and TG-HD). The number of PC determined for each model essentially defined the multidimensional boundary regions of the data variance for each model (Table 7). Since the crystallization experiments were conducted at a larger scale (250 mL) where the mixing environment was different from the calibration standards (15 mL), the sample projection onto the PCA space appeared to be further apart from the models. Thus, the model selection was based on the closest sample-to-model distance rather than setting a fixed boundary using F-test statistic. The sample-to-model distance was calculated for each sample from the experiments (eq 19). Figures 15-18 plotted the sampleto-model distance relative to each model for all four experi-

Figure 15. Sample-to-model distance plot for the low density unseeded crystallization experiment (experiment 1).

Figure 16. Sample-to-model distance plot for the high density unseeded crystallization experiment (experiment 2).

4406 Crystal Growth & Design, Vol. 8, No. 12, 2008

Wong et al. Table 10. E1 of PLS Models from TG-LD or TG-HD model (S) model (T, S) model (D, S) model (T, D, S) expt expt expt expt

1 2 3 4

(LD) (HD) (HD) (LD)

18.9 7.7 8.9 38.3

10.8 8.0 9.4 35.3

11.1 10.1 9.0 5.7

7.4 9.9 8.8 5.4

Table 11. E3 PLS Models from TG-LD or TG-HD model (S) model (T, S) model (D, S) model (T, D, S) expt expt expt expt

1 2 3 4

(LD) (HD) (HD) (LD)

Figure 17. Sample-to-model distance plot for the (S,R)-AD seeded crystallization experiment (experiment 3).

22.79 11.37 10.45 133.93

13.01 12.42 10.67 122.68

13.14 13.23 10.53 20.67

8.99 12.99 9.99 19.62

Table 12. Prediction Values and E2i for Experiment 1 model 2 (TG-All)

model 8 (TG-LD)

HPLC

predicted

% error

predicted

% error

73.8 82.0 88.5 87.5 88.0 82.8

105.1 96.0 78.7 98.7 92.6 94.9

42.4 17.1 11.1 12.7 5.3 14.6

64.9 73.9 79.0 80.0 82.2 80.4

12.1 9.8 10.8 8.6 6.6 2.9

Table 13. Prediction Values and E2i for Experiment 2 model 4 (TG-All)

Figure 18. Sample-to-model distance plot for the (R,R)-AD seeded crystallization experiment.

model 9 (TG-HD)

HPLC

predicted

% error

predicted

% error

65.6 72.6 60.3 67.5 74.1 81.7

4.7 21.2 14 18.8 34.1 26

92.9 70.8 76.7 72.2 54 68.2

73.9 70.5 72.0 67.4 68.1 71.1

12.6 2.8 19.3 0.1 8.2 12.9

Table 8. Model Selection (SIMCA Result) Exp 1

Exp 2



TG-LD TG-HD

Exp 3





model 1 (S) model 2 (T, S) model 3 (D, S) model 4 (T, D, S) 1 2 3 4

19.6 76.8 23.0 26.5

16.2 87.5 36.5 12.8

34.2 84.0 14.2 35.9

model 3 (TG-All)



Table 9. E1 of PLS Models from TG-All expt expt expt expt

Table 14. Prediction Values and E2i for Experiment 3

Exp 4

41.6 50.9 17.2 37.5

ments. It was clear from the plots which model should be applied to the particular experiment (Table 8). An interesting note on the (R,R)-AD seeded crystallization experiment (Figure 18) was that the low-density models (TG-LD) were preferred even when the starting solute concentration was the same as the (S,R)-AD seeded experiment (experiment 3) and the high-density unseeded experiment (experiment 2). The reason for that was because the crystallization kinetics of the (R,R)-AD was very different from the (S,R)-AD. Because of the slower crystallization kinetics and the different solubility in solvent mixture, the slurry density of the (R,R)-AD seeded experiment was lower than the others throughout the experiment. 4.3.2. PLS Models Estimation Result. The PLS model predictions were compared with the HPLC data as showed in Tables 9-15 in which E1 and E2i, along with E3, were used as means to compare the models. The first observation made in comparing models among the three different training groups (TG-All, TG-LD, and TG-HD), was from the E1 result in Tables 9 and 10. It was observed that the model predictions with two models (Table 10) corresponding to the lower or higher slurry density data were much more accurate compared with the model predictions from an overall model (Table 9). Furthermore,

model 12 (TG-HD)

HPLC

predicted

% error

predicted

% error

89.2 89.4 92.7 93.1 90.6 91.1 91.6

73.2 73.0 65.8 72.3 80.7 79.7 84.0

17.9 18.4 29.0 22.3 10.9 12.5 8.3

69.8 84.2 91.1 93.1 94.0 90.8 91.5

21.7 5.8 1.7 0.0 3.7 0.3 0.1

Table 15. Prediction Values and E2i for Experiment 4 model 2 (TG-All)

model 8 (TG-LD)

HPLC

predicted

% error

predicted

% error

24.3 29 28 26.5 29.9 31 34

37.2 29.3 26.0 23.8 11.1 15.4 14.7

53.2 1.0 7.2 10.2 62.9 50.2 56.9

22.6 31.5 23.3 37.6 30.9 38.2 33.7

6.9 8.7 16.7 41.7 3.3 23.3 0.9

Tables 12-15 compared the best prediction model in Tables 9 and 10 of each experiment. It is evident that the estimation models from two training groups have better prediction accuracy in terms of E1, E2i, and E3 (eqs 20-22). Second, the E1 results from the TG-All models in Table 9 suggested that one model did not sufficiently represent the data variance of both the high and low slurry densities clusters (Figure 4). This observation may be explained with the two different trends in model performance displayed in Table 9. On the one hand, the E1 values in experiments 1 and 4 indicated that the PLS model that incorporated temperature and spectra gave much better accuracy. According to the SIMCA result, the samples from

Online Estimation of Diastereomer Composition

Crystal Growth & Design, Vol. 8, No. 12, 2008 4407

5. Conclusion

Figure 19. Score plot of PC1 vs. PC2 in the PCA analysis of all standards data points (Table 1) and samples from experiment 2. The solid line circles the location of the samples, the dotted circle indication location of the high-slurry density data points, and the dashed circle groups the low-slurry density data points.

experiments 1 and 4 have closer relation to the low slurry density calibration data points (Table 8). On the other hand, the result from experiments 2 and 3 followed a different trend in that the model that incorporated at least slurry density as one of the inputs has better performance and that both experiments have closer resemblance to the high-density calibration data points (Table 8). This inconsistency in the model performance trend suggested that using one linear model was clearly insufficient to model all the data variances. The superiority of using two models (TG-LD and TG-HD) that also include slurry density and temperature as inputs is clearly demonstrated (Table 10). When nucleation occurred or as the seeds were added in the supersaturated solution, there was only a thin layer of slurry in the solution. As the experiment progressed, the solution continue to cool at a set rate and more material crystallized out since the system tried to reach equilibrium between the two diastereomers and with the solution phase. As a result, both temperature and slurry density changed over the course of the experiment. The result clearly confirmed that the inclusion of slurry density and temperature into the PLS model improves the accuracy of estimation. Furthermore, one may observe a clear trend in E1 and E3 that the model accuracy increases from left to right in both Table 10 and Table 11. Model accuracy increased with the incorporation of process parameters. Also, the estimation models that included at least the slurry density had the lowest prediction error value. The result agreed with the observation made earlier with the regression coefficients in Table 5 in that slurry density has a relatively large weight in the model and it may be used as a correction factor to improve model accuracy. However, there was one exception in which model 9 (data matrix included Raman spectra only) has the smallest E1 and E3 that contradicted the above statement (Tables 10 and 11). This difference in trend could be understood by performing PCA on both the original calibration data points and the samples in experiment 2. It was shown in Figure 19 that the samples themselves formed another cluster far from the original calibration data. In addition, the sample-to-model distance (Figure 16) is the largest compared with the other experiments, which suggested that the samples might be close to the model prediction limits. It would also explain why the E1 of all the PLS models from TG-All (Table 8) in experiment 2 have higher than normal values.

Twelve PLS models were constructed with the same 95 calibration standards using Raman spectra, temperature, or slurry density data or combinations of these. The models were further tested and compared against data from a 250 mL scaled crystallization experiment. It has shown that the in situ Raman spectroscopy is capable of differentiating diastereomers in a crystallization slurry, provided that the changing process parameters of temperature and slurry density are included in the calibration model. In addition, it was shown that PCA was effective in screening the initial data and helped in deciding whether one or additional training groups would be needed. As multiple PLS models were constructed with separate training groups, the sample-to-model distance proved to be useful in identifying which PLS model should be used to estimate results from the dynamic crystallization experiments. Finally, it is argued that the accuracy of the estimation model should increase if the model accounts for as many of the critical process parameters that are changing during the experiment. In the case of cooling crystallization, slurry density and temperature are the critical process parameters that need be included in the modeling task. As a result, it is essential to find an alternative online measurement from which to infer slurry density. Acknowledgment. Financial support from Sepracor Inc in the form of an internship to Sze-Wing Wong and the availability of their experimental facilities is greatly appreciated. Financial support of the consortium on Batch Manufacturing of Pharmaceuticals and Chemicals at Tufts University is also greatly appreciated.

References (1) Qian, R. Y.; Botsaris, G. B. A New Mechanism for Nuclei Formation in Suspension Crystallizers: The Role of Interparticle Forces. J. Chem. Eng. Sci. 1997, 52, 3429–3440. (2) Elsner, M. P.; Menendez, D. F.; Muslera, E. A.; Seidel-Morgenstern, A. Experimental Study and Simplified Mathematical Description of Preferential Crystallization. Chirality 2005, 17, S183-S195. (3) Jacques, J., Collet, A., and Wilen, S. H., Enantiomers, Racemates, and Resolution; Wiley: New York, 1981. (4) Wilen, S. H.; Collet, A.; Jacques, J. Strategies in Optical Resolutions. Tetrahedron 1977, 33, 2725–2736. (5) Ferraro, J. R. Low-Frequency Vibrations of Inorganic and Coordination Compounds; Plenum Press: New York, 1971. (6) Wang, F.; Wachter, J. A.; Antosz, F. J.; Berglund, K. A. An Investigation of Solvent Mediated Polymorphic Transformation of Progesterone Using in Situ Raman Spectroscopy. Org. Process Res. DeV. 2000, 4, 391–395. (7) O’Sullivan, B.; Barrett, P.; Hsiao, G.; Carr, A.; Glennon, B. In Situ Monitoring of Polymorphic Transitions. Org. Process Res. DeV. 2003, 7, 977–982. (8) Ono, T.; Horst, H. T.; Jansens, P. J. Quantitative Measurement of the Polymorphic Transformation of L-Glutamic Acid Using in Situ Raman Spectroscopy. Cryst. Growth Des. 2004, 4, 465–469. (9) Hu, Y.; Liang, J. K.; Myerson, A. S.; Taylor, L. S. Crystallization Monitoring by Raman Spectroscopy: Simultaneous Measurement of Desupersaturation Profile and Polymorphic Form in Flufenamic Acid System. Ind. Eng. Chem. Res. 2005, 44, 1233–1240. (10) Scho¨ll, J.; Bonalumi, D.; Vicum, L.; Mazzotti, M.; Muller, M. In Situ Monitoring and Modeling of the Solvent-Mediated Polymorphic Transformation of L-Glutamic Acid. Cryst. Growth Des. 2006, 4, 881– 889. (11) Falcon, J. A.; Berglund, K. A. In Situ Monitoring of Antisolvent Addition Crystallization with Principal Components Analysis of Raman Spectra. Cryst. Growth Des. 2004, 4, 457–463. (12) Rades, T.; Pratiwi, D.; Fawcett, J. P.; Gordon, K. C. Quantitative Analysis of Polymorphic Mixtures of Ranitidine Hydrochloride by Raman Spectroscopy and Principal Components Analysis. Eur. J. Pharm. Biopharm. 2002, 54, 337–341.

4408 Crystal Growth & Design, Vol. 8, No. 12, 2008 (13) Starbuck, C.; Spartalis, A.; Wai, L.; Wang, J.; Fernandez, P.; Lindemann, C. M.; Zhou, G. X.; Ge, Z. Process Optimization of a Complex Pharmaceutical Polymorphic System via in Situ Raman Spectroscopy. Cryst. Growth Des. 2002, 2, 515–522. (14) Zhou, G., Wang, J., Ge, Z., Sun, Y. Ensuring Robust Polymorph Isolation Using In-Situ Raman Spectroscopy. Am. Pharm. ReV. 2002, Winter. (15) Caillet, A.; Puel, F.; Fevotte, G. In-line Monitoring of Partial and Overall Solid Concentration during Solvent-mediated Phase Transition using Raman Spectroscopy. Int. J. Pharm. 2006, 307, 201–208. (16) Pelletier, M. J. Quantitative Analysis Using Raman Spectrometry. Appl. Spectrosc. 2003, 57, 20A–42A.

Wong et al. (17) Madden, H. H. Comments on the Savitzky-Golay Convolution Method for Least-Squares Fit Smoothing and Differentiation of Digital Data. Anal. Chem. 1978, 50, 1383–1386. (18) Beebe, K. R.; Pell, R. J.; Seasholtz, M. B. Chemometrics: A Practical Guide; John Wiley & Sons, Inc.: New York, 1998. (19) Esbensen, K. H. MultiVariate Data Analysis, 5th ed.; CAMO Process AS: Trondheim, Norway, 2002. (20) Chemometrics Software I: PLS _Toolbox 3.5 by Eigenvector Research, Inc., Manson, WA. (21) Chemometrics Software II: The Unscrambler from Camo Inc., Trondheim, Norway.

CG701201X