Chemometrics for Analytical Data Mining in ... - ACS Publications

Mar 11, 2014 - The separation process for recovery of natural products from plants very ... from analytical data of process streams is demonstrated in...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/IECR

Chemometrics for Analytical Data Mining in Separation Process Design for Recovery of Artemisinin from Artemisia annua Chandrakant R. Malwade, Haiyan Qu, Ben-Guang Rong,* and Lars P. Christensen Department of Chemical Engineering, Biotechnology and Environmental Technology, University of Southern Denmark, Campusvej 55, DK-5230, Odense M, Denmark ABSTRACT: The separation process for recovery of natural products from plants very often employs multiple separation techniques, and key to the success of such processes is to find the synergy between different separation techniques. Molecular level understanding of process streams is highly required in order to determine the synergy between unit operations, which can be attained through analysis of process streams using advanced process analyzers such as high performance liquid chromatography (HPLC), liquid chromatography−mass spectrometry (LC-MS), etc. Very often use of such process analyzers generates an enormous amount of data making it difficult to extract useful information. Therefore, application of chemometrics for extracting process information from analytical data of process streams is demonstrated in this work. The multivariate data analysis technique PARAFAC is used to extract chemical information such as number of impurities, their relative concentrations, and finally their identification from rather complex analytical chromatograms of flash column chromatography (flash CC) fractions during purification of artemisinin from the crude extract of Artemisia annua. Crude extract of A. annua leaves obtained from dichloromethane is used in this work. Prior to the application of PARAFAC, the data set is preprocessed to remove baseline drift and peak misalignment caused by retention time shifts due to matrix effects. The process information extracted from analytical chromatograms by using the PARAFAC technique indicated the presence of impurities ranging from coumarins, polyacetylenes, and flavonoids to artemisinin related compounds in the flash CC fractions.

1. INTRODUCTION A common role of natural products in plants classified as secondary metabolites is defense mechanisms. They are used to fight off herbivores, pests, and pathogens; thus, many have interesting biological activities per se.1 The secondary metabolites form a vast pool of diverse chemical entities and have been known as a major source of drugs or as lead compounds in drug discovery for the pharmaceutical industry for many years. The role of secondary metabolites as an inspiration for the discovery of novel drugs with well-known or new modes of action is evident from the several high selling drugs obtained directly or indirectly from natural sources.2 Some of the prominent examples of drugs obtained from plants such as the anticancer drugs paclitaxel from Pacific yew (Taxus brevifolia) and vincristine from Madagascar rosy periwinkle (Catharanthus roseus), the analgesic drug morphine from Opium poppy (Papaver somniferum), and cardiac glycosides such as digitoxin from Digitalis purpurea for the treatment of cardiac arrhythmias and congestive heart failure explain the benefits pharmaceutical industry has reaped from natural products.3,4 Artemisinin is another example of such a prominent natural drug and is obtained from the dried leaves of the plant Artemisia annua. Artemisinin is an effective medicine against drug resistant malaria and has been recommended by the World Health Organization (WHO) in combination with other antimalarial medicines.5 The chemical synthesis of many important secondary metabolites has been developed more or less successfully, but still chemical synthesis is unable to compete with the biosynthetic machinery of plants because large scale production by chemical synthesis is still not feasible for many natural products.6 Thus, the plants are still a very attractive source of many important natural products. Therefore, recovery © 2014 American Chemical Society

of natural products from plants has significant value for the pharmaceutical as well as the food and cosmetics industries. However, recovery of natural products from plant represents one of the most challenging tasks due to the presence of many impurities, often low concentration of the target compound in the source, and lack of knowledge about the properties of the target compound as well as impurities. Therefore, separation process design for recovery of natural products from plants requires a multidisciplinary approach to deal with different aspects of the problem such as identification of impurities, selection of separation techniques and operating conditions, determination of equilibrium data, etc. In our previous work,7 we have proposed a conceptual process synthesis methodology consisting of multiple separation techniques for recovery of natural products from plants. This methodology enables generation of process flow sheet alternatives by employing different separation techniques in different sequencing orders. However, designing the optimal separation process depends very much on an understanding of process streams flowing between different unit operations at the molecular level, which in turn requires detailed analysis of process streams by using process analyzers. Therefore, in accordance with the guidance draft released by USFDA for the pharmaceutical industry,8 we have incorporated the Process Analytical Technology (PAT) framework into the methodology. The PAT framework consists of different tools such as advanced process analyzers (HPLC, LCReceived: Revised: Accepted: Published: 5582

December 15, 2013 March 3, 2014 March 11, 2014 March 11, 2014 dx.doi.org/10.1021/ie404233z | Ind. Eng. Chem. Res. 2014, 53, 5582−5589

Industrial & Engineering Chemistry Research

Article

the fractions, their relative concentrations, and pure UV spectra, which otherwise would take enormous time to extract manually from the crowded chromatograms. Major impurities recognized in the flash CC fractions are then identified by LC-MS/MS and quantified by HPLC-DAD (Figure 1). The algorithms used in the present work are executed in MATLAB2010b software, and calculations are performed on an IBM PC with an Intel CORE i7 1.60 GHz processor and 4 GB of installed memory (RAM).

MS, GC-MS, NMR, UV, IR, Raman spectroscopy, etc.) for identification and quantification of impurities in various process streams along with chemometrics to extract useful information from a large amount of noisy data generated from analysis, which is very often the case during purification of natural products. Chemometrics is a discipline concerned with the application of statistical and mathematical methods to complex analytical chemical data to delve out more meaningful information rapidly.9 Application of chemometrics for extracting relevant chemical information from complex analytical data is gaining widespread acceptance to address problems in chemistry, biochemistry, medicine, biology, chemical engineering, and natural products chemistry.10 In natural products research several applications of chemometric methods such as chemical fingerprinting of crude extracts, quality control of herbal medicines, HPLC method design, plant metabolomics, quantitative structure activity relationship (QSAR) studies, and extraction solvent design have been reported earlier.11−16 Although chemometric methods are widely used in bioprocess engineering17 and the crystallization process at the manufacturing level,18 its applications at the interface of chemical process engineering and natural product chemistry are relatively limited, especially at the conceptual process design level. The objective of the present work is to apply chemometric methods to extract useful information from the large amount of analytical data obtained from analysis of various process streams, which not only presents the molecular level information of the systems but also improves the understanding of the separation characteristics of the target compound and its associated impurities. Our preliminary lab scale experiments have confirmed that it is feasible to combine chromatography and crystallization operations to recover artemisinin from dried leaves of the plant A. annua as shown in Figure 1. It has been

2. MATERIALS AND METHODS 2.1. Materials. Dried leaves of Artemisia annua were provided by Aarhus University, Department of Food Science, Denmark, and stored in the dark at room temperature until used as described previously.7 Organic solvents dichloromethane (purity ≥ 99.8%), ethyl acetate (purity ≥ 99.8%), and n-hexane (purity ≥ 99.9%) of HPLC grade obtained from VWR Chemicals, Denmark, were used in the experiments. Silica gel 60 (0.04− 0.063 mm) purchased from Merck, Darmstadt, Germany, was used as the stationary phase for flash column chromatography (flash CC) experiments. 2.2. Extraction of Artemisinin from Dried Leaves of Artemisia annua. Artemisinin was extracted from dried leaves of A. annua by using the maceration technique, which involved immersion of the dried leaves of A. annua into the solvent with intermittent agitation. Dichloromethane was used for the extraction taking into consideration the solubility of artemisinin and ease of recovery. The extraction procedure included immersion of 150 g of dried leaves into 1.5 L of fresh solvent at room temperature followed by filtration after 6 h. The procedure was repeated with 1 L of fresh solvent, and the combined extract (2.5 L) was evaporated to obtain 12.5 g of crude extract containing artemisinin. 2.3. Purification of Crude Extract by Flash Column Chromatography. The crude extract obtained in the previous step was partially purified using flash CC to obtain artemisinin rich fractions. A total of 15 g of crude extract was separated on a 7 cm diameter column filled with normal phase silica conditioned in n-hexane. An adsorbent (silica gel) to solute (crude extract) ratio of 20:1 was used. Ethyl acetate and n-hexane was used as the mobile phase. Gradient type of elution was used to run the column under applied pressure. The gradient started with 100% n-hexane followed by n-hexane and ethyl acetate mixtures of composition 90, 80, 70, and 60% n-hexane by volume. In total, 40 fractions each with 100 mL were collected. 2.4. Analysis of Flash CC Fractions. All fractions were initially analyzed with the help of thin-layer chromatography (TLC) to identify the fractions containing artemisinin. The fractions containing artemisinin (23 to 31) were then analyzed by analytical HPLC on a Dionex UltiMate 3000 Rapid Separation LC (RSLC) system consisting of an HPLC pump (LPG3400SD), an autosampler with sample cooler [WPS-3000(T)SL Analytical], a column compartment (TCC-3000SD), and a diode array detector (DAD-3000). ZORBAX Eclipse XDB-C18 reverse phase column (dimensions 150 × 4.6 mm, particle size 5 μm, Phenomenex, Denmark) was used for separation. The mobile phase consisted of water and acetonitrile with 0.1% formic acid as modifier. Gradient type of elution was used starting with 1% acetonitrile, slowly increasing stepwise up to 99% in 66 min, as mentioned in our previous work.7 The column temperature was adjusted to 35 °C. Sample injection volume of 10 μL and eluent flow rate of 0.8 mL/min was used. Chromatograms were recorded at wavelength resolution of 1 nm in the range 190−600

Figure 1. Process flow diagram for recovery of artemisinin from the plant Artemisia annua.

found that the chemical composition of fractions obtained by chromatographic separation is the key to obtaining a synergistic effect between these two operations.7 Therefore, the fractions obtained by flash CC containing artemisinin are subjected to detailed analysis by HPLC equipped with a diode array detector (DAD) in order to get insight into the chemical composition of fractions. However, HPLC measurements yield a large amount of data with a wide range of impurities having sensitivity at different UV wavelengths. Therefore, in this work the multivariate data analysis technique PARAFAC19 is used to extract process information such as number of chemical components present in 5583

dx.doi.org/10.1021/ie404233z | Ind. Eng. Chem. Res. 2014, 53, 5582−5589

Industrial & Engineering Chemistry Research

Article

Figure 2. Exemplary set of chromatograms of fraction 27 measured at UV wavelengths from 190 to 415 nm. (a) Three-dimensional view; (b) twodimensional view from retention time axis; and (c) two-dimensional view from wavelength axis.

3.1.1. Baseline Correction. Baseline drift is a common problem encountered during the measurement of chromatograms, and it is important to remove baseline drift prior to the application of any multivariate data analysis technique, especially in cases where identification and quantification of peaks is required. Practically it is difficult to avoid this problem during the measurements due to many parameters associated with it such as fluctuations in UV lamp intensity, pump pressure and column temperature, contaminated mobile phase, inadequate mixing during gradient generation, and isocratic blending or their combination. Therefore, it is handled post-measurement either by subtracting the blank sample or subtracting a polynomial fitted to the baseline points from the original chromatogram. In the present work, cubic spline algorithm,21 which interpolates a polynomial baseline fit with the help of selected data points on the original chromatogram and subtracts it from original chromatogram, was used to remove baseline drift. The selected data points on the chromatograms should be free of any chemical signal or noise. The baseline correction algorithm was applied individually to each sample matrix containing chromatograms measured at different wavelengths. Figure 3a shows the chromatograms of fraction 27 measured at 225 different UV wavelengths (190−415 nm) before baseline correction, and Figure 3b shows the same chromatograms after baseline correction. Figure 3b clearly shows the negative peaks appearing in the retention time interval 56−66 min. The observed negative peaks is the result of data points selected on the noisy signals from that interval to interpolate a polynomial and its subsequent subtraction from the original chromatograms. However, we have confirmed with the blank run that the negative peaks do not represent the chemical signals of interest and therefore are not included in PARAFAC analysis. 3.1.2. Retention Time Shift Alignment. Retention time shift alignment is another important pretreatment step before the application of multivariate data analysis techniques. In this work, the Interval Correlation Optimized Shifting algorithm (icoshift)22 which uses a piece-wise linear correction function based on an insertion/deletion (I/D) model and optimizes the piecewise cross correlation using the fast Fourier transform was used

nm. LC-MS data were generated on a LTQ XL (Linear Quadrupole 2D Ion Trap, LTQ20992, Thermo Scientific, U.S.A.) mass spectrometer operated in APCI positive mode and attached to an Accela HPLC pump and a DAD operating from 200 to 600 nm. Settings for the mass spectrometer were 50, 5, and 5 (arbitrary units) for sheath, auxiliary, and sweep gas flow rates (N2), respectively, a vaporizer temperature of 450 °C, a discharge current of 5 μA, a capillary temperature of 275 °C, a capillary voltage of 16 V, a tube lens of 35 V, and AGC target settings of 3 × 104 and 1 × 104 for full MS and MS/MS, respectively. Other parameters were the same as for analytical HPLC. 2.5. Data Set Obtained from Chromatograms. The original chromatograms obtained consisted of 19802 data points on the retention time axis (0−66 min) and 410 data points on the wavelength axis (190−600 nm). In order to reduce the computational time, every 10th data point on the retention time axis was taken, thereby reducing the size of matrix containing chromatograms for one sample at different wavelengths to 1981 × 410. Furthermore, preliminary inspection of the chromatograms of all samples together showed that there was no chemical signal until 450 data points (i.e., 15 min) on the retention time axis and also after 225 data points (i.e., 415 nm) on the wavelength axis. Therefore, these parts of the chromatograms were removed, thereby further reducing the size of the data set to 1532 × 225. An exemplary set of baseline corrected chromatograms measured at 225 wavelengths for one sample is shown in Figure 2. Such kind of matrices for 9 samples (fractions 23−31) were stacked one above another to form a three way data set of size 9 × 1532 × 225.

3. RESULTS AND DISCUSSION 3.1. Preprocessing of Data. The data set obtained after chromatogram measurements was subjected to preprocessing to remove the artifacts such as baseline drift and retention time shift introduced by the fluctuations in the performance of instrument components and also due to the matrix effects without losing the chemical signals.20 5584

dx.doi.org/10.1021/ie404233z | Ind. Eng. Chem. Res. 2014, 53, 5582−5589

Industrial & Engineering Chemistry Research

Article

set to 10−6.19,23 In order to reduce the computation time further, the preprocessed data set was divided into 14 intervals of retention time containing chemical signals of interest as shown in Figure 5. PARAFAC was then applied to the individual intervals to determine the total number of chemical components present in each interval, their relative concentration in all samples, and pure UV spectra. Exemplary results from PARAFAC analysis of interval 8 containing the signal for artemisinin are shown in Figure 6. One component unconstrained PARAFAC model was fitted to the raw data of interval 8, and it was concluded that only one component model was enough to fit the data with the help of diagnostics such as explained variance, visualization of retention time and UV mode loadings, and systematic variation in residuals. Explained variance of 99.36% for interval 8 indicates that 99.36% structured variation in the raw data is modeled by fitting one component PARAFAC model and can be visualized in a more condensed form through the loadings of the fitted model. Fitting of one component model implies the presence of one chemical component in the interval. Figure 6a represents the raw data interval containing signals for artemisinin in all fractions at wavelengths ranging from 190 to 415 nm while Figure 6b−d represents the retention time, sample, and UV mode loadings obtained after fitting one component PARAFAC model to the interval, respectively. Likewise, all intervals were modeled by the PARAFAC technique individually, and the results obtained are summarized in Table 1, including the identity of the major natural products occurring as impurities in the individual fractions, which were identified by LC-MS/MS measurements. Corresponding UV spectral mode loadings and relative concentration profile of the individual components present in the modeled intervals are shown in Figures 7 and 8, respectively. Figure 9 presents the fractionation sequence of the target compound and the associated impurities of the flash CC. 3.3. Process Information Obtained from PARAFAC Analysis. 3.3.1. Chemical Contents of Flash CC Fractions. Application of PARAFAC technique enabled process information to be retrieved such as number of impurities present in the fractions, their relative concentrations, and UV profiles relatively

Figure 3. Exemplary chromatograms of fraction 27 measured at UV wavelengths from 190 to 415 nm. (a) before baseline correction; (b) after baseline correction.

to align the retention time shifts. Since this method is applicable to 1-way or 2-way data only, the 3-way chromatogram data set was unfolded to form a 2-way data set of size 1532 × 4050. For the alignment purpose, intervals containing most apparent peaks as shown in Figure 4a were selected, and each interval was aligned individually by using icoshift algorithm. The “max reference chromatogram” (Figure 4a shown in red) automatically generated by icoshift was used as the reference for alignment of all intervals. The aligned chromatograms are shown in Figure 4b, and the magnified region of the first peak clearly shows that all chromatograms have been aligned to the reference chromatogram shown in red in Figure 4a. After alignment, the data set is folded back into a 3-way data set. 3.2. PARAFAC Modeling. The alternating least-squares (ALS) algorithm is used to find the solution of PARAFAC model iteratively. To speed up the algorithm, the problem is initialized by using alternating trilinear decomposition (ATLD) approximation, and the convergence criterion for relative change in fit is

Figure 4. Retention time shift alignment of chromatograms to the reference chromatogram shown in red by using icoshift algorithm. (a) Before alignment; (b) after alignment. 5585

dx.doi.org/10.1021/ie404233z | Ind. Eng. Chem. Res. 2014, 53, 5582−5589

Industrial & Engineering Chemistry Research

Article

Figure 5. Chromatograms of flash CC fractions divided into 14 retention time intervals.

Figure 6. Exemplary results from one component PARAFAC model fitted to interval 8 containing signal for artemisinin. (a) Original data; (b) retention time mode loadings; (c) relative concentration profile of artemisinin; and (d) UV spectral mode loadings.

Table 1. Summary of the Results Obtained from PARAFAC Modeling of Individual Intervals no.

retention time range (min)

number of components

explained variance (%)

fraction number

type of compound

1 2 3 4 5 6 7 8 9 10 11 12 13 14

22.40−23.40 31.80−32.60 38.80−39.50 39.66−40.13 41.30−41.86 41.90−42.43 44.80−45.33 50.76−51.63 51.80−52.50 52.70−53.20 55.26−56.10 56.76−57.10 58.83−59.20 59.56−59.80

1 1 1 1 1 1 1 1 1 1 2 1 1 1

99.46 74.59 99.91 92.86 86.42 90.27 99.90 99.36 99.15 98.19 91.79 94.68 98.25 93.56

25−30 23−29 25 26−28 23−30 23−27 25 23−31 23−27 23−24 23−25 23−24 23−31 23−31

coumarin flavonoid24 art. derivative casticin artemisinin art. derivative polyacetylene,25 flavonoid24 art. derivative dihydroartemisinic acid artemisinic acid

shown in Figures 7 and 8, respectively. Relative concentration profile (sample mode loadings) and UV loadings of peak interval 11 clearly shows the ability of PARAFAC to resolve the coeluting peaks, thereby retrieving the relative concentration and UV profiles for coeluting chemical entities. Also, the relative

easily and faster. Distribution of chemical components present in the flash CC fractions obtained by the PARAFAC analysis of analytical chromatograms is shown in Table 1. The corresponding UV profiles and relative concentration profiles of the chemical components present in the flash CC fractions are 5586

dx.doi.org/10.1021/ie404233z | Ind. Eng. Chem. Res. 2014, 53, 5582−5589

Industrial & Engineering Chemistry Research

Article

Figure 7. UV spectral mode loadings for all intervals.

Figure 8. Sample mode loadings (relative concentration profiles) for all intervals.

concentration profile of the impurities can give an idea about the fractions that could be combined together for further processing. For example, it can be seen from the relative concentration profile of intervals 3 and 7 that the components present in these intervals are found only in fraction 25. Therefore, it would be more appropriate to process this fraction individually. 3.3.2. Identification of Impurities. As observed in our earlier work,7 the impurities present in the flash CC fractions significantly affect the yield and purity of artemisinin during the crystallization step from the flash CC fractions; therefore, it is important to identify these impurities and assess their effect on solubility of artemisinin. The UV profiles of the impurities extracted by using PARAFAC helped to identify at least the class of compound; however, the chemical structure of some impurities was confirmed by further analysis of fractions by

LC-MS/MS. The identities of intervals 7 and 8 were confirmed as the flavonol casticin and artemisinin, respectively, by comparing their UV and LC-MS/MS spectra and retention times on HPLC with authentic standards. The UV profile of interval 1 indicated the presence of coumarin,26 intervals 3 and 11 flavonoids,24 interval 11 a polyacetylene,25 and intervals 5, 9, 12, 13, and 14 artemisinin related compounds. LC-MS analysis of the fractions confirmed the identities of the compounds present in intervals 1, 13, and 14 as coumarin (m/z 147 [M + H]+), dihydroartemisinic acid (m/z 237 [M + H]+, 219 [M + H − H2O]+, 201), and artemisinic acid (m/z 235 [M+H]+, 217 [M + H − H2O]+, 199, 189), respectively. Chemical structures of the identified compounds are shown in Figure 10. Other compounds could not be determined with certainty, due to their relatively low concentration. 5587

dx.doi.org/10.1021/ie404233z | Ind. Eng. Chem. Res. 2014, 53, 5582−5589

Industrial & Engineering Chemistry Research

Article

Figure 9. Fractionation sequence of the target compound and its associated impurities in the flash CC.

ethyl acetate by using the COSMO-RS modeling approach and found that these impurities increase the solubility of artemisinin and can have influence on the crystallization of artemisinin. Therefore, solubility of artemisinin in the presence of the identified impurities, coumarin, casticin, artemisinic acid, and dihydroartemisinic acid, will be measured experimentally or by using the COSMO-RS approach in the future work. In the case that the impurities present in the fractions are found to be negatively influencing the crystallization process of artemisinin, e.g., enhancing the solubility of artemisinin and/or impeding the nucleation and crystal growth, then the efforts will be put in the direction of altering the fractionation sequence by changing the operating conditions of flash CC operation.

Figure 10. Chemical structure of artemisinin and major impurities found in flash CC fractions. (1) Artemisinin; (2) artemisinic acid; (3) dihydroartemisinic acid; (4) coumarin; and (5) casticin.

4. CONCLUSION The multivariate data analysis technique PARAFAC is used to model the analytical chromatograms of flash CC fractions obtained during purification of artemisinin from dichloromethane extract. Basic process information such as number of chemical components present in the fractions along with artemisinin, their concentration profiles, and identification from pure UV spectra is retrieved from crowded analytical chromatograms relatively fast, which is otherwise timeconsuming and laborious to obtain manually. This process information is useful in designing the downstream purification of artemisinin from the flash CC fractions by crystallization and thereby determining the synergistic effect between these two operations. Thus, the application of chemometric methods for mining process information from the vast amount of analytical data of process streams can speed up the separation process design for recovery of natural products where lack of process

3.4. Application of Process Information. The process information obtained from PARAFAC analysis of analytical chromatograms of flash CC fractions provides the blue print of flash CC unit operation in terms of fractionation sequence of artemisinin and coeluting impurities as shown in Figure 9. This fractionation sequence corresponds to the operating conditions used during flash CC operation and can be manipulated by changing the operating conditions. Thus, the obtained process information gives better understanding of flash CC operation by realizing the number of impurities and their relative concentrations in the fractions containing artemisinin. The impurities present in the fractions can influence the solubility of artemisinin and thereby its subsequent crystallization. Lapkin et al.27 studied solubility of artemisinin in the presence of impurities such as casticin and deoxyartemisinin in solvent mixture of hexane and 5588

dx.doi.org/10.1021/ie404233z | Ind. Eng. Chem. Res. 2014, 53, 5582−5589

Industrial & Engineering Chemistry Research

Article

determination of synergic solvent interactions for natural product extractions. Chemom. Intell. Lab. Syst. 2010, 103, 1−7. (16) Schmidt, B.; Jaroszewski, J. W.; Bro, R.; Witt, M.; Stærk, D. Combining PARAFAC analysis of HPLC-PDA profiles and structural characterization using HPLC-PDA-SPE-NMR-MS experiments: commercial preparations of St. John’s Wort. Anal. Chem. 2008, 80, 1978− 1987. (17) Lopes, J. A.; Costa, P. F.; Alves, T. P.; Menezes, J. C. Chemometrics in bioprocess engineering: process analytical technology (PAT) applications. Chemom. Intell. Lab. Syst. 2004, 74, 269−275. (18) Yu, L. X.; Lionberger, R. A.; Raw, A. S.; D’Costa, R.; Wu, H.; Hussain, A. S. Applications of process analytical technology to crystallization processes. Adv. Drug Delivery Rev. 2004, 56, 349−369. (19) Bro, R. PARAFAC: Tutorial and applications. Chemom. Intell. Lab. Syst. 1997, 38, 149−171. (20) Amigo, J.; Popielarz, M.; Callejón, R.; Morales, M.; Troncoso, A.; Petersen, M.; Toldam-Andersen, T. Comprehensive analysis of chromatographic data by using PARAFAC2 and principal components analysis. J. Chromatogr. A 2010, 1217, 4422−4429. (21) Press, W.; Teukolsky, S.; Vetterling, W.; Flannery, B. Interpolation and Extrapolation. In Numerical recipes in C: The art of scientific computing; Cambridge University Press: 2002; pp 105−128. (22) Tomasi, G.; Savorani, F.; Engelsen, S. B. icoshift: An effective tool for the alignment of chromatographic data. J. Chromatogr. A 2011, 1218 (43), 7832−7840. (23) Wu, H.; Shibukawa, M.; Oguma, K. An alternating trilinear decomposition algorithm with application to calibration of HPLC− DAD for simultaneous determination of overlapped chlorinated aromatic hydrocarbons. J. Chemom. 1998, 12, 1−26. (24) Markham, K. R. Techniques of Flavonoid Identification; Academic Press: London. (25) Manns, D.; Hartmann, R. Annuadiepoxide, a new polyacetylene from the aerial parts of Artemisia annua. J. Nat. Prod. 1992, 55, 29−32. (26) Goodwin, R. H.; Pollock, B. M. Ultraviolet absorption spectra of coumarin derivatives. Arch. Biochem. Biophys. 1954, 49 (1), 1−6. (27) Lapkin, A. A.; Peters, M.; Greiner, L.; Chemat, S.; Leonhard, K.; Liauw, M. A.; Leitner, W. Screening of new solvents for artemisinin extraction process using Ab initio methodology. Green Chem. 2010, 12, 241.

information is always a major problem. Moreover, it can also provide a robust foundation for the detailed design of the separation process for recovery of natural products by revealing the molecular level understanding of process streams. Finally, the revealed process information together with the presented chemometrics method also provides an integrated approach for investigating the synergy effect and optimal design of the separation operations in terms of the fractionation sequences and concentration profiles of the impurities and their influence on the crystallization performance.



AUTHOR INFORMATION

Corresponding Author

*Tel.: +45 6550 7481; e-mail: [email protected] Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors would like to thank Xavier Fretté and Brian Hermansen at the Department of Chemical Engineering, Biotechnology and Environmental Technology, University of Southern Denmark, Odense, for their help with the LC-MS measurements.



REFERENCES

(1) Sarkar, S. D.; Nahar, L. An introduction to natural products isolation. In Natural Products Isolation, 3rd ed.; Sarkar, S. D., Nahar, L., Eds.; Springer: London, 2012; pp 1−25. (2) Newman, D. J; Cragg, G. M. Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Prod. 2012, 75, 311− 335. (3) Koehn, F. E.; Carter, G. T. The evolving role of natural products in drug discovery. Nat. Rev. Drug Discovery 2005, 4, 206−220. (4) Harvey, A. L. Natural products in drug discovery. Drug Discovery Today 2008, 13, 894−901. (5) WHO monograph on good agricultural and collection practices (GACP) for Artemisia annua L; World Health Organization: Switzerland, 2006. (6) Nicolaou, K. C.; Sorensen, E. J.; Winssinger, N. The art and science of organic and natural products synthesis. J. Chem. Educ. 1998, 75 (10), 1226−1258. (7) Malwade, C. R.; Rong, B.-G.; Qu, H.; Christensen, L. P. Conceptual process synthesis for isolation and purification of natural products from plants − A case study of artemisinin from Artemisia annua. Ind. Eng. Chem. Res. 2013, 52, 7157−7169. (8) PATA Framework for Innovative Pharmaceutical Development, Manufacturing, and Quality Assurance; U.S Food and Drug Administration: Rockville, MD, U.S.A., 2004. (9) Wold, S. Chemometrics; what do we mean with it, and what do we want from it? Chemom. Intell. Lab. Syst. 1995, 30, 109−115. (10) Lavine, B. K.; Workman, J. Chemometrics. Anal. Chem. 2013, 85, 705−714. (11) Bernal, F. A.; Delgado, W. A.; Cuca, L. E. Fingerprint analysis of unfacrtionated Piper plant extracts by HPLC-UV-DAD coupled with chemometric methods. J. Chil. Chem. Soc. 2012, 57, 1256−1261. (12) Gad, H. A.; El-Ahmady, S. H.; Abou-Shoerb, M. I.; Al-Azizi, M. M. Application of chemometrics in authentication of herbal medicines: a review. Phytochem. Anal. 2013, 24, 1−24. (13) Berridge, J. C. Chemometrics and method development in highperformance liquid chromatography. Chemom. Intell. Lab. Syst. 1988, 3, 175−188. (14) Jansen, J. J.; Smit, S.; Hoefsloot, C. J.; Smilde, A. K. The photographer and the greenhouse: how to analyse plant metabolomics data. Phytochem. Anal. 2010, 21, 48−60. (15) Garcia, L. M. Z.; de Oliveira, T. F.; Soares, P. K.; Bruns, R. E.; Scarminio, I. S. Statistical mixture design - Principal component 5589

dx.doi.org/10.1021/ie404233z | Ind. Eng. Chem. Res. 2014, 53, 5582−5589