Recipe for Uncovering the Bioactive Components ... - ACS Publications

Jul 27, 2009 - Foo-Tim Chau,*,†,‡ Hoi-Yan Chan,‡ Chui-Yee Cheung,‡ Cheng-Jian Xu,‡ ... Kong, P.R. China, Research Center of Modernization of...
1 downloads 0 Views 2MB Size
Anal. Chem. 2009, 81, 7217–7225

Recipe for Uncovering the Bioactive Components in Herbal Medicine Foo-Tim Chau,*,†,‡ Hoi-Yan Chan,‡ Chui-Yee Cheung,‡ Cheng-Jian Xu,‡ Yizeng Liang,§ and Olav M. Kvalheim*,| State Key Laboratory of Chinese Medicine and Molecular Pharmacology, Shenzhen, P.R. China, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, SAR Hong Kong, P.R. China, Research Center of Modernization of Chinese Medicines, College of Chemistry and Chemical Engineering, Central South University of Changsha, P.R. China, and Department of Chemistry, University of Bergen, Bergen, Norway Using whole chromatographic profiles and measurements of total bioactivity as input, a quantitative pattern-activity relationship (QPAR) approach is proposed as a general method for providing two pieces of crucial information about complex bioactive mixtures available: (i) a model for predicting total bioactivity from the chromatographic fingerprint and (ii) the features in the chromatographic profile responsible for the bioactivity. While the first piece of information is already available through existing approaches, the second one results from our ability to remove dominant features in the chromatographic fingerprints which mask the components specifically related to pharmacological activity. Our targeted approach makes information about bioactivity available at the molecular level and provides possibilities for assessment of herbal medicine (HM) possible beyond just authentication and total bioactivity. As an example, the antioxidant property of the HM Radix Puerariae lobatae is measured through its reducing power toward a ferric ion complex. A partial least-squares (PLS) model is created to predict the antioxidant activity from the chromatographic fingerprint. Using the antioxidant activity as a target, the most discriminatory projection in the multivariate space spanned by the chromatographic profiles is revealed. From this target-projected component, the chromatographic regions most strongly connected to antioxidant activity are identified using the so-called selectivity ratio (SR) plot. The results are validated by prediction of samples not included in the modeling step. Herbal medicine (HM) has gained broad interest in the research community during recent years.1 A single herb may contain hundreds of components spanning a concentration range of several orders of magnitude. Due to this complexity of HM, * Corresponding authors. E-mail: [email protected]. Tel.: +852 3400 8672. Fax: +852 2364 9932. E-mail: [email protected]. Tel.: +47 5558 3366. Fax: +47 5558 9490. † State Key Laboratory of Chinese Medicine and Molecular Pharmacology. ‡ The Hong Kong Polytechnic University. § Central South University of Changsha. | University of Bergen. (1) Li, P.; Qi, L.-W.; Liu, E.-H.; Zhou, J.-L.; Wen, X.-D. Trends Anal. Chem. 2008, 27, 66–77. 10.1021/ac900731z CCC: $40.75  2009 American Chemical Society Published on Web 07/27/2009

the knowledge about composition and pharmacological activity of most HMs is still incomplete. This has been reflected in methods for authentication and quality control of HM.2,3 Traditionally, herbal extracts have been authenticated by just one or a few marker compounds. Markers need not to be among the bioactive analytes or unique for a specific herb. For instance, Ginsenoside is a bioactive and marker component for both Radix ginseng and Radix notoginseng.4 Despite their common origin, the pharmacological functions of these herbs are different. Radix ginseng acts as a tonic and stimulant, while Radix notoginseng eliminates blood stasis and stops bleeding.5 Thus, the pharmacological activity of a HM is not a single-component property. This fact has led to the development of methods for authentication of HM utilizing whole chromatographic fingerprints. Recently, Xu et al.6 used this socalled single-pattern approach to show that reliable authentication of herbs can only be achieved from the cross-correlation of whole chromatographic profiles. The single-pattern approach provides no information about the bioactivity and thus the therapeutic efficacy of HMs. However, the total bioactivity of HM can be related to the chromatographic profiles by means of multivariate modeling techniques. The antioxidant property of green tea was predicted from chromatographic fingerprints using partial least-squares (PLS) and uninformative variable elimination PLS (UVE-PLS) regression.7 A robust PLS procedure was later used for analysis of the same data.8 Wang et al.9 estimated the bioactivity of different combinations of six fractions from a herbal formulation by multiple linear regression (MLR) and artificial neural network (ANN). They used the weight composition of the fractions as input. However, information about bioactivity at the molecular level for HMs has (2) Mok, D. K. W.; Chau, F. T. Chemom. Intell. Lab. Syst. 2006, 82, 210–217. (3) Zeng, Z. D.; Chau, F. T.; Chan, H. Y.; Cheung, C. Y.; Lau, T. Y.; Wei, S. Y.; Mok, D. K. W.; Chan, C. O.; Liang, Y. Z. Chin. Med. 2008, 3–9. (4) The state Pharmacopoeia Commission of People’s Republic of China: Pharmacopoeia of the People’s Republic of China; Chemical Industry Press: Beijing, 2005. (5) Lu, G. H.; Zhou, Q.; Sun, S. Q.; Leung, K. S. Y.; Zhang, H.; Zhao, Z. Z. J. Mol. Struct. 2008, 883-834, 91–98. (6) Xu, S. J.; Yang, L.; Tian, R.; Wang, Z.; Liu, Z. J.; Xie, P. S.; Feng, Q. J. Chromatogr. A 2009, 1216, 2163–2168. (7) van Nederkassel, A. M.; Daszykowski, M.; Massart, D. L.; Vander Heyden, Y. J. Chromatogr. A 2005, 1096, 177–186. (8) Daszykowski, M.; Vander Heyden, Y.; Walczak, B. J. Chromatogr. A 2007, 1176, 12–18. (9) Wang, Y.; Wang, X.; Cheng, Y. Chem. Biol. Drug Des. 2006, 68, 166–172.

Analytical Chemistry, Vol. 81, No. 17, September 1, 2009

7217

been difficult to obtain from the above-mentioned methods. Huang et al.10 represent an overview of various experimental strategies for screening for the bioactive components. Identification of the active ingredients in HM has also been carried out by bioassayguided fractionation.11,12 Experimental procedures for screening of bioactive components are generally laborious. Thus, modeling approaches based on MLR and ANN making use of the correlations between selected chromatographic peak areas and bioactivity have been proposed to detect active components.13,14 Adequate models for prediction of the total bioactivity from chromatographic profiles using multivariate regression are relatively simple to obtain,7,8 but reliable information about bioactivity at the molecular level is not directly available from these models. The reason for this is that most of the component variation within a collection of samples originating from the same kind of herbs is not due to differences in bioactive components but stems from variation in inactive components. This is fairly evident from previous investigations. For instance, three to eight PLS components were needed to model the total antioxidant capacity of green tea from whole chromatographic profiles.7 By means of UVE-PLS, some regions with small and noisy variables were removed from the chromatogram, but like the precursor PLS model also the UVE-PLS model was dominated by orthogonal variation, i.e., features of the chromatographic profiles not related to components that are bioactive but to variation in molecular components that are inactive. To remove such variation from the pattern-activity model, other remedies must be used. Target projection (TP)15,16 and orthogonal partial least-squares (OPLS)17 represent two algorithms for eliminating such variation. By these approaches, a single multivariate projection expresses the relationship between the chromatographic fingerprint and the bioactivity. Dumarey et al.18 used the OPLS loading vector to select chromatographic regions with antioxidant capacity of green tea. However, as both TP and OPLS are covariance-based modeling approaches, the variable loading vector in itself can be misleading for selecting the regions in the chromatogram corresponding to the most bioactive components. The reason is that almost inactive components with relatively high concentrations can show high absolute covariance and thus mask more bioactive components present in smaller concentration. To cope with this situation, Rajalahti et al.19,20 introduced the so-called selectivity ratio (SR) plot for revealing regions in spectral or chromatographic profiles with both high explanatory and high predictive significance for the response (10) Huang, X.; Kong, L.; Li, X.; Chen, X.; Guo, M.; Zou, H. J. Chromatogr. B 2004, 812, 71–84. (11) Zhu, M.; Philipson, J. D.; Yu, H.; Greengrass, P. M.; Norman, N. G. Phytother. Res. 1997, 11, 231–236. (12) Jin, W.; Shi, Q.; Hong, C.; Cheng, Y.; Ma, Z.; Qu, H. Phytomedicine 2008, 15, 768–774. (13) Cheng, Y.; Wang, Y.; Wang, X. Comput. Biol. Chem. 2006, 30, 148–154. (14) Wang, Y.; Jin, Y.; Zhou, C.; Qu, H.; Cheng, Y. Med. Biol. Eng. Comput. 2008, 46, 605–611. (15) Kvalheim, O. M.; Karstang, T. V. Chemom. Intell. Lab. Syst. 1989, 7, 39– 51. (16) Kvalheim, O. M. Chemom. Intell. Lab. Syst. 1990, 8, 59–67. (17) Trygg, J.; Wold, S. J. Chemom. 2002, 16, 119–128. (18) Dumarey, M.; van Nederkassel, A. M.; Deconinck, E.; Vander Heyden, Y. J. Chromatogr. A 2008, 1192, 81–88. (19) Rajalahti, T.; Arneberg, R.; Berven, F. S.; Myhr, K.-M.; Ulvik, R. J.; Kvalheim, O. M. Chemom. Intell. Lab. Syst. 2009, 95, 35–48. (20) Rajalahti, T.; Arneberg, R.; Kroksveen, A. C.; Berle, M.; Myhr, K.-M.; Kvalheim, O. M. Anal. Chem. 2009, 81, 2581–2590.

7218

Analytical Chemistry, Vol. 81, No. 17, September 1, 2009

investigated. This plot provides the bridge from the patternactivity models for predicting total bioactivity to the individual regions or components responsible for the bioactivity. The present study aims at developing an analytical protocol that goes beyond the present state-of-the-art for characterization of HMs. Instead of just assessing the similarities between HMs on the basis of their chromatographic fingerprints and connecting these fingerprints to their total pharmacological activity, our new approach is able to reliably capture the bioactive regions in the chromatographic profiles and thus to enable information of bioactivity at the molecular level. We do this by establishing a quantitative model between the chromatographic fingerprints and the pharmacological activity in a single predictive target component to eliminate the problem of orthogonal variation. Subsequently, the selectivity ratios are calculated to eliminate the effect of variation in relative concentration of components. Radix Puerariae lobatae is used as an example to show how the method works and to investigate its performance. This HM is applied for treatment of cardiovascular disease and hypertension, for treatment of influenza and headache, and to promote production of body fluid.21-23 THEORY Predicting Bioactivity from Chromatographic Fingerprints. Assume that we have measured the total bioactivity in a suite of samples from a HM and the chromatographic profiles for the same samples. Collect the chromatographic profiles in a data matrix X and the corresponding bioactivities in a vector y. A model describing the relation between bioactivity and chromatographic profile can be determined by least-squares regression y ) Xb + f

(1)

Here, b is the vector of regression coefficients connecting the chromatographic profiles X to the bioactivities y, and f contains the residuals. Thus, eq 1 is a formal expression of the quantitative pattern-activity relationships (QPAR), i.e., the mathematical model for connecting chromatographic fingerprint to bioactivity. Since the number of chromatographic variables (retention time points) is generally larger than the number of samples, the generalized inverse X+ has to be calculated to solve eq 1 b ) X+y

(2)

For PLS regression, the generalized inverse X+ can be expressed as the product of three matrices24 X+ ) WR-1UT

(3)

The three matrices are the orthonormal matrix W of PLS weights, the inverse of a bidiagonal matrix R, and the orthonormal (21) Cao, Y.; Lou, C.; Zhang, X.; Chu, Q.; Fang, Y.; Ye, J. Anal. Chim. Acta 2002, 452, 123–128. (22) Jiang, R. W.; Lau, K. M.; Lam, H. M.; Yam, W. S.; Leung, L. K.; Choi, K. L.; Waye, M. M. Y.; Mak, T. C. W.; Woo, K. S.; Fung, K. P. J. Ethnopharmacol. 2005, 96, 133–138. (23) Chen, S. B.; Liu, H. P.; Tian, R. T.; Yang, D. J.; Chen, S. L.; Xu, H. X.; Chan, A. S. C.; Xie, P. S. J. Chromatogr. A 2006, 1121, 114–119. (24) Manne, R. Chemom. Intell. Lab. Syst. 1987, 2, 187–197.

matrix U of scores. By inserting eq 3 in eq 2, the PLS regression vector b can be estimated. Thus, the connection between chromatographic fingerprint and total bioactivity is established. Target Projection. The PLS model calculated above can provide prediction of bioactivity from a chromatographic fingerprint. However, we cannot obtain information about bioactivity at the molecular level as the model is “corrupted” by variation originating from components that are either inactive or only possess very weak bioactivity. A manifestation of this effect is the complexity of the PLS models. Numerous PLS components are usually needed to describe X, and the information about y is scattered between the PLS components. This orthogonal variation17 masks our ability to achieve the second aim of our modeling: to reveal the bioactive regions in the chromatographic fingerprint. To obtain this goal, we need to find the direction in the multivariate predictive space most strongly related to the response. This can be accomplished by means of target projection (TP). Since the method is relatively well described in the literature,15,16,25 this paragraph only provides the points necessary to understand the results from the present work. The direction in space most strongly related to bioactivity is defined by the regression vector b. By projecting X on the normalized regression vector wTP ) b/|b|, target-projected scores tTP that are proportional to the predicted bioactivities are obtained tTP ) Xb/|b|

(4)

at each chromatographic retention time. Thus, a retention time region with large loadings may reflect a component with relatively high concentration and weak bioactivity rather than point to a region with strong bioactivity. To eliminate this “size” factor, we have to link bioactivity to a measure which is independent of relative concentration. One possibility is to calculate the correlation between the predicted bioactivity and the target-projected chromatographic profiles defined by eq 5. A more sensitive measure is obtained by comparing explained vexpl,i and residual vres,i variance in the TP model for each chromatographic variable i. By means of eq 6, these variances can be calculated, and we can define a selectivity ratio SR for each spectral variable i SR,i ) νexpl,i /νres,i

i ) 1, 2, 3, ...

(7)

The selectivity ratios can be displayed similarly to a chromatogram in an SR plot. A high value means that the chromatographic variable has a strong (predictive) correlation to the bioactivity. Thus, the selectivity ratio provides a ranking of chromatographic regions according to bioactive capacity independent of relative concentrations of components. By multiplying the selectivity ratio with the sign of the corresponding regression coefficient, we can even reveal regions from the SR plot corresponding to components which reduce bioactivity. By comparing explained and residual variance in an F-test, it is possible to define boundaries on the selectivity ratios corresponding to a predefined probability level.20

This reduction to a single predictive component in X is possible if the total bioactivity is approximately additive in the bioactive molecular components. If the total bioactivity is dominated by interactions between molecules, the reduction to a single predictive target component is no longer feasible, and interpretation at the molecular level is not possible either. If this situation arises, common diagnostic tools for checking the model will reveal that we have encountered this problem. Bioactive Regions from the Selectivity Ratio (SR) Plot. The loading vector defined by eq 5 displays the covariances between the vector of predicted bioactivities and the vector of intensities

EXPERIMENTAL SECTION Plant Material and Sample Preparation. Seventy-eight samples of Radix Puerariae lobatae were collected from different provinces in China, while one sample was purchased at a local wholesale market in Hong Kong. All samples were ground into fine powder (100mesh). An amount of 4 g of powder and 180 mL of double deionized water was added in a conical flask. It was placed in a 50 °C water bath without shaking or ultrasonication. The duration of the extraction was 75 min. After that, the extract was filtered through a Whatman No. 1 filter paper. The same extraction procedure was then applied to the filtered solid in a re-extraction step, and filtration was carried out again. The filtrates were combined together for freeze-drying. The dry powder was collected for chromatographic and antioxidant characterization. Ferric Reducing Antioxidant Power (FRAP) Assay. With the use of a COBAS FARA II spectrofluorometric centrifugal analyzer (Roche), the antioxidant activity of all extracts was measured by the FRAP method proposed by Benzie and Strain.26 The FRAP reagent was prepared by mixing acetate buffer, 2,4,6tripyridyl-s-triazine (TPTZ) solution, and ferric chloride (FeCl3 · H2O) solution. Acetate buffer was prepared from sodium acetate trihydrate (Riedel-de-Haen, Germany) and glacial acetic acid (BDH Laboratory Supplies, England). TPTZ was bought from Sigma-Aldrich, and its solution was prepared in hydrochloric acid FeCl3 · H2O, purchased from BDH Laboratory Supplies, and dissolved in water. The FRAP reagent was freshly prepared by mixing 20 mL of 300 mmol/L acetate buffer at pH equal to 3.6, 2 mL of 10 mmol/L TPTZ in 40 mmol/L hydrochloric acid, and 2 mL of 20 mmol/L FeCl3 · H2O. At first, the absorbance (A1) of the FRAP reagent was measured at 593 nm. Afterward,

(25) Kvalhem, O. M.; Rajalahti, T.; Arneberg, R. J. Chemom. 2009, 23, 49–55.

(26) Benzie, I. F. F.; Strain, J. J. Anal. Biochem. 1996, 239, 70–76.

From the target-projected scores, the target projected loadings pTP are calculated as pTP ) XTtTP /(tTPTtTP)

(5)

The TP loadings represent the features in the chromatographic profiles explaining and predicting the bioactivity. However, the TP loadings are not directly useful for ranking the chromatographic regions according to bioactive capacity. The reason is that the model is estimated from covariances between the chromatographic variables, and the bioactivity and regions with components present in relatively large concentration may dominate the TP loadings even if the correlation with bioactivity is low (see Results and Discussion section for an example). The target projection reduces the QPAR model to a singlecomponent TP model ˆ TP + ETP ) tTPpTPT + ETP X)X

(6)

Analytical Chemistry, Vol. 81, No. 17, September 1, 2009

7219

10 µL of the HM extract was added to the FRAP reagent. The absorbance (A2) of the extract and the FRAP reagent was recorded after 4 min. Then, the difference (∆A12) of the absorbance A1 and A2 was calculated. At the same time, 1000 µM of Fe2+ standard solution was added to the FRAP reagent, and the absorbance (A3) was measured. Thus, the change (∆A13) in the absorbance between A1 and A3 was determined. The conversion factor is evaluated by dividing 1000 by ∆A13. In this way, ∆A12 recorded for each extract can be converted to the FRAP value by multiplying it with the conversion factor. The quantity obtained represents the antioxidant activity level in terms of FRAP value. A series of concentrations of ascorbic acid were prepared to monitor the performance of the instrument. Chromatographic Analyses. High-performance liquid chromatography (HPLC) analyses were performed on an Agilent 1100 series HPLC system with a column compartment and diode-array detector (DAD). HPLC-grade acetonitrile and acetic acid were purchased from the Tedia Company. Double deionized water was utilized in sample extraction and HPLC analyses. The HPLC-UV chromatograms of HM extracts were obtained by using a reversedphase, ODS hypersil column with the dimension of the column being 250 × 4.6 mm and the particle size being 5 µ. A binary gradient elution system was applied for analyzing the extracts. It was composed of acetonitrile (solvent A) and 0.3% acetic acid in water (solvent B). The gradient elution was carried out as follows: 0-15 min, 7-20%; 15-20 min, 20-25%; 20-30 min, 25-35%; 30-40 min, 35-50%; 40-45 min, 50-75% and 75% of solvent A maintained for 10 min. The flow rate was 1 mL/min. An aliquot of 10 µL of each sample solution was injected into the HPLC system for analysis. The detection wavelength was set at 254 nm. Standards. Five isoflavonoids were selected as standards for both qualitative and quantitative analyses in this investigation. They were Puerarin, Daidzin, Genistin, Daidzein, and Genistein and were purchased from the International Laboratory, USA. The purity of these standards was above 98%. Puerarin was dissolved in water, while the other chemical standards were dissolved in ethanol. As for the FRAP measurements, ascorbic acid was bought from Sigma-Aldrich, and ferrous sulfate was bought from Ridel de Haen. To quantify the amounts of these chemicals in the HM extracts, six different concentrations of each of them were prepared. Five injections were carried out on each of these six concentrations for all the chemical standards in the HPLC study. Then, the calibration curve of each one was obtained by plotting the average of the corresponding chromatographic peak area against the concentration based on the data obtained. The chromatographic peak areas were determined by the Agilent ChemStation software. Chromatographic Validation. The chromatographic method used for analyzing Radix Puerariae lobatae was validated in terms of intraday and interday repeatability and linearity. Both intraday and interday repeatability were assessed by the relative standard deviation (RSD) of peak areas of the chemical standards identified in the extracts as well as the similarity indices of the chromatographic profiles involved. The Computer Aided Similarity Evaluation (CASE) software27 was used to compare the preprocessed (27) Yi, L. Z.; Yuan, D. L.; Liang, Y. Z.; Xie, P. S.; Zhao, Y. Anal. Chim. Acta 2007, 588, 207–215.

7220

Analytical Chemistry, Vol. 81, No. 17, September 1, 2009

Figure 1. HPLC-UV mean chromatogram (red) and standard deviation (blue) of 78 extracts with FRAP values varying from 39 to 1240.

chromatographic fingerprints quantitatively in terms of similarity indices. Data Analysis of Chromatographic Profiles. The retention time interval 9.013-31.393 min with a data point resolution of 0.02 min was selected for analysis. This provided 78 chromatographic profiles each described by the intensities at 1120 retention times. Data preprocessing was performed using programs coded by us in MATLAB 7.2 for Windows (The Mathworks, Natick, MA, 2006). The asymmetric least-squares method was applied for automatic background correction.28 Correlation optimized warping (COW)29 was used to reduce retention time shift in the chromatographic profiles. The chromatograms were divided into 20 sections for alignment, and the maximal shift of time points was set to 10. PLS, TP, and calculation of selectivity ratios were executed by use of prerelease of Sirius 8.0 from Pattern Recognition Systems AS, Bergen, Norway. All models were validated by randomly dividing the samples into two groups, a training set embracing two-thirds of the samples used for model building and a validation set consisting of one-third of the samples. The training set was further subdivided into two groups, and cross validation30 was used for determining the optimum number of PLS components A. The optimum number of significant PLS components was further assessed by predicting the bioactivity for the samples in the validation set with A, A-1, and A-2 PLS components. RESULTS AND DISCUSSION Figure 1 shows the mean chromatographic fingerprint together with the standard deviation of 78 extracts from Radix Puerariae lobatae. These extracts span the range from 39 to 1246 in antioxidant activity as determined by means of the FRAP assay. The average value is 496, and the standard deviation is 215. We shall now examine the ability of several approaches for determining bioactivity at the molecular level by using the chromatographic fingerprints for 78 extracts from Radix Puerariae lobatae. Isoflavonoids in Radix Puerariae lobatae. It is well-known that isoflavonoids constitute one of the major classes of com(28) Eilers, P. H. C. Anal. Chem. 2004, 76, 404–411. (29) Tomasi, G.; van den Berg, F.; Andersson, C. J. Chemom. 2004, 18, 231– 241. (30) Bro, R.; Kjeldahl, K.; Smilde, A. K.; Kiers, H. A. L. Anal. Bioanal. Chem. 2008, 390, 1241–51.

pounds in Radix Puerariae lobatae.22,23,31 Isoflavonoids exhibit different kinds of biological activity including anti-inflammatory activity, antithrombotic activity, antihypertensive activity, cancer chemo-preventive activity, and phyto-osteogen activity.23 According to the Chinese Pharmacopoeia,4 a single marker molecule, Puerarin, is recommended for the authentication of this HM. The content of Puerarin has to be larger than or equal to 2.4%. For our study, we selected five isoflavonoids as standards: Puerarin, Daidzin, Genistin, Daidzein, and Genistein. Puerarin, Daidzin, and Genistin are glycosides, while Daidzein and Genistein are aglycones. Furthermore, Puerarin and Daidzin are glycosides of Daidzein, while Genistin is the glycoside of Genistein.32-35 To determine the content of these molecules in the extracts of Radix Puerariae lobatae, six concentrations in the range 1 and 100 ppm were prepared for each one, and calibration curves were established between chromatographic peak areas and concentrations. All regression equations provided linearity between concentration and peak area with correlation coefficients of 0.999. Puerarin was found to be present in all extracts, while Genistein was only found in four extracts. The main reason may be that Puerarin and Daidzin are more polar than Daidzein and Genistein as indicated by their retention time on the reversed-phase HPLC column. Water was used as the extraction solvent providing relatively poorer extraction of the less polar compounds. The average content of Puerarin in the 78 extracts was 67.0 ppm with a range from 9.2 to 125.9 ppm. The average content of Daidzin, Genistin, and Daidzein was 8.9, 5.2, and 3.1, respectively. The correlation between content of Puerarin and antioxidant activity as measured by the FRAP assay was 0.458, while the correlation of the three other compounds to antioxidant activity was close to zero. The rather low correlation with total antioxidant activity for Puerarin illustrates the problem with the marker approach for authentication. PLS Model for Total Antioxidant Activity. Two-thirds of the extracts were selected as a training set for PLS modeling. By use of score plots from principal component analysis, two samples were identified as outliers. These samples were removed from the training set, and PLS using cross validation gave a model with six components, explaining 91.9% in the chromatographic profiles and 93.7% in the antioxidant activity. The standard error of validation (SECV) for prediction of the FRAP values was 91. Thus, the chromatographic fingerprint is a strong predictor of the antioxidant activity as expressed by the FRAP measurement. The estimated PLS model was used to predict the bioactivity in the 26 validation samples. 74.6% of the bioactivity was explained with a standard error of prediction (SEP) equal to 91. The number of PLS components was reduced to five and the calculation repeated. For this PLS model, 80.2% of the variation in bioactivity was explained, while the SEP value was reduced to 86. Further reduction to four PLS components raised the SEP value to 100 and decreased the explained variance in bioactivity to 75.5%. At the same time, the prediction bias increased. For the five(31) Lee, M.-H.; Lin, C.-C. Food Chem. 2007, 105, 223–228. (32) Rong, H.; Stevens, J. F.; Deinzer, M. L.; Cooman, L. D.; Keukeleire, D. D. Planta Med. 1998, 64, 620–627. (33) Chen, G.; Zhang, J.; Ye, J. J. Chromatogr. A 2001, 923, 255–262. (34) Prasain, J. K.; Jones, K.; Kirk, M.; Wilson, L.; Smith-Johnson, M.; Weaver, C.; Barnes, S. J. Agric. Food Chem. 2003, 51, 4213–4218. (35) Lee, C. H.; Yang, L.; Xu, J. Z.; Yeung, S. Y. V.; Huang, Y.; Chen, Z. Y. Food Chem. 2005, 90, 735–741.

Figure 2. (a) Regression coefficients for PLS model calculated from full chromatographic profiles (1120 variables) for training set, (b) loadings on target-projected component determined from PLS model, and (c) SR plot from TP model.

component PLS model, only the two samples with the lowest bioactivity (FRAP values of approximately 50) showed prediction bias, with prediction of antioxidant activity being 3-4 times higher than measured. For the training set, the five-component PLS model gave almost the same explained variance in bioactivity and in the chromatographic profiles as the six-component model and no bias, confirming that the five-component model is appropriate with respect to predictive ability for this data set. Figure 2a shows the regression coefficients obtained for the five-component PLS model. The three regions with largest coefficients are located around retention times of 12.5, 15.6, and 22.8 min. The last retention time corresponds to the chromatographic elution window for Genistin. The correlation between concentration of Genistin in the extracts and total antioxidant capacity is close to zero. This observation shows that the size of regression coefficients may not necessarily reflect the most significant variables with respect to the response. Although it Analytical Chemistry, Vol. 81, No. 17, September 1, 2009

7221

Figure 3. Regression coefficients for the PLS model calculated from from five chromatographic regions (91 variables) for training set, loadings on target-projected component determined from PLS model, and the SR plot from the TP model.

appears to contradict common sense, this observation is in agreement with our previous work.19,20 Target Projection Loadings and SR Plot. The five-component PLS model was target projected onto the axis of optimal prediction of bioactivity. This procedure eliminates the problem of orthogonal variation which is obscuring the interpretation of complex mixtures such as HMs. The TP loading vector is shown in Figure 2b. This projection explains 14.1% of the total variation in chromatographic profiles. The three regions with largest loadings are centered at retention times of 14.9, 15.6, and 12.5 min. The component eluting around retention time 14.9 is Puerarin. This retention time region, which is displaying the largest loadings, was not among the regions found when using the size of the regression coefficients as selection criterion for importance with respect to antioxidant activity. By means of eq 7, the SR plot is obtained (Figure 2c). From the SR plot, the most influencing chromatographic regions can be reliably unveiled and ranked according to significance for the response. Three regions are evident from the SR plot, centered at retention times of 12.5, 13.3, and 15.6 min. While two of the three top-ranked regions in the SR plot are the same as those obtained by selection based on size of regression coefficients and TP loadings, the second in rank is not possible to unmask without the use of the SR plot. Note also that Puerarin, which is used as a marker compound for authentication and quality control for Radix Puerariae lobatae, is not among the top ranked components in terms of antioxidant activity. Robustness of SR Plot for Selection. Whether five or six PLS components are used for the training set model, the SR plot has the same appearance, and thus the same bioactive regions are detected. This shows that the results achieved from the SR 7222

Analytical Chemistry, Vol. 81, No. 17, September 1, 2009

plot are not critically dependent on the number of PLS components used for TP modeling. The SR plot also turned out to be robust toward the inclusion of the two outlying extracts in the training set even though this requires an additional PLS component. Determining the Right Selection. Now the crucial question arises: which of the three selections is the correct one? One simple check is to calculate the correlation between antioxidant activity and the chromatographic regions corresponding to the different candidates. The maximum correlation with antioxidant activity for each of the five regions in order of increasing retention time (provided in parentheses) is: 0.85 (12.5), 0.81 (13.3), 0.32 (14.9), 0.74 (15.6), and 0.28 (22.8). It is pretty clear from these numbers that the SR plot provides the regions with strongest connection to antioxidant activity. Another way to validate this result is to incorporate all five regions in a PLS modeling of bioactivity. 91 chromatographic retention times are necessary for describing the five regions. For the training set, five PLS components are significant with SECV equal to 101 and explained variance in chromatographic regions and antioxidant activity equal to 96.5% and 85.3%, respectively. The regression coefficients, the TP loadings, and the SR plot are shown in Figure 3. All five regions appear important from the regression coefficients. On the target-projected component, the region around retention time 22.8 min disappears, and for the SR plot the regions around 14.9 and 22.8 min both vanish. The component eluting around 22.8 min is Genistin which has almost zero correlation to total antioxidant capacity. Furthermore, the region around 14.9 min corresponds to Puerarin which also has rather low correlation to total antioxidant capacity. Thus, we can conclude that the SR plot provides the regions that have the strongest correlation to antioxidant activity. Selection based on the size of the regression

Figure 4. Selectivity ratio plot with boundaries determined from an F-test.

coefficients or the target projection loadings incorporates regions that are almost orthogonal to antioxidant activity. The reason for this outcome is that both PLS and TP modeling are basically covariance-based methods, i.e., maximizing the covariance between chromatographic fingerprints and the response of antioxidant activity. This fact implies that variables with large variance, but relative small correlation with antioxidant, may have a deteriorative effect on the possibility of using the size of regression coefficients and/or loadings as selection criterion for revealing the chromatographic regions of components mainly responsible for the bioactivity. Figure 3 nicely illustrates that regression coefficients and loadings are not always useful for interpretation and variable selection. Variable Selection from the SR plot. An interesting test on the validity of our approach is to use the selectivity ratio for

Figure 6. SR plot from nonstandardized (red) and standardized (blue) chromatographic variables.

selecting the most significant variables with respect to total antioxidant activity. In previous work,20 we developed an F-test that compares explained to residual variance used to calculate the selectivity ratio (see eq 7). The critical value for this F-test converges toward 1 for a large number of samples. For the present application with 50 samples in the training set, the limit is 1.6 at the 95% probability level. Figure 4 shows the SR plot with this boundary. Note that variables that reduce antioxidant activity (i.e., negative regression coefficients) are plotted with negative value in the SR plot. Figure 4 reveals that two of the three regions we already discussed have selectivity ratios above the limit defined by the F-test, while the third one is just on the limit. If we use a slightly lower limit of approximately 1.4 for variable selection, corresponding to approximately p ) 0.1, all three regions are well represented. The selected retention time intervals are 12.49-12.59,

Figure 5. Regression coefficients for the PLS model calculated from full chromatographic profiles of the training set with all chromatographic variables standardized to unit variance, loadings on target-projected component determined from the PLS model, and SR plot from the TP model. Analytical Chemistry, Vol. 81, No. 17, September 1, 2009

7223

Figure 7. (a) Correlation between chromatographic intensity and antioxidant activity plotted as a function of retention time and (b) normal probability plot of correlation between chromatographic intensity and antioxidant activity.

13.21-13.33, and, 15.59-15.63 min. These 16 retention time points from the chromatographic profile provide a PLS model explaining 96.8% of the variance in the chromatographic profiles, 75.1% of the variance in total antioxidant activity, and an SECV of 116 for the training set. For the independent validation samples, the explained variance in total antioxidant activity was 82.9%, and the SEP was 111. Thus, this selection, representing only 1.5% of the original chromatographic profile, is able to provide a model with decent prediction ability and explanatory power proving that we have mostly removed chromatographic variation orthogonal to the total antioxidant activity. Having said that, the loss of predictive ability from the full to the reduced model shows that retention times with SR less than the critical value from the F-test also contribute to the total antioxidant activity. Analysis of synthetic mixtures with components with antioxidant activity differing by an order of magnitude or more may provide clues to define the SR boundary from a combination of biological, chemical, and statistical knowledge. Correlation-Based Selection. One way to avoid the “curse” of covariance on modeling and interpretation is to estimate the PLS model by maximizing the correlation between the antioxidant activity and the chromatographic fingerprint. Thus, we redid the analysis with all chromatographic variables standardized to unit variance. Three PLS components were found to be optimal for predicting the antioxidant activity from the standardized chromatographic variables. The model explained 22.6 and 92.2% of the variance in profiles and antioxidant activity, respectively, with SECV equal to 127. The small variance explained in the chromatographic variables is a result of many noisy variables that have gained 7224

Analytical Chemistry, Vol. 81, No. 17, September 1, 2009

increased weight in the modeling through the standardization of variance. The regression coefficients, TP loadings, and SR plot are shown in Figure 5. Both the regression coefficients and the TP loadings are very messy, and no selection is evident. On the other hand, the SR plot provides the same selection as with nonstandardized chromatographic variables proving the robustness of this approach for selecting the bioactive regions. This is further visualized in Figure 6. Thus, the SR plot filters out the noise introduced through the standardization of the chromatographic variables. Actually, previous considerations based on the theoretical aspect of TP have shown that the TP approach has the ability to enhance regions with small intensity but strong correlation to the response through their correlation to the more intense regions that are correlated to the response.19,20 Therefore, there is no need for standardization of the chromatographic variables to enhance components with low concentration but high correlation to the response. Correlation vs Selectivity Ratio. PLS modeling, which is used as a first step in our approach, is essentially a covariance-based technique. This means that variables with low correlation but high covariance with the response variable (total antioxidant activity) caused by large intensity tend to dominate over variables with high correlation to the response, but small intensity. We have seen in the paragraph above that standardization of variables to provide a correlation-based PLS model does not solve the problem since the model becomes more prone to noise and orthogonal variation. Target projection followed by SR plot cures this problem, but would it be possible to come to the same model and thus the same components using the correlation coefficients directly for variable selection? Figure 7a displays the correlation coefficient between chromatographic intensity and total antioxidant activity as a function of retention time. The three regions with largest correlation to total antioxidant activity are the same as those found by using the TP and SR plot. However, the regions do not stand out as in the SR plot. Rather they represent the end points on a continuum of correlation coefficients. This is further illustrated by observing the linear trend in the normal probability plot (Figure 7b). This problem will become even more pronounced with increasingly complex mixtures and thus chromatograms. Pooled correlation coefficients from the validated PLS model may constitute an alternative to the SR plot for reliable variable selection, but this needs a thorough investigation before a conclusion can be reached. CONCLUSIONS The total bioactivity of a HM is intimately connected to its chromatographic fingerprint. As long as the contributions to the total bioactivity from the molecular components are approximately additive, prediction of bioactivity for complex mixtures from chromatographic fingerprints is a relatively simple task. Unveiling the chromatographic regions with components responsible for the bioactivity represents essentially a variable selection problem and is more demanding. This work, however, represents a proof of concept with respect to the ability of the SR plot to relate bioactivity to specific regions in the chromatographic fingerprint. This observation has two important consequences: First, with the knowledge of where to look for bioactive components, the investigator can concentrate his effort on the components eluting

in the most promising chromatographic regions in the crucial matter of identifying the bioactive components. Second, the quality control of HM can be greatly improved. In addition, to provide authentication by inclusion of all the main active and inactive components, the chromatographic fingerprint can, with our novel approach, be used to obtain quantitative information about the most bioactive components in a class of HMs. Multivariate modeling techniques and the derived selectivity ratio plots act as the bridge to connect the chromatographic profile to bioactivity at the molecular level for HM. The introduction of our new technology, which is built on quantitative pattern-activity relationships and targeted multivariate projections and selectivity ratio plots, may not only represent an improvement in quality assessment of HM but also shows a path to discovery of drug leads from HMs. Although the pharmacological activity of a HM is caused by all the bioactive components, our approach makes it possible to select the most bioactive components as markers providing a quality measure that can be related to the therapeutic efficacy of a class of HMs. The patent related to QPAR has been processed by the United States Patent and Trademark Office.36

ACKNOWLEDGMENT We would like to thank Dr. Sibao Chen at State Key Laboratory of Chinese Medicine and Molecular Pharmacology and Prof. Iris F.F. Benzie and Dr. Yim Tong Szeto at the Hong Kong Polytechnic University for their help in collecting and authenticating samples and carrying out the FRAP analyses. Thanks are also due to the University Grant Committee for financial support to the Area of Excellence (AoE) Project Chinese Medicine Research and Further Development. Pattern Recognition Systems AS, Bergen, Norway is thanked for providing access to a prerelease of version 8.0 with many new features utilized in this work.

(36) Chau, F. F.; Xu, C. J. Method for Finding Active Ingredients from Chemical and Biological System. U.S. Patent Application 11/902,536, September 24, 2007.

Received for review April 6, 2009. Accepted July 16, 2009.

NOTE ADDED AFTER ASAP PUBLICATION This manuscript originally posted ASAP on July 27, 2009. An additional statement in the Conclusions and a reference were added, and the corrected manuscript posted ASAP July 31, 2009.

AC900731Z

Analytical Chemistry, Vol. 81, No. 17, September 1, 2009

7225