ARTICLE pubs.acs.org/ac
Calibration of Multiplexed Fiber-Optic Spectroscopy Zeng-Ping Chen,*,† Li-Jing Zhong,† Alison Nordon,‡ David Littlejohn,‡ Megan Holden,‡ Mariana Fazenda,§ Linda Harvey,§ Brian McNeil,§ Jim Faulkner,|| and Julian Morris^ †
)
State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, Hunan, P.R. China ‡ WestCHEM/Centre for Process Analytics and Control Technology, Department of Pure and Applied Chemistry, and §Strathclyde Fermentation Centre, Institute of Pharmacy & Biomedical Science, University of Strathclyde, Glasgow G1 1XL, Scotland, U.K. BioPharCEDD, GlaxoSmithKline, Beckenham, BR3 3BS Kent, U.K. ^ Centre for Process Analytics and Control Technology, School of Chemical Engineering and Advanced Materials, Newcastle University, NE1 7RU, Newcastle upon Tyne, U.K.
bS Supporting Information ABSTRACT: Large-scale commercial bioprocesses that manufacture biopharmaceutical products such as monoclonal antibodies generally involve multiple bioreactors operated in parallel. Spectra recorded during in situ monitoring of multiple bioreactors by multiplexed fiberoptic spectroscopies contain not only spectral information of the chemical constituents but also contributions resulting from differences in the optical properties of the probes. Spectra with variations induced by probe differences cannot be efficiently modeled by the commonly used multivariate linear calibration models or effectively removed by popular empirical preprocessing methods. In this study, for the first time, a calibration model is proposed for the analysis of complex spectral data sets arising from multiplexed probes. In the proposed calibration model, the spectral variations introduced by probe differences are explicitly modeled by introducing a multiplicative parameter for each optical probe, and then their detrimental effects are effectively mitigated through a “dual calibration” strategy. The performance of the proposed multiplex calibration model has been tested on two multiplexed spectral data sets (i.e., MIR data of ternary mixtures and NIR data of bioprocesses). Experimental results suggest that the proposed calibration model can effectively mitigate the detrimental effects of probe differences and hence provide much more accurate predictions than commonly used multivariate linear calibration models (such as PLS) with and without empirical data preprocessing methods such as orthogonal signal correction, standard normal variate, or multiplicative signal correction.
1. INTRODUCTION Over the past decade, the utilization of optical spectroscopies as in situ monitoring techniques within the bioprocessing industry has grown rapidly,1-8 due to their advantages over other analytical techniques such as the short measurement time, multiplicity of analysis, nondestructive methodology, and flexibility. The application of spectroscopic techniques to bioprocess monitoring generally involves a calibration procedure. Conventional calibration practices mainly focus on installing a single probe in a reactor vessel and collecting calibration data over a series of bioreactor experiments. However, large-scale commercial bioprocesses that manufacture biopharmaceutical products such as monoclonal antibodies generally involve multiple bioreactors operated in parallel and, hence, need multiplexed fiberoptic spectroscopic techniques. Spectra recorded by multiplexed fiber-optic spectroscopies contain not only spectral information of the chemical constituents but also the contributions arising from optical differences in the probes, which cannot be efficiently modeled by the commonly used multivariate linear calibration r 2011 American Chemical Society
models, such as partial least-squares (PLS),9 or effectively removed by popular empirical preprocessing methods, such as orthogonal signal correction (OSC),10 standard normal variate (SNV)11 and multiplicative signal correction (MSC).12 The present study seeks to investigate an optimal calibration solution for such complex spectral data sets arising from multiplexed probes.
2. THEORY 2.1. Multiplex Calibration Model (MCM). For I calibration mixture samples comprising J absorbing chemical components, where their spectra are recorded by the same fiber-optic probe, according to the Beer-Lambert law, the theoretical absorbance spectrum (xi, row vector) of sample i is a linear combination of Received: November 30, 2010 Accepted: February 17, 2011 Published: March 07, 2011 2655
dx.doi.org/10.1021/ac103145a | Anal. Chem. 2011, 83, 2655–2659
Analytical Chemistry
ARTICLE
the absorbance contributions of all J components:
Table 1. Mass Fractions (%) of the Three Constituents in Both the Training and Test Ternary Mixture Samples
J
xi ¼
∑ ci, j sj, j¼1
i ¼ 1, 2, :::, I
ð1Þ
mass fraction (%, w/w) sample no.
where ci,j is the concentration of the jth chemical component in the ith sample and the row vector sj denotes the pure absorption spectrum of the jth chemical component. By assuming sj (j = 1, 2, ..., J) are linearly independent, the multivariate linear calibration model built between xi and ci,j (i = 1, 2, ..., I) can provide satisfactory predictions for the concentration of the jth component in future sample mixtures. However, when the spectra of calibration samples are recorded by K different fiber-optic probes, the simple model in eq 1 is not applicable due to the potential differences in the response profiles of different fiber-optic probes. Despite the care taken during probe manufacture to ensure identical physical characteristic, it was found that potential probe optical differences contributed significant variation to spectral signals.13 For such calibration spectra, the following more advanced model is proposed to account for the spectral difference introduced by the differences in the probes: J
x i ðkÞ ¼ pi
i ¼ 1, 2, :::, I,
k ∈ ½1, 2, :::, K
acetone
ethanol
ethyl acetate
1
0
100.0
0
2
100.0
0
3 4
0 50.00
5
50.02
6
J
∑ ci, jsj þ bk ¼ j∑¼ 1 pi ci, jsj þ bk, j¼1
a
0 50.00
0 100.0 0
0
0
49.98
49.85
50.15
7
33.29
33.41
33.29
8
65.61
17.42
16.97
9
17.07
66.02
16.90
10
17.00
17.05
65.95
11 12
6.00 26.01
85.00 60.99
9.00 13.00
13
42.01
33.02
24.98
14
82.99
10.05
6.97
15
47.01
7.00
45.99
16
11.00
18.04
70.96
a
Mixture nos. 1 to 10 are training samples while mixtures nos. 11 to 16 are test samples.
ð2Þ
where xi(k) is the spectrum of the ith calibration sample recorded by the kth fiber-optic probe. The multiplicative parameter pi accounts for the possible difference in optical path length of the ith sample relative to a reference probe induced by the change of fiber-optic probe. bk represents the background profile of the kth probe. Because of the multiplicative effect of the model parameter pi, the relationship between the measured absorbance spectra xi(k) and ci,j (i = 1, 2, ..., I) does not follow the implicitly assumed linear model employed by the commonly used popular multivariate linear calibration methods such as PLS. Assume the jth constituent is the target chemical component and ∑Jj=1ci,j = 1 (which strictly holds for ci,j representing a unit-free concentration such as weight fraction and mole fraction) or the concentration of one constituent (or matrix substance) is approximately constant over mixture samples, it is easy to prove that a linear relationship exists between xi(k) and pi and also between xi(k) and pici,j. Therefore, the multiplicative parameters pi (i = 1, 2, ..., I) for calibration samples can be estimated from their spectra by the optical path length estimation and correction (OPLEC) method.14 For better performance, a modified OPLEC algorithm has been employed in this study (see Section 2.2). After the estimation of parameter vector p (p = [p1; p2; ...; pI]) of I calibration samples, two calibration models can be built by multivariate linear calibration methods such as PLS. The first model is between X (X = [x1(k); x2(k); ...; xI(k)]) and p, and the other is between X and diag(cj)p (diag(cj)p = [p1 c1,j; p2 c2,j; ...; pI cI,j]). Once the spectrum of a test sample has been recorded, the content of the target constituent in the test sample can then be obtained by dividing the predicted concentration from the second calibration model by the corresponding prediction obtained with the first calibration model. 2.2. Modified OPLEC Method for the Estimation of Multiplicative Effect Vector p. Assume the first component in eq 2 is the target constituent in the mixtures and ∑Jj=1ci,j = 1, then eq 2
can also be expressed as
J
x i ðkÞ ¼ pi ci, 1 Δs1 þ pi s2 þ
∑ pi ci, jΔsj þ bk, j¼3
Δsj ¼ sj - s2
ð3Þ
Suppose the singular value decomposition of X (X = [x1(k); ...; xi(k); ...; xI(k)]) can be expressed as follows: 2 3 0 s 5½V s , V n T ¼ Us Σs V s T þ E ð4Þ X ¼ ½Us , Un 4 0 n
∑
∑
where E = Un∑nVnT; superscript “T” denotes the transpose; and subscripts “s” and “n” signify that the corresponding factors represent spectral information and noise, respectively. Suppose the actual number of spectroscopically active chemical components in the spectral data is r, then both Us and Vs consist of r columns. Define c1 = [ c1,1; ...; ci,1; ...; cI,1], diag(c1)p = [p1 c1,1; p2 c2,1; ...; pI cI,1], and according to eqs 3 and 4, the following equations hold: Us UTs p ¼ p Us UTs diagðc1 Þp
¼ diagðc1 Þp
ð5Þ ð6Þ
Equation 5 implies that p is a linear combination of the columns of Us. p ¼ Us a, where a ¼ ½a1 ; a1 ; :::; ar ð7Þ Inserting eq 7 into eq 6, one can obtain: Us UTs diagðc1 ÞUs a ¼ diagðc1 ÞUs a
ð8Þ
Since there is no requirement to know the absolute value of pi, pi can be assumed to be no less than unity. p ¼ Us a g 1 2656
ð9Þ
dx.doi.org/10.1021/ac103145a |Anal. Chem. 2011, 83, 2655–2659
Analytical Chemistry
ARTICLE
The vector a satisfying both eq 8 and eq 9 can be obtained by solving the following constrained linear least-squares problem (which can be realized by the lsqlin function in MATLAB): min a
1 2 jjðUs UTs diagðc1 ÞUs - diagðc1 ÞUs Þa jj2 , 2 such that
- Us a e - 1
ð10Þ
After the calculation of a, the multiplicative effect vector p can be simply estimated as p = Usa. The MATLAB code for the above modified OPLEC method is available in Supporting Information.
3. APPLICATIONS 3.1. Multiplexed MIR Data of Ternary Mixtures. Sixteen ternary mixture samples (10 training samples and 6 test samples) comprising acetone, ethanol, and ethyl acetate were prepared by weight according to a ternary mixture design (Table 1). MIR spectra of each sample were acquired using an ABB FTLA2000 FT-IR spectrometer (ABB, Canada), equipped with a DTGS detector and SiC source and two 12 mm diameter hastelloy immersion probes each with a diamond ATR crystal. The probes were coupled one at a time to the spectrometer via approximately 1.5 m of silver halide polycrystalline infrared fibres (Fiber Photonics, Livingston, U.K.). Each sample was analyzed three times by each of the two probes. Spectra were acquired using 51 scans with a resolution of 16 cm-1 over the range of 0-4000 cm-1 using GRAMS software (Thermo Scientific, U.K.). It took 51 s to acquire one spectrum. Absorbance spectra were calculated using a background of air, which was measured prior to analysis of the samples. The spectral region between 578.6 and 1805.2 cm-1 was selected for further analysis. 3.2. Multiplexed NIR Data of Bioprocesses. The process under study was a monoclonal antibody producing cell culture process using Chinese Hamster Ovary (CHO) cell lines. CHO cells were cultivated in Applikon BioConsole XL and BioConsole ADI 1035 (Applikon Biotechnology Ltd., Worcestershire, U.K.) bioreactors of working volume 5 and 12 L, respectively. The medium used for the bioreactors was generic propriety complex animal component free seed media. The set points of vital process variables were as follows: temperature 37 C, pH 6.95 ( 0.1, agitation rate 263 rpm, aeration rate of O2 of 0.1 vvm and CO2 of 0.05 vvm. The whole process fluid was analyzed in real time using an in situ Bruker Matrix FT-NIR spectrometer (Bruker Optics Ltd., Coventry, England, U.K.) with an internal multiplexer for up to six transflectance fiber-optic probes enabling simultaneously monitoring of six bioreactors. Spectra between 11 995 and 4497 cm-1 were collected with a resolution of 8 cm-1 at a rate of 1 spectrum every 77 s; 32 scans were accumulated for each spectrum, and a spectrum of air was used as a reference. For the detailed experimental setup, refer to ref 13. The key parameter considered in the analysis was the concentration of glucose in the fluid. Samples were collected twice daily over the entire time course of each bioreaction for determination of the glucose concentration in supernatant samples using a YSI 7100 MBS analyzer (Yellow Springs, Ohio, U.S.A.); all the reference assays were carried out in triplicate. For each sample, the average of the three replicate assay values was calculated and used in the subsequent data analysis. The spectra within the region between 10 456 and 5056 cm-1, recorded at the same time as sampling for the offline assays, were selected to form the spectral data sets employed in this study. The calibration and test sets were
Figure 1. MIR spectra of the ternary mixture sample no. 8 collected by the first (red solid line) and second (blue dash line) ATR probes.
composed of 98 and 18 spectra, respectively. The assay values of glucose in the calibration and test sets were in the ranges of 3.13-6.08 g L-1 and 4.26-5.89 g L-1, respectively. 3.3. Data Analysis. Multiplex calibration models (MCM) were built for the MIR data of the ternary solvent mixtures and the NIR data of bioprocess liquors, and their predictive performance was compared with that of PLS calibration models with and without data preprocessing methods such as SNV, MSC, and OSC. The optimal multiplex calibration models and PLS calibration models were determined by cross validation using calibration spectra. Please note that for the MIR data of the ternary mixtures, the replicate spectra of each sample were included as individual calibration samples, but they were left out together during cross validation cycles. The root-mean-square error of prediction (RMSEP) was used as the performance criterion to assess the performance of the MCM and PLS calibration models.
4. RESULTS AND DISCUSSIONS It was observed that there were significant differences between the MIR spectra of the same ternary mixture sample recorded by two fiber-optic ATR-MIR probes (Figure 1). Other than the obvious baseline shift, the change in probe further scaled the whole spectral measurement by a given factor. This is the reason that the PLS model built on the calibration set consisting of only the spectra of the training samples measured by probe 1 (denoted by PLS1 in Table 2) gave erroneous predictions for samples with spectra recorded by probe 2 (Table 2). The predictive performance of the PLS calibration model for samples measured by probe 2 can be significantly improved by adding the spectra of three randomly selected training samples (No. 1, No. 5, and No. 8) recorded by probe 2 into the calibration set (denoted by PLSglobal in Table 2). However, the RMSEP values of the PLSglobal model for samples measured by probe 2 are still quite large (Table 2). It suggests that the complex multiplicative effects on the spectra induced by the change in probe cannot be effectively modeled by commonly used multivariate linear calibration methods such as PLS. The capabilities of some popular empirical preprocessing methods such as OSC, SNV, and MSC in modeling the multiplexed MIR spectral data of the ternary solvent mixtures were also investigated (Table 2). The application of 2657
dx.doi.org/10.1021/ac103145a |Anal. Chem. 2011, 83, 2655–2659
Analytical Chemistry
ARTICLE
Table 2. RMSEP Values Obtained by Different Calibration Methods for Both the Training and Test Samples of the Ternary Solvent Mixtures with Their Spectra Being Measured by Different ATR-MIR Probes acetone (%, w/w) methods
ethanol (%, w/w)
ethyl acetate (%, w/w)
training
test
training
test
training
test
probe 1
1.0
1.4
0.2
0.6
0.7
0.8
probe 2
8.6
6.6
5.3
6.0
6.0
3.5
PLSglobalb
probe 1 probe 2
1.7 5.0
1.9 2.9
0.8 1.9
1.3 1.6
0.8 5.7
0.9 3.2
OSC-PLSglobalc
probe 1
1.4
2.1
0.8
1.0
0.8
0.9
probe 2
5.9
2.8
3.5
2.5
5.3
3.0
probe 1
2.5
2.3
0.7
2.2
1.8
2.5
probe 2
3.2
3.5
2.2
2.0
2.9
2.1
probe 1
0.9
1.7
1.6
1.8
1.5
2.2
probe 2
3.3
2.7
3.1
3.0
4.3
1.7
probe 1 probe 2
0.9 1.3
0.9 1.3
0.4 1.3
1.4 1.1
0.8 1.2
1.3 0.9
PLS1a
SNV-PLSglobalc MSC-PLSglobalc MCMd a
PLS1 denotes the PLS calibration model built on the spectra of the training samples recorded with ATR-MIR probe 1. b PLSglobal signifies the global PLS model built on the calibration set consisting of both the spectra of the training samples recorded with ATR-MIR probe 1 and the spectra of three randomly selected training samples (no.1, no.5, no.8) recorded with ATR-MIR probe 2. c OSC-PLSglobal, SNV-PLSglobal, and MSC-PLSglobal represent the PLSglobal calibration models built on the calibration spectra preprocessed by OSC, SNV, and MSC, respectively. d The MCM model was established on the same calibration spectra as the PLSglobal calibration model.
OSC deteriorated the quality of the predictions. Since MSC and SNV could eliminate part of the multiplicative effects, some improvement in the predictive accuracy of the calibration model was observed when the calibration spectra were preprocessed by SNV or MSC. However, the special case of spectral pretreatment with approaches such as MSC and SNV is certainly most effective only if the chemical variations between the spectra to be corrected and the reference spectrum (in the MSC case) is negligible or, alternatively, the method is applied to a spectral region with little chemical information. This is not the case for the multiplexed MIR data of ternary mixtures. Therefore, despite the application of MSC or SNV, there were still significant differences between the RMSEP values for samples analyzed by probe 1 and the corresponding values for samples analyzed by probe 2. In contrast, the proposed MCM method successfully modeled the multiplexed MIR spectral data of the ternary solvent mixtures (Table 2). The MCM calibration model built on the calibration spectra consisting of both the spectra of the training samples analyzed by probe 1 and the spectra of three training samples (no. 1, no. 5, and no. 8) analyzed by probe 2 gave concentration predictions with quite similar values of RMSEP for samples analyzed by both probes. More convincingly, the RMSEP values for acetone and ethyl acetate obtained by MCM for samples measured by probe 2 are two-to-fourfold better than the corresponding values achieved by PLSgobal, which demonstrates the capability of MCM in modeling multiplexed spectral data. Interestingly, the RMSEP values obtained by MCM for both ethanol and ethyl acetate in the test samples with their spectra being measured by probe 2 were slightly lower than the corresponding values where the spectra were obtained by probe 1. One possible explanation for this fact is that the presence of possible varying background interference in the spectra of the test samples recorded by the probe 1 had more or less affected the performance of MCM in correcting multiplicative effects. While probe 2 produced significantly higher spectral signals than probe 1 for the same samples, which could to some extent neutralize the influence of the possible varying background on
Figure 2. RMSEP values (g L-1) of glucose obtained by different methods for the test samples arising from multiplexed NIR probes in a CHO cell culture process for antibody manufacture.
the performance of MCM, better predictions were obtained for test samples with their spectra being measured by probe 2. It was demonstrated in a previous paper13 that the probe optical differences made significant contributions to the spectral variations in the multiplexed NIR spectral data sets arising from the CHO cell culture system. Preprocessing methods OSC, SNV, and MSC, which are commonly used to remove spectral variations uncorrelated with target constituents ,were employed in the present study with a view to enhance the predictive accuracy of the calibration models for the key analyte glucose (Figure 2). It was discovered that OSC did not improve the RMSEP value, and although MSC enhanced the quality of the predictions for the test samples to some extent, it caused the RMSEP value for the calibration samples to increase from 0.25 to 0.34. It seems that the spectral variations removed from the spectra of the calibration samples by MSC contain not only the contributions of the probe optical differences but also some valuable spectral information relating to glucose. In our opinion, this unintended removal of valuable information is one of the major drawbacks of many empirical preprocessing methods. The application of SNV also provided only marginal benefits in terms of the RMSEP value for the test samples. In contrast, the proposed MCM model outperformed all the other methods investigated in this study 2658
dx.doi.org/10.1021/ac103145a |Anal. Chem. 2011, 83, 2655–2659
Analytical Chemistry (Figure 2) achieving about a 54% reduction in the REMSEP values for the test samples. Although the improvement in predictive accuracy for the multiplexed NIR spectral data is not as great as for the multiplexed MIR spectral data of the ternary solvent mixtures, it is still a significant achievement in the context of multiplex calibration models, especially when considering the much higher complexity of the bioprocess liquors in comparison with the ternary solvent mixtures and the larger number of multiplexed data sets.
5. CONCLUSIONS The present study seeks to provide an optimal solution for the analysis of complex multiplexed spectra recorded by fiber-optic spectroscopies during in situ quantitative monitoring of chemical or biochemical processes. For the first time, a multiplex calibration model with a multiplicative parameter introduced to account for the spectral contributions arising from optical differences in the probes has been developed. Our findings indicate that the proposed multiplex calibration model can effectively mitigate the detrimental effects of probe differences and hence provide much more accurate predictions than commonly used multivariate linear calibration models (such as PLS) built using spectra with or without preprocessing by OSC, SNV, and MSC. This work is a timely and innovative response to the ever increasing need for improved accuracy in in situ chemical or biochemical process monitoring.
ARTICLE
(5) Scarff, M.; Arnold, S. A.; Harvey, L. M.; McNeil, B. Crit. Rev. Biotechnol. 2006, 26, 17–39. (6) Suehara, K.; Yano, T. Adv. Biochem. Eng. Biotechnol. 2004, 90, 173–198. (7) Nordon, A.; Littlejohn, D.; Dann, A. S.; Jeffkins, P. A.; Richardson, M. D.; Stimpson, S. L. Analyst 2008, 133, 660–666. (8) Kondepati, V. R.; Heise, H. M. Curr. Trends Biotechnol. Pharm. 2008, 2 (1), 117–132. (9) Martens,H.; Martens, M. Multivariate Analysis of Quality: An Introduction; John Wiley and Sons: Chichester, 2001. (10) Sjoblom, J.; Svensson, O.; Josefson, M.; Kullberg, H.; Wold, S. Chemom. Intell. Lab. Syst. 1998, 44, 229–244. (11) Barnes, R. J.; Dhanoa, M. S.; Lister, S. J. Appl. Spectrosc. 1989, 43, 772–777. (12) Geladi, P.; McDougall, D.; Martens, H. Appl. Spectrosc. 1985, 39, 491–500. (13) Roychoudhury, P.; O’Kennedy, R.; McNeil, B.; Harvey, L. M. Anal. Chim. Acta 2007, 590, 110–117. (14) Chen, Z. P.; Morris, J.; Martin, E. Anal. Chem. 2006, 78, 7674–7681.
’ ASSOCIATED CONTENT
bS
Supporting Information. MATLAB code for the modified OPLEC method. This material is available free of charge via the Internet at http://pubs.acs.org.
’ AUTHOR INFORMATION Corresponding Author
*E-mail:
[email protected]. Fax: þ86 (0) 73188821916.
’ ACKNOWLEDGMENT The authors thank National Natural Science Foundation of China (Grant 21075034), the Scientific Research Foundation for the Returned Overseas Scholars (Ministry of Education of China), “973” National Key Basic Research Program of China (Grant 2007CB310500), the Fundamental Research Funds for the Central Universities of China, and CPACT for financial support. A.N. acknowledges the award of a University Research Fellowship by the Royal Society, U.K. M.F., L.H., and B.M. acknowledge the support of GSK and the TSB Technology Programme of GSK. ’ REFERENCES (1) Petersen, N.; Odman, P.; Padrell, A. E.; Stocks, S.; Lantz, A. E.; Gernaey, K. V. Biotechnol. Prog. 2010, 26, 263–271. (2) T€unnemann, R.; Mehlmann, M.; S€ussmuth, R. D.; B€uhler, B.; Pelzer, S.; Wohlleben, W.; Fiedler, H. P.; Wiesm€uller, K. H.; Gauglitz, G.; Jung, G. Anal. Chem. 2001, 73, 4313–4343. (3) Jarute, G.; Kainz, A.; Schroll, G.; Baena, J. R.; Lendl, B. Anal. Chem. 2004, 76, 6353–6358. (4) De Gelder, J.; Willemse-Erix, D.; Scholtes, M. J.; Sanchez, J. I.; Maquelin, K.; Vandenabeele, P.; De Boever, P.; Puppels, G. J.; Moens, L.; De Vos, P. Anal. Chem. 2008, 80, 2155–2160. 2659
dx.doi.org/10.1021/ac103145a |Anal. Chem. 2011, 83, 2655–2659