Article pubs.acs.org/IECR
Optimization of Membrane Separation Processes for Protein Fractionation Arjun Singh Bhadouria,†,§ Mirco Sorci,†,§ Minghao Gu,†,§ Georges Belfort,†,§ and Juergen Hahn*,†,‡,§ †
Department of Chemical and Biological Engineering, ‡Department of Biomedical Engineering, and §Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute Troy, New York 12180, United States ABSTRACT: Optimization of the operating parameters for membrane separation of proteins is usually performed in industry by varying one factor at a time. While this procedure cannot ensure that an optimum is found, it is generally too time-consuming and costly to determine optimal operating conditions by simultaneously varying all factors. This paper presents a procedure that can be generically used by, for example, the biotechnology industry, for determining close to optimal operating conditions for cases where relatively little data are available. The procedure fits a polynomial regression model to the operating parameters and performs a multidimensional optimization of the empirical model with regard to some important operating parameter, for example, a combination of selectivity and purity. The technique is illustrated with a membrane ultrafiltration process for the separation of hen egg lysozyme from ribonuclease A and in a second case study for separating hemoglobin from bovine serum albumin.
1. INTRODUCTION Separation and recovery of proteins are important processes in biotechnology. Various techniques are used for protein separation including chromatography, electrophoresis, and membrane ultrafiltration (UF), among others. Limitations of chromatography or electrophoresis include in some cases dilution of product, difficulty with high titer (i.e., highly viscous) solutions, scale up, and the cost of the instrumentation. At the same time, membrane ultrafiltration has received a significant amount of attention because of its scalability and economics. In many recovery unit process lines, both chromatography and membrane filtration are integrated together for effective protein purification.1 Membrane-based unit operations have been traditionally viewed as size-based separation processes where the size of the solutes differs by at least an order of magnitude.2 However, factors other than molecular sieving can also affect the performance of ultrafiltration. For example, concentration polarization3 and electrostatic interactions between proteins and proteins and membranes4 have been investigated for the separation of proteins of similar size. The effect of pH on fractionation of proteins of similar size has been investigated by van Eijndhoven et al.5,6 Performing an optimization of factors affecting UF can be very beneficial due to the strong dependence of UF performance on these factors. There have been advances in the prediction of the rate of UF of charged colloidal dispersions.7,8 These models were extended to predict the rate of UF of proteins.9 However, the challenge for these approaches is that they use models involving particle−particle interactions and resolving the resulting complex expressions requires significant computational effort. An alternative approach to optimization involving a fundamental model description of the process is to use experimentally generated data to predict membrane UF performance. Bowen et al. have developed a neural network approach to predict the rate of ultrafiltration of proteins.10 © 2013 American Chemical Society
However, neural network models tend to include a large number of parameters that require a significant amount of experimental data for estimation. Lin et al. have used response surface methods11 to examine the role of different factors affecting UF performance;12 however, their work does not use the model to optimize the operating conditions. This paper presents an empirical approach for optimizing operating parameters of UF. A polynomial model is fitted to experimental data and optimal operating conditions are determined by optimizing the model predictions. Special attention is paid to the order of the polynomial because the order is affected by the amount of available data but also by trends in the data. Furthermore, the approach can compute operating conditions for different performance measures. The outline of the paper is as follows. Section 2 presents the experimental design based upon regression models for the operating conditions that can be varied for UF. This section includes a discussion of the choice of model order as well as selection of different regression models for optimization of different performance measures. Section 3 presents two case studies involving experimental data where an optimization of the operating conditions is performed. The first case study involves separation of ribonuclease A (RNase A) from hen egg lysozyme (Lys) and involves available data, while the second case study involves the process described by Lin et al.12 for separation of hemoglobin (Hb) from bovine serum albumin (BSA). Conclusions are presented in section 4. Special Issue: David Himmelblau and Gary Powers Memorial Received: Revised: Accepted: Published: 5103
April 24, 2013 August 30, 2013 September 6, 2013 September 6, 2013 dx.doi.org/10.1021/ie401303d | Ind. Eng. Chem. Res. 2014, 53, 5103−5109
Industrial & Engineering Chemistry Research
Article
2.1.1. Optimal Operating Conditions. A necessary condition for an optimum of Ŷ is that all the derivatives of eq 1 with respect to the individual factors are zero:
2. OPTIMAL EXPERIMENTAL DESIGN Experiments are traditionally performed by individually varying each factor affecting the experiment while maintaining other factors at their constant values. Sometimes several factors are varied simultaneously to determine interactions of the factors; however, this is only done after the effect of all individual factors has been determined first. Also, often this step is neglected due to the large number of experiments that are needed to investigate all interactions. Instead, when an optimum operating condition is found with respect to a factor being varied, this factor is set to either its nominal value or its optimal value for the rest of the experiment, and the next factor is varied until its optimal value is also determined. This procedure is repeated until the optimal values for all individual factors are found. The best operating condition is considered to be the combination of the individual optimal values for all factors. This type of approach works well if a system behaves linearly over the investigated operating conditions because the superposition principle ensures that the overall optimum is given by the optimal values for each individual factor. However, the superposition principle does not hold for systems with significant nonlinear behavior because interactions among the factors need to also be taken into account. Processes such as UF contain non-negligible nonlinearities, which require that optimal experimental design account for interactions among the factors. One approach to deal with nonlinearities among the factors is factorial design,13,14 which investigates interactions by varying combinations of the parameters. Unfortunately, factorial design requires a large number of experiments and is often impractical for investigating all combinations of values of different factors, because the number of required experiments increases exponentially with the number of factors to vary. The next subsection will present a different approach based upon regression using polynomial models. This approach has the advantage that it can deal with nonlinearities while the experimental requirements are kept at a reasonable level. 2.1. Polynomial Regression. Polynomial regression is a linear regression method where the output, Y, is modeled as an nth order polynomial of the independent factors, xi.The regression is linear with respect to the parameters to be estimated. The objective used for estimating the model parameters is to minimize the root-mean-square modeling error ε = (∑l(Yl − Ŷ(x1,x2,...,xp)l)2)1/2 where Yl are the measured responses at each point l and Ŷ (x1,x2,...,xp)l are the estimated responses predicted by the model. A general polynomial model involving p independent factors is given by p
Y ̂ (x1 , x 2 , ..., xp) = a0 +
p
+
k
⎡ ∂ 2Y ̂ ∂ 2Y ̂ ⎤ ⎢ ⎥ ··· 2 ∂x1∂xp ⎥ ⎢ ∂x1 ⎢ ⎥ ⋱ ⋮ ⎥ ⎢ ⋮ ⎢ 2 ̂ 2 ̂ ⎥ ⎢ ∂ Y ··· ∂ Y ⎥ ⎢⎣ ∂xp∂x1 ∂xp 2 ⎥⎦
(3)
2.2. Experimental Design for Optimizing UF Performance. UF is a membrane-based filtration process used generally for separation or purification of macromolecular solutions especially protein solutions. It replaces other methods such as size exclusion chromatography because of the lower costs, lower dilution of product, and easier scalability. The mode of operation for the feed to the membrane unit is usually dead ended or cross-flow, and the separation is achieved using a pressure gradient across the membrane (Figure 1).
Figure 1. Schematic for an ultrafiltration process.
j
A variety of different factors affect the performance of UF. Some of these can be varied over the course of an experiment, and they include (1) ionic strength of solution (x1), (2) pH of the solution (x2), (3) transmembrane pressure (x3), and (4) stirring speed (x4). Thus there are two solution (x1, x2) and two operating parameters (x3, x4). There are several additional factors that also affect UF performance, for example, (5) type of monomer grafted to modify the membrane, (6) membrane charge type (e.g., negative or positive), (7) grafted monomer density, and (8) initial protein concentration. These factors cannot be modified during an experiment and are varied prior to conducting an
j=1 i=1 j
∑ ∑ ∑ aijkxixjxk + ... k=1 j=1 i=1
(2)
Equation 2 results in p equations that can be solved for the p factors, xi. A maximum of Ŷ exists if the Hessian matrix given by eq 3 is negative definite, that is, all of the eigenvalues are negative, in addition to the conditions stated in eq 2.
∑ aixi + ∑ ∑ aijxixj i=1
p
∂Y ̂ ∂Y ̂ ∂Y ̂ ∂Y ̂ = = = ··· = =0 ∂x1 ∂x 2 ∂x3 ∂xp
(1)
where the order of interactions should be decided based on the available data using cross validation to avoid overfitting. The parameters a0, ai, aij, and aijk from the polynomial from eq 1 are determined by minimizing the error ε between the model predictions and the data. 5104
dx.doi.org/10.1021/ie401303d | Ind. Eng. Chem. Res. 2014, 53, 5103−5109
Industrial & Engineering Chemistry Research
Article
measures that are commonly used, depending upon how the separation process is used. Common measures in biotechnology are
experiment. Hence, they are not part of the experimental design procedure investigated in this work. Given that there are four factors that can be modified during an experiment, the regression model needs to include four inputs. The only item left to decide for the regression model is the order of the model. The order should be determined by fitting the regression model to available experimental data and deciding which order provides a good fit. The more terms are included in the model, the better the fit will be. However, this comes at the cost of estimating more parameters and can lead to overfitting of the model. For example, if a linear regression model is sufficient, then only p + 1 parameters need to be estimated, of which only the p coefficients of the individual factors play a role for the optimization. However, the drawback of using linear regression models is that the optimal conditions will always lie at one of the boundaries of the operating conditions. Because this can be a limiting factor for optimizing operating conditions for UF, a nonlinear regression model should be used instead. One of the simplest nonlinear models is given by truncation of the cubic and higher order terms from eq 1:
1. Selectivity ψ=
P=
a13
2a 22 a 23 a 23
2a33
a 24
a34
⎡ ⎤ a14 ⎤⎢ x1*⎥ ⎡ a1 ⎤ ⎥⎢ ⎥ ⎢a ⎥ a 24 ⎥ x 2* ⎢ 2⎥ ⎢ ⎥ = − ⎥ ⎢ a3 ⎥ a34 ⎥⎢ x3*⎥ ⎢a ⎥ ⎢ ⎥ ⎣ 4⎦ 2a44 ⎥⎦⎢ x* ⎥ ⎣ 4⎦
a12
a13
2a 22 a 23 a 23 2a33 a 24
a34
a14 ⎤ ⎥ a 24 ⎥ ⎥ a34 ⎥ 2a44 ⎥⎦
(10)
4. Flux
F=
V At
(11)
5. Rejection
R=1−
(4)
Cp Cf
(12)
where So,i = Cp,i/Cf,i, is the sieving coefficient of protein i, mi is the mass of protein i in the permeate solution, Cp and Cf are protein concentrations in permeate and feed, respectively, V is the cumulative volume of feed through the membrane, A is the cross sectional area of the membrane, and t is the time elapsed during the filtration process. These performance measures are modeled as functions of the process variables ionic strength (x1), pH (x2), transmembrane pressure (x3), and stirring speed (x4). It should be noted that optimal operating conditions for one performance measure will likely be suboptimal for another measure. Accordingly, a trade-off between the optima for different performance measures is commonly required. 2.4. Model Order Selection via Cross-Validation. Model order determination is an important step in data fitting. While using a higher order model will give better fits, it may reduce the confidence in the parameter estimates, and the model will have a poorer predictive performance if the model order is too high and too many parameters are estimated. In order to avoid overfitting, techniques such as crossvalidation,15,16 bootstrapping,17,18 Cp statistic,19 AIC,20 or Bayesian methods have been developed. From these methods, cross-validation is used in this work to obtain the model order. Cross-validation was initially used to evaluate the predictive validity of linear regression equations that were used to forecast a performance criterion.21,22 Cross-validation draws two samples from the same data set that is used for model fitting. The first sample is used to fit the data and is known as the calibration sample, while the second sample, called the validation sample, is used to evaluate the predictive capability of the model fit. Given a data set of N data points, the number of points used for calibration are NC and for validation are NV = N − NC. The prediction error is calculated as a mean square error between the predicted response (as listed above in eq 1) and the experimental value. Cross-validation involves the following
(5)
(6)
The system given by eq 6 can be analytically solved provided the parameters have been identified. Additionally, the Hessian matrix ⎡ 2a11 ⎢ ⎢ a12 ⎢ ⎢ a13 ⎢⎣ a 14
(9)
while water treatment usually involves the measures
The conditions shown in eq 5 result in four linear equations, which have to be solved simultaneously: a12
100mL (mL + mR )
⎛ Cp ⎞ υ = 100⎜ ⎟% ⎝ Cf ⎠
where the parameters ai and aij need to be estimated from available data. Equation 4 contains 15 parameters that need to be estimated. This requires a relatively rich data set and sometimes more simplifying assumptions are made to align the model complexity with experimental data requirements. 2.2.1. Optimal Operating Conditions. To determine the optimal operating conditions for the regression model given by eq 4, the partial differentials of Ŷ with respect to the variables x1, x2, x3, and x4 are computed and set to zero.
⎡ 2a11 ⎢ ⎢ a12 ⎢ ⎢ a13 ⎢⎣ a 14
(8)
3. Yield
+ a13x1x3 + a14x1x4 + a 22x 2 2 + a 23x 2x3 + a 24x 2x4
∂Y ̂ ∂Y ̂ ∂Y ̂ ∂Y ̂ = = = =0 ∂x1 ∂x 2 ∂x3 ∂x4
So,R
2. Purity
Y ̂ = a0 + a1x1 + a 2x 2 + a3x3 + a4x4 + a11x12 + a12x1x 2 + a33x32 + a34x3x4 + a44x4 2
So,L
(7)
is required to be negative definite. 2.3. Performance Measures for UF. The predicted variable Ŷ from eq 4 refers to a performance measure that should be maximized for UF. There are several performance 5105
dx.doi.org/10.1021/ie401303d | Ind. Eng. Chem. Res. 2014, 53, 5103−5109
Industrial & Engineering Chemistry Research
Article
such, pH values near 11 were used for collecting data for building the regression model. In the next part of the experiment, the protein mixture was filtered at pH near 11 (10.63 to 10.74) and at a conductivity gradient of 22.2−5.4 mS/cm, data shown in Figure 3. Based on the experiments, it can be seen that maximum selectivity at one pH value is achieved for conductivity ranging from 6 to 10 mS/ cm or 60 to 100 mM NaCl.
steps: (1) split the data into calibration and validation samples; (2) fit the model to the calibration part of the data and calculate the prediction error of the fitted model using the validation part; (3) Repeat steps 1 and 2 for different selections of NV from the data set (CNvNdifferent selections); (4) the crossvalidation estimate of the test error is the sample average of the different cross-validations obtained for each validation sample selection. The model is selected by minimizing the cross-validation for different model orders. For a limited size data set, as is common for most ultrafiltration experiments, a “leave one out” crossvalidation procedure can be used where NV = 1 and NC (=N − 1) data points are used to fit the model. This is done for N(C1N) different sets of calibration and validation samples. The model order is selected to be the one that results in the smallest averaged cross-validation estimate of the test error.
3. CASE STUDIES The method described above is applied in two case studies. One case involves an available data set from our group, while the other case makes use of data taken from the literature. It is the objective for each case study to determine close to optimal operating conditions for UF. 3.1. UF of Protein Mixture of RNase A and Lys. 3.1.1. Experimental Procedure. Data from UF of a binary protein mixture of ribonuclease A (RNase A, MW 13.7 kDa, pI 9.5) and hen egg lysozyme (Lys, MW 14.3 kDa, pI 11) were collected.23 The filtration experiments were carried out varying one factor at a time with a modified polyethersulfone (PES) membrane using UV induced graft polymerization. The monomer grafted on the membrane was 3-sulfopropyl methacrylate potassium (SPMP) which was negatively charged at all operating pH values to make use of electrostatic interactions.23 Fractionation of the protein mixture was carried out at pH 9.5 and pH 11, the isoelectric points of the individual proteins. Though Figure 2 shows that there is a larger
Figure 3. Models for ψ vs conductivity.
3.1.2. Polynomial Model Development. The polynomial regression models have two factors as inputs: pH and conductivity. The experiments were carried out using ultrafiltration equipment provided by Asahi Kasei Bioprocess Inc. in the pH range of 10.63 to 10.74. The model structure was chosen in such a way that only linear models were fit with respect to pH (x2) due to the limited number of different pH values used. Nonlinear relationships for ψ and P as functions of conductivity (x1) were developed, because a reasonable amount of conductivity data was available, including data collected over a large range of values. Both quadratic and cubic models were investigated to evaluate whether using a higher order model would improve the fit of the predictions to the data using a cross-validation approach. Because this was the case here (Figure 3), a cubic model with respect to conductivity was used. It should be noted that neither a quadratic nor a cubic model provides an excellent fit for the data; however, the quadratic model is unable to capture some of the behavior of the process that is known from experience whereas the fit of the cubic model does not have this limitation in this case. Because only one value was used for the transmembrane pressure (x3) throughout the data collection, this input was neglected for further analysis. Lastly, the experiments did not involve stirring. Using the described assumption of a linear relationship of selectivity (and purity) with respect to pH and a cubic relation with respect to conductivity, the model reduces to
Figure 2. Fractionation of individual proteins at a pH gradient23
ψ = a0 + a1x1 + a 2x 2 + a12x1x 2 + a11x12 + a112x12x 2
difference between individual sieving at pH 9.5 than at pH 11, at pH 9.5 RNase is neutral, while Lys is positively charged. Because of the opposite charges, there is heavy fouling of the membrane with Lys and there is very low transmission. Contrary to this, there is strong electrostatic repulsion of RNase A at pH 11, which is negatively charged at this pH, from the membrane (negative), which results in low adsorption and fouling, allowing high transmission of Lys (neutral) and a better separation than that at pH 9.5. Therefore the experiments show that pH 11 can be considered a good value for this factor. As
+ a111x13
(13)
which includes 7 parameters. Because only a linear model was fit with respect to pH (x2), there are no higher order terms for pH in the model. While there are three coefficients of terms involving pH (x2), these include interaction terms with conductivity, and these coefficients can be found even for data that only includes measurements taken at two different pH values. A model of the same structure, albeit with different 5106
dx.doi.org/10.1021/ie401303d | Ind. Eng. Chem. Res. 2014, 53, 5103−5109
Industrial & Engineering Chemistry Research
Article
parameter values, is developed for P. The parameters obtained for the individual models are given in Table 1, and the goodness of fit statistics are provided in Table 2.
The optimal pH is close to the experimental results and lies at one of the boundaries because the models include a linear relationship between pH and the performance measures. However, the optimum conductivity is quite different from any of the measured data points at 5.59 and 5.42 mS/cm for the two performance measures. This is an indication that the optimal operating point may be at none of the measured values; future experiments will have to validate these model predictions. 3.1.4. Trade-off between Performance Measures. None of the performance measures is optimal for all applications because each process has a unique set of requirements. Hence, different applications put different weights on each of the performance measures. For example, some processes focus on purity, while others deem a high selectivity to be more important. In practice, a combination of different criteria is usually used. The polynomial regression method presented in this work can account for this by combining the different performance measures, Ŷi, and providing a percentage weight, αi, to each measure:
Table 1. Model Parameters a0 a1 a2 a12 a11 a112 a111
selectivity, ψ
purity, P
−365.3 52.68 34.14 −4.871 −1.803 0.1639 0.001244
−3732 491.4 355.9 −45.63 −14.92 1.366 0.008003
Table 2. Goodness of Fit Statistics for the Models goodness of fit
selectivity, ψ
purity, P
SSE R2 adjusted R2 RMSE
0.01556 0.9783 0.9523 0.05579
0.8164 0.9828 0.9622 0.4041
Ŷ =
∑ αiYî i
3.1.3. Process Optimization. The two models given by eq 13 are optimized with respect to the inputs. The optimization was performed using the function “fmincon” in MATLAB. The optimization variables were restricted to the ranges over which data were collected (x1 from 5.4 to 22.2 mS/cm and x2 from 10.63 to 10.74). While the optimization was performed numerically, the optimal solution satisfies the conditions shown in eqs 2 and 3 because this is directly built into the MATLAB routines. The optimal conditions and the corresponding performance measures are given in Table 3. A plot of the selectivity as a function of the conductivity and pH is shown in Figure 4, which shows that the optimal value does not lie at the boundary of the operating regime.
with
∑ αi = 1 i
(14)
Equation 14 allows one to put a weight on a combination of performance measures, for example, weighing selectivity versus purity, and to derive a combined model involving these different weights. To illustrate this approach, a relationship for optimal conductivity with respect to different weights of selectivity is found. Optimal conductivity ranges from 5.42 to 5.59 mS/cm depending on the weights (Figure 5), while the optimal pH
Table 3. Optimal Operating Conditions and Corresponding Performance Measures ψ* = 2.3071 x*1 = 5.59
x*2 = 10.74
P* = 91.77 x*1 = 5.42
x*2 = 10.74
Figure 5. Variation of optimal conductivity for different models.
remains at 10.74 for all values of the weights. This shows that an optimum found for one performance measure might differ from an optimum for another performance measure. This approach can be extended to include different performance measures such as purity and yield or the cost of operation provided the economics of operation are known. 3.2. UF of Protein Mixture of Hb and BSA. Lin et al.12 have used the response surface method to study the effects of various process variables on the fractionation of hemoglobin (Hb) from bovine serum albumin (BSA) using UF. They used a second-order polynomial regression model to fit BSA rejection with respect to the initial protein concentration (x1), pH of the solution (x2), transmembrane pressure (x3), and stirring speed (x4). Data for estimating the model parameters were collected by varying the initial protein concentration from
Figure 4. Model for ψ.
5107
dx.doi.org/10.1021/ie401303d | Ind. Eng. Chem. Res. 2014, 53, 5103−5109
Industrial & Engineering Chemistry Research
Article
using a linear model is that the data requirements for estimating the parameters from eq 16 from experimental data are significantly lower than if all parameters of eq 15 are estimated from an experiment. The result of this is that one has the option of performing fewer experiments to obtain the same model prediction accuracy by using eq 16 instead of eq 15 or, alternatively, that one can estimate the smaller number of parameters with a higher confidence for the same data set.
100 to 500 ppm, the pH of the solution from 4 to 7.5, the transmembrane pressure from 10 to 50 psi, and the stirring speed from 100 to 300 rpm. The resulting model is given by R = a0 + a1x1 + a 2x 2 + a3x3 + a4x4 + a11x12 + a12x1x 2 + a13x1x3 + a14x1x4 + a 22x 2 2 + a 23x 2x3 + a 24x 2x4 + a33x32 + a34x3x4 + a44x4 2
(15)
where the values of the 15 parameters are provided in Table 4.
4. CONCLUSIONS Optimization of membrane UF operation is needed but commonly not performed in detail. The number of experiments to perform a rigorous optimization can be large, resulting in significant time and financial commitments. Optimal experimental design can be used to either reduce the number of experiments or maximize the information content from a predetermined number of experiments. This work focused on the latter aspect using polynomial regression models. The models include the factors that can be varied as inputs, and they predict certain performance measures of interest. Combinations of these performance measures can also be used. The resulting expression for the performance measures can be numerically optimized to determine operating conditions that maximize the performance measures. This type of approach is in stark contrast to commonly used linear approaches where one factor is varied at a time. Here with our nonlinear approach, we are able to account for nonlinearities and interactions among the factors. The polynomial regression method for UF was illustrated with two case studies. One study involved available experimental data and another study used a data-derived model from the literature. The first case study illustrated that the optimal operating conditions can be significantly different from the ones found by individually varying the parameters, that is, using a linear approach. The second case study illustrated that the model order should be chosen as low as possible, while still being able to fit the data well, if the main goal is to optimize a performance measure. Future work will focus on experimentally verifying that the computed operating conditions are indeed optimal and do not just represent an optimum for the model. This generic approach can be very useful for users that seek optimal operating conditions. We have shown for UF that users such as biotechnology companies producing proteins could benefit significantly by using our polynomial regression method to optimize performance and reduce costs.
Table 4. Coefficients of Model for Rejection of BSA Given by Equation 1512 parameter
value
a0 a1 a2 a3 a4 a11 a12 a13 a14 a22 a23 a24 a33 a34 a35
1.605 1.37 −1.31 2.04 −0.17 −0.64 0.013 −0.14 0.27 0.96 0.86 −0.11 −1.68 0.36 −0.52
While the data for this model and the parameters are provided in ref 12, no optimization of this model was performed. Optimizing R from eq 15 with respect to the four inputs using the MATLAB function “fmincon” returns the results shown in Table 5. Table 5. Optimal Operating Conditions for Model Given by Equation 15 x*1 = 100 ppm
x*2 = 7.5
x*3 = 10 psi
x*4 = 100 rpm
It can be seen from Table 5 that the optimal values for all four operating parameters are at the boundary of the operating conditions. This indicates that a simple linear model would have been sufficient to capture the characteristics of this system for optimization purposes. In fact, a linear model of the form R = a0̃ + a1̃ x1 + a 2̃ x 2 + a3̃ x3 + a4̃ x4 (16)
■
can be generated by estimating the parameter values shown in Table 6 by fitting eq 16 to data generated by the nonlinear model (eq 15). Equation 16 has the same optimal operating regime as the model given by eq 15. The one advantage of
Corresponding Author
*Phone: +1 (518) 276 2138. Fax: +1 (518) 276 3035. E-mail:
[email protected]. Notes
The authors declare no competing financial interest.
Table 6. Coefficients of Model for Rejection of BSA Given by Equation 16 parameter
value
ã0 ã1 ã2 ã3 ã4
1.7722 −0.8005 0.8005 −0.7820 −0.3575
AUTHOR INFORMATION
■
ACKNOWLEDGMENTS The authors A.S.B. and J.H. gratefully acknowledge partial financial support by the National Science Foundation (Grant CBET#0941313) and the American Chemical Society (ACSPRF No. 50978-ND9). M.S., M.G, and G.B. acknowledge financial support by the U.S. Department of Energy, DOE (Nos. DE-FG02-90ER14114 and DE-FG02-07ER46429), the 5108
dx.doi.org/10.1021/ie401303d | Ind. Eng. Chem. Res. 2014, 53, 5103−5109
Industrial & Engineering Chemistry Research
Article
National Science Foundation (Grant CTS-94-00610), and Asahi Kasei Bioprocess Inc., Glenview, IL.
■
REFERENCES
(1) Shukla, A. A.; Hubbard, B.; Tressel, T.; Guhan, S.; Low, D. Downstream processing of monoclonal antibodies − Application of platform approaches. J. Chromatogr., B: Anal. Technol. Biomed. Life Sci. 2007, 848 (1), 28−39. (2) Cherkasov, A. N.; Polotsky, A. E. The resolving power of ultrafiltration. J. Membr. Sci. 1996, 110, 79−82. (3) Porter, M. C. Concentration polarization with membrane ultrafiltration. Ind. Eng. Chem. Prod. Res. Dev. 1972, 11 (3), 234−248. (4) Mahsa, M. R.; Zydney, A. L. Role of electrostatic interactions during protein ultrafiltration. Adv. Colloid Interface Sci. 2010, 160, 40− 48. (5) van Eijndhoven, R. H. C. M.; Saksena, S.; Zydney, A. L. Protein fractionation using electrostatic interactions in membrane filtration. Biotechnol. Bioeng. 1995, 48, 406−414. (6) Saksena, S.; Zydney, A. L. Effect of solution pH and ionic strength on the separation of albumin from immunoglobins (IgG) by selective filtration. Biotechnol. Bioeng. 1994, 43, 960−968. (7) Bowen, W. R.; Jenner, F. Dynamic ultrafiltration model for charged colloidal dispersions: A Wigner-Seitz cell approach. Chem. Eng. Sci. 1995, 50, 1707−1736. (8) Bacchin, P.; Aimar, P.; Sanchez, V. Model for colloidal fouling of membranes. AIChE J. 1995, 41, 368−376. (9) Bowen, W. R.; Williams, P. M. Dynamic ultrafiltration model for proteins: A colloidal interaction approach. Biotechnol. Bioeng. 1996, 50, 125−135. (10) Bowen, W. R.; Jones, M. G.; Yousef, H. N. S. Dynamic ultrafiltration of proteins - A neural network approach. J. Membr. Sci. 1998, 146, 225−235. (11) Box, G. E. P.; Wilson, K. B. On the experimental attainment of optimum conditions. J. R. Stat. Soc. 1951, 13 (1), 1−45. (12) Lin, S. H.; Hung, C. L.; Juang, R. S. Effect of operating parameters on the separation of proteins in aqueous solutions by deadend ultrafiltration. Desalination 2008, 234, 116−125. (13) Yates, F. Design and analysis of factorial experiments, Technical Communication no. 35 of the Commonwealth Bureau of Soils, London, 1937. (14) Plackett, R. L.; Burman, J. P. The design of optimum multifactorial experiments. Biometrika 1946, 33, 305−325. (15) Picard, R.; Cook, D. Cross-validation of regression models. Journal of the American Statistical Association 1984, 79, 575−583. (16) Shao, J. Linear model selection by cross-validation. J. Am. Stat. Assoc. 1993, 88, 486−494. (17) Efron, B. Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Am. Stat. Assoc. 1983, 78, 316− 331. (18) Efron, B. How biased is the apparent error rate of a prediction rule? J. Am. Stat. Assoc. 1986, 81, 461−470. (19) Mallows, C. L. Some comments on Cp. Technometrics 1973, 15, 661−675. (20) Akaike, H. A new look at statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716−723. (21) Mosier, C. I. Symposium: The need and means of crossvalidation. I. Problems and designs of cross-validation. Educ. Psychol. Meas. 1951, 11, 5−11. (22) Browne, M. W. Cross-validation methods. J. Math. Psychol. 2000, 44, 108−132. (23) Sorci, M.; Gu, M.; Heldt, C. L.; Grafeld, E.; Belfort, G. A multidimensional approach for fractionating proteins using charged membranes. Biotechnol. Bioeng. 2013, 110, 1704−1713.
5109
dx.doi.org/10.1021/ie401303d | Ind. Eng. Chem. Res. 2014, 53, 5103−5109