Comparison of Two Robust Alternatives to the Box ... - ACS Publications

Nov 22, 2011 - established by Box and Draper19 via a Bayesian route and is usually known as the BoxАDraper determinant criterion. 2.2. MULTIVARIATE M...
0 downloads 0 Views 907KB Size
ARTICLE pubs.acs.org/IECR

Comparison of Two Robust Alternatives to the BoxDraper Determinant Criterion in Multiresponse Kinetic Parameter Estimation Eduardo L. T. Conceic-~ao* and Antonio A. T. G. Portugal CIEPQPF, Department of Chemical Engineering, University of Coimbra, Polo II, Rua Sílvio Lima, 3030-790 Coimbra, Portugal ABSTRACT: Generalized least-squares and maximum likelihood approaches for parameter estimation in multivariate response models have been prevalent in the chemical kinetics literature to date. In contrast, robust alternatives have received considerably less attention. These methods safeguard against possible deviations from the assumptions, such as the presence of outliers or nonnormality of the random errors. We compare, through Monte Carlo simulation, the performance of the classical BoxDraper determinant criterion (ML) to those of two robust estimators: the multivariate Huber’s M-estimator and the multivariate leasttrimmed squares estimator (MLTS). Although the results are not entirely conclusive, overall, we find no compelling evidence for preferring any one of the two robust methods over the conventional ML estimates. At the same time, it was unexpected to find that ML is still reasonable under mild outlier contamination and mild deviations from normality. This notwithstanding, one loses nothing by comparing ML together with MLTS to cross-check each other as a safety measure.

1. INTRODUCTION Nonlinear regression is extensively used for parameter estimation in chemical kinetics models. Often, many responses are measured simultaneously, usually in the form of concentration time data. When several responses are available, it is frequent that some of the associated models contain common parameters. For this reason, instead of using only the data for a particular response, the responses should be combined in a suitable way to obtain more efficient estimates of the parameters. It is often overlooked that classical regression methods such as generalized least-squares or maximum likelihood estimation are extremely sensitive to misspecifications of the assumptions that were made concerning the data. Since assumptions are, at best, only approximate in practice, seemingly negligible departures from the assumptions can result in a noticeable performance loss. Of course, although this view is widely held in the statistical literature, it should be checked, as realistically as possible, for specific fields such as the present one, in the context of practical problems. Consequently, robust methods have the objective of safeguarding against slight changes from the postulated assumptions. In essence, they take explicitly into account the principle that their underlying idealized models will only be approximately true. A very accessible overview on this topic can be found in Morgenthaler.1 Recognizing that it is difficult, if not impossible, to be able to cope with all possible deviations from the assumed model, most existing robust procedures focus on violations in the distributional assumption for the error process and in the form of data contaminations by outliers (atypical observations, by virtue of being at a large distance from the bulk of the data). [For references on other deviations from the assumptions, see M€uller (pp 187188).2] So far, not much work has been done in this direction for multivariate (or multiresponse) regression, although some robust estimators have been introduced recently in the linear model setting. As far as we know, it is an open question whether the r 2011 American Chemical Society

established asymptotic properties (what happens if the sample size n goes to infinity) of these estimators, such as consistency (convergence to the true value of the parameter as the sample size increases) and efficiency (mean squared error as small as possible), also hold in the context of nonlinear regression models. Moreover, asymptotic results can be a poor approximation for moderate to small sample sizes. Or, as stated by Le Cam and Yang3 (on p 175): “It must be pointed out that the asymptotics of the ‘standard iid case’ are of little relevance to practical use of statistics, in spite of their widespread study and use.” Indeed, what is needed are small-sample results, and this means that guidance via a computer simulation is important, as observed both by Huber (p 62)4 and M€uller (on p 184).2 A widely used measure of robustness is the breakdown point (BDP), which is, roughly speaking, the smallest proportion of bad observations, which leads an estimator to unreliable model parameter values. Therefore, the achievement of a reasonably high BDP represents a desirable goal for robust procedures. High asymptotic efficiency, relative to an underlying Gaussian (normal) error distribution, has also been favored in the literature (a more thorough discussion on these points can be found on pages 5 and 6 in Huber and Ronchetti5). So far, all of the estimators proposed in the literature are extensions to the multivariate case of univariate regression robust estimators. (For background on univariate methods for robust linear regression, the reader is directed to Chapters 4 and 5 of Maronna, Martin, and Yohai.6) M-estimation was considered first by Koenker and Portnoy,7 but suffers from two drawbacks: M-estimators do not safeguard against the presence of outliers in the independent or explanatory variables and have a relatively Received: March 16, 2011 Accepted: November 22, 2011 Revised: September 27, 2011 Published: November 22, 2011 1118

dx.doi.org/10.1021/ie2005324 | Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research

ARTICLE

low BDP in higher dimensions of the dependent or response variables. On the other hand, the multivariate least-trimmed squares (MLTS) investigated by Agullo et al.8 has a high BDP but suffers from low efficiency under Gaussian errors. A few methods have been proposed, combining both high BDP and efficiency for the multivariate linear regression model, such as S-estimators,9 τ-estimators,10 CM-estimators,11 generalized S-estimators,12 and, more recently, MM-estimators.13 Also, it is possible to improve the efficiency of the MLTS while maintaining the BDP by applying the classical least-squares estimator solely on the observations with squared Mahalanobis distance (based on the coefficients and error covariance matrix from an initial MLTS fit to the data) below an appropriate cutoff value.8 It is of interest to notice that, concerning the case of planned experiments, M-estimators can achieve better BDPs than for random designs, such as in observational studies.14 In fact, in “typical cases” of chemical kinetic studies, the values of the independent variables are given by an experimenter. As a consequence, it is unlikely for the independent variables to be corrupted by outliers. Furthermore, the number of responses is relatively small in many cases. Thus, in practice, the view that the use of M-estimators is unappealing may be unrealistically pessimistic, in the case of fixed explanatory variables. In the univariate setting, Huber (see Huber and Ronchetti (p 198)5) recommended comparing the estimates from the usual least-squares procedure with those of the M and one of the high BDP estimators, with the purpose of assessing discrepancies which merit closer scrutiny. Following this idea, a Monte Carlo simulation study was undertaken along the lines of that of Conceic-~ao and Portugal15 to gain some insight into the smallsample properties of the classical maximum likelihood (ML), M, and MLTS estimators in the context of kinetic modeling. More specifically, we investigate the effect of the fraction of outliers in the sample, outlying distance, and form of error distribution in terms of both bias and efficiency. Decision trees were used for better interpretation of the features of the large amount of data (a mixture of both numerical and categorical variables) generated for this examination. Because of their many desirable characteristics such as interpretability, ability to cope with irrelevant inputs, and capability of handling data of mixed type, they are suited particularly “for serving as an off-the-shelf procedure for data mining” (see Hastie, Tibshirani, and Friedman (pp 350352)16). The next five sections are organized as follows. Section 2 provides a description of the estimators, while section 3 does the same for the datasets that we examined. In section 4, we outline the basic features of the simulation studies, and results are presented and discussed in section 5. Finally, we make some final comments in section 6.

various measurement sources. The transpose is represented by the superscripted symbol “t”. In addition, we denote the residual values by rij(θ) = yij  fj(xi, θ). Hereafter, we shall assume that the vector of error terms (ei = (ei1, ...,eiq)t) possesses the following properties:

2. MULTIRESPONSE ESTIMATION

Then,

Eðei Þ ¼ 0

and ( Σ for i ¼ k Eðei etk Þ ¼ 0 for i 6¼ k

ð2Þ

where Σ is a fixed q  q covariance matrix. In this case, the errors have zero mean, measurements from different experiments are independent, but the errors for different responses on the same experiment can be correlated. Furthermore, all of the experimental runs have the same covariance matrix Σ. Let us further suppose that the vector of error terms ei are multivariate normal. Hence, the joint density function of all n responses (yi = (yi1, ...,yiq)t) is (

ð2πÞ

nq=2

ðdet ΣÞ

n=2

exp

1 n  ðy  f i ðθÞÞt Σ1 ðy i  f i ðθÞÞ 2 i¼1 i



)

(see pp 134139 in Bates and Watts17 and pp 536538 in Seber and Wild18), where fi(θ) = (f1(xi,θ), ...,fq(xi,θ))t and the determinant of Σ is written as det Σ. Taking logarithms, the loglikelihood function is given by n 1 n Lðθ, ΣÞ ¼ const  ln det Σ  ðy  f i ðθÞÞt Σ1 ðy i  f i ðθÞÞ 2 2 i¼1 i n 1 ð3Þ ¼ const  ln det Σ  tracefRðθÞΣ1 R t ðθÞg 2 2



where “const” is an unimportant constant and we form the n  p residual matrix R(θ) with the (i,j)th element rij(θ) (we write R = [(rij)]). Thus, the maximization of eq 3 is equivalent to minimizing   VðθÞ L1 ðθ, ΣÞ ¼ ln det Σ þ trace Σ1 ð4Þ n where V¼

n

ðy i  f i ðθÞÞðy i  f i ðθÞÞt ¼ R t ðθÞRðθÞ ∑ i¼1

ð5Þ

We now consider concentrating on the likelihood. It can be shown that Σ = V(θ)/n (see p 138 of Bates and Watts17) for fixed θ, which, when substituted back into eq 4, gives     VðθÞ VðθÞ ¼ ln det þ q L1 θ, n n θ^ML ¼ arg min det VðθÞ

2.1. Concentrated Maximum Likelihood Estimator. The

relationship between the behavior of q response variables and the vector of explanatory variables x can be formulated by the multiresponse nonlinear regression model:

" i, k

θ

ð6Þ

ð1Þ

is the maximum likelihood estimator of θ. Also, the maximum ^ = V(θ ^)/n. This result was first likelihood estimate of Σ is Σ established by Box and Draper19 via a Bayesian route and is usually known as the BoxDraper determinant criterion.

where yij is the measured value of the jth response each observed at the corresponding n values of xi, fj represents the nonlinear function for the jth response, depending on a set of p unknown parameters θ = (θ1, ..., θp)t, and eij is the error term caused by

2.2. MULTIVARIATE M-ESTIMATOR We now introduce the strategy of applying a univariate regression M-estimator to each response studied by Koenker

yij ¼ fj ðx i , θÞ þ eij

i ¼ 1; :::; n, j ¼ 1; :::; q

1119

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research and Portnoy.7 More precisely, we have that ! q n rij ðθÞ θ^M ¼ arg min F θ σj i¼1 j¼1

∑∑

ARTICLE

ð7Þ

where σj is some robust measure of dispersion for the jth response and F is a suitable loss function which downweights observations with large residuals. Probably, the most common F function is that of Huber: 8 2 > >u forjuj e c < 2  ð8Þ FðuÞ ¼ c > > forjuj > c : c juj  2 where a value of c must be selected and u is an arbitrary independent variable. Observe that the absolute error is used for large residuals, |u| f ∞, contrary to the least-squares approach in which every observation is treated equally by the squared error function. The tuning constant c regulates the degree of robustness of the procedure. This presents us with a dilemma, namely, a choice between robustness (when c is small) on the one hand and efficiency for the Gaussian model (when c is larger) on the other. Hence, a compromise is needed. Typical values are in the range of 12 (for example, c = 1.2, 1.25, 1.345, 1.5; see Wang et al.20 and references therein). Of course, the above approach ignores any possible correlation among the responses, this restriction being made for the sake of convenience and simplicity. Despite this weakness of not making full use of the multivariate nature of the data, it would be reasonable to expect little loss of efficiency for this M-estimator if the dependence between the response variables is small. Since the scale factors σ = (σ1, ...,σq)t are usually unknown, one must estimate them simultaneously with θ. This can be done with the “proposal 2” estimates of Huber, defined as ( ! ) q n rij ðθÞ t t t F ðθ^M , σ^M Þ ¼ arg min σ j þ aσ j θ, σ > 0 σj j¼1 i¼1

∑∑

ð9aÞ (see pp 3738 of Huber4 and pp 172176 in Huber and Ronchetti5), and to obtain consistency for the scale estimate σ^M under normal error, a should be chosen so that a ¼ ðn  pÞEΦ ½χðuÞ

with χðuÞ ¼ uψðuÞ  FðuÞ

ð9bÞ

where ψ = F0 and Φ is the standard normal distribution function. Unlike eq 8, we note that, if F is bounded then the above approach would yield a trivial solution given by σ = 0, thus excluding the use of so-called redescending F functions, such as Hampel’s three-part function or Tukey’s biweight (see Figure 1). For that reason, the F function was chosen to be of Huber’s type with a tuning parameter of c = 1.5. With it, we have χ(u) = 1/2ψ2. The consistency factor a in eq 9a is calculated by numerical integration of eq 9b:   1 a ¼ ðn  pÞEΦ ψ2 ðuÞ 2 Z ¼



∞

2

1 2 1 u ψ ðuÞpffiffiffiffiffiffi exp  2 2 2π

! du

Figure 1. Huber-type and redescending-type F-functions.

performed using the infinite integration routine DQAGI from the QUADPACK package.21 Here, ψ is considered to be Huber’s psi function: ψ(u) = min (c, max(u, c)). 2.3. Multivariate Least-Trimmed Squares. The basis of our approach is the extension of the maximum trimmed likelihood (MTL) estimators of Hadi and Luce~no22 to the multivariate setting. The idea behind it is based on trimming the logarithm of the likelihood function, namely, hU

∑ l i:n ðθÞ i¼h

ð10Þ

L

where hL e hU, (hL,hU) ∈ {1, 2, ..., n}, and l hL:n (θ) g l i:n(θ) g 3 3 3 g l hU:n(θ) represents the contribution of each observation to the log-likelihood function sorted from largest to smallest. Such an ordering is always possible because the likelihood is scalar-valued. There will be the usual tradeoff between robustness and efficiency, which is controlled by specifying the values for parameters hL and hU: the greater the hL and the lesser the hU, the less efficient (but also sturdier) the estimates. Note that, in the univariate case, when hL = 1, hU < n, and under the assumption that the measurement error is Gaussian, then MTL is the same as the least trimmed squares (LTS) of Rousseeuw.23 Therefore, it is natural to extend the definition of LTS to the general multivariate response case, based on this trimmed version of eq 10 for the multivariate normal distribution. From eq 3, we see that, apart from an unimportant constant, the contribution of the ith observation to the log-likelihood function is

t

1 1 l i ðθÞ ¼  ln det Σ  y i  f i ðθÞ Σ1 y i  f i ðθÞ 2 2 Using this fact, taking hL = 1 and n/2 e hU and defining the squared distances of the residuals as

t

di 2 ðθ, ΣÞ ¼ y i  f i ðθÞ Σ1 y i  f i ðθÞ 1120

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research

ARTICLE

for all vectors θ and all q  q positive definite matrices Σ, the MLTS estimator can be obtained in the following way (hereafter, for the sake of convenience, we shall omit the subscript U):   h 2 ^ ^ di:n ðθ, ΣÞ ðθMLTS , ΣMLTS Þ ¼ arg min h ln det Σ þ



θ, Σ

i¼1

ð11Þ where d1:n (θ,Σ) e 3 3 3 e dh:n (θ,Σ) is the ordered sequence of the h smallest squared distances of the residuals. Incidentally, note that it is possible to convert this constrained optimization problem to an unconstrained one by introducing a suitable parametrization,24 which ensures positive definiteness of the covariance matrix. In practice, it is often assumed that the covariance matrix is diagonal (that is, the measurement errors are uncorrelated). Thus, eq 11 now becomes 2

2

ðθ^MLTS , σ^MLTS Þ 8 > < ¼ arg min 2h ln θ, σ > :

2 q Y j¼1

σj þ

h

q

∑ 4j∑¼ 1

i¼1

!2 3 rij ðθÞ 5 σj i:n

9 > = > ; ð12Þ

subject to the condition σ > 0. The resulting equation (eq 12) is now a bound-constrained problem. At this point, it is appropriate to mention that, in order for the method to work, it is required that ( p þ qðq þ 1Þ=2 holds for eq 11 h> ð13Þ p þ q holds for eq 12 Remark. For small datasets, the above inequalities mean that some caution should be taken against discarding close to half of the data points, which gives the maximal BDP of ∼50%. We conclude with a few words on an alternative formulation of the MLTS regression estimator. Agullo et al.8 proved that the computation of MLTS estimates can be performed by minimizing the determinant of the minimum covariance determinant estimator (MCD) estimated covariance matrix of the errors based on the residuals from θ. (This is a well-known robust estimator of multivariate location and scatter introduced by Rousseeuw.23) Formally, θ^MLTS ¼ arg min det Σ^MCD ðRðθÞÞ θ

ð14Þ

which is equivalent to replacing V(θ) in eq 6 by the robust MCD scatter matrix of the residuals. Replacing the classical covariance matrix with a robust one is perhaps the most common approach used to develop robust multivariate techniques. 2.4. Computing the Estimates. In what follows, we shall briefly discuss the practical implementation issues regarding the choice of optimization method, the initialization of the population, the stopping rule, and the handling of boundary constraints. 2.4.1. Choice of Optimization Algorithm. The major computational difficulty with the MLTS estimates is that they cannot be calculated by standard gradient-based optimization algorithms, because the objective function is nondifferentiable and nonconvex. These optimization techniques are also not suitable for

M-estimates, since eq 8 has a discontinuous second derivative at u = c. Besides, it should be pointed out that, in order to ensure a fair comparison, we must use an identical algorithm for each estimator in the experimental testing. Therefore, we adopted the improved version, by Lee et al.,25 of the differential evolution (DE) algorithm proposed by Storn and Price26 for all the regression estimators. This method is a stochastic global search heuristic, which is a direct search algorithm (does not require that the objective function be differentiable). DE is also easy to tune and applies to bound-constrained problems. For a detailed description of the DE heuristic, we refer to Chapter 2 of Price, Storn, and Lampinen,27 as well as the survey papers of Storn28 and Neri and Tirronen.29 Virtually every publication that involves this topic highlights the power of such a simple method to solve a large class of complex realworld problems. 2.4.2. Population Initialization. Unlike algorithms that operate on a single solution, DE is a population-based search strategy. The method maintains a population PG of candidate solutions that consists of D-dimensional vectors θ~: ~ ~ ~ ~ PG ¼ fθ1, G ; :::; θ2, G ; :::; θi, G ; :::; θNpop , G g where θ~ define the augmented θ vector to include nuisance parameters such as σ, Npop specifies the population size, and G is the generation (iteration) to which the population belongs. The number of population vectors (Npop) remains constant during the optimization process. It is typical to initialize the population PG=0 with uniform random values within a hyperbox defined by ~ ~ ~ θL e θi, 0 e θU i ¼ 1, 2; :::; Npop As a welcome side benefit, this largely alleviates the difficulty in determining sufficiently good starting values—which is always a challenge in nonlinear models—since one only needs to establish a range of acceptable values for the unknown parameters. 2.4.3. Termination Rule. After initialization, PG=0 is subjected to certain evolutionary operators and evolves through several generations. Since candidate solutions usually converge to one point, the diversity decreases, resulting in a relatively homogeneous population. Therefore, the differences between the population members in a generation can be used to derive a measure of convergence. Following Krivy et al.,30 the termination criterion is to stop if ~ ~ median1 e i e Npop jðθi, G Þ  min jðθi, G Þ 1 e i e Npop < ε ð15aÞ f0 or G g Gmax

ð15bÞ

where ϕ(θ~) denotes the function to minimize in order to perform the estimation of parameters and ε is a given positive threshold. Here, f0 is a scale factor, for which we use 8 > for ML detðR t RÞ > > ! > > > yij  y̅ j < σ^j þ aσ^j for M F f0 ¼ σ^j j i > > > > > ðyij  y̅ j Þ2 for MLTS > :

∑∑ ∑i ∑j

1121

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research

ARTICLE

where R = [(yij  yj)], yj = ∑iyij/n, and σ^j = mediani{|yij  mediani(yij)|}/0.6745. It should be mentioned that a criterion similar to eq 15a but with the worst ϕ(θ~i,G) replacing median1eieNpopϕ(θ~i,G) has been shown in a recent comparison study31 to be the best among several other choices. Our experience with the particular variant of DE employed here shows that several times, a tiny fraction of the population becomes stagnated in the search space, even if the optimum already has been found. But this means that, since the worst solution is getting stuck, any criterion based on it would fail, resulting in an excess of spurious generations. 2.4.4. Boundary Constraints. The search space Ω is defined as Ω¼

D Y j¼1

~ ~ ½θL, j , θU, j 

~ ~ θL, j < θU, j , j ¼ 1, 2; :::; D

Several options exist for keeping newly generated parameter values, which exceed their allowed ranges within bounds. In this work, an out-of-bounds ith individual in generation G is reset to the middle between its old value and the bound that is violated as follows (see p 86 of Price32): 8 1 ~ ~ ~ ~ > > ðθi, j, G þ θU, j Þ if θi, j, Gþ1 > θU, j > > θL, j > > 2 i, j, G > > : θ~ otherwise i, j, Gþ1 with j ¼ 1, 2; :::; D

Scheme 1. Reaction Network for Pyrolysis of Oil Shale

3. DATASETS 3.1. Oil Shale Pyrolysis. Oil shale is a sedimentary rock containing insoluble organic matter known as kerogen dispersed and bound to the mineral matrix. By heating the rock at high temperature in the absence of oxygen, the thermal degradation of kerogen yields a synthetic fuel referred to as shale oil, in a process called pyrolysis. After hydrotreating, shale oil can be refined into liquid fuels and many other products. A reaction network of lumped components was proposed by Ziegel and Gorman,36 as shown in Scheme 1. The dataset taken from Appendix 1, pp 282284, in Bates and Watts17 contains n = 64 measurements of the concentration of oil and bitumen, expressed as percentages of the initial amount of kerogen for six temperatures ranging from 673 K to 798 K. In the experiments, kerogen and byproduct coke and gas were not measured. The set of linear differential equations describing the time evolution of the concentrations is

ð16Þ

rather than just replace it with the exceeded limit as done by Lee et al.,25 since this last choice can decrease the diversity of the population. 2.4.5. Implementation Effort. All computations, including the Monte Carlo calculations, were done using the opensource statistical software R.33 The base system provides a language that can be used both for programming and for issuing commands interactively. For the interested reader, we refer to the recent survey article of Horgan34 for an overview of the basic features of the language, as well as to the book by Chambers35 for more background on programming in R. At the same time, high-quality algorithms for numerical computations, pseudo-random number generation, sorting and searching, and many other tasks are included in the core system itself, automatically available for use. (For example, the quadrature routine DQAGI is available in the built-in function integrateð Þ.) In turn, this is supplemented by a large collection of software packages. This allows one to tackle the implementation aspects in a natural and concise way. Moreover, because parameter estimation boils down, in essence, to an optimization problem, the key fundamental issue in the task of writing the nonlinear regression code is the simplicity of the chosen optimization algorithm. And, indeed, DE is simple and easy to implement. As Storn so aptly put it in page 3 of his survey on DE:28 “Simplicity is an asset which is very important to anyone who considers optimization to be a necessary but not the primary task. DE’s simplicity allowed many practicing engineers and researchers from very diverse disciplines to use global optimization without the need to be an optimization expert.”

dCkerogen ¼  ðk1 þ k4 ÞCkerogen dt

ð17aÞ

dCbitumen ¼ k1 Ckerogen  ðk2 þ k3 ÞCbitumen dt

ð17bÞ

dCoil ¼ k4 Ckerogen þ k2 Cbitumen dt

ð17cÞ

together with the initial conditions Ckerogen ðt0 Þ ¼ 1, Cbitumen ðt0 Þ ¼ 0,

Coil ðt0 Þ ¼ 0

ð17dÞ

and an additional parameter t0 is needed to take account of the unknown initiation time necessary to reach the threshold temperature for the reaction. In the above equations, ki (min1) are rate constants for the various steps of the reaction scheme, C denotes the molar concentration of the component given by the subscript, and t is the time. The analytical solution to eq 17a takes the form ð18aÞ Ckerogen ¼ exp  γðt  t0 Þ Cbitumen ¼

Coil ¼

 k1 exp  βðt  t0 Þ  exp  γðt  t0 Þ α ð18bÞ

 k1 k2 1  exp  βðt  t0 Þ αβ þ

 αk4  k1 k2 1  exp  γðt  t0 Þ αγ

ð18cÞ

where α ¼ k1 þ k4  k2  k3 β ¼ k2 þ k3 1122

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research

ARTICLE

Scheme 2. Reactions Involved in the Synthesis of 1,3-Propanediol from 3-Hydroxypropanal

Scheme 3. Reaction Network for Conversion of Methanol to Hydrocarbons

and γ ¼ k1 þ k4

ð18dÞ

Assuming the Arrhenius equation for the temperature dependence of rate constants, we reparametrize it to reduce the correlations between the activation energies and the pre-exponential terms as discussed in Lohmann et al.,37 and write ki ¼ exp½ln ki ðT1 ÞτðTÞ þ ln ki ðT2 Þð1  τðTÞÞ

ð19aÞ

with   T1 T  T2 τðTÞ ¼ T T1  T2

dCAc ¼ r3  r4  r3 dt

where Ccat is the concentration of catalyst (10 g/L), ri denotes the reaction rate for step i, and t is the time (min). The initial conditions in all cases were CHPA = 1.34953 mol/L, CPDO = 0, and CAc = 0 at t = 0. The reaction rates are expressed by

ð19bÞ

where T is the absolute temperature (in kelvin) and T1 and T2 are reference temperatures which correspond generally to the two extreme values of the range of temperatures covered (also given in kelvin). In addition, in their study of this process, Bates and Watts (pp 188198)17 found that the initiation time decreased with increasing temperature. These authors proposed the following linear relationship:   b 1 1  ð20Þ t0 ¼ t0 ðTref Þ  R T Tref where R is the gas law constant (in units of kJ/(mol K)) and Tref is a reference temperature (equal to 723 K). Thus, we now have 10 adjustable parameters, consisting of four logarithms of rate constants at 673 K, plus four other ones at 798 K, the initiation time t0 at 723 K, and the slope b (expressed in units of min kJ1 mol) of t0 versus the negative scaled inverse temperature (1/R)[(1/T)  (1/Tref)]. 3.2. Hydrogenation of 3-Hydroxypropanal. We consider the production of 1,3-propanediol, which is an important industrial compound predominantly used as a monomer in the preparation of polyesters that are particularly suitable for textile fiber applications, with several other interesting applications being reported. The process of hydrogenation of 3-hydroxypropanal (HPA) to 1,3-propanediol (PDO) over a Ni/SiO2/Al2O3 catalyst powder was studied by Zhu et al.,38 who proposed the reaction mechanism outlined in Scheme 2 and established the corresponding mathematical model given next: dCHPA ¼  ðr1 þ r2 ÞCcat  ðr3 þ r4  r3 Þ dt

ð21aÞ

dCPDO ¼ ðr1  r2 ÞCcat dt

ð21bÞ

ð21cÞ

r1 ¼

k1 pCHPA h i3

1=2 H 1 þ K1 p=H þ K2 CHPA

ð22aÞ

r2 ¼



k2 CPDO CHPA

1=2 1 þ K1 p=H þ K2 CHPA

ð22bÞ

r3 ¼ k3 CHPA

ð22cÞ

r3 ¼ k3 CAc

ð22dÞ

r4 ¼ k4 CAc CHPA

ð22eÞ

where k1, k2 (L2 /(mol min g)), k3, k3 (min1), and k4 (L/(mol min)) are rate constants, p is the hydrogen pressure (MPa), K1 is the adsorption equilibrium constant (L/mol) for H2, K2 is the adsorption equilibrium constant (L/mol) for HPA, and H is the Henry’s law constant with a value equal to 137.9 MPa L/mol at 25 C. In the above equations, seven parameters are to be determined: k1, k2, k3, k4, k3, K1, and K2. The experimental data consists of n = 37 measurements of the concentration of HPA and PDO versus time collected at 45 C and three pressures (2.6, 4.0, and 5.15 MPa), taken from Table 16.23 on p 309 of the work by Englezos and Kalogerakis.39 The time integration is performed by the LSODA routine in the ODEPACK software package40 accessed by the R package odesolve, which is an interface to the LSODA Fortran 77 integrator. Both the relative and absolute tolerances RTOL and ATOL were set to 106. 3.3. Methanol Conversion to Hydrocarbons. Maria41 analyzed the simplified reaction mechanism with only first-order reactions depicted in Scheme 3 for the methanol conversion to hydrocarbons over ZSM-5 class zeolites. It is important to note that steps 4, 5, and 6 of the sequence are fictitious and have been included to demonstrate the detection of redundant or unimportant reactions by means of the procedure described there for model reduction. Nevertheless, for the sake of coherence with previous work,42,43 we have decided to retain them. 1123

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research

ARTICLE

Scheme 4. Reaction Network for the Hydrogenation of Toluene

and 1-methylcyclohexene (θ1MCH) are defined as θTol ¼

€ H2 Assuming now the quasi-stationary hypothesis for C (constant carbene concentration), the mass balances for each species lead to the differential equations dxoxygenates dt ¼  2k1 

ðk23

! k1 xolefins þ k4 þ k5 xoxygenates þ k63 Þxoxygenates þ xolefins

ð23aÞ k1 xoxygenates ðk23 xoxygenates  xolefins Þ dxolefins ¼ þ k4 xoxygenates dt ðk23 þ k63 Þxoxygenates þ xolefins ð23bÞ dxparaffins k1 xoxygenates ðxolefins þ k63 xoxygenates Þ ¼ þ k5 xoxygenates dt ðk23 þ k63 Þxoxygenates þ xolefins ð23cÞ which describe the time evolution of the molar fraction of the components (x) with five reaction constants k1, k23 (k23 = k2/k3), k4, k5, and k63 (k63 = k6/k3) to be estimated. Initial values are xoxygenates = 1, xolefins = 0, and xparaffins = 0. The 16 experimental data points for the concentration of the three species versus time are taken from p 402 of Floudas et al.42 The differential equations are integrated by LSODA executed with a termination tolerance of 106 for the relative and absolute errors. 3.4. Hydrogenation of Toluene. This case uses the kinetics study of the hydrogenation of toluene to methylcyclohexane at atmospheric pressure and ambient temperature in the presence of a commercial 5% Ru-act catalyst reported by Belohlav et al.44 They concluded that the reaction scheme shown in Scheme 4 could be envisaged. The mole balances of the species involved are then as follows: dCTol ¼  kH, 1 θTol þ kD, 1 θ1MCH dt

ð24aÞ

dC1MCH ¼ kH, 1 θTol  ðkD, 1 þ k2 Þθ1MCH dt

ð24bÞ

dCMCH ¼ k2 θ1MCH dt

ð24cÞ

rel KTol CTol

θ1MCH ¼

rel KTol CTol rel þ C1MCH þ KMCH CMCH

rel KTol CTol

C1MCH rel þ C1MCH þ KMCH CMCH

ð25aÞ

ð25bÞ

rel where Krel i represents the relative adsorption coefficients (Ki = Ki/K1MCH). The experimental dataset is comprised of n = 13 measurements of the time dependence of the q = 3 species concentrations. Numerical results are obtained by LSODA with an absolute and relative termination accuracy of 106.

4. DESIGN OF THE MONTE CARLO STUDY A Monte Carlo study was conducted to compare the smallsample performance of the M and MLTS estimators with each other and with the classical multivariate maximum likelihood estimator. Table 1 summarizes some of the settings used to control the Monte Carlo simulations. Let us now examine the setup of the Monte Carlo experiments. 4.1. Experimental Design. We must select n design points for the independent variables xi at which to observe the responses yi. Unfortunately, it turns out that the proper choice of design to be employed in Monte Carlo studies is a difficult and controversial matter; therefore, any choice of values is likely to be problematic. As noted by Huber in p 152 of Huber and Ronchetti:5 “Incidentally, the difficulty to choose and agree on a representative selection of carrier matrices was the main reason why there never was a regression follow-up to the Princeton robustness study (see Huber 2002, p 1642).” 45 Primarily, we shall be concerned with the following: (a) the design points xi are not modeled as a random sample from a specified distribution (see p 59 in Huber4); (b) for fixed designs, it is realistic to expect the absence of outlying xi points. How do we do this? One way is to generate data points (xi,yi) whose xi is set to the original experimental values, since it also addresses the problem of performing realistic simulations. 4.2. Data Generation. 4.2.1. No Outliers. The responses yi were generated from model specified by eq 1 for given true values of design points xi, model parameters θ, different hypothetical distributions for the error term, and error covariance matrix Σ. Four distributions were considered for ei: • Multivariate Gaussian ei ≈ Nq ð0, ΣÞ

with initial conditions CTol = 1, C1MCH = 0, and CMCH = 0 at t = 0. In these equations, the subscript “Tol” denotes toluene, “1MCH” denotes 1-methylcyclohexene, and “MCH” denotes methylcyclohexane; kH,i (min1) represents the hydrogenation rate constants, kD,i (min1) represents the disproportionation rate constants, and k2 = kH,2 + kD,2 is an overall rate constant, C represents the normalized concentrations of the chemical species, and t is time. The fractional surface coverages for toluene (θTol)

• Two contaminated multivariate Gaussians ei ≈ 0:9Nq ð0, ΣÞ þ 0:1Nq ð0, 2ΣÞ and ei ≈ 0:7Nq ð0, ΣÞ þ 0:3Nq ð0, 5ΣÞ 1124

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research

ARTICLE

Table 1. Some Important Settings Controlling the Simulationsa Generated Data Σ

n

64

DE Control Variables

"

0:00194  0:0000958

θ~

parameters

13

51

1000

51

600

51

600

1

ln k2 (673 K)

4.026

10

1

ln k3 (673 K)

3.982

10

1

ln k4 (673 K)

4.928

10

1

ln k1 (798 K)

0.5411

10

1

ln k2 (798 K)

0.02497

10

1

0.02786 0.7332 4.282 95.62

10

1

10

1

0

10

1000

0

σbitumen

0.0440

1010

0.5

σoil

0.0432

1010

0.5

0

100

Hydrogenation of 3-Hydroxypropanal

#

k1

12.07

0:00295

2

3 0:0000332  0:0000419

7 7 5

k2

4.031 1010

0

0.1

k3

0.0006636

0

0.1

k3

0.0169

0

0.1

k4

0.015

0

0.1

0

104

3 0:000514  0:000365

169.2

k2

4.219

0

100

σHPa

0.0552

1010

1

σPDO

0.0543

1010

1

100

Methanol Conversion to Hydrocarbons k1

3.1355

0

k23

1.0084

1080

100

k4

1.4252  107

0

100

0:000190

2

0:000234 6 6  0:000145 4  0:000101

1000

10

k1

0:0153 6 6 0:000177 4  0:000793

81

3.898

t0 (723 K)

16

Gmax

0:00187

b

0:00305  0:00229

Npop

ln k1 (673 K)

ln k4 (798 K)

"

θ~U

Oil Shale Pyrolysis

#

ln k3 (798 K)

37

θ~L

7 7 5

k5

0.017516

0

100

k63

0

0

100

σoxygenates

0.124

1010

0.3

σolefins

0.00577

1010

0.3

σparaffins

0.0138

1010

0.3

Hydrogenation of Toluene kH,1

0.0051

1080

0.1

kD,1

0.011

1080

1

k2

0.011

1080

1

Krel Tol

1.9

1080

100

Krel MCH

1.8

1080

100

σTol

0.0153

1010

0.1

σ1MCH

0.0227

1010

0.5

σMCH

0.0245

1080

0.7

0:000598

a ~ θL and θ~U record the lower and upper bounds of the search space, respectively, and Npop and Gmax are the population size and maximum number of generations used by the variant of the differential evolution (DE) algorithm, respectively.24 The other DE control variables are set to the following values: crossover probability Cr = 0.8, base weighting factor Fb = 0.5, step factor ξ = 1.5, and termination tolerance of 108. In order to accelerate the progress of the solution, the values of θ~ shown in the table are placed in the initial population.

1125

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research

ARTICLE

which are denoted as CN and CN+, respectively. • Multivariate tStudent tq(k;λ,Δ) with k degrees of freedom, where λ ∈ Rq is the location parameter, and Δ is a q  q symmetric positive definite matrix. Specifically, we used tq(3;0,Σ/3), denoted as t3. We can distinguish three degrees of deviations from normality corresponding to the error distributions considered increasing in the order CN < t3 < CN+. Random number generators from the R package mvtnorm46,47 were used for the multivariate normal and t-distributions. Finally, the true values for θ and Σ were set to the estimates obtained by regression of the original experimental data using the determinant criterion for multiresponse estimation. These generated data are referred to as “clean” data. 4.2.2. Outlier Generation. To assess the effect of data contamination by outliers, randomly selected observations from each uncontaminated dataset were modified to be “bad” data points. We have employed four proportions of outliers in data in each simulated dataset, namely, 10%, 15%, 20%, and 30% contamination. Then, each of the q response variables are shifted upward at the selected points by an amount λ(σjj)1/2 (σjj represents the diagonal elements of Σ). We used λ = 5 for 10% and 20% bad points and λ = 10 for 15% and 30% bad points. To summarize, the data generation process can be organized in the following steps: (1) Obtain the ML estimates θ^ML and Σ^ML from the original experimental data. (2) Choose the sample size n, and define the design points xi. (3) Compute the deterministic part fj(xi, θ) in the model given by (1), using θ^ML as the true values of the parameters. (4) “Clean” dataset: simulate responses yij by adding noise eij ^ML as the true covariance from a distribution to fj, using Σ matrix of measurement errors. (5) “Contaminated” dataset: Select a proportion of points at random and generate the outlying responses. pffiffiffiffiffi i ⊂ f1, 2; :::; ng, j ¼ 1; :::; q y0ij r yij þ λ σjj 4.3. Details of MLTS. We begin by noting that, in systems with even a moderate number of responses and data of small size, the number of nuisance parameters to estimate in the error covariance matrix may be quite large, relative to the sample size. In that case, the estimation becomes very difficult, if not impossible. Now, with respect to the MLTS estimator, we must determine if the choice of h is large enough to ensure reliable estimates for all datasets. (Remember that the sample size for each simulated dataset is equal to that of the corresponding real one.) In this way, the condition described by eq 13 and the remark below it, along with the small size of two datasets lead to our choice of ignoring the correlation in order to reduce the number of nuisance parameters. Thus, we use the form given by eq 12 of MLTS with h = 0.75n, designated as MLTS(0.25). 4.4. Comparison Criteria. For the purpose of comparing the performance of the estimators, we used suitable normalized versions of two criteria employed by You.48 More specifically, the robust analogues of (normalized) bias,

RBk ¼

medianðθ^k Þ  θk jθk j

ð26aÞ

Figure 2. Tree for the data obtained only under Gaussian error with no trace of outlier contamination. The boxplots in the leaves depict the efficiency criterion MAD, expressed as a percentage, and the node size is shown on their tops. The splitting variable and adjusted p-value are displayed at the root node.

and relative efficiency, with respect to the ML estimator in the absence of outliers, MADk ¼

medianðjθ^ML;uncontaminated  θk jÞ k ^ medianðjθk  θk jÞ

ð26bÞ

were evaluated on the basis of 100 Monte Carlo replications for each of the k = 1, ..., p parameters of interest. (Recall that if the relative efficiency is 100%. The same phenomenon is also observed in the simulation studies conducted for the case of univariate regression.15 Considering that this also happens for the efficiency measure based on the mean squared error (not shown), it is not clear whether this is a genuine property of the estimators or simply an artifact of our data. 5.3. Performance on Clean Data. In Figure 4, we compare all three methods when no outliers are present for any of the distributions except the normal.

Figure 5. Absolute value of the normalized bias, |RB|, expressed as a percentage for all simulations at several distributions for clean data.

ARTICLE

The initial (multiway) split is on ErrorDistr into each of the three distributions. Further separation is then accomplished by estimator for the CN and CN+ distributions, splitting MLTS from the other two estimators. Here again, the use of M-estimates is not preferable to ML estimates. The middle panel shows that, at the t3 distribution, the decision tree was unable to discriminate any differences in the performance of the three estimators. As in the Gaussian case, the two panels on the left reveal that, under CN noise conditions, MLTS performed significantly worse than the other two estimators. On the other hand, as can be seen from the right panels, now MLTS performs clearly better than the ML and M-estimators, which show similar behavior. These results indicate that is unlikely that these robust methods will perform better than using the determinant criterion, unless the measurement error markedly departs from normality. Finally, let us consider bias. As shown in Figure 5, the data were not partitioned for the response variable RB. Thus, there is not much to say about bias, other than most of the cases are no worse than 30% with a distribution that is highly skewed to the lower biases. This is still acceptable, from a practical point of view. 5.4. Outliers in the Response Variable. We now turn our attention to the situation where the data are contaminated by outliers. Notice first that, in order to focus the analysis on the effect of gross error rather than on specific error distributions, the predictor ErrorDistr was not used in this case. Regarding efficiency, the resulting tree is displayed in Figure 6. The first split is on perc:outliers with a split point of 20%. Subsequently, both child nodes, in turn, are split in the variable estimator. It is remarkable that different contamination fractions of 10%, 15%, and 20% showed quite similar behavior. This notwithstanding, MLTS appears to have some advantage over both ML and estimates, since their estimates have much less variability. The fact that ML is competitive against both MLTS and M-estimates under 30% contamination is quite puzzling. Of course, it is not surprising that, in this case, MLTS with 25%

Figure 6. Tree for the effect of outlier contamination on the efficiency of the estimators. The boxplots in the leaves depict the efficiency criterion MAD, expressed as a percentage, and the node size is shown on their tops. The splitting variable and adjusted p-value are displayed at each internal node. 1128

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research

ARTICLE

the use of data mining techniques to analyze a large number of kinetic datasets. This carries a heavy computational burden that may be addressed using parallel processing. This is left for future work.

’ AUTHOR INFORMATION Corresponding Author

*Tel.: +351 239 798 748. Fax +351 239 798 703. E-mail: etc@ eq.uc.pt.

’ ACKNOWLEDGMENT The authors thank the three anonymous reviewers for their helpful comments and suggestions.

Figure 7. Tree for the effect of outlier contamination on the biasedness of the estimators. The boxplots in the leaves depict the absolute value of the normalized bias, |RB|, expressed as a percentage and the node size is shown on their tops. The splitting variable and adjusted p-value are displayed at each internal node.

trimming suffers from a serious loss of efficiency, since its BDP is smaller than the amount of contamination. With respect to bias, we see from Figure 7 that, for contamination levels of e20%, all the estimators perform reasonably well; the biases of MLTS are below 20% on most occasions, and those of ML and M-estimators are just slightly higher. With more severe contamination, we find that all the estimators are likely to yield inaccurate estimates. In summary, in the case of moderate contaminations, the results give some support to the use of MLTS in preference to the determinant criterion.

6. CONCLUSIONS AND OUTLOOK We have reported the results of a Monte Carlo study comparing the use of two robust estimators with the maximum likelihood (determinant criterion) estimator in multiresponse nonlinear models. Although the results were quite mixed, they somewhat support the combined use of the classical and robust techniques under outlier contamination, namely, that of the MLTS estimator. In contrast, Huber’s M-estimates do not show any clear advantage over conventional ML estimates. Contrary to our findings for univariate response models,15 it is difficult to draw strong conclusions using this data alone. Clearly, more research is needed in this area to provide adequate guidelines. We suspect that the major reason for the mixed results is the large variability of the ranking performance of the estimators among the four datasets, which we do not show here. In other words, the relative performance of the conventional and robust methods is clearly model-dependent. Therefore, a natural question arises as to which and how features of the chemical kinetic models influence the statistical properties of the estimators. We believe that an adequate answer to this question would require

’ NOMENCLATURE a = consistency factor involved in eq 9b b = regression coefficient involved in eq 20 c = Huber’s function tuning constant C = concentration E = mathematical expectation operator, activation energy e = measurement random errors fj = nonlinear function for the jth response G = generation counter H = Henry’s law constant hL = lower summing limit in eq 10 hU = upper summing limit in eq 10 h = trimming constant for MLTS (n  h = number of trimmed observations) Ki = adsorption constant of species i k = reaction rate constant l i = contribution of the ith observation to the log-likelihood function MAD = efficiency criterion Npop = number of population vectors n = number of observations p = number of parameters q = number of responses R = gas constant RB = normalized bias rij = residuals from the fit in eq 1 r = reaction rate T = absolute temperature t = time t0 = initiation time V = matrix defined by eq 5 xi = mole fraction of species i x = vector of explanatory variables y = response variable Greek Symbols

λ = factor used for outlier generation θ = vector of unknown model parameters θ^ = estimate of θ θi = surface coverage of species i θ~ = vector of all parameters θ~L = lower bound of vector θ~ θ~U = upper bound of vector θ~ F = some symmetric function of the residuals Σ = covariance matrix of the measurement errors with elements σij 1129

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130

Industrial & Engineering Chemistry Research σ = vector of robust residual scale estimates of the M and MLTS estimators

’ REFERENCES (1) Morgenthaler, S. A Survey of Robust Statistics. Stat. Methods Appl. 2007, 15, 271–293. (2) M€uller, C. H. Robust Planning and Analysis of Experiments; Lecture Notes in Statistics 124; SpringerVerlag: New York, 1997. (3) Le Cam, L.; Yang, G. L. Asymptotics in Statistics: Some Basic Concepts, 2nd ed.; SpringerVerlag: New York, 2000. (4) Huber, P. J. Robust Statistical Procedures, 2nd ed.; SIAM: Philadelphia, PA, 1996. (5) Huber, P. J.; Ronchetti, E. M. Robust Statistics, 2nd ed.; Wiley: Hoboken, NJ, 2009. (6) Maronna, R. A.; Martin, R. D.; Yohai, V. J. Robust Statistics: Theory and Methods; Wiley: Chichester, U.K., 2006. (7) Koenker, R.; Portnoy, S. M Estimation of Multivariate Regressions. J. Am. Stat. Assoc. 1990, 85, 1060–1068. (8) Agullo, J.; Croux, C.; Van Aelst, S. The Multivariate Least-Trimmed Squares Estimator. J. Multivariate Anal. 2008, 99, 311– 338. (9) Van Aelst, S.; Willems, G. Multivariate Regression S-Estimators for Robust Estimation and Inference. Stat. Sinica 2005, 15, 981–1001. (10) García Ben, M.; Martínez, E.; Yohai, V. J. Robust Estimation for the Multivariate Linear Model Based on a τ-Scale. J. Multivariate Anal. 2006, 97, 1600–1622. (11) Bai, Z.; Chen, X.; Wu, Y. On Constrained M Estimation and Its Recursive Analog in Multivariate Linear Regression Models. Stat. Sinica 2008, 18, 405–424. (12) Roelant, E.; Van Aelst, S.; Croux, C. Multivariate Generalized S-Estimators. J. Multivariate Anal. 2009, 100, 876–887. (13) Kudraszow, N. L.; Maronna, R. A. Estimates of MM Type for the Multivariate Linear Model. J. Multivariate Anal. 2011, 102, 1280–1292. (14) Mizera, I.; M€uller, C. H. Breakdown Points and Variation Exponents of Robust M-Estimators in Linear Models. Ann. Statist. 1999, 27, 1164–1177. (15) Conceic-~ao, E. L. T.; Portugal, A. A. T. G. Finite-Sample Comparison of Robust Estimators for Nonlinear Regression Using Monte Carlo Simulation: Part I. Univariate Response Models. Comput. Chem. Eng. 2011, 35, 530–544. (16) Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer Science + Business Media, LLC: New York, 2009. (17) Bates, D. M.; Watts, D. G. Nonlinear Regression Analysis and Its Applications; Wiley: New York, 1988. (18) Seber, G. A. F.; Wild, C. J. Nonlinear Regression; Wiley: New York, 1989. (19) Box, G. E. P.; Draper, N. R. The Bayesian Estimation of Common Parameters from Several Responses. Biometrika 1965, 52, 355–365. (20) Wang, Y.-G.; Lin, X.; Zhu, M.; Bai, Z. Robust Estimation Using the Huber Function with a Data-Dependent Tuning Constant. J. Comput. Graph. Stat. 2007, 16, 468–481. € berhuber, C. W.; (21) Piessens, R.; de Doncker-Kapenga, E.; U Kahaner, D. K. QUADPACK: A Subroutine Package for Automatic Integration; SpringerVerlag: Berlin, 1983. (22) Hadi, A. S.; Luce~no, A. Maximum Trimmed Likelihood Estimators: A Unified Approach, Examples, and Algorithms. Comput. Stat. Data Anal. 1997, 25, 251–272. (23) Rousseeuw, P. J. Least Median of Squares Regression. J. Am. Stat. Assoc. 1984, 79, 871–880. (24) Pinheiro, J. C.; Bates, D. M. Unconstrained Parametrizations for Variance-Covariance Matrices. Stat. Comput. 1996, 6, 289–296. (25) Lee, M. H.; Han, C.; Chang, K. S. Dynamic Optimization of a Continuous Polymer Reactor Using a Modified Differential Evolution Algorithm. Ind. Eng. Chem. Res. 1999, 38, 4825–4831. (26) Storn, R.; Price, K. Differential Evolution. A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. J. Global Optim. 1997, 11, 341–359.

ARTICLE

(27) Price, K. V.; Storn, R. M.; Lampinen, J. A. Differential Evolution. A Practical Approach to Global Optimization; Springer: Berlin, 2005. (28) Storn, R. In Advances in Differential Evolution; Chakraborty, U. K., Ed.; Studies in Computational Intelligence; SpringerVerlag: Berlin, Heidelberg, Germany, 2008; Vol. 143; pp 131. (29) Neri, F.; Tirronen, V. Recent Advances in Differential Evolution: A survey and Experimental Analysis. Artif. Intell. Rev. 2010, 33, 61–106. (30) Krivy, I.; Tvrdík, J.; Krpec, R. Stochastic Algorithms in Nonlinear Regression. Comput. Stat. Data Anal. 2000, 33, 277–290. (31) Zielinski, K.; Laur, R. In Advances in Differential Evolution; Chakraborty, U. K., Ed.; Studies in Computational Intelligence; Springer Verlag: Berlin, Heidelberg, Germany, 2008; Vol. 143; pp 111138. (32) Price, K. V. In New Ideas in Optimization; Corne, D., Dorigo, M., Glover, F., Eds.; McGrawHill: London, 1999; Chapter 6, pp 79108. (33) R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2009. (ISBN 3-900051-07-0.) (34) Horgan, J. M. Programming in R. WIREs Comput. Stat. 2011, 4, 7584. (35) Chambers, J. M. Software for Data Analysis: Programming with R; Springer Science + Business Media, LLC: New York, 2008. (36) Ziegel, E. R.; Gorman, J. W. Kinetic Modelling with Multiresponse Data. Technometrics 1980, 22, 139–151. (37) Lohmann, T.; Bock, H. G.; Schl oder, J. P. Numerical Methods for Parameter Estimation and Optimal Experiment Design in Chemical Reaction Systems. Ind. Eng. Chem. Res. 1992, 31, 54–57. (38) Zhu, X. D.; Valerius, G.; Hofmann, H.; Haas, T.; Arntz, D. Intrinsic Kinetics of 3-Hydroxypropanal Hydrogenation over Ni/SiO2/ Al2O3 Catalyst. Ind. Eng. Chem. Res. 1997, 36, 2897–2902. (39) Englezos, P.; Kalogerakis, N. Applied Parameter Estimation for Chemical Engineers; Marcel Dekker: New York, 2001. (40) Hindmarsh, A. C. In Scientific Computing; Stepleman, R. S., Ed.; NorthHolland: Amsterdam, 1983; pp 5564. (41) Maria, G. An Adaptive Strategy for Solving Kinetic Model Concomitant Estimation—Reduction Problems. Can. J. Chem. Eng. 1989, 67, 825–832. (42) Floudas, C. A.; Pardalos, P. M.; Adjiman, C. S.; Esposito, W. R.; G€um€u-s, Z. H.; Harding, S. T.; Klepeis, J. L.; Meyer, C. A.; Schweiger, C. A. Handbook of Test Problems in Local and Global Optimization; Nonconvex Optimization and Its Applications 33; Kluwer Academic Publishers: Dordrecht, 1999. (43) Vidaurre, G.; Vasquez, V. R.; Whiting, W. B. Robustness of Nonlinear Regression Methods under Uncertainty: Applications in Chemical Kinetics Models. Ind. Eng. Chem. Res. 2004, 43, 1395–1404. (44) Belohlav, Z.; Zamostny, P.; Kluson, P.; Volf, J. Application of Random-Search Algorithm for Regression Analysis of Catalytic Hydrogenations. Can. J. Chem. Eng. 1997, 75, 735–742. (45) Huber, P. J.; John, W. Tukey’s Contributions to Robust Statistics. Ann. Statist. 2002, 30, 1640–1648. (46) Genz, A.; Bretz, F.; Miwa, T.; Mi, X.; Leisch, F.; Scheipl, F.; Hothorn, T. mvtnorm: Multivariate Normal and t Probabilities; R package, version 0.9-9, 2010. (47) Genz, A.; Bretz, F. Computation of Multivariate Normal and t Probabilities; Lecture Notes in Statistics; SpringerVerlag: Berlin, 2009; Vol. 195. (48) You, J. A Monte Carlo Comparison of Several High Breakdown and Efficient Estimators. Comput. Stat. Data Anal. 1999, 30, 205–219. (49) Hawkins, D. M. Recursive Partitioning. WIREs Comput. Stat. 2009, 1, 290–295. (50) Loh, W.-Y. Tree-Structured Classifiers. WIREs Comput. Stat. 2010, 2, 364–369. (51) Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J. Classification and Regression Trees; Wadsworth: Belmont, CA, 1984. (52) Quinlan, J. R. C4.5: Programs for Machine Learning; Morgan Kaufmann: San Mateo, CA, 1993. (53) Hothorn, T.; Hornik, K.; Zeileis, A. Unbiased Recursive Partitioning: A Conditional Inference Framework. J. Comput. Graph. Stat. 2006, 15, 651–674. 1130

dx.doi.org/10.1021/ie2005324 |Ind. Eng. Chem. Res. 2012, 51, 1118–1130