Optimal Experimental Design for the Characterization of Liquid

terms: the (co)variances—of the model parameters and/or the predictions. ...... Bertakis , E.; Kalem , M.; Pfennig , A. Model-based geometry opt...
1 downloads 0 Views 449KB Size
Article pubs.acs.org/IECR

Optimal Experimental Design for the Characterization of Liquid− Liquid Equilibria D. Dechambre,† L. Wolff,† C. Pauls, and A. Bardow* Chair of Technical Thermodynamics, RWTH Aachen University, 52062 Aachen, Germany S Supporting Information *

ABSTRACT: Measurements of liquid−liquid equilibria (LLE) are characterized by high experimental effort, long equilibration times, high sample consumption, and hence high costs. To reduce these efforts, we analyze LLE measurements using optimal experimental design (OED). Using OED, we identify experiments with the highest information content for LLE systems of both type I and type II. The NRTL and UNIQUAC gE models are employed. The optimal designs are compared to conventional designs. The optimal designs employ very few (3−5) distinct experiments located in regions where the curvature of the binodal curve is high. Thereby, at least half of the experimental effort can be saved without losing either prediction or parameter accuracy.

1. INTRODUCTION Chemical process design depends on accurate knowledge of phase equilibria. Errors in phase equilibria can potentially lead to useless process designs. Thus, reliable knowledge on phase equilibria is indispensable.1 Liquid−liquid equilibria (LLE), in particular, are of considerable importance: LLE determine, for example, the distribution of chemicals in extraction and heterogeneous azeotropic distillation processes or in the environment. Because predictive thermodynamic models are typically not yet sufficiently accurate for LLE,2 experimental LLE data still have to be collected. However, LLE experiments are often accompanied by high experimental effort, long equilibration times, high sample consumption, and therefore high costs. Hence, there is a trade-off between measurement costs and accuracy.3,4 LLE data are typically measured by equilibrating two liquid phases in an equilibrium cell, taking samples from both phases, and determining their respective compositions by an appropriate analytical method. This experimental approach requires relatively large sample volumes, leading to long equilibration times. To overcome these limitations, attempts have been made to automate and miniaturize experiments for partition coefficients5−9 and even for full LLE measurements.10 If the setup is to be applicable over a wide range of components, the phase composition is typically determined by chromatographic methods.6−8,10 In these cases, the chromatographic methods often are the bottleneck in the experimental workflow because of long analysis times. More rapid analytical methods could be employed to determine phase compositions spectroscopically,5,9 but such methods are limited to optically accessible components. Reducing the number of experiments would therefore be desirable if the same accuracy of LLE data could be retained. This goal can be achieved by the method of model-based optimal experimental design (OED), which identifies the experiments with the highest information content.11 OED has been used to determine optimal experimental setups,12 as well as optimal operating schedules of kinetic experiments.13−17 © 2014 American Chemical Society

Regarding phase equilibria, OED has already been successfully applied to the characterization of vapor−liquid equilibria (VLE) for binary systems18−20 and multinary reactive VLE systems.21 The results of these studies can help the experimenter to design experiments more efficiently. In this work, the theoretical potential of OED for the measurement of LLE was explored. The main design variable was the overall composition to be used in the experiments. In section 2, the LLE model is presented, and the OED method for LLE measurements is introduced. The calculations are based on the nonrandom two-liquid (NRTL) and universal quasichemical (UNIQUAC) gE models. Conventional designs are used as a benchmark to evaluate the benefit of the optimal designs. The results are given in section 3. We studied two ternary mixtures: one system with only one partially miscible binary subsystem (type I LLE22) and one system with two partially miscible binary subsystems connected by one miscibility gap (type II LLE22). In subsection 3.1, the optimal design of a type I LLE described by the UNIQUAC model is discussed; in subsection 3.2, the optimal designs for the same type I LLE described by the NRTL model and for a type II LLE described by both the UNIQUAC and NRTL models are discussed. More case studies are presented in the Supporting Information. The presented theoretical study shows the potential for reducing the experimental effort by a factor of at least 2 without losing prediction accuracy. Conclusions are given in section 4.

2. MODEL-BASED OPTIMAL EXPERIMENTAL DESIGN In this section, we present the model used for the calculation of LLE and a short introduction to the theory of model-based optimal experimental design (OED). Excellent detailed accounts of OED theory can be found in the books of Bard,11 Atkinson et al.,23 and Walter and Pronzato.24 Received: Revised: Accepted: Published: 19620

September 12, 2014 November 19, 2014 November 20, 2014 November 20, 2014 dx.doi.org/10.1021/ie5035573 | Ind. Eng. Chem. Res. 2014, 53, 19620−19627

Industrial & Engineering Chemistry Research

Article

Figure 1. Positions of the optimally designed experiments and the equally distributed conventional experiments in the ternary phase diagrams. Left: (10) (10) α(ξ(10) opt ) (see eq 17). Center: α(ξConv1), using three distinct experiments (see eq 18). Right: α(ξConv2), using 10 distinct experiments (see eq 19). α is a linear scalar [0, 1] to represent the overall compositions z in the middle of the tie lines (cf. subsection 2.1).

The goal is to obtain the parameters τij from the experimental procedure. For the design of experiments, the derivatives of the model equations g(w,z,θ) with respect to the parameters are needed. For nonlinear models, these derivatives depend on the values of the parameters. Hence, an initial estimate of these parameters, θ̂, is needed (cf. subsection 2.2). This estimate could result from predictive models, such as the universal quasichemical functional group activity coefficients (UNIFAC) model27 or the conductor-like screening model for real solvents (COSMO-RS).28 The entire model g(w,z,θ) is solved using the isothermal flash algorithm proposed by Michelsen and Møllerup.29 To simplify the representation of the overall composition z, only overall compositions in the middle of the tie lines are considered in the following. This allows the vectorial quantity z to be replaced by a linear scalar measure α ∈ [0, 1]. α is 0 at the binary subsystem, and α is 1 near the critical point in type I LLE and at the second binary subsystem in type II LLE (see Figure 1). The scaling of α is not necessary; one could optimize the composition z directly. 2.2. Theory of Optimal Experimental Design. A design is defined by a specific combination of experiments. To represent a design, we introduce the design vector ξ

The aim of OED is to identify a combination of experimentsa designthat minimizes the uncertaintiesin statistical terms: the (co)variancesof the model parameters and/or the predictions. In the scope of this work, an LLE experiment essentially consists of the preparation of an overall composition of components, z, that separates into two phases, ′ and ″ with phase compositions x′ and x″, which are analyzed subsequently. Hence, the aim of OED for LLE is to determine optimal overall compositions z to be used in the experiments. For this purpose, a model is needed to relate the overall compositions to the corresponding phase compositions. 2.1. LLE Model. The aim of the LLE model is to relate the phase compositions x′ and x″ to overall compositions z. In experiments, we specify the overall composition z which separates into two phases. We measure the phase compositions x′ and x″. The LLE model has to include material balances as well as equilibrium conditions. Equilibrium of the chemical potentials of both phases is assumed to hold which reduces to the well-known form of equal activities expressed by activity coefficients and mole fractions. Hence, the entire model g(w,z,θ) becomes g(w, z , θ) = 0 ⎡ γ′(θ , x′)xi′ − γ″(θ , x″)xi″, i = 1, ..., nc − 1 ⎤ i ⎢i ⎥ ⎢ ⎥ nc − 1 ⎞ nc − 1 ⎛ ⎛ ⎞ ⎢ γ′ (θ , x′)⎜1 − ∑ x ′⎟ − γ″(θ , x″)⎜1 − ∑ x ″⎟ ⎥ i i ⎜ ⎟ ⎜ ⎟ nc ⎥ = ⎢ nc ⎝ ⎠ ⎝ ⎠ i=1 i=1 ⎢ ⎥ ⎢ ⎥ ⎛ ⎞ ″ ″ − − z x z x 1 1 1 1 ⎢ zi − xi′ − ⎜1 − ⎟xi″, i = 2, ..., nc − 1⎥ ⎥⎦ x1′ − x1″ x1′ − x1″ ⎠ ⎝ ⎣⎢ 2nc − 2

∈

α α2 ⋯ αnexp ⎫ ⎧ ⎪ 1 ⎪ ⎬ ξ=⎨ ⎪v ⎪ ⎩ 1 v2 ⋯ vnexp ⎭

ξ combines experimental conditions αi with design weights vi ∈ [0, 1] such that ∑vi = 1; that is, vi specifies the f raction of experimental effort spent at αi.24 Because experiments can be repeated, the number of distinct experiments nexp is less than or equal to the total number of experiments Nexp. If Nexp experiments are performed with each experiment repeated ri times, the design weights are determined as vi = ri/Nexp and are thus rational numbers. These kinds of designs are called exact designs and are denoted by ξ(Nexp). When allowing the design weights to be real numbers and increasing the total number of experiments Nexp → ∞, ξ can be regarded as the normalized probability distribution of the experiments. This formulation leads to so-called continuous designs, which are denoted by ξ* in the following. Typically, optimal designs employ only a small set of distinct experimental settings nexp even for continuous designs.24

(1)

Here ⎛ x′ ⎞ w = ⎜ ⎟ ∈ 2nc − 2 ⎝ x″ ⎠

(3)

(2)

is the vector of phase compositions, consisting of the conjunction of the (nc − 1)-dimensional mole fraction vectors x of each phase, where nc represents the number of components. The first two lines in eq 1 describe the equilibrium conditions; the third line of eq 1 describes the material balance. In this work, the activity coefficients γi(θ, x) were calculated using the NRTL25 and UNIQUAC26 models. The parameters θ of the activity coefficient models correspond to the interaction parameters τij between components i and j. 19621

dx.doi.org/10.1021/ie5035573 | Ind. Eng. Chem. Res. 2014, 53, 19620−19627

Industrial & Engineering Chemistry Research

Article

ellipsoid by minimizing the determinant of the parameter covariance Vθ11

To apply OED theory, the following assumptions are typically made:11 (1) The model is identifiable and of exact structural type; that is, the parameters θ are at least locally identifiable from the experiments,30−32 and the model is assumed to apply exactly if the true values of the variables w and z and the parameters θ are known. (2) The following properties of the errors are assumed: (a) There is no error in the independent variables z. (b) There is no systematic error in the measurements. (c) Errors in different experiments are independent and normally distributed: w̃ ∼ 5(0, Vw ). (d) Errors in each experiment are distributed with the same covariance matrix Vw. The properties of the errors are of major importance, because the measurement variances induce the variances in the parameters, which, in turn, lead to variances in the model predictions. The strength of this error propagation depends on the experimental design. Thus, there is an optimal design minimizing model parameter and/or prediction variances. An approximate covariance matrix of the parameters, Vθ, can be calculated using the Fisher information matrix F11 Vθ (θ ̂, ξ) ≈ [NexpF(θ ̂, ξ)]−1

ξD = argmin det{Vθ (θ ̂, ξ)}

(2) To determine the experimental settings leading to the most accurate predictions of the phase compositions, we apply the Goptimal design, which minimizes the maximum of the standardized prediction variance dPred23 ξG = argmin max dPred(z , ξ , θ)̂ ξ

(9)

nw

nw

u

v

∑ ∑ σ uvdPred, uv(z, ξ , θ)̂

(10)

Vw−1

where σ is the uvth element of and dPred,uv is the uvth element of VPred. Thus, dPred is an average value of the variance in the predicted phase compositions x′ and x″. Therefore, the G-optimal design is a min−max design that determines the experimental settings ξ that minimize the mean variance in the predictions of phase compositions at the worst total composition z. The D- and G-optimal designs are interlinked through the general equivalence theorem (GET).23,24,33 The GET states that, in the continuous case, the G- and D-optimal designs are equal uv

nexp

∑ vμBμT(AμVwAμT)−1Bμ + Σ0−1

* ξG* = ξD* = :ξopt

(5)

(11)

and that the maximum of the standardized prediction variance is equal to the number of parameters np

where Aμ =

1 Nexp

dPred(z , ξ , θ)̂ =

(4)

μ=1

z

The standardized prediction variance dPred is a normalized, scalar measure of VPred at any overall phase composition z

The Fisher information matrix measures the sensitivity of the model with regard to the parameter values. The Fisher information matrix F can be calculated as11 F (θ ̂ , ξ ) =

(8)

ξ

∂g ∂w

and wμ, zμ, θ ̂

Bμ =

∂g ∂θ

* ) = np max dPred(z , ξopt (6)

The construction of D-optimal designs ξ*D can be executed exp) exp+1) sequentially: Given an initial design ξ(N , the design ξ(N is D D obtained by simply adding the experiment giving the largest decrease in det(Vθ). This is always the experiment in which exp) dPred(z,ξ(N ) is maximal.23 As Nexp → ∞, the sequentially D generated D-optimal exact design converges to the D-optimal continuous design, which is equal to the G-optimal continuous design according to the GET. As exact G-optimal designs cannot be built sequentially, we built a continuous design (cf. eq 11). Subsequently, the design weights of the continuous design were rounded to represent a discrete number of experiments in an exact design. To evaluate the quality of a design ξ, we employ the Def f iciency23

In eq 5, the term in parentheses quantifies the uncertainty in the predictions of the model as a consequence of the measurement uncertainty by incorporating the measurement covariance matrix Vw and the sensitivity of the model equations g to the measurement quantities w. If the impact of measurement uncertainty is large, Fisher information is low. The matrices B express the sensitivity of the model equations to the unknown parameters θ. Fisher information is high where the model output is highly sensitive to the parameter values. Σ0 is the variance−covariance matrix of possibly available preliminary information on parameter uncertainty from previous experiments or simulations. Knowledge of Vθ allows for the calculation of the covariance matrices of the predictions, VPred, for any total composition z11 VPred(z , ξ , θ)̂ = (A−1B)Vθ (A−1B)T

(12)

z

wμ, zμ, θ ̂

⎧ ⎫1/ np ⎪ det[F(θ ̂, ξ)] ⎪ ⎬ ζD = ⎨ ⎪ * )] ⎪ ⎩ det[F(θ ̂, ξopt ⎭

(7)

Hence, VPred describes the variances in the predicted phase compositions x′ and x″ to be expected when the design ξ is executed. Both the parameter covariance Vθ and the prediction covariance VPred are matrices. To minimize these parameter or prediction covariances, scalar measures of the matrices are needed as follows: (1) To determine the experimental settings minimizing the uncertainty in the parameters θ, we use the popular D-optimal design to minimize the volume of the parameter confidence

(13)

23

and the G-ef f iciency

*) max dPred(z , ξopt ζG =

z

max dPred(z , ξ) z

=

np max dPred(z , ξ) z

(14)

The two efficiencies are measures for the information loss in parameter and prediction accuracy, respectively, when one uses the design ξ instead of ξ*; that is, ζi = 0 denotes entire information loss, whereas ζi = 1 implies no information loss. 19622

dx.doi.org/10.1021/ie5035573 | Ind. Eng. Chem. Res. 2014, 53, 19620−19627

Industrial & Engineering Chemistry Research

Article

Vw = I

The efficiencies are defined such that an experiment with a value of ζi = 0.5 would have to be conducted (1/ζi) = 2 times to obtain the same accuracy as in the optimal experiment. For nonlinear models, F depends on the estimated model parameters θ̂ (cf. section 2.1). It is common to assume estimates θ̂ to already be in a local neighborhood of the true values θ.11,34 If the estimates are not in a local neighborhood, the designed experiments based on those initial parameter estimates will not be optimal with respect to the real parameters. An iterative procedure is then necessary to alternate between the design and execution of experiments and a subsequent refinement of the parameters (cf. refs 24 and 35). If required, robust optimal design methods could be employed. For example, robust designs optimize the expected value of the uncertainty based on the probability distribution for the estimated model parameters θ̂.16,24,36−38

(15)

Any diagonal measurement variance matrix Vw with equal diagonal entries will give the same results, because our objective function (eq 9, G-optimality) is scaled by the measurement variance. 3.1. LLE of Type I with the UNIQUAC Model. In this section, the impact of the experimental design on the quality of the predictions made by the UNIQUAC model for a type I LLE experiment is discussed. OED yields a continuous optimal design ξopt * that concentrates on three distinct positions ⎧ 0.22 0.64 1 ⎫ * =⎨ ⎬ ξopt ⎩ 0.2726 0.3635 0.3639 ⎭

(16)

This means that 27% of the experimental effort is spent at the overall composition of α = 0.22, 36% of the experimental effort is spent at the overall composition of α = 0.64, and 37% of the experimental effort is spent at the overall composition of α = 1 (near the critical point). ξ(10) opt contains the following experiments

3. RESULTS In this section, a design of experiments for LLE of type I described by the UNIQUAC model26 is presented in detail. A brief summary of the results of the OEDs for LLE of type I described by the NRTL model25 as well as for LLE of type II described by the UNIQUAC and NRTL models follows; the details can be found in the Supporting Information. The objective function for the optimal design is the minimization of the variance of the predictions. Therefore, an exact G-optimal design ξ(10) opt , containing a total number of 10 experiments, was built by rounding the design weights of the G* . This continuous design ξopt * was optimal continuous design ξopt built sequentially according to the iterative procedure described in subsection 2.2. Therefore, α was discretized into 100 commensurate sections. In each iteration step, the maximum of the standardized prediction variance dPred(z,ξD(Nexp)) was identified by evaluating dPred(z,ξD(Nexp)) at these discrete supporting points. The optimal exact and continuous designs, ξ(10) opt and ξ* opt, respectively, were compared to conventional, (10) intuitive designs, ξ(10) Conv1 and ξConv2. Each conventional design also contains a total of 10 experiments. The conventional designs were chosen to have equidistantly distributed experiments: The first conventional design ξ(10) Conv1 contains as many distinct experiments as the optimal design, whereas the second (10) always contains 10 equally conventional design ξConv2 distributed, distinct experiments (i.e., without repetition of any experiment). The OED methodology assumes an exact structural model and the structural identifiability of the model parameters (see subsection 2.2 and refs 11, 23, and 24). It is well-known that the widely used gE models are not structurally exact, as the models represent approximations of the real phase behavior of vapor−liquid and liquid−liquid equilibria.25,26 If required, robust estimation methods could be applied to deal with errors in the model structure.24 To fulfill the requirements of OED and assess the potential of the optimal experimental design method for the prediction of LLE, we chose to simulate experimental data using gE models with a fixed set of parameters. The model parameters were obtained by adjusting the gE models to experimental data for the ternary system sulfolane−n-heptane−p-xylene. The parameters are locally identifiable because the Fisher information matrix has full rank.30 Mitsos et al. describe a method to ensure the unique solution of the parameter estimation problem.39 The measurement variance Vw was chosen to be the identity matrix

⎧ 0.22 0.64 1 ⎫ (10) ⎬ ξopt =⎨ ⎩ 0.3 0.3 0.4 ⎭

(17)

Here, three experiments are executed at the overall composition of α = 0.22, three experiments are executed at the overall composition of α = 0.64, and four experiments are executed at the overall composition of α = 1. To benchmark the performance of the computed optimal design, conventional, intuitive designs were used. The first conventional design ξ(10) Conv1 contains as many distinct experiments as the optimal design (three in this case), which are evenly distributed between the partially miscible binary subsystem and the critical point ⎧ 0 0.5 1 ⎫ (10) ⎬ ξConv1 =⎨ ⎩ 0.3 0.3 0.4 ⎭

(18)

Because a conventional design consisting of only three distinct experiments is uncommon in practice, we consider another conventional design ξ(10) Conv2 as a second benchmark. This second conventional design consists of 10 distinct experiments that are evenly distributed between the partially miscible binary subsystem and the critical point (10) ξConv2 ⎧ 0 0.11 0.22 0.33 0.44 0.56 0.67 0.78 0.89 1 ⎫ ⎬ =⎨ ⎩ 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1⎭ (19)

Figure 1 (left) shows the positions of the experiments belonging to optimal design ξopt * and ξ(10) opt in a ternary phase diagram, that is, where experiments should be executed to minimize the variance of the predictions made by the UNIQUAC model. The phase diagrams at the center and right of Figure 1 show the positions of the experiments belonging to the first and second conventional designs, ξ(10) Conv1 and ξ(10) Conv2, respectively. In the optimal design, experiments are moved from the binary subsystems to areas where the curvature of the binodal curve is pronounced. Figure 2 shows the predicted standardized variances dPred (cf. eq 10) of the optimal designs ξ*opt and ξ(10) opt , as well as of the (10) conventional designs ξ(10) Conv1 and ξConv2, plotted as functions of α. Hence, it shows the expected average uncertainty in the 19623

dx.doi.org/10.1021/ie5035573 | Ind. Eng. Chem. Res. 2014, 53, 19620−19627

Industrial & Engineering Chemistry Research

Article

conventional design ξ(10) Conv1. Hence, no binary experiment is necessary here. Regarding the prediction variance of conventional design ξ(10) Conv2, an even distribution with low values over a large range of α can be observed. Only at high values of α does a dramatic increase occur. This region corresponds to the high-curvature region of the binodal curve near the critical point. This increase can be explained by the distribution of the experimental effort * and ξ(10) on the experimental positions: In optimal designs ξopt opt , more effort is spent on the region near the critical point than in conventional design ξ(10) Conv2. Hence, a lack of sampling in the region near the critical point leads to insufficient information on the degree of curvature of the binodal curve. 3.2. LLE of a Type I System with the NRTL Model and LLE of a Type II System with the NRTL and UNIQUAC Models. In this section, we briefly summarize the OED for a type I LLE described by the NRTL model and the OEDs for a type II LLE described by the UNIQUAC and NRTL models. Details can be found in the Supporting Information. The type I LLE is based on the experimental data for the ternary system sulfolane−n-heptane−p-xylene, and the type II LLE is based on experimental data for the ternary system 3-methyltetrahydrofuran−di-n-butyl ether−water. Figure 3 shows the positions of the experiments of the optimal experimental designs. Just as in the design for type I LLE described by the UNIQUAC model (cf. subsection 3.1), the optimal experiments concentrate on regions where the curvature of the binodal curve is pronounced. Table 1 reports the G and D-efficiencies, ζG (eq 14) and ζD (eq 13), respectively, and the mean predicted standardized variances, mean(dPred) (eq 10), of the continuous optimal designs ξ*opt and the exact optimal and conventional designs (10) (10) ξ(10) opt , ξConv1, and ξConv2 for type I and type II LLE described by the NRTL and UNIQUAC models. The exact designs consist of 10 experiments each, distributed according to the design weights vi (cf. eq 3). Comparison of the continuous optimal design ξ*opt and the exact optimal design ξ(10) opt shows little loss of information when building an exact design based on a continuous design (cf. introduction of section 3). By definition, the G- and Defficiencies ζG and ζD are 100% for the continuous design, whereas for the exact designs, the values of the D-efficiencies ζD are ≥99% and the values of the G-efficiencies ζG are ≥83%. The

Figure 2. Predicted standardized variances for the optimal design plotted against the design variable α. α is a linear scalar [0, 1] to represent the overall compositions z in the middle of the tie lines (cf. subsection 2.1).

predicted phase compositions once the corresponding design (10) (10) ξ*opt, ξ(10) opt , ξConv1, or ξConv2, has been executed. For the continuous optimal design ξ*opt, we find that *)≤6 dPred(ξopt

(20)

where 6 is the number of parameters in the UNIQUAC model for a ternary system. Thus, the computed optimal design is optimal according to the general equivalence theorem (cf. eq 12). The values of the predicted standardized variances of the exact optimal design, dPred(ξ(10) opt ), agree well with the values of the predicted standardized variances of the continuous optimal design, dPred(ξopt * ). Hence, the exact design ξ(10) opt is a good approximation of the continuous design ξopt * (cf. introduction of (10) section 3). The prediction variances of ξopt are evenly distributed and hence independent of α; that is, the quality of the predicition is equally good all along the binodal curve. In contrast, the prediction variance of conventional design ξ(10) Conv1 is the largest for almost any value of α, especially at the large local maxima between the experimental supporting points: The maximum of dPred(ξ(10) Conv1) is more than 3 times higher than the maximum of dPred(ξ(10) opt ). Thus, the total number of experiments in conventional design ξ(10) Conv1 has to be increased threefold to achieve a comparable accuracy in the predictions as in optimal design ξ(10) opt . It can therefore be concluded that the information gain on the shape of the binodal curve is low when executing experiments on the binary subsystem, as in

Figure 3. Positions of the optimally designed experiments in the ternary phase diagrams. Left: α(ξ(10) opt ) based on the NRTL model for type I LLE. (10) Center: α(ξ(10) opt ) based on the NRTL model for type II LLE. Right: α(ξopt ) based on the UNIQUAC model for type II LLE. α is a linear scalar [0, 1] to represent the overall compositions z in the middle of the tie lines (cf. subsection 2.1). 19624

dx.doi.org/10.1021/ie5035573 | Ind. Eng. Chem. Res. 2014, 53, 19620−19627

Industrial & Engineering Chemistry Research

Article

Table 1. G and D Efficiencies ζG (Eq 14) and ζD (Eq 13) and Mean Predicted Standardized Variances mean(dPred) (Eq 10) of (10) (10) Conventional Designs ξ(10) Conv1 and ξConv2 and of Optimal Designs ξ* opt and ξopt for Type I and Type II LLE Based on the NRTL and UNIQUAC Models UNIQUAC model ξ(10) Conv1

ξ(10) Conv2

ξ*

ζG ζD mean(dPred)

0.28 0.69 12.03

0.37 0.75 5.44

1 1 5.22

ζG ζD mean(dPred)

0.06 0.55 14.08

0.24 0.63 5.75

1 1 5.57

NRTL model ξ(10) opt Type I 0.88 0.99 5.48 Type II 0.83 0.99 6.09

ξ(10) Conv1

ξ(10) Conv2

ξ*

ξ(10) opt

0.50 0.83 7.79

0.37 0.78 5.40

1 1 5.43

0.92 0.99 5.64

0.36 0.79 6.42

0.34 0.74 5.20

1 1 5.39

0.88 0.99 5.57

predicted standardized variances that are in the range of and even slightly lower than the values of the optimal designs. This is due to the very low values of the predicted standardized variances where the curvature of the binodal curve is low as a consequence of the many distinct experiments of ξ(10) Conv2. One common characteristic of the optimal designs is the high fraction of experimental effort spent at regions near the critical point. With current gE models, the critical region is usually not well predicted. Thus, in practice, the predicted optimal design point might actually not even lie in the two-phase region. For practical applications, α should therefore be initially reduced to α ∈ [0, 0.8], for example, when using gE models, to avoid the region near the critical point. However, our general findings about the resulting optimal designs remain qualitatively the same as shown by an optimal design study for α ∈ [0, 0.8] in the Supporting Information. Based on the first measurements, the optimal design can then be iteratively refined according to the procedure described in section 2.2. The presented findings also hold for mixtures with smaller miscibility gaps. This is demonstrated for the mixture cyclohexane−methanol−toluene in the Supporting Information. Although optimal experiments are also carried out in the partially miscible binary subsystem in this case, more than 70% of the experimental effort is spent in the high-curvature region of the miscibility gap.

predicted standardized variances of the optimal designs are low even in high-curvature regions of the binodal curve. Again, this is because the experiments concentrate on high-curvature regions. The objective function of the OED is the minimization of the variance of the predictions, that is, maximization of the Gefficiencies ζG. Regarding the G-efficiencies, we observe that the first (10) performs poorly in all cases: conventional design ξConv1 ) does not exceed 50% for the designs based on the ζG(ξ(10) Conv1 NRTL model and is even less than 28% for the designs based on the UNIQUAC model. For the conventional design based on the UNIQUAC model for a type II LLE, the G-efficiency ζG(ξ(10) Conv1) is particularly low (6%) because of an increased curvature of the binodal curve described by the UNIQUAC model compared to the binodal curve described by the NRTL model. The second conventional design, ξ(10) Conv2, also provides low values of G-efficiencies: ζG(ξ(10) Conv2) ≤ 37%. The G-efficiencies are especially low for the conventional designs applied to type II LLE described by the UNIQUAC model. This is due to very high predicted standardized variances in regions where the curvature of the binodal curve is large. In these cases, the prediction accuracy is increased by a factor of up to 14 through the use of the optimal design. However, the second conventional design, ξ(10) Conv2, performs well in regions where the curvature of the binodal curve is low. The values of the D-efficiencies of the first conventional design, ζD(ξ(10) Conv1), are between 55% and 93%, whereas the values of the D-efficiencies of the second conventional design, ζD(ξ(10) Conv2), are between 63% and 78%. Hence, with regard to the parameters, less information is lost than for the predictions in comparison to the optimal designs. The parameter accuracy can thus be increased by a factor up to 2 through the use of an optimal experimental design. According to the general equivalence theorem (cf. eq 12), the maximum value of the predicted standardized variance does not exceed the value of the number of parameters (i.e., six for a gE model describing a ternary system) for a continuous optimal experimental design. As this is the lowest maximum value that can be achieved, it is a suitable benchmark for comparison with the mean values of the predicted standardized variances of the different designs to each other. The first conventional design, ξ(10) Conv1, shows the largest mean predicted standardized variance in all cases. The reason for this is the distinct maxima of the mean predicted standardized variances, which are located between the experimental supporting points (cf. Figure 2). The second conventional design, ξ(10) Conv2, shows mean values of the

4. CONCLUSIONS In this work, the theoretical potential of model-based optimal experimental design (OED) for the characterization of type I and type II LLE has been investigated. The designs are based on the NRTL and UNIQUAC gE models, because these models are widely used in liquid−liquid phase equilibria calculations. These gE models are well-known to usually not be structurally exact. Because OED is a model-based procedure, the information gained by using OED is always determined by the quality of the underlying model, and our analysis thus provides an upper bound for the efficiency improvements to be expected in practice. The results show that simply reducing the number of distinct experimental settings and distributing them equally leads to poor predictions in regions where no measurements have been performed, especially if those regions include an increased curvature of the binodal curve. Simply increasing the number of equidistantly distributed supporting points in conventional designs leads to good predictions in most parts of the phase diagram. However, this approach results in high experimental effort and loss of performance in high-curvature regions of the binodal curve. 19625

dx.doi.org/10.1021/ie5035573 | Ind. Eng. Chem. Res. 2014, 53, 19620−19627

Industrial & Engineering Chemistry Research

Article

Calligraphic Symbol

Our investigations show that optimal experimental designs for the characterization of LLE employ very few (3−5) distinct experiments and that a large fraction of the experimental effort is located in regions where the curvature of the binodal curve is high. Thereby, at least half of the experimental effort can be saved without losing either prediction or parameter accuracy. Hence, the significant potential of OED for LLE is demonstrated: fewer experiments leading to higher prediction accuracy. The full potential will be exploited with improved LLE models, but it is already possible to derive a simple design rule to improve LLE experiments: Perform more experiments in high-curvature regions of the binodal curve.



5 = normal distribution Superscripts

′ = phase ′ ″ = phase ″ * = continuous (design) (i) = i-exact (design) Subscripts



ASSOCIATED CONTENT

* Supporting Information S

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Author Contributions †

D.D. and L.W. ontributed equally to this work

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was performed as part of the Cluster of Excellence “Tailor-Made Fuels from Biomass”, which is funded by the Excellence Initiative of the German federal and state governments to promote science and research at German universities. D.D. acknowledges support by the Fonds National de la Recherche, Luxembourg (788803).



REFERENCES

(1) Dohrn, R.; Pfohl, O. Thermophysical propertiesIndustrial directions. Fluid Phase Equilib. 2002, 194−197, 15−29. (2) Case, F. H.; Chaka, A.; Moore, J. D.; Mountain, R. D.; Ross, R. B.; Shen, V. K.; Stahlberg, E. A. The sixth industrial fluid properties simulation challenge. Fluid Phase Equilib. 2011, 310, 1−3. (3) Dohrn, R.; Treckmann, R.; Olf, G. A centralized thermophysicalproperty service in chemical industry. Inst. Chem. Eng. Symp. Ser. 1997, 142, 113−124. (4) Wakeham, W.; Assael, M.; Atkinson, J.; Bilek, J.; Fareleira, J.; Fitt, A.; Goodwin, A.; Oliveira, C. Thermophysical property measurements: The journey from accuracy to fitness for purpose. Int. J. Thermophys. 2007, 28, 372−416. (5) Carlsson, K.; Karlberg, B. Determination of octanol−water partition coefficients using a micro-volume liquid−liquid flow extraction system. Anal. Chim. Acta 2000, 423, 137−144. (6) Bolden, R. D.; Hoke, S. H., II; Eichhold, T. H.; McCauley-Myers, D. L.; Wehmeyer, K. R. Semi-automated liquid−liquid back-extraction in a 96-well format to decrease sample preparation time for the determination of dextromethorphan and dextrorphan in human plasma. J. Chromatogr. B 2002, 772, 1−10. (7) Kuzmanović, B.; van Delden, M. L.; Kuipers, N. J. M.; de Haan, A. B. Fully Automated Workstation for Liquid−Liquid Equilibrium Measurements. J. Chem. Eng. Data 2003, 48, 1237−1244. (8) Alimuddin, M.; Grant, D.; Bulloch, D.; Lee, N.; Peacock, M.; Dahl, R. Determination of log D via Automated Microfluidic Liquid− Liquid Extraction. J. Med. Chem. 2008, 51, 5140−5142. (9) Marine, N. A.; Klein, S. A.; Posner, J. D. Partition Coefficient Measurements in Picoliter Drops Using a Segmented Flow Microfluidic Device. Anal. Chem. 2009, 81, 1471−1476. (10) Dechambre, D.; Pauls, C.; Greiner, L.; Leonhard, K.; Bardow, A. Towards automated characterisation of liquid−liquid equilibria. Fluid Phase Equilib. 2014, 362, 328−334. (11) Bard, Y. Nonlinear Parameter Estimation; Academic Press: New York, 1974. (12) Bertakis, E.; Kalem, M.; Pfennig, A. Model-based geometry optimization of a Nitsch cell using the Fisher information matrix. Chem. Eng. Sci. 2008, 63, 4881−4887. (13) Galvanin, F.; Barolo, M.; Bezzo, F. Online Model-Based Redesign of Experiments for Parameter Estimation in Dynamic Systems. Ind. Eng. Chem. Res. 2009, 48, 4415−4427. (14) Galvanin, F.; Boschiero, A.; Barolo, M.; Bezzo, F. Model-Based Design of Experiments in the Presence of Continuous Measurement Systems. Ind. Eng. Chem. Res. 2011, 50, 2167−2175. (15) Flores-Sánchez, A.; Flores-Tlacuahuac, A.; Pedraza-Segura, L. L. Model-Based Experimental Design to Estimate Kinetic Parameters of the Enzymatic Hydrolysis of Lignocellulose. Ind. Eng. Chem. Res. 2013, 52, 4834−4850. (16) Barz, T.; Arellano-Garcia, H.; Wozny, G. Handling Uncertainty in Model-Based Optimal Experimental Design. Ind. Eng. Chem. Res. 2010, 49, 5702−5713. (17) Kalem, M.; Bertakis, E.; Pfennig, A. Anwendung der modellbasierten Versuchsplanung zur Ermittlung von Kinetikparametern bei Grenzflächenreaktionen. Chem.-Ing.-Tech. 2008, 80, 79−85.

Detailed discussion of LLE of type I described by the NRTL model and LLE of type II described by the NRTL and UNIQUAC models. NRTL and UNIQUAC models and NRTL and UNIQUAC derivatives with respect to model parameters and mole fractions. NRTL and UNIQUAC parameters. Derivatives of the LLE model with respect to the model parameters and mole fractions. This material is available free of charge via the Internet at http://pubs.acs.org.



D = D-optimal G = G-optimal opt = optimal μ = measurement

NOTATION

Roman Symbols

dPred = standardized variance of the predictions F = Fisher information matrix nc = number of components np = number of parameters nexp = number of distinct experiments Nexp = total number of experiments ri = number of repetitions v = design weight Vw = covariance matrix of the measurements VPred = covariance matrix of the predictions Vθ = covariance matrix of the parameters w = conjunction of phase compositions x = phase composition z = overall composition Greek Symbols

α = tie-line variable γi = activity coefficient of species i θ = model parameters ξ = design 19626

dx.doi.org/10.1021/ie5035573 | Ind. Eng. Chem. Res. 2014, 53, 19620−19627

Industrial & Engineering Chemistry Research

Article

(18) Sutton, M. The analysis and design of binary vapour-liquid equilibrium experiments. Part II The design of experiments. Canadian Journal of Chemical Engineering 1977, 609−613. (19) Doví, V. G.; Reverberi, A. P.; Maga, L. Optimal design of sequential experiments for error-in-variables models. Comput. Chem. Eng. 1993, 17, 111−115. (20) Nilles, M.; Wozny, G.; Barz, T. Kalibrierung dynamischer Modelle durch adaptive, optimale Versuchspläne. Chem.-Ing.-Tech. 2014, 86, 953−965. (21) Alsmeyer, F. Neue Mess- und Auswertemethoden zur Bestimmung von Dampf-Flüssigkeits-Gleichgewichten in reagierenden Systemen; Dissertation; VDI-Verlag: Düsseldorf, Germany, 2003; Reihe 3, Nr. 787. (22) Treybal, R. E. Liquid Extraction; McGraw-Hill: New York, 1963. (23) Atkinson, A. C.; Donev, A. N.; Tobias, R. D. Optimum Experimental Designs, with SAS; Oxford University Press: Oxford, U.K., 2007. (24) Walter, E.; Pronzato, L. Identification of Parametric Models; Springer Verlag: New York, 1997. (25) Renon, H.; Prausnitz, J. M. Local compositions in thermodynamic excess functions for liquid mixtures. AIChE J. 1968, 14, 135−144. (26) Abrams, D. S.; Prausnitz, J. M. Statistical thermodynamics of liquid mixtures: A new expression for the excess Gibbs energy of partly or completely miscible systems. AIChE J. 1975, 21, 116−128. (27) Fredenslund, A.; Jones, R. L.; Prausnitz, J. M. Groupcontribution estimation of activity coefficients in nonideal liquid mixtures. AIChE J. 1975, 21, 1086−1099. (28) Klamt, A. Conductor-like Screening Model for Real Solvents: A New Approach to the Quantitative Calculation of Solvation Phenomena. J. Phys. Chem. 1995, 99, 2224−2235. (29) Michelsen, M. L.; Møllerup, J. Thermodynamic Models: Fundamentals and Computational Aspects; Tie-Line Publications: Holte, Denmark, 2007. (30) Miao, H.; Xia, X.; Perelson, A. S.; Wu, H. On identifiability of nonlinear ODE models and applications in viral dynamics. SIAM Rev. 2011, 53, 3−39. (31) Galvanin, F.; Ballan, C. C.; Barolo, M.; Bezzo, F. A general model-based design of experiments approach to achieve practical identifiability of pharmacokinetic and pharmacodynamic models. J. Pharmacokinet. Pharmacodyn. 2013, 40, 451−467. (32) Grewal, M.; Glover, K. Identifiability of linear and nonlinear dynamical systems. IEEE Trans. Autom. Control 1976, 21, 833−837. (33) Kiefer, J.; Wolfowitz, J. The equivalence of two extremum problems. Can. J. Math. 1960, 12, 234. (34) Schöneberger, J. C.; Arellano-Garcia, H.; Wozny, G. Local Optima in Model-Based Optimal Experimental Design. Ind. Eng. Chem. Res. 2010, 49, 10059−10073. (35) Marquardt, W. Model-based experimental analysis of kinetic phenomena in multi-phase reactive systems. Chem. Eng. Res. Des. 2005, 83, 561−573. (36) Walter, E.; Pronzato, L. Optimal experiment design for nonlinear models subject to large prior uncertainties. Am. J. Physiol.: Regul., Integr. Comp. Physiol. 1987, 253, R530−R534. (37) Pronzato, L.; Walter, E. Robust Experiment Design via Maximin Optimization. Math. Biosci. 1988, 89, 161−176. (38) Chu, Y.; Hahn, J. Integrating parameter selection with experimental design under uncertainty for nonlinear dynamic systems. AIChE J. 2008, 54, 2310−2320. (39) Mitsos, A.; Bollas, G. M.; Barton, P. I. Bilevel optimization formulation for parameter estimation in liquid−liquid phase equilibrium problems. Chem. Eng. Sci. 2009, 64, 548−559.

19627

dx.doi.org/10.1021/ie5035573 | Ind. Eng. Chem. Res. 2014, 53, 19620−19627