Effect of Systematic and Random Errors in Thermodynamic Models on

The behavior of the error propagation in these kinds of computer models is highly ... computer models of the type used in chemical engineering applica...
1 downloads 0 Views 172KB Size
3036

Ind. Eng. Chem. Res. 1999, 38, 3036-3045

Effect of Systematic and Random Errors in Thermodynamic Models on Chemical Process Design and Simulation: A Monte Carlo Approach V. R. Vasquez and W. B. Whiting* Chemical Engineering Department, University of NevadasReno, Reno, Nevada 89557-0136

A Monte Carlo method is presented to separate and study the effects of systematic and random errors present in thermodynamic data on chemical process design and simulation. From analysis of thermodynamic data found in the literature, there is clear evidence of the presence of systematic errors, in particular, for liquid-liquid equilibria data. For systematic errors, the data are perturbed systematically with a rectangular probability distribution, and to analyze random errors, the perturbation is carried out randomly with normal probability distributions. Thermodynamic parameters are obtained from appropriate regression methods and used to simulate a given unit operation and to obtain cumulative frequency distributions, providing a quantitative risk assessment and a better understanding of the role of uncertainty in process design and simulation. The results show that the proposed method can clearly distinguish when one type of error is more significant. Potential applications are safety factors, process modeling, and experimental design. 1. Introduction The quality of any computational method used for simulation and prediction is measured in terms of the accuracy and precision of the output results, which are affected by all the uncertainty sources present in the computational procedure. These include errors present in the input data or variables, numerical and modeling errors, and any assumptions made. The evaluation of the effects of these uncertainty sources on the quality of the computational model often cannot be carried out using analytical statistical methods because of the complexity involved. From the chemical process simulation standpoint, we can visualize the computer model as a set of numerical steps to produce the evaluation of a given unit operation or process. For instance, the performance evaluation of a distillation column involves a basic step sequence defined by the acquisition of experimental data, thermodynamic modeling, model parameters regression, and modeling of the distillation process. The behavior of the error propagation in these kinds of computer models is highly nonlinear and complex, so Monte Carlo methods are preferred to study the uncertainty propagation. When uncertainty and sensitivity analyses are performed, it is usually implicitly assumed that the thermodynamic and unit-operation models are reasonable representations of reality. Thus, these studies are mainly conducted based on the experimental data errors as the main source of uncertainty. Even though the idea of using the experimental data errors as the main source can be useful for many practical situations, it has been common to include only the random experimental errors, neglecting the effect of systematic or bias errors. In this work, a new Monte Carlo method is introduced to evaluate the impact of systematic errors and to * To whom correspondence should be addressed. E-mail: [email protected]. Tel.: 1-775-784-6360. Fax: 1-775-7844764.

differentiate their effects from those of random errors in predicted process performance. The method consists of the identification of the systematic trends and random errors present in the experimental data, definition of appropriate probability distributions for both types of errors, and Monte Carlo simulation using samples from these distributions. The method was tested using both a synthetic example and case studies of liquid-liquid extraction operations. The results show that the method is capable of distinguishing when one type of error has a dominant effect over the other. For the liquid-liquid extraction cases studied, it is shown that systematic errors can have a significant impact on the predicted process performance, and therefore notoriously influence the uncertainty analysis. Section 2 contains a brief description of the characteristics of random and systematic errors. In section 3, the basic uncertainty propagation mechanism through computer models of the type used in chemical engineering applications is discussed, followed by a detailed description of the new Monte Carlo approach for the uncertainty analysis of these two types of errors in complex models. Applications for the prediction of unit operations performance are presented in section 4 with two case studies of liquid-liquid extraction. In addition, the method presented can be used to facilitate decision making in fields related to safety factors selection, modeling, and experimental data measurement and design. Another important problem not addressed in this work is the effect of modeling error in uncertainty propagation analysis, which is certaintly an open problem for further research. Modeling errors can have significant effects on process design and simulation as pointed out by Vasquez and Whiting1 using the UNIQUAC and NRTL models in the process performance prediction of liquid-liquid extraction operations.

10.1021/ie980748e CCC: $18.00 © 1999 American Chemical Society Published on Web 06/22/1999

Ind. Eng. Chem. Res., Vol. 38, No. 8, 1999 3037

2. Random and Systematic Errors

standard deviation estimates:

In general, random and systematic errors can be defined as a measure of the state of information available for a given set of experimental data or observable parameters. The definition of the state of information for experimental measurements is a controversial issue and usually is related to the concept of data space. Some authors defined the data space as all the conceivable instrumental responses,2 but this definition is vague from the practical standpoint because an experimental measurement is a function of many additional variables external to the instrument itself. For instance, systematic external fluctuations in the working environment as well as external random fluctuations significantly affect the instrument output in many situations. The experimentalist has the task of reporting either quantitatively or qualitatively these effects, but the tools available are still quite basic. Additionally, the error sources identification process is not straightforward, making it even more difficult to report estimates for the “true” experimental condition. Traditional practices involve only reporting the sample average (xj) with a variability measure based on the instrument (precision) statistics. However, at least a qualitative idea of systematic and other external errors should be reported. According to ANSI/ASME PTC 19.1 Standard,3 a measurement uncertainty analysis should be able to identify more than four sources of systematic or bias errors (otherwise, there is too great a chance that some important sources will go unrecognized), and authors such as Hayward4 indicate that a good uncertainty analysis might reveal dozens of primary measurement errors. The main measurement error sources, as classified by ANSI/ASME,3 are in (a) calibration, (b) data acquisition, and (c) data reduction procedures. On one hand, calibration errors come from the assumption that the instrument calibration is done in such a way that its response is identical to some known standard in the working environment. On the other hand, data acquisition errors are related to error sources in the process such as excitation voltages, signal conditioning, probe errors, and so forth. Finally, data reduction errors involve errors in calibration curve fitting and computational resolution. All these error sources are subject to random and systematic variations, which should be quantified. Decomposing the total error for a given experiment µ as defined by ANSI/ASME,3

δµ ) β + µ

(1)

where β is a fixed bias error and µ is a random precision error. The bias error is assumed constant for a given experiment. The methodology suggested to estimate β is based on bias limits defined as follows: 3

K

σS,ij2]1/2 ∑ ∑ j)1 i)1

σS ) [

(2)

where σS represents an average of the elemental bias errors, j includes the categories involved (i.e., (a), (b), and (c) error types), and i defines the sources within a given category. Similarly, the same approach is used to define the total random error based on individual

3

K

∑ ∑SS,ij2]1/2 j)1 i)1

SR ) [

(3)

A similar approach for including both random and bias errors is presented by Dietrich,5 with minor variations. The main difference lies in the use of a Gaussian tolerance probability κ multiplying a quadrature sum of both types of errors:

U ) κ[SR2 + σS2]1/2

(4)

where κ is the value used to define uncertainty intervals for means of large samples of Gaussian populations defined as µ ( κσ. Additional formulas are presented when dealing with small samples, replacing the Gaussian probability term κ by a tolerance probability from the “t” student distribution, which is equivalently used to define uncertainty intervals for means of small samples as xj ( ts. Also, formulations for effective degrees of freedom and t effective values are given when both types of errors are present. On one hand, problem with this formulation as pointed out by Dietrich is that it is very difficult to estimate the effects on the output results due to systematic errors. On the other hand, the ANSI/ASME PTC 19.13 establishes that the random uncertainty is estimated by the statistical analysis of experimental data, while the systematic uncertainty is estimated by nonstatistical methods, suggesting that one should keep the random uncertainty separate from the systematic one. The International Organization for Standardization (ISO) in the document Guide to the Expression of Uncertainty in Measurement6 also conceptually defines the role of random and systematic error on the measurement process, and the use of additive correction factors is suggested to decrease the effect of systematic errors when the source is identified. As noted by ISO, the systematic error and its cause cannot be completely known; therefore, the compensation cannot be complete. Thus, ISO does not present specific methods and techniques to deal with this problem. The combination of random and systematic uncertainties for statistical analysis has became even more controversial in recent years, as pointed out by Dietrich,5 due to the difficulties found in linking random and systematic probability distributions when dealing with highly complex models. 3. Propagation of Uncertainty in Computer Models A computer model, in a broad sense, can be defined as any computer code that implements mathematical models for a given phenomenon, which cannot be analyzed directly or analytically because of its complexity. Any unit operation simulation process, involving the use of experimental data as an initial step to obtain parameters, easily falls into this category. The random and systematic error propagation in this kind of model becomes very complex from the analytical standpoint, particularly the consideration of bias errors. Most of the work regarding error propagation theory is defined in terms of Taylor’s series expansion techniques3,7 of the model involved for independent random errors and mainly considers only the first two terms in the series. For instance, for a function with two independent

3038 Ind. Eng. Chem. Res., Vol. 38, No. 8, 1999

variables, the expansion is expressed as

f(x+δx,y+δy) ≈ f(x,y) +

∂f ∂f x + y + R2 ∂x ∂y

(5)

where δx and δy are the errors associated with the variables x and y. R2 is the remainder after two terms, which is generally assumed not significant. On the basis of eq 5, the ANSI/ASME PTC 19.1 Standard3 presents formulas for the equivalent variance term propagated for several functions based on the variance terms of the independent variables. For example, for a function r ) f(x,y) the expressions are of the type

Sr2 )

(∂x∂rS ) + (∂y∂rS ) 2

x

2

y

(6)

From a practical standpoint, it is obvious that this type of approach cannot be implemented even for moderately complex computer models because of the inability to develop the expressions for the Taylor’s series expansion. Extremely poor resolution of the error is obtained when the complexity of the model increases. To get around this problem, the use of Monte Carlo methods are very popular. Iman and Conover8 and Iman and Helton9 pioneered the uncertainty analysis of computer models through the use of Monte Carlo methods. Basically, their approach consists of assigning probability distributions to the model parameters involved, and from there perform sampling over the distributions to obtain parameter sets, which are used to study the variation or imprecision of the output or dependent variables of the model from the collective variation of the independent parameters. The parameter sets are obtained via random-sampling procedures, with modifications to accelerate the convergence of the cumulative frequency distributions for the output variables. The most common modification is the use of stratified sampling, in particular, Latin hypercube sampling (LHS),10 where the sample space is divided into intervals of equal probability and one sample is taken at random from within each interval. All of these uncertainty analyses methods based on Monte Carlo simulation using probability distributions for the model parameters have the disadvantages that they cannot incorporate the effect of systematic errors and the parameter correlation has to be taken into account to obtain reliable results. Determining the parameter correlation is a difficult task when dealing with nonlinear models, as shown by Bates and Watts11,12 and Cook and Witmer.13 Iman and Conover14 suggested a pairing procedure to approximate the parameters correlation when sampling using LHS. However, Vasquez et al.15 showed that the resolution of this pairing procedure for highly nonlinear models can be very poor. The latter developed a new sampling technique called equal probability sampling (EPS) that automatically takes into account the parameter correlation, increasing the resolution of the parameter space sampling substantially. Most of the emphasis given in the literature for the analysis of systematic errors is related to specific physical situations or cases such as parameter bias elimination for log-transformed data16 or the report of experimental data ratios to reduce bias effects.17 4. The Monte Carlo Approach The two previous sections clearly demonstrate the necessity for further development of methods for incor-

porating the effect of random, systematic, and even modeling errors in uncertainty and sensitivity analyses of computer models. Because of the obvious complexity of error propagation in a computer model, which can involve hundreds of equations and variables, Monte Carlo methods seem to be the most practical way of incorporating the effect of these types of errors in the uncertainty analysis. 4.1. Method. It is reasonable to incorporate the systematic and random error effects in the uncertainty analysis at the stage of the computer model where they are initially detected. In other words, if we have a computer model to simulate the performance of a liquid-liquid extraction operation involving fluid-phase equilibria predictions (assuming no modeling error), the presence of systematic and random errors will be detected mainly in the experimental data used to regress any parameter of the computer model. Thus, the main idea behind the approach proposed is to perform perturbations in the experimental data according to the type of errors present and their probability distributions, and then to obtain the necessary model parameter sets to perform Monte Carlo simulations of the system under the uncertain conditions detected. This approach has the advantage of not explicitly dealing with the estimation of experimental error propagation to the parameter estimates, which is needed to define appropriate probability distributions of the parameters if the uncertainty analysis is carried out starting from the parameter space as usually done.8,9 Additionally, the issue of analyzing systematic trends present in the experimental data becomes easier. As a first step, the approach consists of defining probability distributions for the random errors based on experimental evidence. This includes information about instrument statistics and any other source of information available. Second, one defines the nature of any systematic error present in the experimental data by comparison with standard values, with literature sources, and/or from the experimental conditions used to obtain the data. With this information, one establishes any bias limits and dynamic dependencies of the bias errors for the model variables to define appropriate probability distributions. Once the statistical information for the errors involved has been generated, there are two possibilities to set up the uncertainty analysis: (a) separately study the random effects and the systematic ones or (b) analyze the combined effect of both types of errors. In this work, we are interested in quantifying the influence of the bias and random errors on the uncertainty analysis as well as the combined effect of both of them. For each experimental datum, using its probability distribution defined from the experimental random information [in practice, often the probability distributions will be assumed normal N(xj,sx2)], one generates n samples according to the precision desired for the estimated cumulative distributions. These values are then collated to produce n pseudo-experimental data sets. Then, one regresses n sets of model parameters using the n pseudo-experimental data sets as input. Evaluations of the computer model are performed using the n parameter sets obtained from the previous step, and the cumulative frequency distributions (cfd) for the output variables are constructed. These cfd’s represent the probability distributions of the output variable under the influence of random errors. The statistical

Ind. Eng. Chem. Res., Vol. 38, No. 8, 1999 3039

characterization of the cfd quantifies the effect of the random errors on the computer model. For the case for systematic errors, the approach consists of randomly generating pseudo-experimental data sets from inside the bias limits by shifting the original data set according to a representative rectangular probability distribution. Notice that the systematic or bias limits can be dynamic with the independent variables, so the shifting procedure has to take this into account. The methodologies suggested for the estimation and analysis of systematic errors by the authors and organizations mentioned in section 2 do not include the possibility of any dynamic behavior of the systematic errors with the variables involved in the experiment or measuring process. With the n pseudo-experimental data sets systematically obtained, again the model parameters are regressed and evaluations of them in the computer model are done to obtain the cumulative frequency curves quantifying the influence of the systematic errors on the output variables. Having the cumulative frequency distributions produced by considering the systematic and random errors separately, now it is possible to separately evaluate the effect of both types and to determine which has a more pronounced influence on the output variables, thus improving the state of information of the model for decision-making purposes. Additionally, as mention earlier, it is of interest to evaluate the combined effect of both types of errors on an output variable of the computer model. This can be done by systematically shifting the experimental data set as explained above for the systematic case and then shifting each individual datum by a random normal deviate using the methodology suggested for the analysis of random errors. This procedure is repeated n times to generate the pseudoexperimental data sets necessary to perform the parameter regressions, and the computer model evaluations or simulations. The cumulative frequency distribution produced by this approach quantifies the combined effect of both types of errors. The approach proposed is demonstrated below with a synthetic example taken from the literature and then with specific thermodynamic cases. 4.2. A Basic Example. A numerical example involving an exponential function presented by Dolby and Lipton18 is used to demonstrate the use of the new approach for analyzing the effects of systematic and random errors using Monte Carlo methods. The model is of the form

y(x) ) R + βγx

(0 < γ < 1)

(7)

where x is the independent variable and R, β, and γ are parameters. For the case of analysis of random error effects, the experimental data were simulated by adding random normal deviates to a hypothetical set of “true values” defined by points satisfying the relationship

y(x) ) 1.0 + 6.0(0.55)x

(8)

The hypothetical “true values” were obtained by evaluating xk+1 ) k in eq 8, for k ) 0.0, ..., 6.0 with unit increments. The normal distributions used for each of the data points are N(y(xi),σy) and N(x,σx) for i ) 1, ..., 7, and σy is computed as 10% of y(xi) and σx is 10% of xi. The bias error effects are simulated by randomly choosing β values from a uniform distribution given by U[4.0,8.0]. Thus, eq 7 with R ) 1.0, β ) 8.0, and γ )

Figure 1. Upper and lower synthetic systematic error limits defined for the model y(x) ) R + β(γ)x. The points shown also include a 10% random normal variation on hypothetical experimental data.

Figure 2. Comparison of the systematic and random effects on the cumulative frequency curve estimation for the function value at x ) 1.0 in the equation y(x) ) R + β(γ)x.

0.55 and with R ) 1.0, β ) 4.0, and γ ) 0.55 defined the synthetic bias upper and lower limits for y. Figure 1 shows these two limits and hypothetical experimental data showing the behavior of the random error used. One hundred pseudo-experimental data with random deviates were used to regress the parameters R, β, and γ of eq 7 using a direct search optimization method (Powell’s method19). With the parameter sets obtained, eq 8 was evaluated at x ) 1.0. The results are presented in Figure 2b, which is the cumulative frequency distribution. It shows the influence of the random errors simulated on that specific value of the model. Similarly, 100 parameter sets (R, β, γ) were regressed from pseudoexperimental data sets using systematic deviates for y as explained before. Figure 2a presents the cumulative frequency distribution of eq 8 evaluated at x ) 1.0. Comparing Figure 2, part a and b, it is clear that the effect of the systematic error is more significant than the one produced by the random errors. This was expected in this case, showing that the methodology is able to distinguish random and systematic effects when they are significantly different. Another important issue is that the mean of the cfd for the random errors is a good estimate for the unknown “true” value of the output variable from the probabilistic standpoint. However, this is not the case for the cfd obtained for the

3040 Ind. Eng. Chem. Res., Vol. 38, No. 8, 1999

systematic effects; any value on that distribution can be the unknown “true” value. This is because every value is equally likely in this case, but the knowledge of the cfd broadness becomes very important for decision making, even more important than for the case of random error effects. The cfd broadness is a measure of how uncertain and sensitive are the output variables to systematic effects. Also notice that, because this is a simple model, the effect of systematic input deviates from a uniform distribution produces a uniform effect on the distribution of the output variable. For more complex models, as shown later, this effect is no longer observed because of nonlinear effects on the uncertainty propagation. Figure 2c shows the cumulative frequency obtained from including both types of errors combined in the uncertainty analysis. It is observed that, in this particular case, the broadness of the curve is mainly defined by the systematic effects. In other words, the individual uncertainty effects of the systematic and random errors are not additive when both act together. This suggests that the main individual effect or error type controls the uncertainty propagation through the model. Under accepted experimental analysis and measurement methods, it is expected that the experimental value reported corresponds to the mean of a series of experimental measurements called replicates. Thus, in principle, the statistics associated with the reported values should also be included. Unfortunately, it is very common to find that the latter is not done, and usually only a vague percentage value for the random uncertainty present in the data is reported, with no mentioning of any possible systematic error. This situation can affect the resolution of the proposed method because of the inherent uncertainty present when choosing the bias limits and also because one is using the reported mean values to incorporate the random error effects. The consequences of this will be reflected as an overestimation of the uncertainty effects. To evaluate the resolution and/or sensitivity of the proposed method, uncertainty levels were assigned to the definition of the bias limits and to the precision of the estimation of the hypothetical experimental mean values. As explained before, in Figure 2 (parts a and c) the bias limits were arbitrarily defined using β values of 4.0 and 8.0 for the lower and upper limits, respectively. For practical purposes it is convenient to define the bias limits in such a way that they can represent extreme limits or the worst case scenario. To see the effect of using uncertain bias limits for the systematic errors, we determined the cumulative frequency distribution (cfd) of the function y(x) evaluated at x ) 1.0 using a normal distribution with mean µ ) 6.0 and standard deviation σ ) 1.0 to calculate the lower and upper bias limits of the parameter β in eq 7. One hundred random pairs of β values were drawn from this distribution to define bias limits, and then the approach proposed was used to generate a cfd for each pair of bias limits obtained during the sampling. All the cumulative frequency distribution curves generated are plotted in Figure 3. The main points to observe are the general broadness of all the cfd curves together and the general shape of the distributions. Comparing these characteristics with the ones of Figure 2a, we can see that the latter has similar broadness and shape. This means that the use of approximated extreme bias limits (in this

Figure 3. Systematic error effects on the cumulative frequency curve estimation for the function value at x ) 1.0 in the equation y(x) ) R + β(γ)x for 100 bias limits simulating the uncertainty effect on the bias limits definition.

example a hypothetical worst case scenario) cover reasonably well the bias limits normal distribution assumed. The case of error in the experimentally reported mean-value random error estimate (SR) is studied by assuming an order-of-magnitude less relative error than the one present in single-experimental measurements. It is important to mention that when using a reasonable sample size, the variance of the mean estimation should be very small when compared to the uncertainty of singular experiments. In the present example, the pseudo-experimental true values were at first perturbed by drawing values from a normal distribution with the true values as means and 1% standard deviation, and then the methodology proposed for analysis of random errors explained before was used with each of the data sets generated by the first random perturbation process. The results are presented in Figure 4, which consists of 100 cfd curves. Notice that the uncertainty of the mean estimation is not very significant in this case, and the cfd curves are similar to the ones in Figure 2b. This suggests that when the uncertainty of the mean estimation is small, it is reasonable to expect a good resolution from the approach proposed. The combined effect of uncertainty on the bias limits definition and experimental mean estimation was simulated by combining the two cases explained above. At first, the bias limits uncertainty was introduced, followed by the uncertainty on the mean estimation. The results are presented in Figure 5. As pointed out for the case without uncertainty on the bias limits and mean estimation (see discussion for Figure 2), the error propagation is mainly defined by systematic errors in this case, and the broadness of the cfd shown in Figure 2c is similar to the general broadness of the cfd curves presented in Figure 5. In general, if there is evidence of significant uncertainty on the bias limits definition and experimental mean estimation, the methodology described above can be incorporated as an additional step to study the

Ind. Eng. Chem. Res., Vol. 38, No. 8, 1999 3041

Figure 4. Random effects on the cumulative frequency curve estimation for the function value at x ) 1.0 in the equation y(x) ) R + β(γ)x for 100 pseudo-experimental data sets simulating the effect of having uncertainty on the experimental mean estimation.

Figure 6. Systematic error limits defined for the left phase in the liquid-liquid equilibria of the diisopropyl ether(1) + acetic acid(2) + water(3) ternary system at 25 °C based on evidence of bias errors found using experimental data from different sources.

5.1. The UNIQUAC model. The UNIQUAC model (Abrams and Prausnitz20) is widely used for vaporliquid and liquid-liquid equilibria. In this work, the parameters regressed for this model are bij and bji according to the following equation and assumptions: 20-23

ln γi )

Φi

θi z + qi ln - qi ln ti - qi xi 2 Φi

∑j

θjτij

+

tj

li + qi -

Φi xi

∑j xjlj

(9)

where

θi )

qixi rixi ; qT ) qkxk; Φi ) ; rT ) rkxk; qT rT k k z li ) (ri - qi) + 1 - ri; τij ) exp(bij/T); 2 ti ) θkτki; z ) 10 (9)





∑k

Figure 5. Combined uncertainty effect on the bias limits definition and pseudo-experimental mean estimation on the cumulative frequency curve estimation for the function value at x ) 1.0 in the equation y(x) ) R + β(γ)x.

impact of these uncertainties on the cumulative frequency distributions estimation, and in that way evaluate if more precision in the input data is required to improve the uncertainty analysis. 5. Thermodynamic Applications To illustrate the application of the uncertainty analysis approach presented in this work, two liquid-liquid extraction cases using the UNIQUAC activity coefficient model are studied, involving liquid-liquid equilibria predictions for the ternary systems diisopropyl ether + acetic acid + water and chloroform + acetone + water.

The objective function definition is based on the minimization of the distances between experimental and estimated mole fractions, using a inside-variance estimation method proposed by Vasquez and Whiting24 as the regression approach, which automatically reweights the objective function. The optimization procedure used was the multidimensional Powell method, discarding the direction of largest decrease to avoid the problems of the quadratically convergent method (for details, see Press et al.19). 5.2. Diisopropyl Ether(1) + Acetic Acid(2) + Water(3). The experimental data for this system were taken from Treybal,25 Othmer and White,26 and Hlavaty and Linek.27 The last two data sets are also reported by Sørensen and Artl.28 Figure 6 shows the experimental tie lines for the different data sets mentioned. Clear systematic trends are observed in the left phase of the system, as well as systematic variations on the tie-line

3042 Ind. Eng. Chem. Res., Vol. 38, No. 8, 1999 Table 1. Upper and Lower Limits Used To Define the Systematic Trends Found in the Liquid-Liquid Ternary System Diisopropyl Ether(1) + Acetic Acid(2) + Water(3)a tie line

lower (mol %)

upper (mol %)

1 2 3 4 5 6

0.00 3.50 5.60 12.00 17.80 20.50

2.00 8.80 10.00 19.20 26.20 30.00

a The values presented correspond to the minimum and maximum composition of water (component 3) used in Procedure 1 for the left phase only of the data set reported by Hlavaty and Linek.27

Figure 7. Uncertainty of the percentage of acetic acid extracted in the liquid-liquid extractor under different sources of uncertainty: systematic, random, and a combination of both.

slopes. According to the original references and additional literature related to experimental data reporting (see, for example, Higashiuchi and Sakuragi29), it is usually claimed that these kind of experimental measurements are accurate within 1% of the experimental value, with errors normally distributed. The data set reported by Hlavaty and Linek27 was used to introduce the systematic and random deviates according to the evidence presented (see Figure 6), to apply the approach presented in this work. To simulate the effect of the random errors, each experimental datum was independently perturbed using random numbers generated from a normal distribution with the experimental value as the mean and a standard deviation of 0.3236% of the mean, which defines limits of 1% as the maximum for the experimental data. One hundred pseudoexperimental data sets were generated in this way and used to regress 100 binary interaction parameter sets for the UNIQUAC equation. Then, 100 simulations of a liquid-liquid extraction operation were performed to study the effect of the random errors on the predicted performance. The unit operation used is an example selected from Treybal,25 which consists of 8000 kg/h of an acetic acid-water solution, containing 30% acid (mass), which is to be counter-currently extracted with diisopropyl ether to reduce the acid concentration in the solvent-free raffinate product. The column has 8 equilibrium stages, and the solvent feed is 12500 kg/h. The column operates at 23.5 °C. The output variable is the percentage of acetic acid extracted at steady-state conditions in the column or extractor. The extraction was simulated using the ASPEN Plus (ASPEN Plus is a trademark of Aspen Technology Inc., Cambridge, MA) process simulator. Figure 7b shows the cumulative frequency distribution obtained for this case, where the variation observed in the percentage of acetic acid extracted is between 73.3 and 74.6%. For the analysis of the systematic error effects on the predicted performance of this unit operation, the bias limits defined in Figure 6 were used together with Procedure 1 to generate the required pseudo-experimental data sets to regress the parameters of the UNIQUAC model. Table 1 presents the specific limits used for water (component 3) for the data

set of Hlavaty and Linek.27 Figure 7a presents the cumulative frequency distribution obtained under the influence of systematic errors. It is observed that the effect of this type of error is larger than the effect caused by random errors on the uncertainty of the predicted performance. Additionally, the effect of combining both systematic and random errors was studied in this example. The pseudo-experimental data sets are generated by first applying Procedure 1 to introduce the systematic deviates in the data and second perturbing again the data generated from the first step by introducing random deviates for each datum as explained before. Figure 7c presents the cumulative frequency distribution for the combined effect of both types of error. It is observed that the distribution characteristics are very similar to the ones obtained from the systematic error effects, suggesting that the error type with larger effects mainly defines the uncertainty propagation. Another important observation is the shape of the cumulative distribution for the systematic or combined effects. Notice that its shape cannot be predicted or inferred from the statistical properties of the systematic deviates (i.e., uniform distribution) because of the nonlinearities and complexity of the computer model. This is not the case when the model is relatively simple, as presented before with the basic example of Dolby and Lipton18 (see eq). Procedure 1. Method for pseudo-experimental data generation based on systematic limits defined for liquidliquid equilibria of ternary systems. 1. Identify the phases where systematic errors are present. 2. Define approximate systematic limits on the experimental-phase envelope for the phases affected. Use a binary plot to represent the phase equilibria (see, for example, Figure 6). 3. Choose an experimental data set representative of the phase envelope. Give emphasis to experimental sets that cover the regions with systematic trends. 4. Define the minimum and maximum values for the composition of the component in the x axis, using the systematic limits defined in the phase envelope (see, for example, Table 1). 5. Represent the experimental tie lines by straight lines of the following form: y ) ax + b, where a and b represent vectors containing the slopes and intercepts for the tie lines involved, respectively, and x is the composition of the component in the x axis, and similarly for y. 6. Generate from a uniform distribution U[0,1] a factor f1 for randomly shifting the phase or phases with systematic errors. 7. For a given phase, change the composition of the ) xmin + f1(xmax - xmin component on the x axis by xnew i i i i ),

Ind. Eng. Chem. Res., Vol. 38, No. 8, 1999 3043

Figure 8. Liquid-liquid equilibria experimental tie lines for the chloroform(1) + acetone(2) + water(3) ternary system at 25 °C from different experimental data sources.

where i ) 1, ..., M, and M is the number of tie lines in the experimental data set. 8. Generate another factor f2 from U[0,1]. Additionally, randomly assign the sign (+ or -) to the factor f2. 9. Estimate a typical fractional variation gs for the tie-line slopes by analysis of the experimental data sets available. ) ai + f2gsai. 10. Let anew i 11. Recompute the vector b using the new vector anew and the data of the phase not being changed. 12. Recompute the composition of the component on ) anew xnew + bnew . the y axis by ynew i i i i 13. Repeat the procedure to generate n pseudoexperimental data sets, where n is the number of simulations to be performed. 5.3. Chloroform(1) + Acetone(2) + Water(3). This is a very well-known system, with several equilibrium data sets available in the literature. The experimental data used in this example are from Bancroft and Hubard,30 Brancker and Hunter,31 Reinders and De Minjer,32 and Ruiz and Prats,33 all of them also reported by Sørensen and Artl.28 Figure 8 shows the experimental tie lines of these four data sets. It can be seen that there is a clear systematic trend for the left phase of the system. On the basis of original references for the experimental data sets, the average experimental or random error reported is around 1% for the compositions. As pointed out in the previous example, no reference is made to the presence of systematic trends. On one hand, the data set from Reinders and De Minjer32 was chosen to study the effect of the random and systematic errors on the predicted process performance of a liquid-liquid extraction operation. Any of the other data sets can be used for this purpose, but special attention has to be taken into account to ensure that the systematic errors are well represented. On the other hand, for this particular system, the larger systematic errors according to Figure 8 seem to be on the left phase at relative low compositions of water and acetone in the ternary system. This means that the regions with more systematic error are not strongly taken into account when predicting process performance because one expects a reasonably high separation in the equilibrium stages during the extraction. To obtain a low acetone

Figure 9. Systematic error limits defined for the left phase in the liquid-liquid equilibria of the chloroform(1) + acetone(2) + water(3) ternary system at 25 °C based on evidence of bias errors found using experimental data from different sources.

composition in the extract, a large amount of water in this case has to be used, making the process not practical from the engineering standpoint. However, the uncertainty on the predicted process performance due to this type of error is still significant and larger than the uncertainty propagated from random errors as shown later. Using the methodology described above to study the random error effects with 1% of the compositions as error limits, 100 pseudo-experimental data sets were generated and used to regress 100 binary interaction parameter sets for the UNIQUAC model. An example taken from Smith34 was used to simulate a unit operation where water is used to separate a chloroformacetone mixture in a simple counter-current extraction column with two equilibrium stages. The feed contains equal amounts of chloroform and acetone on a weight basis. The column operates at 25 °C and 1 atm. A solvent/feed mass ratio of 1.565 is used. The output variable for this case is the percentage of acetone extracted. The operation was simulated using the ASPEN Plus process simulator. The cumulative frequency distribution obtained for the percentage of acetone extracted is presented in Figure 10b, which shows a small effect of the experimental random error on the predicted performance. For the systematic error case, the bias limits were defined using the left-phase experimental envelope as reference and they are indicated by dashed lines in Figure 9. The numerical values of the limits used for water (component 3) are presented in Table 2. One hundred pseudoexperimental data sets were obtained using Procedure 1, and then 100 binary interaction parameters were regressed and used to evaluate the predicted performance of the liquid-liquid extraction. Figure 10a presents the cumulative frequency distribution for the effects caused by the systematic trend detected in the experimental data. Its broadness is significantly larger than that for the case of random errors (Figure 10b), showing that the latter has a smaller effect than the systematic errors on the predicted performance, even though the total variation for the percentage extracted is not very large. In general,

3044 Ind. Eng. Chem. Res., Vol. 38, No. 8, 1999

Figure 10. Uncertainty of the percentage of acetone extracted in the liquid-liquid extractor under different sources of uncertainty: systematic, random, and a combination of both. Table 2. Upper and Lower Limits Used to Define the Systematic Trends Found in the Liquid-Liquid Ternary System Chloroform(1) + Acetone(2) + Water(3)a tie line

lower (mol %)

upper (mol %)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

2.00 2.90 4.00 5.75 8.75 12.00 12.50 14.75 16.25 19.90 22.45 25.16 26.59 28.86 33.13 34.12 49.92

6.00 6.25 7.00 8.00 11.00 14.25 14.75 16.50 18.00 19.99 22.50 25.18 26.60 28.86 33.13 34.12 49.92

a The values presented correspond to the minimum and maximum composition of water (component 3) used in Procedure 1 for the left phase only of the data set reported by Reinders and De Minjer.32

if the safety factors used in the design do not cover the variation observed in the percentage of acetone extracted because of systematic errors, it would be recommended to obtain more accurate experimental data to avoid under- or over-design problems because every predicted performance value in Figure 10a is equally likely. Notice that this type of conclusion cannot be obtain without an uncertainty analysis of this kind, which shows the importance and value of these methods in process design and simulation. Analyzing Figure 10c, which shows the cumulative frequency distribution for the combined effect of systematic and random errors, one observes (as in the previous example) that the error type with a larger effect on the uncertainty analysis mainly defines the final error propagation. 6. Concluding Remarks A new approach based on Monte Carlo methods to study the effect of systematic and random errors on

uncertainty analyses was developed. The results show that the proposed method clearly distinguishes when one type of error has a larger effect than the other one on the predicted process performance of specific study cases, and that systematic errors can have a significant role in the uncertainty analysis. Additionally, it was found that the error type with the larger influence is the one that mainly defines the broadness on the cumulative frequency distributions in the uncertainty analysis. Also, the shape of the cumulative frequency distribution cannot be inferred from the probability distributions used to introduce the systematic deviates when the computer model involved is significantly complex. For the case of random normally distributed errors, the cfd for the output generally tends to have normal characteristics. In general, experimental data articles in the literature (particularly thermodynamic data) do not adequately assess and report the presence of possible systematic error. It was shown that systematic errors may be a very important and common source of uncertainty. In addition, the method presented can be used to facilitate decision making in safety factor selection, modeling, and experimental data measurement and design. As pointed out in the Introduction section, modeling errors can have a significant impact on uncertainty propagation through compupeter models. It is well-known in the field of thermodynamics that most of the molecular-based models have limitations and these effects have to be analyzed when performing uncertainty and sensitivity analysis. This is an open problem for further research. The goal is to be able to quantify separately the contributions of random, systematic, and modeling errors and in that way improve decision making and process design. Research efforts are being conducted to incorporate the effect of modeling errors in the quantification and analysis of uncertainty propagation in thermodynamic models. Acknowledgment This work was supported, in part, by the National Science Foundation Grant CTS-96-96192. Literature Cited (1) Vasquez, V. R.; Whiting, W. B. Uncertainty and Sensitivity Analysis of Thermodynamic Models Using Equal Probability Sampling (EPS). Comput. Chem. Eng. 1999, in press. (2) Tarantola, A. Inverse Problem Theory: Methods for Data Fitting and Model Parameter Estimation; Elsevier Science Publishers: New York, 1987. (3) American Society of Mechanical Engineers. ANSI/ASME PTC 19.1s1985: Measurement Uncertainty Part 1; ASME: New York, 1985. (4) Hayward, A. T. J. Repeatability and Accuracy; Mechanical Engineering Publication Ltd.: New York, 1977. (5) Dietrich, C. F. Uncertainty, Calibration and Probability, 2nd ed.; Adam Hilger: New York, 1991. (6) International Organization for Standardization (ISO). Guide to The Expression of Uncertainty in Measurement, ISO: Switzerland, 1993. (7) Taylor, J. R. An Introduction to Error Analysis, 2nd ed.; Science Books: Sausalito, CA, 1997. (8) Iman, R. L.; Conover, W. J. Small Sample Sensitivity Analysis Techniques for Computer Models, with an Application to Risk Assessment. Commun. Stat.-Theor. Methods 1980, A9 (17), 1749-1874. (9) Iman, R. L.; Helton, J. C. An Investigation of Uncertainty and Sensitivity Analysis Techniques for Computer Models. Risk Anal. 1988, 8 (1), 71-90. (10) Iman, R. L.; Shortencarier, M. J. A FORTRAN 77 Program and User’s Guide for the Generation of Latin Hypercube and

Ind. Eng. Chem. Res., Vol. 38, No. 8, 1999 3045 Random Samples for Use with Computer Models; Report NUREG/ CR-3624; National Technical Information Service: Springfield, VA, 1984. (11) Bates, D.; Watts, D. Relative Curvature Measures of Nonlinearity. J. R. Statist. Soc. B 1980, 42 (1), 1-25. (12) Bates, D.; Watts, D. Parameter Transformations for Improved Approximate Confidence Regions in Nonlinear Least Squares. Ann. Stat. 1981, 9 (6), 1152-1167. (13) Cook, R. D.; Witmer, J. A. A Note on Parameter-Effects Curvature. J. Am. Stat. Assoc. 1985, 80 (392), 872-878. (14) Iman, R. L.; Conover, W. J. A Distribution-Free Approach to Inducing Rank Correlation Among Input Variables. Commun. Stat.-Simul. Comput. 1982, 11 (3), 311-334. (15) Vasquez, V. R.; Whiting, W. B.; Meerschaert, M. A Sampling Technique for Correlated Parameters in Nonlinear Regression Models Based on Equal Probability Sampling (EPS). SIAM J. Sci. Comput. 1999, submitted for publication. (16) Gatland, I. R.; Thompson, W. J. Parameter Bias Elimination for Log-Transformed Data with Arbitrary Error Characteristics. Am. J. Phys. 1993, 61 (3), 269-272. (17) Chakroun, W.; Taylor, R. P.; Steele, W. G.; Coleman, H. W. Bias Error Reduction Using Ratios to Baseline Experimentss Heat Transfer Case Study. J. Thermophys. 1993, 7 (4), 754-757. (18) Dolby, G. R.; Lipton, S. Maximum Likelihood Estimation of the General Nonlinear Functional Relationship with Replicated Observations and Correlated Errors. Biometrika 1972, 59 (1), 121129. (19) Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P. Numerical Recipes, 2nd ed.; Cambridge Press: New York, 1994. (20) Abrams, D. S.; Prausnitz, J. M. Statistical Thermodynamics of Liquid Mixtures: A New Expression for the Excess Gibbs Energy of Partly or Completely Miscible Systems. AIChE J. 1975, 21 (1), 116-128. (21) Anderson, T. F.; Prausnitz, J. M. Application of the UNIQUAC Equation to Calculation of Multicomponent Phase Equilibria. 1. Liquid-Liquid Equilibria. Ind. Eng. Chem. Process Des. Dev. 1978, 17 (4), 561-567. (22) Nova´k, J. P.; Matous, J.; Pick, J. Liquid-Liquid Equilibria; Elsevier: New York, 1987. (23) Sørensen, J. M.; Magnussen, T.; Rasmussen, P.; Fredenslund, A. Liquid-Liquid Equilibrium Data: Their Retrieval,

Correlation and Prediction. Part II: Correlation. Fluid Phase Equilibr. 1979, 3, 47-82. (24) Vasquez, V. R.; Whiting, W. B. Regression of Binary Interaction Parameters for Thermodynamic Models Using an Inside-Variance Estimation Method (IVEM). Fluid Phase Equilibr. 1999, submitted for publications. (25) Treybal, R. E. Mass Transfer Operations, 3rd ed.; McGrawHill: Singapore, 1981. (26) Othmer, D. F.; White, R. E.; Trueger, E. Liquid-Liquid Extraction Data. Ind. Eng. Chem. 1941, 33, 1240-1248. (27) Hlavaty, K.; Linek, J. Liquid-Liquid Equilibriums in Four Ternary Acetic Acid-Organic Solvent-Water Systems at 24.6 °C. Collect. Czech. Chem. Commun. 1973, 38, 374. (28) Sørensen, J. M.; Arlt, W. Liquid-Liquid Equilibrium Data Collection; Chemistry Data Series; DECHEMA: Frankfurt/Main, Germany, 1980; Vol. 5. (29) Higashiuchi, H.; Sakuragi, Y.; Iwai, Y.; Arai, Y.; Nagatani, M. Measurement and Correlation of Liquid-Liquid Equilibria of Binary and Ternary Systems Containing Methanol and Hydrocarbons. Fluid Phase Equilibr. 1987, 36, 35-47. (30) Bancroft, W. D.; Hubard, S. S. A New Method for Determining Dineric Distribution. J. Am. Chem. Soc. 1942, 64, 347353. (31) Brancker, A. V.; Hunter, T. G.; Nash, A. W. The Quaternary System Acetic Acid-Chloroform-Acetone-Water at 25 °C. J. Phys. Chem. 1940, 44, 683-698. (32) Reinders, W.; De Minjer, C. H. Vapor-Liquid Equilibria in Ternary Systems. VI. System Water-Acetone-Chloroform. Recl. Trav. Chim. Pays-Bas. 1947, 66, 573-604. (33) Ruiz, B. F.; Prats, R. D. Quaternary Liquid-Liquid Equilibria: Experimental Determination and Correlation of Equilibrium Data. Part I. System Water-Acetone-Acetic AcidChloroform. Fluid Phase Equilibr. 1983, 10, 77-93. (34) Smith, B. D. Design of Equilibrium Stage Processes; McGraw-Hill: New York, 1963.

Received for review November 30, 1998 Revised manuscript received April 7, 1999 Accepted April 14, 1999 IE980748E