Modeling Temperature-Dependent Properties of Oxygen, Argon, and

Aug 25, 2010 - Response Modeling Methodology (RMM) and Comparison with Acceptable .... of RMM is a relational model, which describes a modeled...
0 downloads 0 Views 528KB Size
Ind. Eng. Chem. Res. 2010, 49, 9469–9485

9469

Modeling Temperature-Dependent Properties of Oxygen, Argon, and Nitrogen via Response Modeling Methodology (RMM) and Comparison with Acceptable Models Haim Shore Department of Industrial Engineering and Management, Ben-Gurion UniVersity of the NegeV, POB 653, Beer-SheVa 84105, Israel

Diamanta Benson-Karhi* Department of Management and Economics, The Open UniVersity of Israel, POB 808, Raanana 43107, Israel

In a recent paper, temperature-dependent properties of water were modeled via response modeling methodology (RMM), and the resultant models were compared to models obtained by TableCurve2D (a dedicated software for relational modeling), and to “Acceptable models”, recommended by DIPPR (a widely used database for constant and temperature-dependent physical properties). In this paper, we extend the comparison to oxygen, argon and nitrogen. Model comparison has been conducted for 10 temperature-dependent physical and thermodynamic properties. Detailed results are reported in this paper. Summary tables, which rank the various models in terms of goodness-of-fit and stability over all properties, are provided. The three variations of the RMM model (two-, three-, and four-parameter models) compare favorably with other models, often with more parameters, in terms of both goodness-of-fit and stability. The unique desirable properties of RMM models are discussed. Introduction Response modeling methodology (RMM) is a new platform for empirical modeling, recently developed.1 RMM had been applied to a myriad of disciplines in science and engineering2–4 and was shown to deliver good modeling properties. It was also used to model and predict temperature-dependent properties of hydrocarbons by correlations based on the similarity of molecular structures.5 This paper demonstrates application of the RMM approach to model thermodynamic properties of pure substances, where the relationship with temperature is monotone convex. Thus, it is a continuation of an ongoing research effort aimed to validate the effectiveness of RMM as a general modeling platform in the chemical engineering discipline. The next section delivers an overview of the RMM approach. In a recent paper,6 Benson-karhi, Shore, and Shacham (henceforth BSS) have modeled thermodynamic properties of water, employing three different modeling approaches. The first approach used TableCurve2D (henceforth TC), a dedicated software for relational modeling, which ranks thousands of models stored in its data bank according to their goodness-offit (relative to the available data). To select the final model (among those offered by TC), a special new routine for comparison of models is introduced here, based on the information-theoretic approach.7 The latter uses Akaike’s information criterion (AIC) to select a best subset of models and then rank the candidate models according to their “relative likelihood to be the best model“ (relative to the data and the complete subset of candidate models; refer for details to the section Selecting the Best Model). The second approach uses “Acceptable models” recommended by DIPPR. The latter is a widely used database for constant and temperature-dependent physical properties and the * To whom correspondence should be addressed. E-mail: [email protected].

modeling of these properties. DIPPR recommends a single acceptable model for each property, across different substances (estimates of parameters may differ among substances). This model is identified in DIPPR by a unique equation number, for example, DIPPR 101. The acceptable model can be purely theoretical, purely empirical, or a combination thereof. The third approach, based on RMM, provides a uniform model (with three variations) to all substances and all properties. The three variations are the two-parameter RMM2, the three-parameter RMM3, and the four-parameter RMM4 (refer to the next section for details). The three approaches compared may perhaps be ranked as moving from the most particular, TC, which offers a tailormade model for each substance-property combination, to DIPPR, which offers a single model for each property, to RMM, which is the most general. Consequently, the most interesting comparison in this study is that between models offered by DIPPR and those obtained via RMM. However, models obtained by all three approaches are compared in terms of both goodnessof-fit and stability (details are given in section Criteria for Goodness-of-Fit and Stability). In this paper, we extend the results obtained for water6 to three important and much researched pure substances, oxygen, argon, and nitrogen. The motivation for the present paper is to demonstrate that with regard to how the three approaches (TC, DIPPR and RMM) compared, results obtained for water can be extended to other pure substances, notwithstanding the very extensive research efforts that have been put in the past toward modeling thermodynamic properties of these substances.8 In the next section, we provide a brief overview of RMM and the variations of its basic model as these are used in the paper. The following section details with the routine developed to identify the subset of best candidate models (from those ranked by TC) and how the best model is finally selected. In the following section, we review criteria used in this paper to

10.1021/ie100981y  2010 American Chemical Society Published on Web 08/25/2010

9470

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

compare models with respect to goodness-of-fit and stability and then expound, in the pursuant section, the properties modeled and the source data set used for each. We then display the results. The 10 subsections that comprise this Results section are arranged in the same order that the respective properties appear in DIPPR. Summary comparisons of rankings of the three types of models are given in the following section, in terms of goodness-of-fit and stability, for all four substances (oxygen, argon, nitrogen, and water). Some practical conclusions are also delineated. The last section delivers a summary and some conclusions. RMM: An Overview RMM: The Basic Concept. Observing how monotone convex relationships are often modeled in science and engineering, one is surprised to find out that certain basic patterns keep reappearing, which are associated with varying degrees of monotone convexity. Thus, Newton’s gravitation law represents moderate convexity, expressed by a simple power law, while radioactive decay is represented by an exponential law, associated with more extreme convexity. Even more extreme convexity is delivered by Gompertz growth model (a doubleexponential function, henceforth referred to as exponentialexponential function1,4). One realizes that all these basic functional patterns can be arranged in hierarchical fashion on the “ladder of monotone convex functions”1 so that climbing the “ladder” results in models with more intense convexity. The hierarchy of functions represented on the ladder consists of repeated appearances of a basic cycle comprising “linear, power, exponential” elements in the modeled functions. These functions have been shown to be widely prevalent in the exact sciences and in disciplines of applied science and engineering.1,4,9 Following are several explicit functions belonging to the ladder (some have already been alluded to earlier): • Newton’s Kinetic Energy: Ek(V) ) M(V2/2) (a power function; V is velocity) • Radioactive Decay: R(t) ) R0e-kt (an exponential function; t is time) • Antoine Equation: log(P) ) A + [B/(T + C)] (an exponential-power function; P is pressure, and T is temperature) • Arrhenius Formula: Re(T) ) Ae(-Ea/kBT) (an exponential-power function; Re is the rate of a chemical reaction as function of temperature, T) • Gompertz Growth Model: Y ) b1 exp(-b2e-b3X) (an exponential-exponential function; Y can be population size, and X can be time or any influencing environmental variable). RMM captures the models of the ladder by a single stochastic model, with a small number of parameters that help achieve stability in RMM estimated models. Furthermore, at the cost of additional two parameters added to the basic RMM model, an additional cycle of “linear, power, exponential” can be added, allowing even more extreme convexity to be modeled (this will be expounded in the next subsection). Simplified versions of the basic RMM model, with a smaller number of parameters (either 2, 3, or 4), have been shown to be effective in representing monotone convex relationships with a single regressor variable, and they compare favorably with existing models.1,5,6 These simplified models, used throughout this paper, are introduced in the subsection below. RMM: The General Model and Variations. At the heart of RMM is a relational model, which describes a modeled response, Y, in terms of a linear combination of effects (the linear predictor, LP, denoted η, which transmits systematic variation to the response), two possibly correlated normal errors,

ε1 and ε2, with correlation F and standard deviations σε1 and σε2, respectively, and a vector of parameters W ) log(Y) )

R [(η + ε1)λ - 1] + µ2 + ε2 λ

(1)

where {R, λ, µ2} are parameters that need to be determined. Note that ε1 implies that there is uncertainty (either measurement imprecision or otherwise) in the explanatory variables (LP), which is additional to the uncertainty associated with the response (ε2). Equation 1 will henceforth be referred to as the RMM origin model. It is easy to verify that eq 1 is flexible in modeling monotone convex relationships having varying degrees of convexity. Ignoring the error terms and the scale parameter µ2, consider the following simplified model Y ) exp

[ Rλ (η

λ

- 1)

]

(2)

One can easily derive the first four “steps” (functions) of the ladder from eq 2:1 1. Linear: λ ) 0, R ) 1 [note that as λ f 0, (1/λ)[(η)λ - 1] f log(η)] 2. Power: λ ) 0, R * 1 3. Exponential: λ ) 1 4. Exponential-power: λ * 0, λ * 1. How would one extract the exponential-exponential and exponential-exponential-power cases (the next two steps on the ladder)? Insert exp{(β/κ)[(η)κ - 1]} for η in eq 2 (two new parameters, β and κ, are introduced). One can easily verify that for κ f 0 and β f 1, we have exp{(β/κ)[(η)κ - 1]} f η, so that eq 2 is revoked. Thus, for the price of two additional parameters, β and κ, an additional cycle of the basic “linear, power, exponential” pattern is invoked, allowing us to climb the ladder to its fifth and sixth steps. Indeed, this is a general principle; with the introduction of two additional parameters, additional repetition of the basic “linear, power, exponential” cycle is achieved to obtain models with an ever-increasing degree of monotone convexity.1 A variation of the RMM model, realistically assuming that ε1 , η, can be derived from eq 11 W ) log(Y) ) µ2 +

R λ log(η)+(λε1/η) [e - 1] + ε2 λ

(3)

This model is often preferable to eq 1. The reason is that in searching for the parameters of eq 1, we may encounter imaginary values (when the value of the expression within the brackets occasionally turns negative). This problem does not exist in estimating the parameters of eq 3. Estimation procedures for eqs 1 and 3 are detailed in ref 1 and references therein. For a model with a single regressor (the predictor variable, X), one can derive from eq 3 the simple model1 a W ) log(Y) ) log(MY) + [ebZ - 1] + cZ b

(4)

where {a,b,c} are parameters that need to be determined, Z ) X - MX, and MX and MY are the medians of X and the response, Y, respectively. Note that when Z ) 0 (X ) MX), Y ) MY. This implies that if the medians of the predictor variable and the response are estimated from observational data, eq 4 is a threeparameter model. For this reason, eq 4 will be denoted as RMM3. Also note that since eq 4 represents a quantile function of Y in terms of Z (a deterministic relationship), various error structures may be assumed for eq 4 for parameters’ estimating. Finally, note that eq 4 has been shown to deliver excellent

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

representation to quantile functions (expressed in terms of standard normal quantiles) of a large spectrum of diversely shaped distributions.1,10 Further explanation for the validity of eq 4, with some examples from chemical engineering, can be found in ref 1 (section 17.3) and in ref 3. In a like manner, we may derive from eq 1, for a single-regressor model, the fourparameter model (denoted RMM4) a W ) log(Y) ) log(MY) + [(1 + cZ)b - 1] + dZ b

(5)

with Z as defined earlier. It is again emphasized that both eqs 4 and 5 are tailored for univariate modeling only. Also, note that although MX and MY need estimation, we do not count them in the number of parameters attributed to either RMM3 or RMM4 (namely, the three- and four-parameter models). Although this practice can be debated, it conforms to the common practice where parameters associated with preprocessing of data do not enter in the count of a model’s parameter set, for example, in calculating AIC values for comparison of models. Other examples for such unaccounted parameters are those associated with data transformations (like parameters associated with a Box-Cox transformation) or estimates of parameters used in the centering of explanatory variables in polynomial linear regression (by subtracting the means) or in standardizing these variables (by centering and then dividing by the standard deviations). The support of eqs 4 and 5 is [0,∞]. (The support of a function is the range for which it is defined.) If data suggest that there is a strictly positive lower limit, far from zero, these models may require relocation via introducing a location parameter, L, so that eqs 4 and 5 become, respectively a W ) log(Y - L) ) log(MY - L) + [ebZ - 1] + cZ b

(4a) a W ) log(Y - L) ) log(MY - L) + [(1 + cZ)b - 1] + dZ b (5a) If the problem is the existence of both negative and positive response values in the data (RMM requires that all observations share same sign), the above equations are more conveniently written as a W ) log(Y + δ) ) log(MY + δ) + [ebZ - 1] + cZ b

(4b) a W ) log(Y + δ) ) log(MY + δ) + [(1 + cZ)b - 1] + dZ b (5b) Note that the definition of Z is not altered due to the introduction of the location parameters (L or δ). If the latter are unknown and need to be estimated, setting them equal to the absolute value of the lowest available observation has been found to be good practice. Alternatively, L or δ may serve as an additional parameter that needs estimation. Note, however, that introducing a location parameter to an RMM model requires that estimation of parameters, for example, via nonlinear regression, be implemented on the original scale (not the log scale). Empirical fitting of RMM3 to numerous data sets has shown that quite often, we obtain for the parameters in eqs 4 a ≈ c. This implies that the following two-parameter model may also provide a good fit to available data (we will denote this model RMM2)

1 W ) log(Y) ) log(MY) + a (ebZ - 1) + Z b

[

9471

]

(6)

Relocation of the above model (with either L or δ) gives 1 W ) log(Y - L) ) log(MY - L) + a (ebZ - 1) + Z b

[

] (6a)

1 W ) log(Y + δ) ) log(MY + δ) + a (ebZ - 1) + Z b

[

] (6b)

Occasionally, one wishes to conduct the fitting process by working with small values of Z. In this case, it is advisible to standardize X according to the following formula: Z ) (X - MX)/(σX). The new two-, three-, and four-parameter models are (respectively with eqs 6, 4, and 5)

{ [

a two parameters RMM2(a, b): Y ) MY exp (ebZ - 1) + aZ b three parameters RMM3(a, b, c):

]}

Y ) MY{exp[a(ebZ - 1) + cZ]} four parameters RMM4(a, b, c, d):

{ [b/ca ((1 + cZ)

Y ) MY exp

b/c

- 1) + dZ

]}

(7)

Note that the four-parameter model in eq 7 is reparameterized (relative to eq 5) for computational convenience. For example, if in the four-parameter model the estimated b or c tends to 0, limiting forms for this model are easily suggested. All results in this paper refer to RMM models as presented in eq 7, with temperature as the predictor variable (X) and the modeled property as the response (Y). Semiempirical Modeling with RMM. Modeling temperature-dependent physical properties is an ongoing scientific endeavor that spans several centuries. While some models developed over the years originated in purely theoretical arguments, unsatisfactory accuracy had motivated development of alternative models that are either empirical or semiempirical in nature. For example, most current acceptable models for vapor pressure are based on an original theory-based model (Clapeyron equation). However, this model had been enhanced empirically, resulting in various semiempirical models, like the Riedel formula.8 RMM modeling results in purely empirical models. Occasionally, semiempirical models are required where certain requirements, originating in theoretical arguments, need to be met by the model. In this section, we demonstrate how RMMbased models can be converted to be compatible with certain requirements derived from theory. As a first example, consider the requirement that the constructed model passes through a single specified point, like the vapor pressure at the normal boiling point (NBPt). This pressure is theoretically known to be 1 atm (1.013 × 105 Pa). In that case, the fitted RMM model will be constructed to pass through the boiling point, instead of the medians. For RMM4, for instance, the resulting RMM model will be a W ) log(Y) ) log(1.013 × 105) + [(1 + cZ)b - 1] + dZ b T - NBPt where Z ) σT

(8)

9472

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

Figure 1. Percentage deviations, ∆Hvap ) Hvap(expt.) - Hvap(calc.), of the oxygen experimental enthalpy of vaporization, Hvap(expt.), from calculated values, Hvap(calc.), using RMM (eq 10) as function of temperature T/K.

Another demonstrative example for semiempirical modeling with RMM addresses the enthalpy of vaporization, known to be 0 at the critical temperature, while its first derivative approaches negative infinity at that point. Data used here are those of oxygen, given in Table A1 (Supporting Information). The DIPPR model is (DIPPR 106) Y ) a(1 - Tr)b+cTr+dTr +eTr 2

3

Tr )

T Tc

(9)

where Y is expressed in J/kmol, Tr is the reduced temperature, and Tc is the critical temperature of oxygen (154.581K). The model’s estimated parameters, using these data, are (DIPPR assigns e ) 0) a ) 9.22 × 106, b ) 0.5704, c ) -0.6386, and d ) 0.4405. These parameters will be used when we compare the performance of the DIPPR suggested model to that of the fitted RMM model. From eq 9, when T ) Tc, we have Tr ) 1, so that Y ) 0, in accordance with the first theoretical requirement. The first derivative of the fitted eq 9, at Tr ) 1, is Y′ ) -4.722 × 109 (expressed in the same units as Y since Tr is a pure (unit-free) number). This is a large negative number, approximately compatible with the second requirement. We now construct a RMM semiempirical model. The suggested model is Y ) a(1 - Tr)f(Tr)

Tr )

T Tc

f (Tr) ) exp

{ bc [(T ) - 1] + dT } c

r

r

(10)

where f(Tr) is given by a RMM model. The first requirement of having zero enthalpy of vaporization at the critical point is met by eq 10. Fitting this model to the data, we obtain the following parameters’ estimates: a ) 9.124 × 106, b ) 1.877, c ) 2.801, and d ) -0.9695. Introducing these values into eq 10 and differentiating, we obtain close to the critical point, at T ) Tc - 0.001 ) 154.580. For the DIPPR model, Y′ ) -4.015 × 107, and for the RMM model, Y′ ) -3.998 × 107. Both of these values fulfill the second requirement of approaching -∞ at the critical point. Note that at the critical point, no definite limit could be found for the RMM model. Comparing the goodness-of-fit of the two models, we obtain the following for the root-mean-squared error (RMSE; the standard deviation of the residuals): RMSE ) 4188 for the DIPPR model and RMSE ) 4217 for the RMM model. These results imply that RMM and DIPPR deliver comparable accuracy. Figures 1 and 2 present residual scatter plots for the fitted RMM and DIPPR models,

Figure 2. Percentage deviations, ∆Hvap ) Hvap(expt.) - Hvap(calc.), of the oxygen experimental enthalpy of vaporization, Hvap(expt.), from calculated values, Hvap(calc.), using DIPPR 106 (eq 9) as function of temperature T/K.

respectively. Scatters appear highly random, implying that all structure in the data was indeed captured by the models. In summary, we realize that RMM modeling can be instrumental also in building semiparametric models, where the requirements are carefully built into the model, and then, the rest of the model, the part not directly associated with the requirements, can be represented by a purely empirical RMM model. Selecting the Best Model Software TC (TableCurve2D) was used to identify candidate models that may best describe a given data set. TC has a bank of nearly 3700 models, which, given a data set, are estimated via least squares. These models are then ranked according to some selected goodness-of-fit statistic, like Pearson correlation. We have found the criteria offered by TC for ranking models wanting, and developed a new procedure. The new procedure relies on the information-theoretic approach of Burnham and Anderson,7 which is based on maximum likelihood (ML) estimates. Estimation via least squares is equivalent to the ML for a normal additive error with constant variance. The new procedure first checks whether the assumptions regarding the error hold. If residual normality and constancy of variance are in doubt, a log transformation is applied. For instance, as a result of a normality check, the vapor pressure of all three substances was log-transformed. This led to the selection of other models than those originally suggested by TC. The normality of the residuals was checked via the graphic test ∆SNP (delta stabilized normal probability plot).11 Also, the data were screened for outliers defined as observations exceeding the (3σε limits, where σε is the fitted model’s standard deviation error. Most of the outliers were first or last observations near the triple point or critical temperature. A third check was inspection of the spread of the errors to pinpoint a lack of randomness (existence of a pattern in the scatter plot). This could indicate that the data set originated in calculated values (rather than empirical observations). We aspired to use only data sets based on real observations, and although in most cases the data source would submit this information, a further check was made for each data set used. Once preliminary checks had been concluded, the final model was selected, among those offered by TC, pursuing the information-theoretic approach of Burnham and Anderson.7 The latter base their approach on AIC (to be detailed in the next section), ranking a subset of candidate models according

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

9473

∆i > 10 is screened out. For the remaining K2 models, values of {Wi} are calculated anew, and a new subset that has cumulative values of W that exceed a certain threshold (usually 80%) is selected as the final set of K3 candidate models. From this subset, the best model is selected (the one with the largest W). Note, that since TC does not require that the same model be used for each property, irrespective of substance, it is expected that TC will very often provide the best-fitting model (in terms of goodness-of-fit). In fact, this was the case for most analyses. Thus, the selected TC model serves as the baseline according to which one may assess the effectiveness of the models offered by DIPPR and by RMM. However, the large number of parameters associated with many of the selected TC models (which often deliver superior goodness-of-fit) is, in many cases, more than “compensated for” by reduced stability. This will amply be demonstrated in the analyses displayed in this paper. 7

Criteria for Goodness-of-Fit and Stability Two of the most important dimensions of the quality of an estimated model are goodness-of-fit and stability. Estimating them requires that appropriate criteria be defined and their measurement procedures specified. In this section, we expound the criteria used in this paper. Criteria for Goodness-of-Fit. Two criteria were used in this study, all based on measuring the response in the original scale. MSE: Mean-Squared Error. MSE is defined by MSE )

Figure 3. A flowchart of the decision process to select the best model offered by TC.

to the model’s “relative likelihood to be the best model” (relative to the data and the complete subset of candidate models) ∆i ) AICci - min(AICci) Wi )

e-1/2∆i

(i)

K

∑e

(11)

-1/2∆i

i)1

where K is the number of candidate models (the subset of best models selected, based on ∆), AICc is the corrected AIC (find details in next section, eqs 13 and 15), and Wi is the “relative likelihood” of model i. Only models with ∆i e 10 are included in this set (based on an empirical criterion recommended in ref 7). It is desired to have a (preferably) single model with an associated W that is much larger than corresponding values of all other models in the set. Such a scenario was often encountered in our analyses. Figure 3 displays the flowchart of the procedure to select the best TC model. In this chart, SE is the standard error (of the fitted model), serving as a first criterion to select two subsets of N1 and N2 models (modeling with response expressed on original and on logarithmic scales, respectively), and R1 and R2 are the respective numbers of models remaining after conducting the normality test. According to the relative sizes of R1 and R2, a decision is made whether to adhere to the original scale or to proceed with a logarithmic scale (for the response). A normality test is then reapplied that screens out models still not compatible with the normality assumption. The remaining K1 models are then screened based on the ∆i criterion (eq 11). Any model with

SSE N-p

N

SSE )

∑ (y

i

- yˆi)2

(12)

i)1

with yˆi denoting the predicted value of the ith observation, N as the sample size, and p as the number of estimated parameters. The square root of MSE is denoted as RMSE (root-meansquared error) or, simply, Se. Corrected Akaike’s Information Criterion (AICc). Kullback-Leibler (K-L) information criterion, l(f,g), is an information-theoretic measure that expresses information loss incurred when model g is used to approximate reality, the latter given by the true model f. Since f is usually unknown, the best we can do is minimize, based on given data, an estimator of the expected K-L information loss. This is given by AIC, which, assuming for the model’s residual an additive normal error with constant variance, is given by AIC ) N log[σˆ ε2(ML)] + 2k

SSE N k)p+1

σˆ ε2(ML) )

(13)

where σˆ ε2(ML) is a ML estimate of the variance of the residuals. We wish to minimize the expected K-L information loss, and therefore, minimization of AIC is required (for the selected model). The assumption that the models estimated via TC have an additive normal error (an assumption that is being checked by the procedure outlined in the earlier section) allows us to estimate the parameters via the least-squares approach (as offered by TC) and then calculate the associated AIC values by the simple relationship SSE ) Nσˆ ε2(ML) ) (N - p)σˆ ε2(LS)

(14)

where σˆ ε2(LS) is the MSE given by eq 12. A correction for the AIC statistic was proposed for cases where the number of parameters in the model is large relative to the size of the data set. This resulted in a corrected AIC12

9474

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

AICc ) AIC +

2k(k + 1) N-k-1

(15)

The two statistics, MSE and AICc, though not statistically independent, were used as goodness-of-fit statistics throughout this study. MSE and AICc resulted in different rankings, especially for short data sets such as the liquid viscosity of oxygen, where n ) 9 (see Table 13, and find further details about these measures in ref 7). Criteria for Stability. A stable model is one whose parameters are stable relative to data not used in the fitting procedure. For example, an estimated stable model would submit accurate predicted values even for a temperature range different from that used in estimating the model’s parameters. In this study, two criteria of stability were formulated to judge the model’s stabilit,: the relative size of the 95% confidence interval and PRESS (prediction error sum of squares). 95% Confidence Interval for a Parameter Estimate. This is a go/no-go measure, which judges any fitted model to be nonstable if any of its parameters’ estimates have a 95% confidence interval width that exceeds the parameter’s estimate. Though this criterion is arbitrary with respect to the associated threshold (namely, that for a stable estimate, the relative confidence interval width does not exceed a unity), it has been shown to provide effective measure for model stability. In the tables showing the results of this study, any indication of instability is marked by the confidence intervals given in bold. PRESS. This is a traditional statistic used to check a model’s stability. It is defined by N

PRESS )

N

∑e

2 (i)

i)1

)

∑ (y

i

- yˆ(i))2

(16)

i)1

where yˆ(i) is the model’s predicted value for observation i, given that the model has been estimated with observation i omitted (from the sample of N data points).13 Minimization of PRESS is desired. Large gaps between the root square of PRESS and RMSE (or Se) is indicative of a scenario of overfitting (the model contains too many parameters relative to the information provided by the data). An overfitted model submits small residuals; however, poor prediction for data not used in the fitting is also encountered. A typical case of overfitting is routinely encountered in fitting a polynomial with too many terms. While PRESS is routinely given (and easily calculated) for linear regression implementation, it is rarely given, in available statistical software packages, when fitting models via nonlinear regression. Therefore, a special program had been written in MATHEMATICA that provided PRESS values for all models compared. Properties and Data Sources Fifteen temperature-dependent physical and thermodynamic properties have been initially addressed. These properties, with relevant information (data sources, ranges and median values for temperature, the response and other relevant information), are displayed in Appendices A and B in the Supporting Information. Note that outliers excluded from the analyses (and detailed in Appendix A, Supporting Information) refer to observations analyzed via TC, with residuals exceeding (3σε, where σε is the standard deviation of the residuals (in most cases, these outliers came from observations near the critical or triple points). For solid density (SD), solid vapor pressure (SVP), solid heat capacity (SHC), and surface tension (ST), no acceptable sets of measured data were found in DIPPR. Also, for solid

thermal conductivity (STC), no compatible DIPPR model was found. Therefore, these five properties are excluded from all analyses. For the liquid density (LD), liquid heat capacity (LHC), and ideal gas heat capacity (IGHC), acceptable data sets were found for oxygen and nitrogen only (find details in Appendix A, Supporting Information). Data sets used in this paper all derive from DIPPR. Although we are aware that, currently, more accurate data sets may be available in various sources, the authors of this paper believe that using a single source for all data sets is beneficial for the reader, who may wish to reproduce the results of the analyses presented. Also, qualifications for all data sets used are easily accessible via the DIPPR database. We judge that not resorting to the currently best-available data sets does not diminish the validity of the comparisons performed in this paper and the conclusions derived thereof. Results Analysis is uniformly structured for all properties according to the following scheme. First, once an appropriate data set was identified, it was plotted to ensure that continuous monotone convexity was not violated (a basic requirement of RMM, though a certain adaptation of RMM allows relaxation of this restriction). In cases where points at either ends of the set indicated noncontinuity (due to proximity to critical points, like with LD), those points were not included in the analysis (see details in Appendix A in Supporting Information). After the initial screening of the data, the best model suggested by TC was identified (as detailed earlier). Then, the parameters of DIPPR’s acceptable model were re-estimated (though they are given in DIPPR) in order to ensure that later comparisons of models would be based on estimates derived from identical data sets. Finally, RMM analysis was implemented, namely, the parameters of all three versions of RMM (eq 7) were estimated. The vapor pressure analysis scheme contains comparison of three additional models, Antoine, Riedel, and Wagner.8 Liquid Density (LD, G). The plot of the nitrogen LD is given in Figure S1 (Supporting Information) (as an illustrative representative plot). Analysis by TC for oxygen on the logarithmic scale showed that the leader model is a polynomial of order 14 (eq 6064 in TC) log(F) ) a + bT + cT2 + ... + oT14 Analysis by TC for nitrogen on the original scale showed that the leader model is a polynomial of order 10 (eq 6860 in TC) F ) a + bT + cT2 + ... + kT10 With regard to DIPPR, the acceptable model for LD (for convenience, displayed on both the original and logarithmic scales) is DIPPR 105 a

F)

[1+(1 - (T/c))d]

b

[ (

log(F) ) a - b 1 + 1 -

T c

d

)]

(note that a and b are expected to be different for the two models). Applying RMM, all three models (RMM2, RMM3, and RMM4) were fitted.

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010 Table 1. Goodness-of-Fit Statistics (MSE and AICc) for the Liquid Density, Gliqa number of parameters

model

normality

MSE

AICc

Table 3. Goodness-of-Fit Statistics (MSE and AICc) for the Vapor Pressure, Pa number of parameters

model

normality

Oxygen TC#6064 RMM4 DIPPR RMM3 RMM2

15 4 4 3 2

TC#6860 RMM4 DIPPR RMM3 RMM2

11 4 4 3 2

+ + + + -

7.86 × 10-9 5.90 × 10-8 4.26 × 10-7 8.63 × 10-6 6.68 × 10-5

-883.6 -808.4 -711.5 -565.5 -466.6

1.63 × 10-6 5.86 × 10-5 4.32 × 10-4 3.85 × 10-3 8.71 × 10-3

-385.4 -294.1 -232.0 -165.9 -142.2

a RMM models refer to eq 7. DIPPR refers to eq DIPPR 105 from the DIPPR database.

Table 2. Stability Statistics for the Liquid Density, Gliq DIPPR parameter

value

RMM4

confidence interval width

value

confidence interval width

Oxygen a b c d PRESS

1.457 0.1164 -1.200 0.05700 154.0 0.5445 0.3035 0.01818 4.802 × 10-5

a b c d PRESS

1.535 0.1991 128.3 0.2183

Wagner TC#6504 Riedel RMM2 DIPPR Antoine

5 7 5 2 5 3

Wagner TC#4203 Riedel RMM2 DIPPR Antoine

4 5 5 2 5 3

AICc

Wagner TC#6504 Riedel RMM2 DIPPR Antoine

4 7 5 2 5 3

6.08 × 10-9 1.04 × 10-8 3.94 × 10-8 1.49 × 10-7 2.98 × 10-6 cannot be fitted

+ + + -

-3341 -3244 -3011 -2779 -2245

Argon + + + + + -

4.62 × 10-8 1.03 × 10-7 1.81 × 10-7 5.64 × 10-7 1.09 × 10-6 5.09 × 10-5

-955.7 -908.5 -876.6 -815.8 -774.3 -557.8

4.53 × 10-8 1.38 × 10-7 2.73 × 10-7 6.43 × 10-6 2.25 × 10-5 7.85 × 10-5

-1154 -1063 -1033 -814.9 -733.1 -645.8

Nitrogen + + + + + +

a

0.2117 0.009978 -0.03161 0.03726 -0.5198 0.009021 -0.03286 0.006728 9.248 × 10-6

RMM models refer to eq 7. DIPPR refers to eq DIPPR 101 from the DIPPR database.

(b) With regard to DIPPR, the acceptable model on a logarithmic scale is DIPPR 101

Nitrogen 1.277 0.1581 1.769 0.06411 0.02736

MSE

Oxygen

Nitrogen + + + + -

9475

0.4705 -0.3764 -0.5730 0.1344

0.1056 0.06281 0.01544 0.06694 0.04924

Tables 1 and 2 display results obtained for goodness-of-fit and stability, respectively. Table 1 shows that among the RMM models, RMM4 is the best fitting model. Table 2 displays estimates of parameters’ values for DIPPR and RMM, with the associated confidence interval widths. Due to the large number of parameters in the selected TC models, these are omitted from Table 2. Note that results reported in Tables 1 and 2 refer to the logarithmic scale for oxygen and to the original scale for nitrogen. Vapor Pressure (VP, P). Wagner et al.14,15 have published high-precision data for the vapor pressure for oxygen, argon, and nitrogen. The ranges of the data were from the triple point to the critical point for argon and nitrogen and from the normal boiling point to the critical point for oxygen (see details in Appendix A in the Supporting Information). A plot of vapor pressure for oxygen is given in Figure S2 (Supporting Information). The performance of RMM for a wide range of vapor pressure data was compared to that of the following models (relate to ref 8 for details): (a) TC analysis for oxygen and nitrogen, on a logarithmic scale, showed that the leader is the seven-parameter model (eq 6504 in TC) c b d e f g log(P) ) a + + 2 + 3 + 4 + 5 + 6 T T T T T T Analysis by TC for argon on the logarithmic scale showed that the leader is (eq 4203 in TC) log(P) ) a + bT + cT1.5 + dT2 log(T) +

e T2

log(P) ) a +

b + c log(T) + dTe T

(c) Applying RMM, all three models (RMM2, RMM3, and RMM4) were fitted. (Due to the wide spectrum of models being compared, the performance of RMM2 only is displayed), (d) Wagner’s equation For oxygen: log(Pr) )

aτ + bτ1.5 + cτ3 + dτ6 + eτ9 Tr

For argon and nitrogen: aτ + bτ1.5 + cτ3 + dτ6 log(Pr) ) Tr

τ ) (1 - Tr)

τ ) (1 - Tr)

where Tr is the reduced temperature (Tr ) T/Tc), Tc is the critical temperature, Pr is the reduced pressure (Pr ) P/Pc), and Pc is the critical pressure (see details in Appendix A, Supporting Information). (e) Antoine equation log(P) ) A +

B T+C

(f) The extended Riedel equation log(P) ) A -

B E + C log(T) + DT2 + 2 T T

Table 3 shows statistics for the goodness-of-fit for all six models, and Table 4 shows stability statistics for TC, DIPPR, and RMM2. Figure 4 displays a demonstrative percentage error plot for oxygen, obtained by RMM2 (eq 7). The three-parameter Antoine equation could not be fitted to data covering this wide temperature range of VP. It can be seen that the accuracy of RMM2 is significantly higher than that of the five-parameter DIPPR equation. The Wagner equation provides the most accurate correlations for the VP of oxygen,

9476

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

Table 4. Stability Statistics for the Vapor Pressure, P TC parameter

DIPPR confidence interval width

value

RMM2 confidence interval width

value

value

confidence interval width

0.5123 -0.5805

7.28 × 10-5 3.05 × 10-4

Oxygen a b c d e f g PRESS

39.03 -2133 5.48 × 106 -7.85 × 108 6.35 × 1010 -2.74 × 1012 4.92 × 1013 1.950 × 10-6

4.445 3.131 × 103 9.132 × 105 1.412 × 108 1.221 × 1010 5.596 × 1011 1.062 × 1013

21.78 -1028 -2.947 7.140 × 10-4 1.494

62.85 599.3 14.17 0.0293 6.293

5.136 × 10-4

2.801 × 10-5

Argon a b c d e PRESS

-8.492 1.00037 0.3324 0.033 -0.02730 0.00305 1.434 × 10-5 1.19 × 10-4 -17448 913.4 6.480 × 10-6

a b c d e f g PRESS

30.01 -10944 2.08 × 106 -2.25 × 108 1.37 × 1010 -4.42 × 104 5.95 × 1012

24.49 -1010 -2.948 1.59 × 10-5 2.177 1.950 × 10-6

20.98 229.6 4.364 2.71 × 10-4 2.839

0.6211 -0.7129

84.51 680.1 19.75 2.90 × 10-2 12.55

0.6511 -0.7284

2.73 × 10-4 8.83 × 10-4

3.491 × 10-5

Nitrogen 5.427 2930 6.52 × 105 7.66 × 107 5.003 × 109 1.725 × 1011 2.453 × 1012

22.92 -864.6 -2.802 3.76 × 10-4 1.632

625.6

argon, and nitrogen, but it should be noted that this equation was actually optimized (by stepwise regression) for the very same substances. From the results of this comparison, it can be concluded that the general purpose RMM competes favorably with the most accurate specific VP equations for representing data covering a wide temperature range. It should be noted that in practice, values of RMM parameters’ estimates are typically confined to the range {-1,1} or thereabouts. This special feature of the RMM model makes LS parameter estimation much simpler (compared to estimating other models) and would almost surely guarantee a global optimum solution. In terms of stability, RMM2 delivers superior performance. The stability of the parameters’ estimates of the DIPPR model is weak, and estimates for the parameters of the TC model for oxygen and nitrogen exceed 1013. Enthalpy of Vaporization (EV, Hvap). The plot of the oxygen data is given in Figure S3 (Supporting Information). Analyzing oxygen by TC, on the original scale, showed that the leader is (eq 7502 in TC)

Figure 4. Percentage deviations, ∆P ) P(expt.) - P(calc.), of the oxygen experimental vapor pressure, P(expt.), from calculated values, P(calc.), using RMM2 (eq 7) as function of temperature T/K.

1.550 × 10-3

7.28 × 10-4 2.10 × 10-3

4.535 × 10-4

Hvap )

a + c√T 1 + b√T + dT

Analyzing argon by TC, on the original scale, showed that the leader is (eq 91 in TC) (Hvap)2 ) a + bT3 Analyzing nitrogen by TC, on the original scale, showed that the leader is an eighth-degree polynomial (eq 6058 in TC) Hvap ) a + bT + cT2 + ... + iT8 The DIPPR acceptable model, uniform across all substances, is given by eq 9 (DIPPR 106) Hvap ) a(1 - Tr)b+cTr+dTr +eTr 2

3

Tr )

T Tc

For oxygen and nitrogen, values suggested by DIPPR are given only for the first four parameters (e ) 0). For argon, values for the first two parameters are given (c ) d ) e ) 0). Applying RMM, all three models (RMM2, RMM3, and RMM4) were fitted. Table 5 shows statistics for the goodness-of-fit for all five models, and Table 6 shows the stability statistics. Although parameters’ estimates for the selected TC model for nitrogen are not shown (there are nine of these), stability checking indicates strong multicollinearity, a sure sign of overfitting (using too many parameters in the model). Figure 5 displays a demonstrative percentage error plot for nitrogen, obtained by RMM4 (eq 7). We realize that in terms of both the goodness-of-fit and stability, RMM competes favorably with DIPPR models. It should be noted that despite the wide range of the response values (see details in Appendix A, Supporting Information), values of RMM parameters are confined to the {-1,1} range. Liquid Heat Capacity (LHC, Cp). The plot of the oxygen data is given in Figure S4 (Supporting Information).

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

9477

Table 5. Goodness-of-Fit Statistics (MSE and AICc) for the Enthalpy of Vaporization, Hvapa model

number of parameters

normality

MSE

AICc

2.008 × 107 2.083 × 107 3.730 × 107 1.404 × 108 1.887 × 108

346.1 346.9 358.5 379.6 385.0

9.769 × 106 1.031 × 107 1.154 × 107 1.128 × 107 6.981 × 107

359.3 362.4 363.0 366.6 402.6

6.82 × 106 8.153 × 106 6.92 × 106 5.77 × 108 9.30 × 108

495.9 501.4 508.6 631.8 645.0

Oxygen TC#7502 DIPPR RMM4 RMM3 RMM2

4 4 4 3 2

TC#91 RMM3 RMM2 RMM4 DIPPR

2 3 2 4 2

DIPPR RMM4 TC#6058 RMM3 RMM2

4 4 9 3 2

+ + + + + Argon + + + + + Nitrogen + + + + -

Figure 5. Percentage deviations, ∆Hvap ) Hvap(expt.) - Hvap(calc.), of the nitrogen experimental enthalpy of vaporization, Hvap(expt.), from calculated values, Hvap(calc.), using RMM4 (eq 7) as function of temperature T/K.

relative contribution is negligible (less than 0.1% in terms of the response values). This suggests that perhaps switching to RMM3 with only two parameters (c ) 0) would provide a more stable and still well-fitting model

a

RMM models refer to eq 7. DIPPR refers to eq DIPPR 106 from the DIPPR database.

Analysis by TC on the original scale for oxygen showed that the leader model is (eq 8002 in TC)

two parameters RMM3(a, b): Y ) MY{exp[a(ebZ - 1)]}

( Tc )

Cp ) a + b exp -

Tables 7, 8 contain (for nitrogen) the resulting statistics of the two parameters RMM3. Ideal Gas Heat Capacity (IGHC, Cgasid). The plot of the oxygen data is given in Figure S5 (Supporting Information). It seems that there is an inflection point which causes the graph of the data to behave like an S-shaped function. This violates the basic RMM assumption of monotone convexity, though with certain modifications, S-shaped functions are easily modeled via RMM.4 These modifications are not warranted by the moderate deviation from monotone convexity displayed in Figure S5 (Supporting Information). Therefore, RMM models, as used in BSS,6 will be employed in this analysis. However, diminished accuracy is expected. Implementing TC, we have restricted the candidate models to those having a constant sign in the first derivative. This is an option provided by TC, and it implies that the first derivative of Y has the same sign but it is not necessarily monotone throughout the whole range. Applying TC to the data on the original scale, the leading model for oxygen is (eq 7610 in TC):

Analysis by TC on the original scale for nitrogen showed that the leader model is (eq 7112 in TC)

√Cp )

a + cT 1 + bT + dT2

The DIPPR acceptable model is DIPPR 100, a fourth-degree polynomial Cp ) a + bT + cT2 + dT3 + eT4 The resulting statistics of the goodness-of-fit and stability are displayed in Tables 7 and 8, respectively. One realizes that RMM is superior both in terms of goodnessof-fit (unexpectedly, better even than the proposed TC model) and in terms of stability. Fitting RMM3 to nitrogen data, parameter c assumes a small value (-4.67 × 10-5), and its Table 6. Stability Statistics for the Enthalpy of Vaporization, Hvap TC

DIPPR confidence interval width

value parameter a b c d PRESS

parameter a b c d PRESS

confidence interval width

value

9.495 × 10 -0.05797 -746 352 -0.001060

RMM4 9.22 × 10 0.5703 -0.6384 0.4404

6

628 360 0.01051 56 274 9.513 × 10-4 4.932 × 108

5.848 × 108

213 570 0.1015 0.1784 0.08686

0.3418 0.5489 -0.3108 -0.04428 2.145 × 109

argon 5.018 × 10 8.76 × 10 -1.429 × 107 1.05 × 105 2.315 × 108 13

confidence interval width

value

oxygen 6

parameter a b PRESS

RMM

10

RMM2 8.18 × 10 89 390 0.2992 0.01183 9.684 × 1010 6

-0.01483 0.00029 0.3803 0.04669 2.738 × 108

nitrogen 7.567 × 106 0.4432 -0.3777 0.2980

0.06240 0.4144 0.1448 0.03128

RMM4

2.411 × 108

1.835 × 105 0.08501 0.12641 0.04912

0.4041 0.03094 -0.5165 0.02517

4.780 × 108

0.03094 0.07629 0.01954 0.02233

9478

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

Table 7. Goodness-of-Fit Statistics (MSE and AICc) for the Liquid Heat Capacity, Cpa number of parameters

model

normality

MSE

Table 9. Goodness-of-Fit Statistics (MSE and AICc) for the Ideal Gas Heat Capacity, Cgasida

AICc

number of parameters

model

normality

Oxygen RMM4 RMM3 TC#8002 DIPPR RMM2

4 3 3 5 2

RMM3 RMM4 TC#7112 DIPPR RMM2

2 4 4 5 2

+ + + + +

6 472 49 433 151 539 397 671 618 752

219.6 266.3 293.2 320.8 325.2

TC#7610 DIPPR RMM2 RMM4 RMM3

11 5 2 4 3

19 416 8 392 10 265 49 249 344 542

115.8 116.4 118.6 145.2 147.5

DIPPR TC#7608

5 9

+ + + + +

Nitrogen

Cgas

a + cT + eT2 + gT3 + iT4 1 + bT + dT2 + f T3 + hT4

The acceptable model recommended by DIPPR is uniform for IGHC and is given by (DIPPR 107)

[

53.05 98.84 241.3 241.5 267.8

c/T sinh(c/T)

2

]

+d

[

e/T cos(e/T)

+ +

21.12 24.34

71.16 95.96

Table 10. Parameter Values and Confidence Interval Width of the id DIPPR Model Fitted to IGHC, Cgas oxygen

Analysis by TC on the original scale for nitrogen showed that the leader model is (eq 7608 in TC)

Cgasid ) a + b

0.7097 90.67 245 313 194 057 887 385

a RMM models refer to eq 7. DIPPR refers to eq DIPPR 107 from the DIPPR database.

a + cT + eT2 + gT3 + iT4 + kT5 ) 1 + bT + dT2 + f T3 + hT4 + jT5

Cgasid )

AICc

Nitrogen

+ + + + +

a RMM models refer to eq 7. DIPPR refers to eq DIPPR 100 from the DIPPR database.

id

MSE

Oxygen

parameter

value

confidence interval width

value

confidence interval width

a b c d e

29 112 9837 2434 8988 1136

18.86 412.2 153.1 554.2 27.17

29 108 8563 3265 8246 1664

7.559 765.3 303.1 696.1 38.30

Analysis of oxygen by TC, on the original scale, showed that the leader model is the Lorentzian cumulative distribution function (eq 8078 in TC) B)a+

2

]

RMM modeling, as expected, delivers inferior performance for all models. Tables 9 and 10, respectively, display the goodness-of-fit and stability statistics. Table 9 compares the goodness-of-fit of all five models for oxygen, while for nitrogen, no satisfactory fit was found among any RMM versions and IGHC data. Table 10 shows statistics associated with the DIPPR model only. Although parameters’ estimates for the selected TC models are not shown (there are 11 for oxygen and 9 for nitrogen), stability checking indicates strong multicollinearity, a sure sign of overfitting (using too many parameters in the model relative to the data available). Second Virial Coefficient (SVC, B). The plot of oxygen data is given in Figure S6 Supporting Information).

nitrogen

T-c b π arctan + π d 2

[

(

)

]

Analysis of argon by TC, on the original scale, showed that the leader model is (eq 7912 in TC) B)

a + c log(T) 1 + b log(T) + d log2(T)

Analysis of nitrogen by TC, on the original scale, showed that the leader model is (eq 6605 in TC) B)a+

c d e f g b + 2 + 3 + 4 + 5 + 6 X X X X X X

X ) log(T)

The DIPPR acceptable model is uniform for SVC and is given by (DIPPR 104)

Table 8. Stability Statistics for the Liquid Heat Capacity, Cp TC

DIPPR

value

confidence interval width

53 339 1.428 -13.94

520.0 1.051 1.017

parameter a b c d e PRESS

value 209 314 -7716 139.8 -1.107 0.003263

125 892 5644 91.63 0.6402 0.001629

confidence interval width

value

oxygen

5.437 × 106

parameter a b c d e PRESS

RMM

confidence interval width

RMM4

1.744 × 107

-0.3199 0.3666 -0.4355 -0.06550

0.03338 0.2881 0.08214 0.03606 544 697

nitrogen 258.1 -5.190 × 10-3 -1.807 -1.770 × 10-5

95.36 0.01112 0.3599 6.51 × 10-5 250 075

RMM2 675 764 -29 794 538.5 -4.347 0.01330

1.364 × 10 61 962 1039 7.681 0.02111 1.641 × 107

7

0.1032 1.048

0.01011 0.06457

325 030

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

The statistics for the goodness-of-fit and stability are displayed in Tables 11 and 12, respectively. One realizes that RMM4 is superior in terms of both the goodness-of-fit and stability (unexpectedly, better even than the proposed TC model). Note once again the uniformly centered values of the RMM4 parameters’ estimates, which are confined to the {-1,1} range (unlike DIPPR parameters). Liquid Viscosity (LV, ηliq). The plot of the nitrogen data is given in Figure S7 (Supporting Information). Analysis of oxygen by TC, on the original scale, showed that the leader model is (eq 1101in TC)

Table 11. Goodness-of-Fit Statistics (MSE and AICc) for the Second Virial Coefficient, Ba model

number of parameters

normality

MSE

AICc

7.952 × 10-8 1.169 × 10-7 1.406 × 10-7 3.616 × 10-6 3.492 × 10-6

-366.9 -358.0 -351.3 -283.0 -282.0

2.561 × 10-7 7.651 × 10-7 9.851 × 10-7 1.417 × 10-6 5.116 × 10-5

-355.3 -329.0 -320.6 -316.3 -232.0

5.95 × 10-10 7.69 × 10-10 2.11 × 10-7 5.08 × 10-7 1.15 × 10-6

-686.9 -684.6 -500.9 -468.4 -446.4

Oxygen RMM4 TC#8078 DIPPR RMM2 RMM3

4 4 5 2 3

RMM4 TC#7912 DIPPR RMM3 RMM2

4 4 5 3 2

+ + + + + Argon + + + + +

ηliq ) a + bT2√T +

+ + + + +

7 4 3 5 2

c T

Analysis of argon and nitrogen by TC, on the original scale, showed that the leader model for both data sets is (eq 6122 in TC)

Nitrogen TC#6605 RMM4 RMM3 DIPPR RMM2

9479

√ηliq ) a + bT + cT

2

+ dT3

The DIPPR acceptable model is given by (DIPPR 101)

a

RMM models refer to eq 7. DIPPR refers to eq DIPPR 104 from the DIPPR database.

[

ηliq ) exp a +

c b d e B)a+ + 3 + 8 + 9 T T T T

b + c log(T) + dTe T

]

For oxygen and nitrogen, d ) e )0 (a three-parameter model is actually suggested). Table 13 shows that RMM modeling delivers better goodnessof-fit statistics, relative to DIPPR’s acceptable model, while, as expected, the TC model is the leader. Table 14 shows stability statistics for the TC model, the DIPPR model and the twoparameters RMM model. RMM2 was chosen both for oxygen and nitrogen data (the leader RMM3 for oxygen was found to be unstable in terms of parameters’ values). While fitting RMM3 to the argon data, the estimate for the parameter b was small, implying that its relative contribution was negligible (less than 0.1% in terms of the response values). This suggests that perhaps switching to RMM3 with b ) 0 would provide a more stable and still well-fitting model (with only two parameters). One realizes that the TC models show the highest stability followed by the RMM model. Once again, note the uniformly centered values of the RMM parameters’ estimates, which are confined to the {-1,1} range (unlike DIPPR parameters).

Modeling with RMM requires that all data have the same sign (the sign of predicted values is solely determined by the response median, as formerly detailed). Because some of the data in Figure S6 (Supporting Information) are negative, a location parameter is added to the RMM models to allow for such values. As related to earlier (eqs 4b, 5b, and 6b), addition of a location parameter is part of preprocessing of the data, and therefore, in the pursuing calculation of goodness-of-fit statistics for the various RMM models, it is not considered an additional estimated parameter (namely, it is not counted in the number of degrees of freedom). Running RMM2 with the additional location parameter, we obtain a δ value for each data set to be used for all three versions of RMM (see details in Appendix A, Supporting Information). In all three data sets, these values of δ imply that after relocating the data, all response observations are negative, namely, having the same sign (as required). Table 12. Stability Statistics for SVC, B TC parameter

value

a b c d e PRESS

-0.1466 2.184 43.57 19.89

a b c d e PRESS

-0.1782 -0.5954 0.02954 0.09258

DIPPR confidence interval width

RMM4 confidence interval width

value

value

confidence interval width

-1.072 -0.1406 1.333 -0.1654

0.04123 0.3346 0.1378 0.1556

Oxygen 0.5158 0.5166 2.477 4.191 7.911 × 10-6

0.0397 0.00148 -15.83 0.5462 -84 394 8068 14 1.25 × 1014 1.51 × 10 16 8.6 × 1015 -1.02 × 10 5.335 × 10-5

2.910 × 10-6

Argon 0.03968 0.03096 0.006549 0.009189

0.03468 -14.20 -91 042 1.411 × 1014 -6.54 × 1015

0.0670

0.00254 1.357 35 210 5.1 × 1014 2.5 × 1016

1.476 -5.501 -0.02022 -1.513

66.45 5.197 0.8706 0.1342 6.455 × 10-3

55 489 Nitrogen

a b c d e PRESS

3.876 × 10-8

0.04162 0.00189 -12.59 0.9513 -118 163 26 327 1.63 × 1015 2.45 × 1015 17 1.46 × 1017 -2.05 × 10 2.476 × 10-4

-1.009 -0.3238 1.055 -0.6224

0.01413 0.08322 0.02943 0.03812 6.084 × 10-8

9480

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

Table 13. Goodness-of-Fit Statistics (MSE and AICc) for the Liquid Viscosity, ηliqa number of parameters

model

normality

MSE

AICc

4.16 × 10-14 5.92 × 10-14 1.17 × 10-13 3.38 × 10-13 6.30 × 10-14

-262.9 -259.8 -253.6 -249.9 -248.9

5.11 × 10-14 5.97 × 10-12 6.63 × 10-12 7.41 × 10-12 8.05 × 10-12

-384.1 -329.5 -328.1 -319.4 -312.4

2.86 × 10-14 1.16 × 10-12 1.27 × 10-12 1.12 × 10-12 1.39 × 10-12

-708.1 -626.9 -624.8 -623.7 -621.0

ηvap )

For oxygen and nitrogen, d ) 0 (a three-parameter model is actually suggested). The resulting statistics of the goodness-of-fit and stability are displayed in Tables 15 and 16, respectively. Figure 6 displays a demonstrative percentage error plot for nitrogen, obtained by RMM4 (eq 7). Table 15 shows that RMM modeling delivers three models that all show good performance. RMM4 leads among all RMM versions in terms of goodness-of-fit statistics and competes favorably with the DIPPR model. According to Table 16, RMM4 is also well-behaved with respect to stability (for both measures). Liquid Thermal Conductivity (LTC, Kliq). The plot of the oxygen data is given in Figure S9 (Supporting Information). One can realize that the graph displays a mirror of an S-shape plot, being concave for low-temperature values and convex for high values. This violates the basic RMM assumption of monotone convexity. Consequently, RMM may be anticipated to perform poorly (results are not compatible with this prediction). Implementing TC, we have restricted the candidate models to those having a constant sign in the first derivative, an option provided by TC. Applying TC to the data on the original scale, the leading model for oxygen is (eq 3271 in TC)

Oxygen TC#1101 RMM3 DIPPR RMM2 RMM4

3 3 3 2 4

TC#6122 RMM3 RMM2 RMM4 DIPPR

4 2 2 4 5

TC#6122 RMM2 RMM3 RMM4 DIPPR

4 2 2 4 3

+ + + + + Argon + + + + + Nitrogen + + + + +

a

RMM models refer to eq 7. DIPPR refers to eq DIPPR 101 from the DIPPR database.

Vapor Viscosity (VV, ηvap). The plot of oxygen data is given in Figure S8 (Supporting Information). Analysis of oxygen by TC, on the original scale, showed that the leader model is (eq 2587 in TC) ηvap ) a + bT2 + cX2 + dX

X ) log(T)

κliq ) a + b log(T) +

Analysis of argon by TC, on the original scale, showed that the leader model is (eq 3112 in TC) ηvap ) a + bX2 + c

T + dX X

c T√T

+ de-T

Analysis of nitrogen by TC, on the original scale, showed that the leader model is (eq 1327 in TC)

X ) log(T)

log(T) T2

log(κliq) ) a + bT3 + c

Analysis of nitrogen by TC, on the original scale, showed that the leader model is (eq 6605 in TC) ηvap ) a + bX + cX2 + dX3 + eX4 + fX5

aTb 1 + c/T + d/T2

Since the argon LTC data set contains only eight observations, no attempt was made to analyze this property by TC. However, comparison between DIPPR and RMM is presented. With regard to DIPPR, the acceptable model for LTC is a fourth-degree polynomial (DIPPR 100). For oxygen and nitrogen, only values for the first two parameters are suggested by DIPPR, which implies that DIPPR leads to a linear model. For

X ) √T

DIPPR recommends an acceptable model that is uniform for VV (DIPPR 102) Table 14. Stability Statistics for LV, ηliq TC value parameter a b c PRESS

DIPPR confidence interval width oxygen

-6.700 × 10-4 1.1 × 10-5 1.4 × 10-9 7.1 × 10-11 0.06862 5.44 × 10-4 6.083 × 10-13

9.190 -98.22 -3.696

0.6952 8.868 0.1335

0.07433 -1.280 × 10-3 9.15 × 10-6 -2.41 × 10-8

-3

7.08 × 10 1.94 × 10-4 1.74 × 10-6 0.55 × 10-8

-8.868 204.3 -0.383 -1.29 × 10-22 10 6.421 × 10-8

1.190 × 10-12

confidence interval width RMM2

value

1.9 × 10-3 0.02685

-0.2207 -0.1474

4.007 × 10-12

RMM3(a,b ) 0,c)

argon

parameter a b c d PRESS

value

1.851 × 10-12

parameter a b c d e PRESS

RMM

confidence interval width

33.22 501.3 6.569 3.919 0.2302

-0.01220

0.04758

-0.4981

0.03576

1.589 × 10-10

nitrogen 0.06672 -1.400 × 10-3 1.18 × 10-5 -3.64 × 10-8 8.823 × 10-13

-3

8.99 × 10 2.79 × 10-4 2.85 × 10-6 9.65 × 10-9

RMM2 20.74 -274.3 -5.980 4.411 × 10-11

9.548 160.3 1.724

-0.2188 0.04010 3.817 × 10-11

0.0083 0.1137

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

Table 17. Goodness-of-Fit Statistics (MSE and AICc) for the Liquid Thermal Conductivity, Kliqa

Table 15. Goodness-of-Fit Statistics (MSE and AICc) for the Vapor Viscosity, ηvapa model

number of parameters

normality

MSE

AICc

number of parameters

model

Oxygen TC#2587 DIPPR RMM4 RMM2 RMM3

4 3 4 2 3

TC#3112 DIPPR RMM4 RMM3 RMM2

4 4 4 3 2

+ + + + +

MSE

AICc

6.954 × 10-16 1.221 × 10-15 1.227 × 10-15 2.647 × 10-15 2.597 × 10-15

-1773 -1746 -1744 -1707 -1707

7.54 × 10-16 1.32 × 10-15 1.88 × 10-15 5.79 × 10-13 2.22 × 10-12

-967 -950 -941 -782 -746

TC#3271 RMM4 RMM3 RMM2 DIPPR

4 4 3 2 2

RMM2 RMM3 DIPPR RMM4

2 3 3 4

TC#1327 RMM3 RMM4 RMM2 DIPPR

3 3 4 2 2

+ + + + -

6.922 × 10-8 1.623 × 10-7 2.329 × 10-7 2.451 × 10-6 2.777 × 10-6

-800.6 -758.8 -742.5 -628.5 -622.4

5.164 × 10-7 4.538 × 10-7 5.288 × 10-7 4.594 × 10-7

-106.1 -99.27 -98.05 -82.29

6.165 × 10-8 7.442 × 10-8 8.049 × 10-8 1.449 × 10-7 1.882 × 10-7

-374.8 -370.5 -366.6 -357.0 -351.0

Argon

+ + + + +

8.45 × 10-16 1.02 × 10-15 3.02 × 10-14 1.16 × 10-13 4.07 × 10-13

+ + + -

+ + + + Nitrogen

Nitrogen 6 4 3 3 2

normality Oxygen

Argon

TC#6701 RMM4 DIPPR RMM3 RMM2

9481

-3,223 -3204 -2891 -2765 -2649

+ + + + +

a RMM models refer to eq 7. DIPPR refers to eq DIPPR 100 from the DIPPR database.

a

RMM models refer to eq 7. DIPPR refers to eq DIPPR 102 from the DIPPR database.

displays a demonstrative percentage error plot for argon, obtained by RMM2 (eq 7). Table 17 shows that, in terms of goodness-of-fit, the simple RMM2 outperforms DIPPR for all three data sets. Table 18 shows that stability statistics indicate multicollinearity for the TC model, a sure sign of overfitting. The simple RMM2 performs better than the DIPPR model, both in terms of the goodness-of-fit and stability regarding all three data sets. Vapor Thermal Conductivity (VTC, Kvap). The plot of the oxygen data is given in Figure S10 (Supporting Information). Analysis by TC on the original scale for oxygen showed that the leader model is (eq 6307 in TC) κvap ) a + bX + cX2 + ... + jX9 Figure 6. Percentage deviations, ∆ηvap ) ηvap(expt.) - ηvap(calc.), of the nitrogen experimental vapor viscosity, ηvap(expt.), from calculated values, ηvap(calc.), using RMM4 (eq 7) as function of temperature T/K.

X ) log(T)

Analysis of argon by TC, on the original scale, showed that the leader model is (eq 2156 in TC)

argon, only values for the first three parameters are suggested, which implies that DIPPR leads to a quadratic model. RMM modeling, contrary to expectations, delivers superior performance for all models. Tables 17 and 18, respectively, display the goodness-of-fit and stability statistics. Figure 7

κvap ) a + bT + c log(T) +

d √T

Analysis of nitrogen by TC, on the original scale, showed that the leader model is (eq 6405 in TC)

Table 16. Stability Statistics for VV, ηvap TC parameter

value

DIPPR confidence interval width

RMM4 confidence interval width

value

value

confidence interval width

Oxygen a b c d PRESS

6.93 × 10-5 7.56 × 10-6 1.4 × 10-12 2.73 × 10-12 2.8 × 10-13 4.40 × 10-6 2.92 × 10-6 -3.37 × 10-5 3.692 × 10-14

1.176 × 10-6 0.5563 105.9

a b c d PRESS

5.51 × 10-5 0.35 × 10-5 1.13 × 10-7 3.55 × 10-6 4.07 × 10-9 1.69 × 10-7 1.33 × 10-6 -2.71 × 10-5 2.342 × 10-14

1.12 × 10-6 0.5808 110.8 -1505

1.86 × 10-7 0.02202 13.06

6.859 × 10-14

0.8806 -0.4228 0.2650 0.07919

0.1958 0.2995 0.1207 0.05699

6.403 × 10-14

Argon 6.53 × 10-8 7.188 × 10-3 9.789 697.2

0.6944 -0.2184 0.9661 0.03193

5.98 × 10-8 0.01216 10.51

0.4694 -0.2534 0.5055 0.1018

5.025 × 10-14

0.01034 0.05489 0.02120 0.01535

7.359 × 10-14

Nitrogen a b c d PRESS

6.27 × 10-7 0.6136 48.12 8.260 × 10-14

2.947 × 10-12

0.01116 0.03488 8.535 × 10-3 8.389 × 10-3

9.629 × 10-14

9482

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

Table 18. Stability Statistics for LTC, Kliq TC parameter

value

DIPPR confidence interval width

value

RMM2 confidence interval width

value

confidence interval width

3.586 × 10-3 3.39 × 10-5

-0.1402 0.6345

3.416 × 10-3 0.07591

Oxygen a b c d PRESS

1.243 -0.2276 -56.96 2.63 · 1020

3.926 × 10-6

0.01499 2.87 × 10-3 1.644 5.30 · 1020

0.2701 -1.330 × 10-3

1.470 × 10-4

1.351 × 10-4

Argon a b c PRESS

7.65 × 10-3 0.1643

-0.1669 0.3376

0.3036 0.04466 -2.440 × 10-3 7.83 × 10-4 -6 3.42 × 10-6 5.044 × 10 7.873 × 10-6

8.344 × 10-6

Nitrogen a b c PRESS

κvap ) a + bX +

-1.776 0.05371 -5.64 × 10-7 2.15 × 10-8 35.76 66.26 1.645 × 10-6

0.2622 -1.645 × 10-3

c e g i + dX2 + 2 + fX3 + 3 + hX4 + 4 X X X X X ) log(T)

DIPPR’s acceptable model is identical to that for VV (DIPPR 102). A three-parameter version of this model is proposed for oxygen and argon, and a four-parameter model is proposed for nitrogen. A comparison of the five models of TC, DIPPR, and RMM is given in Table 19 for the goodness-of-fit and in Table 20 for the stability. With regards to both dimensions, RMM4 outperforms the DIPPR model. Although parameters’ estimates for the selected TC models for oxygen and nitrogen are not shown in Table 20 (there are 10 for oxygen and 9 for nitrogen), stability checking indicates strong multicollinearity. The PRESS statistic is also missing for these models due to numerical complexity in using MATHEMATICA for such models. Summary Comparison and Practical Conclusions Summary Comparison. Table 21 displays sample sizes for all four substances (including water) and for all properties (in the order in which they appear in DIPPR, identical to the order in Appendix A, Supporting Information). Table 22 displays models’ rankings, in terms of the goodness-of-fit and stability, as obtained by TC, DIPPR, and RMM (with its three versions). Entries denote 1 for the best model. Values for PRESS are occasionally missing for TC due to the large size of the model’s

Figure 7. Percentage deviations, ∆κliq ) κliq (expt.) - κliq (calc.), of the argon experimental liquid thermal conductivity, κliq (expt.), from calculated values, κliq (calc.), using RMM2 (eq 7) as function of temperature T/K.

2.861 × 10-3 2.84 × 10-5

1.739 × 10-3 0.04737

-0.1135 0.5283

4.955 × 10-6

4.647 × 10-6

Table 19. Goodness-of-Fit Statistics (MSE and AICc) for the Vapor Thermal Conductivity, Kvapa model

number of parameters

TC#6307 RMM4 DIPPR RMM3 RMM2

10 4 3 3 2

TC#2156 RMM4 DIPPR RMM3 RMM2

4 4 3 3 2

TC#6405 RMM4 DIPPR RMM3 RMM2

9 4 4 3 2

normality

MSE

AICc

1.040 × 10-9 5.684 × 10-8 7.896 × 10-8 7.852 × 10-7 1.017 × 10-6

-807.4 -659.8 -648.2 -556.3 -547.4

8.134 × 10-10 4.714 × 10-9 4.310 × 10-8 1.063 × 10-6 2.068 × 10-5

-577.6 -528.4 -468.3 -378.6 -297.1

1.762 × 10-9 1.943 × 10-8 1.152 × 10-7 1.115 × 10-6 1.217 × 10-6

-788.9 -702.7 -631.5 -542.3 -540.2

Oxygen + + + Argon + + + + Nitrogen + + + + -

a RMM models refer to eq 7. DIPPR refers to eq DIPPR 102 from the DIPPR database.

parameters set (this occasionally made calculation of PRESS numerically prohibitive). Property 8 (IGHC) is not monotone convex throughout the whole range. Although this property occasionally delivered a good fit for RMM models, lack of monotone convexity violated one of the basic assumptions of RMM. Property 8 was therefore excluded from the summary tables showing rankings across all properties (refer to the next two tables). Table 23 is a frequency table showing rankings in terms of MSE and AICc (combined) for all substances across properties (property 8 was excluded for reasons just explained). Table 24 similarly displays frequencies for the stability measure, PRESS. We realize that upon comparison of DIPPR’s acceptable models to models derived via RMM (which is the comparison of relevance, as explained in BSS6), RMM fares favorably. For example, comparing the three approaches in terms of being best or second best, in terms of goodness-of-fit, TC appears 69 times, RMM 58 times, and DIPPR 25 times (Table 23). Likewise, in terms of stability, RMM appears 32 times, TC 24 times, and DIPPR 20 times (Table 24). Practical Conclusions. (1) Observing models offered by TC and the associated stability statistics, it seems that in many cases, models offered by TC are overfitting the data, resulting in models with a number of parameters that is larger than required

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

9483

Table 20. Stability Statistics for VTC, Kvap TC parameter

value

DIPPR confidence interval width

RMM4 confidence interval width

value

value

confidence interval width

Oxygen a b c d PRESS

0.9525 -0.06431 7288

0.3850 0.02756 1810 4.148 × 10-6

0.8080 0.1062 -0.07980 0.1412 0.5624 0.03323 -3 0.07779 1.840 · 10 3.309 × 10-6

Argon a b c d PRESS

-0.1120 2.493 × 10-3 1.49 × 10-7 1.49 × 10-5 0.01904 3.38 × 10-4 0.2904 0.01027 2.623 × 10-8

6.576 × 10-4 0.6178 75.29

8.86 × 10-5 0.01653 18.05

2.150 × 10-6

0.6831 -0.3615 1.478 0.06290

0.02075 0.09539 0.05313 0.01453

6.048 × 10-7

Nitrogen 3.684 × 10-4 1.03 × 10-4 0.7593 0.03469 33.57 41.48 -609.7 3087 -6 5.498 × 10

a b c d PRESS

Table 21. Sample Sizes Used for All Four Substances sample size no.

property

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

solid density (SD) liquid density (LD, F) solid vapor pressure (SVP) vapor pressure (VP, P) enthalpy of vaporization (EV, Hvap) solid heat capacity (SHC) liquid heat capacity (LHC, Cp) id ) ideal gas heat capacity (IGHC, Cgas second virial coefficient (SVC, B) liquid viscosity (LV, ηliq) vapor viscosity (VP, ηvap) solid thermal conductivity (STC) liquid thermal conductivity (LTC, κliq) vapor thermal conductivity (VTC,κvap) surface tension

oxygen argon nitrogen water 49 177 22 24 19 23 9 51 49 40

31 57 31

68 20

24 13 28 19 8 28

11 19 33 23 93 32 23 40

11 64 22 49 41 28 16 54 31 41 18 18 41

or justified, in view of the available data. Although the AICc statistic is supposed to guard against such overfitting, it is the research team’s feeling that this measure is not sufficient for the intended purpose, and stability checking is always very much desired. Here, the statistic PRESS, rarely used in chemical engineering research (as inspection of related literature can attest), is very effective in identifying ill-conditioning. To realize that, let us refer to modeling nitrogen vapor pressure. Here, neither MSE nor AICc indicate ill-conditioning for the TC model (Table 3). However, the associated huge PRESS value (Table 4) obviously indicates ill-conditioning. In this case, only RMM2 seems to indicate that the model is stable. As another example for TC tendency to offer overfitted models with too many parameters, let us observe oxygen liquid density. Here, TC offers a model with 15 parameters. Although stability statistics for this model are not shown, one would doubt that a model with such a high number of parameters is useful and can ever be considered a stable model. (2) Once initial RMM modeling indicates a linear relationship in the available data, there is no justification to continue using RMM for parameter estimation. Since RMM4 includes the linear case as a special case, it seems good practical advice that upon modeling via RMM, the model RMM4 should first be used. If parameters’ estimates, together with the associated confidence intervals, point toward a linear relationship (for example, b ) d ) 0 is indicated), then obviously, a linear relationship is suggested by the model, and simple linear regression should be used to derive estimates for the linear model. Similar practices can be followed if any of the other special cases of the general

1.040 0.1639 0.6052 -0.1700

0.1054 0.05342 8.637 × 10-3 0.06841

2.729 × 10-6

RMM model are indicated by parameters’ estimates and the respective confidence intervals. (3) Since TC fits a different model to each property-substance combination while DIPPR attempts to deliver a uniform model for each property, it is natural to expect that in most cases, TC would deliver a better goodness-of-fit. However, comparing DIPPR to RMM, the latter approach goes a step further (relative to the former) in that it offers uniform models to all properties. As realized from Table 22, RMM’s general platform for modeling monotone convex relationships (and other types, like S-shaped relationships) is effective in both the goodness-of-fit and stability comparisons. The uniformity of practice offered by RMM and the variety of special cases that it offers (only a certain version, assuming a single error term, was used here) may qualify it to serve as a general purpose platform for empirical modeling of chemical relationships. (4) The uniformity of the RMM approach extends to its parameters’ values. As realized from the RMM estimates in the various tables, most values are confined to the {-1,1} range. This stands in stark contrast to parameters’ estimates offered by the other approaches (TC and DIPPR). The practical advantage of this observation can hardly be overestimated. When a model is fit to the data, restricting the search routine to a uniform range for all parameters would almost surely guarantee an optimum global solution (that is, globally best estimates). Such an assertion cannot be extended to platforms where different models are used either for each property or for each substance-property combination. Conclusions In this paper, results are presented from a comprehensive research study, aimed at using RMM to model chemical relationships and comparing them to models obtained by other approaches. A newly developed routine to select, in terms of goodness-of-fit criteria, the best empirical model out of a set of given models was used to select models provided by TC. The selected model then served as a baseline for further comparisons between models offered by DIPPR and RMM. Two dependent criteria were used for the goodness-of-fit (MSE and AICc), and two criteria were used for the stability (relative width of 95% confidence intervals of estimated parameters and PRESS). PRESS was found to be an effective measure for the lack of stability, particularly for models with a large number of parameters, where AICc was unable to detect overfitting.

9484

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010

Table 22. Rankings of Models According to Goodness-of-Fit (MSE, AICc) and Stability (PRESS) Statisticsa oxygen property

measure

1. SD

p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS p MSE AICc PRESS

2. LD

3. SVP

4. VP

5. EV

6. SHC

7. LHC

8. IGHC

9. SVC

10. LV

11. VV

13. LTC

14. VTC

15. ST

a

TC

15 1 1

7 1 1 1 4 1 1 1

DIPPR

4 3 3 2

5 3 3 3 4 2 2 2

argon RMM

TC

DIPPR

nitrogen RMM

4 2 2 1

2 2 2 2 4 3 3 3

3 2 2 2 11 1 1

5 3 3 3 5 2 2

4 1 1 1 2 3 3

4 2 2 2 3 1 1 1 4 1 1 1 4 1 1 1 10 1 1

5 3 3 3 3 2 2 2 3 2 2 3 2 3 3 3 3 3 3 2

4 1 1 1 2 3 3 3 4 3 3 2 2 2 2 2 4 2 2 1

TC

11 1 1

5 1 1 1 2 1 1 1

4 2 2 2 4 1 1 1 4 1 1 1

4 1 1 1

5 3 3 2 2 3 3 3

5 3 3 3 5 3 3 3 4 2 2 2 3 2 2 1 3 3 3 3

2 2 2 3 2 2 2 2

4 1 1 1 2 2 2 2 4 3 3 3 2 1 1 2 4 2 2 2

7 1 1 3 9 2 2

DIPPR

4 3 3 1

5 3 3 2 4 1 1 1

water RMM

4 2 2 2

2 2 2 1 4 3 3 2

4 1 2 1 9 2 2

5 3 3 3 5 1 1

2 2 1 2

7 1 1 1 4 1 1 1 6 1 1 1 3 1 1 1 9 1 1

5 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 4 3 3 2

4 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 4 2 2 1

TC

DIPPR

RMM

2 1 1 1 16 1 1

2 1 1 1 5 2 2 1 5 2 2 2 5 3 3 2 5 2 2 1 2 1 1 1 5 2 3 3 5 2 2

2 1 1 1 4 3 3 2 3 3 3 3 4 2 2 1 4 3 3 2 2 1 1 1 4 1 1 1 3 3 3

5 1 1 1 4 2 2 3

5 3 3 1 3 3 3 3 4 3 3 2

4 2 2 2 3 2 2 2 3 1 1 1

5 1 1 1 5 3 3 3

4 3 3 3 4 2 2 2

3 2 2 2 4 1 1 1

3 1 1 1 9 1 1 21 1 1 2 1 1 1 4 3 2 2 12 1 1 8 1 1

1 is best. The p is the number of parameters. RMM relates to any of its models.

As expounded and demonstrated in the Practical Conclusions subsection and throughout this paper, the RMM approach provides four highly desirable modeling features. (1) Uniformity of modeling in that a single origin model (or some basic variations thereof, as used in this paper) provides a platform for modeling different properties belonging to different substances. (2) Model stability resulting from the small number of parameters associated with RMM models. (3) Sound theoretical basis in that RMM demonstrability represents monotone convexity as a continuous spectrum rather than as a discrete-valued ladder of functions. This

enhances the RMM success probability in fitting well monotone convex data. (4) RMM is a truly empirical modeling approach in that the final structure of the model (belonging to the ladder) is determined by the data rather than in advance. This property is inherent to any RMM modeling that uses the RMM origin model, and it was amply demonstrated earlier when RMM had been introduced. (Note that this property was somewhat obscured in the above data analyses since the three models used were simplified off-spins (variations) of the origin model; the reader is referred for further details to ref 1.)

Ind. Eng. Chem. Res., Vol. 49, No. 19, 2010 Table 23. Goodness-of-Fit (MSE plus AIC) Frequency Table of the Rankings of Models Across All Propertiesa Goodness-of-Fit (MSE, AICc)

oxygen argon nitrogen water sum

a

rank

1

2

3

missing

sum

TC DIPPR RMM TC DIPPR RMM TC DIPPR RMM TC DIPPR RMM TC DIPPR RMM

14 0 4 10 0 4 15 2 1 18 4 10 57 6 19

4 6 8 2 4 8 3 0 15 3 9 8 12 19 39

0 12 6 0 10 2 0 16 2 3 11 6 3 49 16

8 8 8 14 12 12 8 8 8 2 2 2 32 30 30

26 26 26 26 26 26 26 26 26 26 26 26 104 104 104

Table 24. Stability (PRESS) Frequency Table of Rankings of Models Across All Propertiesa Stability (PRESS)

argon nitrogen water sum

a

tion of the data quality for the analyzed properties of oxygen, argon, and nitrogen. Table 1. Data for the oxygen enthalpy of vaporization, EV (Hvap/J · kmol-1). Figure S1. Plot of data for the nitrogen liquid density (kmol · m-3). Figure S2. Plot of data for the oxygen vapor pressure (MPa). Figure S3. Plot of data for the oxygen enthalpy of vaporization (J/kmol). Figure S4. Plot of data for the oxygen liquid heat capacity [J/(kmol · K)]. Figure S5. Plot of data for the oxygen ideal gas heat capacity [J/(kmol · K)]. Figure S6. Plot of data for the oxygen second virial coefficient (m3/kmol). Figure S7. Plot of data for the nitrogen liquid viscosity (Pa · s). Figure S8. Plot of data for the oxygen vapor viscosity (Pa · s). Figure S9. Plot of data for the oxygen liquid thermal conductivity [W/(m · K)]. Figure S10. Plot of data for the oxygen vapor thermal conductivity [W/(m · K)]. This material is available free of charge via the Internet at http:// pubs.acs.org. Literature Cited

1 is best. Property 8 is excluded.

oxygen

9485

rank

1

2

3

missing

sum

TC DIPPR RMM TC DIPPR RMM TC DIPPR RMM TC DIPPR RMM TC DIPPR RMM

5 0 4 5 1 1 5 2 2 5 5 6 20 8 13

2 4 3 1 2 4 0 2 7 1 4 5 4 12 19

0 5 2 0 4 2 1 5 0 2 3 1 3 17 5

6 4 4 7 6 6 7 4 4 5 1 1 25 15 15

13 13 13 13 13 13 13 13 13 13 13 13 52 52 52

1 is best. Property 8 is excluded.

In regards to the empirical modeling property, it seems that RMM remains unmatched by any other existing modeling methodology. A recent paper5 extends application of RMM to predict temperature-dependent properties of pure compounds using molecular descriptors. This modeling is conducted in the framework of the structure-structure correlation (S-S-C) approach. Acknowledgment DIPPR 801 is a project of the Design Institute for Physical Properties, sponsored by AIChE. MATHEMATICA is a registered trademark of Wolfram Research. TableCurve2D is a registered trademark of SYSTAT. Supporting Information Available: Appendix A: Information about data sets used to analyze properties for oxygen, argon, and nitrogen. Appendix B. Data sources and DIPPR classifica-

(1) Shore, H. Response Modeling Methodology (RMM): Empirical Modeling for Engineering and Science; World Scientific Publishing Co. Ltd.: Singapore, 2005. (2) Shore, H. Response Modeling Methodology (RMM) s A new approach to model a chemo-response for a monotone convex/concave relationship. Comput. Chem. Eng. 2003, 27 (5), 715–726. (3) Shore, H. The Random Fatigue Life Model as a special case of the RMM model s A comment on Pascual. Commun. Statistics 2004, 33 (2), 537–539. (4) Shore, H.; Benson-Karhi, D. Forecasting S-shaped diffusion processes via response modeling methodology. J. Oper. Res. Soc. 2007, 58 (6), 720– 729. (5) Shacham, M.; Brauner, N.; Shore, H.; Benson-Karhi, D. Predicting temperature-dependent properties by correlations based on similarity of molecular structures: application to liquid density. Ind. Eng. Chem. Res. 2008, 47 (13), 4496–4504. (6) Benson-Karhi, D.; Shore, H.; Shacham, M. Modeling temperaturedependent properties of water via Response Modeling Methodology (RMM) and comparison with acceptable models. Ind. Eng. Chem. Res. 2007, 46 (10), 3446–3463. (7) Burnham, K. P.; Anderson, D. R. Model Selection and Multimodel Inference; Springer-Verlag: New York, 2004. (8) Daubert, T. E. Evaluated equation forms for correlating thermodynamic and transport properties with temperature. Ind. Eng. Chem. Res. 1998, 37, 3260–3267. (9) Shore, H. Response Modeling Methodology (RMM) s Validating evidence from engineering and the sciences. Qual. Reliab. Eng. Int. 2004, 20, 61–79. (10) Shore, H.; A’wad, F. Statistical comparison of the goodness-of-fit delivered by five families of distributions used in distribution fitting. Commun. Statistics 2010, 39 (10), 1707–1728. (11) Michael, J. R. The stabilized probability plot. Biometrika 1983, 70 (1), 11–17. (12) Hurvich, C. M.; Tsai, C.-L. Regression and time series model selection in small samples. Biometrika 1989, 76, 297–307. (13) Myers, R. H.; Montgomery, D. C. Response Surface Methodology; Wiley: New York, 2002. (14) Wagner, W. New vapor pressure measurements for argon and nitrogen and a new method for establishing rational vapor pressure equations. Cryogenics 1973, 13, 470–482. (15) Wagner, W.; Ewers, J.; Pentermann, W. New vapor pressure measurements and new rational vapour-pressure equation. J. Chem. Thermodyn 1976, 8, 1049–1060.

ReceiVed for reView April 30, 2010 ReVised manuscript receiVed July 3, 2010 Accepted July 20, 2010 IE100981Y