PVT Correlations of Indian Crude Using Support Vector Regression

Sep 1, 2009 - hyperparameters were optimized using a combination of grid search and Nelder-Mead simplex algorithm. The quadratic programming (QP) ...
0 downloads 0 Views 959KB Size
Energy Fuels 2009, 23, 5483–5490 Published on Web 09/01/2009

: DOI:10.1021/ef900518f

PVT Correlations of Indian Crude Using Support Vector Regression Sarit Dutta and J. P. Gupta* Department of Chemical Engineering, Indian Institute of Technology, Kanpur-208016, U. P., India Received May 24, 2009. Revised Manuscript Received August 6, 2009

Correlations for bubble point pressure, solution gas-oil ratio, oil formation volume factor (for both saturated and undersaturated crude) and viscosity (for both saturated and undersaturated crude) have been developed for Indian crude using support vector regression (SVR). Detailed comparisons have been made with various important correlations currently available in the literature. Radial basis function (RBF) kernel was used along with ε-insensitive loss function for developing the SVR models. The model hyperparameters were optimized using a combination of grid search and Nelder-Mead simplex algorithm. The quadratic programming (QP) problem resulting from the SVR formulation was solved using sequential minimal optimization (SMO). It was found that the developed models outperformed most other existing correlations by giving significantly lower values of average absolute relative error for the parameters studied. This study shows highly favorable results that can be integrated in most reservoir modeling software.

significant increase in accuracy over Al-Marhoun’s previous correlations,9,10 also developed for Saudi crude. Osman et al.11 also used ANN (but with a different training algorithm) for OFVF at bubble point pressure based on crudes from the Middle East, Malaysia, Columbia, and Gulf of Mexico fields. Computing techniques other than ANN were also used. Malallah et al.12 used a graphical alternating conditional expectation method for bubble point pressure and OFVF. ElSebakhy13 used support vector regression (SVR) for generating correlations for bubble point pressure and saturated OFVF using three different published PVT databases, which he claimed to outperform empirical and neural network models. All these studies prove that correlations based on data mining techniques are more accurate than empirical correlations. However, most of these correlations were found to be appropriate for the specific region where the parameters have been measured, but not for other regions. The unavailability of a single universal correlation suited for all kinds of crudes underlines the need for specific geographical-area-based correlations, as discussed by Hanafy et al.14 with reference to Egyptian crudes. Among various data mining techniques, SVR has received much attention in recent times among researchers from different fields as a better alternative to ANN.15-18 Historically, ANN was developed heuristically,

Introduction 1

Since Katz developed five methods for predicting crude oil shrinkage in 1942, a considerable volume of literature has accumulated on the study of PVT correlations of crude oil. Various correlations were proposed by different researchers from all over the world, with varying degrees of accuracy in terms of average error, and based on crude samples from different oil fields. Most of these correlations were developed empirically using graphical or regression methods. A comprehensive review of various empirical correlations is provided by Sutton.2 In the late 1990s, rapid growth of data mining techniques and computational capability led some researchers3-6 to use advanced soft computing tools for the development of PVT correlations, especially artificial neural networks (ANN). Elsharkawy and Gharbi7 compared viscosity correlations developed using classical regression methods and those with ANN and concluded that the latter performed significantly better than the former. Al-Marhoun and Osman8 have presented correlations for bubble point pressure and saturated OFVF using MLP trained by backpropagation with early stopping for Saudi Arabian crudes. This study showed *To whom correspondence should be addressed. E-mail: jpg@ iitk.ac.in. (1) Katz, D. L. Drill. Prod. Prac., API 1942, 137–147. (2) Sutton, R. P. In Petroleum Engineering Handbook; SPE, Richardson, TX, 2006; Vol. 1, pp 256-331. (3) Gharbi, R. Energ. Fuels 1997, 11, 372–378. (4) Elsharkawy, A. M. Modeling of properties of crude oil and gas systems using RBF network. SPE Paper 49961, SPE Asia Pacific Oil and Gas Conference and Exhibition, Perth, Australia, October 12-14, 1998. (5) Gharbi, R. B.; Elsharkawy, A. M.; Kartoub, M. Energ. Fuels 1999, 13, 454–458. (6) Gharbi, R.; Elsharkawy, A. M. SPE Reserv. Eval. Eng. 1999, 2, 255–265. (7) Elsharkawy, A. M.; Gharbi, R. Adv. Eng. Softw. 2001, 32, 215– 224. (8) Al-Marhoun, M. A.; Osman, E. A. Using artificial neural networks to develop new PVT correlations for Saudi crude oils. SPE Paper 78592, Abu Dhabi International Petroleum Exhibition and Conference, Abu Dhabi, UAE, October 13-16, 2002. (9) Al-Marhoun, M. A. J. Petrol. Technol. 1988, 40, 650–666. (10) Al-Marhoun, M. A. J. Can. Petrol. Technol. 1992, 31, 22–26. r 2009 American Chemical Society

(11) Osman, E. A.; Abdel-Wahhab, O. A.; Al-Marhoun, M. A. Prediction of oil PVT properties using neural networks. SPE Paper 68233, Middle East Oil Show, Bahrain, March 17-20, 2001. (12) Malallah, A. M.; Gharbi, R.; Algharaib, M. Energ. Fuels 2006, 20, 688–698. (13) El-Sebakhy, E. J. Petrol. Sci. Eng. 2009, 64, 25–34. (14) Hanafy, H.; Macary, S. M.; ElNady, Y. M.; Bayomi, A. A.; El Batanony, M. H. Empirical PVT correlations applied to Egyptian crude oils exemplify significance of using regional correlations. SPE Paper 37295, International Symposium on Oilfield Chemistry, Houston, TX, February 18-21, 1997. (15) Yu, P. S.; Chen, S. T.; Chang, I. F. J. Hydrol. 2006, 328, 704–716. (16) Gandhi, A. B.; Joshi, J. B.; Jayaraman, V. K.; Kulkarni, B. D. Chem. Eng. Sci. 2007, 62, 7078–7089. (17) Rajasekaran, S.; Gayathri, S.; Lee, T. Ocean Eng. 2008, 35, 1578– 1587. (18) Zheng, L. G.; Zhou, H.; Cen, K. F.; Wang, C. L. Expert Syst. Appl. 2009, 36, 2780–2793.

5483

pubs.acs.org/EF

Energy Fuels 2009, 23, 5483–5490

: DOI:10.1021/ef900518f

Dutta and Gupta

and the amount up to which deviations greater than ε are tolerated. Two slack variables ξi g 0 and ξ* i g 0 are now introduced for each training instance xi, so that the corresponding label ti can be expressed as:

that is, the theory for ANN developed following extensive trials and experimentation, whereas SVR was developed the other way round. ANN models are prone to local minima problems, which are significantly absent in SVR models.19 Unlike ANN, SVR automatically selects the model size by selecting the support vectors. Moreover, SVR gives a sparse solution, which makes the algorithm computationally less intensive compared to ANN. These advantages often lead to SVR models outperforming ANN models in practice. The current study aims to present correlations for the PVT behavior of Indian crude, based on samples from various onshore oil fields in western India, especially Gujarat, using SVR.

ti e yi þ ε þ ξi  ti g yi - ε - ξi

where equality holds for ti outside the ε-insensitive tube and inequality holds for ti inside the tube. Substituting ti from eq 5 into eq 4 and expressing the latter in terms of the slack variables: N X 1 ðξ þ ξÞ þ wT w ð6Þ E~ε ¼ C 2 i ¼1

Methodology of SVR

Minimizing eq 6 is a quadratic programming (QP) problem that can be expressed as: N X 1  minimize C ðξi þ ξi Þ þ wT w 2 8i ¼1 > ð7Þ < ti - yi ðxÞ e ε þ ξi  subject to yi ðxÞ - ti e ε þ ξi  > : ξi , ξi g 0

The methodology of SVR is discussed in detail by Cristianini and Shawe-Taylor19 and Smola and Scholkopf,20 and only a brief outline is provided below. Suppose there are N instances x1,..., xN and corresponding labels t1,..., tN where xi ∈ RN and ti ∈ R. It is required to determine the function mapping the instances to the labels. A linear regression model can be expressed as a linear combination of fixed basis functions φ(x). Hence, yðxÞ ¼ wT φðxÞ þ b

The QP problem above can be solved more easily in its dual form.20 On dualizing using Lagrange multipliers, the problem can be expressed as: N X N 1X   ðRi - Ri ÞðRj Rj ÞÆφðxi Þ, φðxj Þæ maximize 2 i ¼1 j ¼1 N N X X   ti ðRi þ Ri Þ - ε ðRi þ Ri Þ þ ð8Þ i ¼1 i ¼1 8 N X >  < ðRi - Ri Þ ¼ 0 subject to > : i ¼1  Ri , Ri ∈ ½0, C

ð1Þ

where O = (φ1, φ2,..., φm)T is the feature vector. The adaptive parameters w and b can be determined by minimizing the regularized error function E~ with respect to w and b 1 ð2Þ E~ ¼ βED þ F wT w 2 where ED is the sum of the squares of errors over all the instances of the estimation set, and β and F are weighting factors. However, because of lack of sparseness, this solution is computationally very intensive. To obtain a sparse solution, Vapnik21 proposed minimizing a different form of error function, which can be obtained by replacing the sum of squared error term in eq 2 by: ( if jðyðxÞ - tÞj < ε Eε ðyðxÞ - tÞ ¼ 0, jðyðxÞ - tÞj - ε, otherwise

where, Æ., .æ denotes the inner product space; and Ri, Ri*, ηi, and ηi* are the Lagrange multipliers. The transformation from primal to dual problem is given in Smola and Scholkopf20 and Bishop.22 It can be seen from eq 8 that y(x) is expressed only in terms of the inner product of the feature vectors O(x). If a function k: Æx, x0 æ f ÆO(x), O(x0 )æ exists, then eq 8 can be written as: N X N 1X   ðRi - Ri ÞðRj Rj Þkðxi , xj Þ maximize 2 i ¼1 j ¼1 N N X X   - ε ðRi þ Ri Þ þ ti ðRi þ Ri Þ ð9Þ i ¼1 i ¼1 8 N >  : i ¼1  Ri , Ri ∈ ½0, C

ð3Þ where ε is some predefined value. Replacing the squared error term in eq 3, we have: N X F Eε ðyðxi Þ - ti Þ þ wT w E~ ε ¼ 2 i ¼1 ¼C

N X 1 Eε ðyðxi Þ - ti Þ þ wT w 2 i ¼1

ð5Þ

ð4Þ

Such a function k(x,x0 ) is called the Kernel function, which allows determination of ÆO(xi), O(xj)æ without explicitly computing O(x). Each of these kernels has some constants, collectively called hyperparameters, which are selected by the user. The solution of eq 9 is given by:

where, by convention the inverse of the coefficient of the regularization term appears in front of the error term. E~ε is called the ε-insensitive error. This function neglects all deviations |yi-ti| < ε and penalizes otherwise. The constant C > 0 determines the trade-off between the regularization term

yðxÞ ¼

(19) Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods; Cambridge University Press: 2000. (20) Smola, A. J.; Scholkopf, B. Stat. Comput. 2004, 14, 199–222. (21) Vapnik, V. N. The nature of Statistical Learning Theory, 2nd ed.; Springer: New York, 2000.

N X  ðRi - Ri Þkðx, xi Þ þ b

ð10Þ

i ¼1

(22) Bishop, C. M. Pattern Recognition and Machine Learning; Springer: New York, 2006.

5484

Energy Fuels 2009, 23, 5483–5490

: DOI:10.1021/ef900518f

Dutta and Gupta Table 1. Properties of Indian crude

parameter

minimum

maximum

mean

std. deviation

reservoir pressure (psia) reservoir temperature (F) solution GOR (scf/stb) gas gravity (air = 1) API gravity (API) bubble point pressure (psia) saturated OFVF (rb/stb) undersaturated OFVF (rb/stb) dead oil viscosity (cP) saturated oil viscosity (cP) undersaturated oil viscosity (cP)

1286.36 144 10 0.63 15 100 1.00 1.035 0.329 0.1 0.08

5940 298 1998.91 1.54 45 4338 1.99 1.957 25 15.5 27.77

3074.17 224 554.21 0.994 36.33 1557.65 1.31 1.372 1.556 0.969 1.426

1252.24 44.64 428.64 0.173 8.45 856.43 0.234 0.248 2.586 1.790 3.862

Table 2. Details of SVR Models and Input Variables parameter

input variables

complete data set size

training set size

test set size

No. of support vectors

bubble point pressure solution GOR saturated OFVF undersaturated OFVF saturated oil viscosity undersaturated oil viscosity

T, Rs, γg, γo T, Pb, γg, γo T, Rs, γg, γo T, P, Pb, γg, γo, Bob, Rsb Pb, μod, γg, γo P, Pb, μob

372 372 530 263 435 252

298 298 424 211 348 202

74 74 106 52 87 50

273 288 405 182 304 74

and N X  ðRi - Ri Þφðxi Þ w ¼

solving the QP problem in eq 8. LIBSVM package by Chang and Lin23 was used to solve the QP problem. LIBSVM uses a fast and efficient implementation of the widely used sequential minimal optimization (SMO) method24,25 for solving large QP problems. This package was also used for testing the performance of the models on data not used during training. Choosing the Kernel and Its Hyperparameters. There are four commonly used kernels: linear, radial basis function (RBF), polynomial, and sigmoidal. Linear kernels are preferred when the number of features is very large and nonlinear mapping may not significantly improve performance; for lesser number of features nonlinear mapping gives better results.26 Although RBF, polynomial, and sigmoidal kernels all provide nonlinear mapping, under some parameters21 sigmoidal kernel is not valid (i.e., not the inner product of two vectors). Polynomial kernels have more hyperparameters, and the kernel values can range from zero to infinity, which may cause numerical difficulties. In contrast, RBF kernels involve determination of only two hyperparameters, and the kernel value ranges from zero to one, so they are simpler to handle. On the basis of these recommendations, RBF kernel was chosen for all the SVR models developed. The RBF kernel is given by the following equation:

ð11Þ

i ¼1

Equation 10 is called the support vector expansion. As shown in Smola and Scholkopf,20 the Lagrange multipliers are nonzero only for those ti which are outside the ε-tube, so that the coefficient (Ri - Ri*) of O(xi) vanishes for all xi inside the ε-tube. All xi with nonvanishing coefficients are called support vectors. Using the Karush-Kuhn-Tucker conditions, which state that at the point of solution the product of the dual variables and the constraints should vanish, b can be computed.22 b ¼ ti - ε - wT φðxi Þ N X  ¼ ti - ε ðRj - Rj Þkðxi , xj Þ

ð12Þ

j ¼1

Development of SVR Models Data Acquisition. Data used for this study was provided by the Institute of Reservoir Studies, Ahmedabad, a unit of Oil and Natural Gas Corporation (ONGC), India, by analysis of several bottom hole samples from various onshore oil fields in Western India, especially Gujarat. Each data set was checked for any missing data and if found, such points were rejected. After this, we were left with 372 data sets for bubble point pressure and GOR, 530 for saturated oil OFVF, 263 for undersaturated oil OFVF, 435 for saturated oil viscosity, and 252 for undersaturated oil viscosity. Statistical analysis of different crude oil properties is shown in Table 1. Six SVR models were developed, one for each of the PVT properties listed in Table 2. Approximately 20% of the data was used as test set and the rest as training set. For each property, the sizes of the test and training sets, number of support vectors, and input variables are listed in Table 2. Training data was preprocessed by normalization so that all values lie between zero and one. Developing the model comprises choosing a kernel, fixing a value of ε and C, and

kðxi , xj Þ ¼ expð - λjjxi - xj jj2 Þ

ð13Þ

For a RBF kernel based SVR models, three quantities need to be fixed before building the model: the kernel parameter λ > 0, the penalty factor C > 0, and ε g 0. There is no clearly defined method for choosing the hyperparameters, and it is currently an active area of research in learning theory. A review of various existing methods has been provided by Cherkassky and Ma.27 The (24) Platt, J. C. In Advances in Kernel Methods - Support Vector Learning; Scholkopf, B., Burges, C. J. C., Smola, A. J., Eds.; MIT Press: Cambridge, MA, 1998; Ch. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. (25) Joachims, T. In Advances in Kernel Methods - Support vector Learning; Scholkopf, B., Burges, C. J. C., Smola, A. J., Eds.; MIT Press: Cambridge, MA, 1998; Chapter Making Large-Scale SVM Learning Practical. (26) Hsu, C.; Chang, C. C.; Lin, C. J. A Practical Guide to Support Vector Classification; Technical Report, 2008. (27) Cherkassky, V.; Ma, Y. Neural Networks 2004, 17, 113–126.

(23) Chang, C. C.; Lin, C. J. LIBSVM: A Library for Support Vector Machines; 2001. Available at: http: //www.csie.ntu.edi.tw/∼cjlin.

5485

Energy Fuels 2009, 23, 5483–5490

: DOI:10.1021/ef900518f

Dutta and Gupta

Figure 1. Grid search for SVR model for bubble point pressure.

parameter C determines the trade-off between the model complexity and the degree to which deviations larger than ε are tolerated (eq 6). If C is too large (infinity), then the objective is to minimize the ε-insensitive error only, without regard to model complexity part in the optimization formulation, whereas a very low value of C serves only to increase the flatness of the model. The parameter ε controls the width of the ε-insensitive zone, used to fit the training data. A very high value of ε can considerably reduce the number of support vectors used to construct the regression function leading to higher error; whereas a very low value will increase the number of support vectos, thereby increasing model complexity and reducing sparseness of the model. The kernel parameter λ > 0 is the width or variance of the Gaussian distribution. The problem can be viewed as a constrained optimization problem (note that the hyperparameters are not bounded, rather they have inequality constraints), where one seeks to minimize the k-fold crossvalidation error for the training set for a particular set of hyperparameter values. Following Hsu et al.,26 a simple grid search method was used to determine optimum values of the hyperparameters. For a user-defined value of ε, first a coarse grid search is conducted, which gives an idea of where the minima may lie from the contour plot; then a finer grid search is performed in the region showing lower values of the objective function. In addition to this, the Nelder-Mead simplex algorithm was used to pinpoint the minima. An example is provided below for implementing this technique in the case of bubble point pressure. ε was varied from 10-1 to 10-5. For each value of ε, a coarse grid search was performed to find the minima of 10-fold cross-validation error in the range of 2-5 to 210 for C, and 2-15 to 25 for λ with exponentially increasing grid size. An exponentially increasing grid size was chosen because it allows sampling of the function over a large area. The contour plot for coarse grid search for bubble point pressure is shown in Figure 1a. From the figure, it can be seen that the minima lies within the contour of function value 10. Next, a fine grid search was performed (Figure 1b) within the region identified from Figure 1a. Following this, any point was chosen from within the lowest contour, which serves as the

Figure 2. Variation of Eaar and number of support vectors with ε for bubble point pressure model.

initial point for simplex algorithm. Various initial points were chosen, and the one giving lowest cross-validation error was taken as the final value. The simplex minimization was performed using MATLAB Optimization Toolbox 3.0.1. Variation of Eaar and number of support vectors with ε are shown in Figure 2. As can be seen from the figure, in the case of bubble point pressure for ε e 10-2.7, the error and number of support vectors level off. Hence, ε = 10-2.7 and the corresponding values of C(= 73.6143) and λ (= 2.6854) were chosen for building the SVR model. A similar technique was used to determine the hyperparameters for other models, the values of which are shown in Table 3. Results and Discussions Statistical analysis was performed on the developed SVR models to check their performance for data not used in building the model. The statistical measures used were average absolute relative error (Eaar), maximum (Emax) and minimum (Emin) absolute relative error, correlation coefficient (r2), and 5486

Energy Fuels 2009, 23, 5483–5490

: DOI:10.1021/ef900518f

Dutta and Gupta

Table 3. Parameters of RBF Kernel for Different SVR Models PVT property

λ

C

bubble point pressure sol. GOR saturated OFVF undersaturated OFVF saturated oil viscosity undersaturated oil viscosity

73.0852 7.6143 32.3703 47.1000 65.2094 20.4551

Table 9. Statistical Analysis of SVR Model for Undersaturated Oil Viscosity

ε

2.6854 1.1227 2.5856 0.2312 0.5004 0.0513

-2.7

10 10-2.9 10-3 10-3.3 10-3.8 10-4

Eaar (%)

Emax(%)

Emin(%)

r2 (%)

σare

training set test set complete data set

1.848 4.859 2.447

33.633 48.844 48.844

0.019 0.021 0.019

99.75 99.47 99.70

3.465 7.328 4.651

Table 5. Statistical Analysis of SVR Model for Solution GOR data

Eaar (%)

Emax(%)

Emin(%)

r2 (%)

σare

training set test set complete data set

6.460 7.744 6.715

71.681 32.211 71.681

0.001 0.029 0.001

98.31 98.61 98.37

8.742 7.425 8.502

Table 6. Statistical Analysis of SVR Model for Saturated OFVF data

Eaar (%)

Emax(%)

Emin(%)

r2 (%)

σare

training set test set complete data set

0.515 1.067 0.625

8.889 8.747 8.889

0.001 0.001 0.001

99.66 99.39 99.60

1.241 1.583 1.333

Table 7. Statistical Analysis of SVR Model for Undersaturated OFVF data

Eaar (%)

training set test set complete data set

0.057 0.088 0.063

Emax(%) 1.836 0.680 1.836

Emin(%) -4

1.0  10 0.003 1.0  10-4

r2 (%)

σare

99.99 99.99 99.99

0.183 0.115 0.172

Table 8. Statistical Analysis of SVR Model for Saturated Oil Viscosity data

Eaar (%)

Emax(%)

Emin(%)

r2 (%)

σare

training set test set complete data set

11.467 12.297 11.633

158.590 121.629 158.590

0.027 0.079 0.027

99.79 99.75 99.78

18.114 18.174 18.108

standard deviation of absolute relative error (σare). These are defined below.  N   1X xexpt - xpredicted   100% ð14Þ Eaar ¼   N xexpt i ¼1

Emax

Emin

" # xexpt - xpredicted    100% ¼ max   xexpt " # xexpt - xpredicted    100% ¼ min   xexpt

Eaar (%)

Emax(%)

Emin(%)

r2 (%)

σare

training set test set complete data set

3.784 3.627 3.752

23.342 21.849 23.342

0.007 0.041 0.007

99.94 99.86 99.90

4.395 4.627 4.434

developed, best fit to the data is achieved in the case of undersaturated OFVF, with, Eaar, r2, Emax, Emin, and σare of 0.063%, 99.99%, 1.836%, 0.0001%, and 0.172, respectively. The error measures for bubble point pressure, solution GOR, saturated OFVF, and undersaturated oil viscosity are all within acceptable range. But, from Figure 3f it may be observed that there are certain localized deviations from the 45 line. The error measures for saturated oil viscosity are high compared to the other models, although the predicted values correlate highly with the experimental data. Comparison with Existing Correlations. The performance of the SVR models was compared with popular existing correlations in terms of Eaar, r2, Emax, Emin, and σare The results are shown in Tables 10-15. Bubble Point Pressure. From Table 10 it can be observed that Eaar is least (2.447%) in the case of the SVR model. Among various empirical models, Eaar is lowest in the case of Standing,28 with a value of 19.386%. Elsharkawy and Alikhan’s29 correlation show a comparatively higher Eaar value of 26.741%, although the other empirical models have Eaar close to that of Standing’s. The correlation coefficient for SVR (99.70%) is higher than that of the empirical correlations (≈93%). In terms of maximum error also, the SVR model surpasses the other correlations, with a value of 48.844%. SVR also gives the lowest σare value at 4.651, whereas that of empirical correlations is around 16%. On the basis of the above discussion, it can be said that the SVR model shows the best performance and should be used for evaluating bubble point pressure for Indian crude. Solution GOR. Among various solution GOR models, the SVR model (Table 11) shows minimum Eaar of 6.715%, but the empirical models show comparatively higher values of more than 20%. The SVR model exhibits a correlation coefficient of 98.37%, but higher than that of empirical models (≈ 93%). Emin is less than 0.1% for all models, but Emax varies widely from 51.297% for Glasø31 to 178.731% for Elsharkawy and Alikhan.29 σare increases from 8.502 for SVR to 33.357 for Elsharkawy and Alikhan’s29 correlation. This clearly shows that the SVR model outperforms the existing correlations. Saturated OFVF. From Table 12, it can be observed that although the SVR model gives best performance on all counts, the performance of the other correlations are marginally worse. Al-Marhoun’s9 correlation produces comparatively larger Eaar (5.051%), but his improved correlation10 published in 1992 considerably reduces the error to 2.786% and brings it closer to the results of the SVR model. Undersaturated OFVF. From Table 13, it can be seen that the SVR model shows the lowest Eaar of 0.063%. The empirical models show comparatively higher values ranging from 2.486 to 4.887%. The SVR model also correlates highly with the experimental values, having correlation coefficient

Table 4. Statistical Analysis of SVR Model for Bubble Point Pressure data

data

ð15Þ

ð16Þ

The results of the analysis are presented in Tables 4-9. Crossplots are presented for each of these models in Figure 3. These consist of scatter plots between experimental and model predicted values, with a solid line indicating the locus of a point equidistant from both axes, that is, for any point on the line both values are equal. For a square plot, this is the 45 line. For an ideal model, all points should lie on this line. In the case of real models, there are deviations from this line, which show how closely a model fits the data. Among the models

(28) Standing, M. B. Drill. Prod. Prac., API 1947, 275–287. (29) Elsharkawy, A. M.; Alikhan, A. A. J. Petrol. Sci. Eng. 1997, 17, 291–302.

5487

Energy Fuels 2009, 23, 5483–5490

: DOI:10.1021/ef900518f

Dutta and Gupta

Figure 3. Crossplots for various SVR models.

very close to 100%. The empirical correlations have comparatively lower values, with Vazquez and Beggs’s30 correlation showing a considerable lower value of 96.49%.

In terms of Emax and Emin, SVR exhibits minimum values of 1.836% and 1.0  10-4 % respectively. The crossplot for SVR show a nearly perfect fit (Figure 3d). Thus, the SVR model shows better performance than the empirical correlations for Indian crude.

(30) Vazquez, M.; Beggs, H. D. J. Petrol. Technol. 1980, 32, 968–970.

5488

Energy Fuels 2009, 23, 5483–5490

: DOI:10.1021/ef900518f

Dutta and Gupta 35

Elsharkawy and Alikhan’s and Petrosky and Farshad’s36 correlation perform marginally worse at 99.57 and 99.50%, respectively. In terms of Emax and Emin, Beggs and Robinson37 show the best values of 104.031% and 4.4  10-5%, respectively. σare is similar for Beggs and Robinson,37 Petrosky and Farshad,36 and the SVR model at around 17%. Undersaturated Oil Viscosity. Beal,40 Labedi,39 and Elsharkawy and Alikhan’s35 correlations all perform comparably on all counts, although Beal’s and Labedi’s are marginally better than that of Elsharkawy. All these correlations give Eaar values of about 3%. The SVR model gives a marginally higher Eaare of 3.752% and fails to better the performance of the empirical correlations. All the correlations show high correlation coefficient, greater than 99%. Emin is less than 0.1% for all the correlations, but Emax is around 25%, except for that of Petrosky and Farshad,36 which shows a higher Emax of 43.059%.

Table 10. Comparison of Bubble Point Pressure Correlations correlation SVR Standing28 Vazquez and Beggs30 Glasoe31 Al-Marhoun9 Al-Shammasi32 Elsharkawy and Alikhan29

σare

Eaar (%) Emax(%) Emin(%) r2 (%) 2.447 19.386 23.096 21.864 20.893 19.430 26.741

48.844 73.028 84.303 75.109 211.789 103.100 66.094

0.019 0.027 0.011 0.094 0.020 0.014 0.195

99.70 93.32 93.42 94.14 92.62 93.62 88.91

4.651 14.231 18.139 16.101 22.293 14.976 15.077

Table 11. Comparison of GOR Correlations correlation SVR Standing28 Vazquez and Beggs30 Glasoe31 Al-Marhoun9 Al-Shammasi32 Elsharkawy and Alikhan29

σare

Eaar (%) Emax(%) Emin(%) r2 (%) 6.715 20.146 20.916 18.777 27.561 20.536 42.190

71.681 72.276 61.366 51.297 138.245 65.970 178.731

0.001 0.022 0.013 0.075 0.029 0.017 0.186

98.37 93.18 93.63 94.51 91.85 93.93 92.33

8.502 13.016 13.441 11.793 22.301 12.886 33.357

Conclusions

Table 12. Comparison of Saturated OFVF Correlations correlation SVR Standing28 Vazquez and Beggs30 Glasoe31 Al-Marhoun9 Al-Marhoun10 Al-Shammasi32 Elsharkawy and Alikhan29

Bubble point pressure, solution gas-oil ratio, oil formation volume factor (for both saturated and undersaturated crude), and viscosity (for both saturated and undersaturated crude) for Indian crude were computed using several widely used correlations available in the literature. However, the accuracy of the predicted values in terms of average absolute relative error, correlation coefficient, maximum and minimum absolute relative error, and standard deviation of absolute relative error was found to be unacceptable when compared with the experimental data. In order to obtain more accurate predictions, new correlations for bubble point pressure, solution gas-oil ratio, oil formation volume factor (for both saturated and undersaturated crude), and viscosity (for both saturated and undersaturated crude) of Indian crude were developed using SVR. For bubble point pressure, the SVR model outperforms the existing correlations with Eaar of 2.447%, r2 of 99.70%, Emax of 48.844%, Emin of 0.019%, and σare of 4.651, respectively. Similarly, for solution GOR, the developed correlation outperforms the existing ones, with the SVR model showing higher accuracy with Eaare, r2, Emax, Emin, and σare values of 6.715%, 98.37%, 71.681%, 0.001%, and 8.502, respectively. In the case of OFVF of saturated oil also, the SVR model shows better performance than the empirical correlations with Eaar, r2, Emax, Emin, and σare values of 0.625%, 99.60%, 8.889%, 0.001%, and 1.333, respectively. For OFVF of undersaturated oil, the SVR model allows more accurate prediction than the existing correlations, with Eaar, r2, Emax, Emin, and σare values of 0.063%, 99.99%, 1.836%, 0.0001, and 0.172, respectively. For saturated oil viscosity, the SVR model performs better than the existing correlations,

Eaar (%) Emax(%) Emin(%) r2 (%) σare 0.625 2.895 5.068 3.643 5.051 2.786 3.314 3.308

8.889 28.542 17.724 22.409 19.120 23.461 15.768 24.110

0.001 0.008 0.016 0.004 0.002 0.005 0.003 0.007

99.60 96.79 95.54 96.89 96.44 97.19 96.83 97.00

1.333 2.786 4.031 2.931 4.271 2.655 2.734 2.930

Table 13. Comparison of Undersaturated OFVF Correlations correlation SVR Vazquez and Beggs30 Petrosky and Farshad34 Al-Marhoun10 Dindoruk and Christman33

Eaar (%) Emax(%) Emin(%) r2 (%) σare 0.063 4.887 3.740 2.486 3.376

1.836 1.0  10-4 99.99 0.172 18.596 0.002 96.49 4.088 16.259 0.041 98.76 2.730 9.119 0.007 98.95 1.720 15.676 0.001 98.32 2.552

Table 14. Comparison of Saturated Oil Viscosity Correlations correlation SVR Chew and Connally38 Beggs and Robinson37 Labedi39 Elsharkawy and Alikhan35 Petrosky and Farshad36

Eaar (%) Emax(%) Emin(%) r2 (%) σare 11.633 19.288 19.463 15.427 19.421 20.786

158.590 168.211 104.031 295.80 125.682 119.986

0.027 0.007 4.4  10-5 0.001 0.016 0.171

99.78 99.45 98.10 96.86 99.57 99.50

18.108 22.560 17.115 20.654 15.728 16.652

Table 15. Comparison of Undersaturated Oil Viscosity Correlations correlation SVR Beal40 Vazquez and Beggs30 Labedi39 Elsharkawy and Alikhan35 Petrosky and Farshad36

Eaar (%) Emax(%) Emin(%) r2 (%) σare 3.752 2.535 3.603 2.620 2.625 3.770

23.342 24.949 25.295 23.949 31.516 43.059

0.007 0.001 0.001 0.001 0.006 0.005

99.90 99.80 99.85 99.97 99.75 99.26

4.434 3.731 4.559 3.626 4.310 5.464

(33) Dindoruk, B.; Christman, P. G. PVT Properties and Viscosity Correlations for Gulf of Mexico Oils. SPE Paper 71633, SPE Annual Technical Conference and Exhibition, New Orleans, LA, September 30 to October 3, 2001. (34) Petrosky, G. E., Jr.; Farshad, F. F. Pressure-volume-temperature correlations for Gulf of Mexico. SPE Paper 26644, SPE Annual Technical Conference and Exhibition, Houston, TX, October 3-6, 1993. (35) Elsharkawy, A. M.; Alikhan, A. A. Fuel 1999, 78, 891–903. (36) Petrosky, G. E., Jr.; Farshad, F. F. Viscosity correlations for Gulf of Mexico crude oils. SPE Paper 29468, SPE Production Operations Symposium, Oklahoma City, OK, April 2-4, 1995. (37) Beggs, H. D.; Robinson, J. R. J. Petrol. Technol. 1975, 27, 1140– 1141. (38) Chew, J.; Connally, C. A., Jr. Trans. AIME 1959, 216, 23–25. (39) Labedi, R. J. Petrol. Sci. Eng. 1992, 8, 221–234. (40) Beal, C. Trans. AIME 1946, 165, 94–115.

Saturated Oil Viscosity. From Table 14, it can be seen that the SVR model has minimum Eaar of 11.633%. The empirical models show an Eaar of around 19%. Correlation coefficient for the SVR model is maximum at 99.78%, although (31) Glasoe, O. J. Petrol. Technol. 1980, 32, 785–795. (32) Al-Shammasi, A. A. Bubblepoint pressure and oil formation volume factor correlations. SPE Paper 53185, Middle East Oil Show and Conference, Bahrain, February 20-23, 1999.

5489

Energy Fuels 2009, 23, 5483–5490

: DOI:10.1021/ef900518f

Dutta and Gupta

ε = Sensitivity band of the ε-insensitive error (-) λ, δ, d = Kernel hyperparameters (-) μo = Undersaturated oil viscosity (cP) μod = Dead oil viscosity (cP) μob = Saturated oil viscosity (cP) F = Coefficient of sum squared weights in regularized error function (-) σare = Standard deviation of absolute relative error (-) ζ i, ζ * i = Slack variables in convex optimization (-) b = Bias vector (-) w = Weight vector (-) t = Label vector (-) xi = ith instance vector (-) φ = Feature vector (-) R = Set of real numbers (-) RN = Euclidean N-space (-)

with Eaar, r , Emax, Emin, and σare values of 11.633%, 99.78%, 158.590%, 0.027%, and 18.108, respectively. 2

Nomenclature Bob = Oil formation volume factor at bubble point pressure (rb/stb) Bo = Undersaturated oil formation volume factor (rb/stb) C = Penalty factor in support vector regression formulation (-) E~ = Regularized error function (-) ED = Sum of squared errors (-) E~ε = ε-insensitive error (-) Eaar = Average absolute relative error (-) Emax = Maximum absolute relative error (-) Emin = Minimum absolute relative error (-) k(x, x0 ) = Kernel function (-) N = Number of training instances (-) P = Reservoir pressure (psia) Pb = Bubble point pressure (psia) r2 = Correlation coefficient (-) Rs = Solution gas oil ratio (scf/stb) Rsb = Solution gas oil ratio at bubble point pressure (scf/ stb) T = Reservoir temperature (F) ti = ith label (-) y(.) = Predictor function (-) γAPI = API gravity (API) γo = Oil specific gravity (-) γg = Gas gravity (-) φ (x) = Basis function (-) Ri, Ri*, ηi, ηi* = Lagrange multipliers (-) β = Coefficient of ED in regularized error function (-)

Conversion Factors API = 141500/(131.5 þ API) = kg/m3 cP  10-3 = Pa s F = 273.15 þ (F - 32)/1.8 = K psi  6.894 757 = kPa rb/stb = m3/m3 scf/stb  1.781 071  10-1 = m3/m3 Acknowledgment. The authors thankfully acknowledge the cooperation extended by ONGC in providing the data used for this study and also for the useful suggestions given by various researchers at the Institute of Reservoir Studies (ONGC), Ahmedabad, Gujarat under the dynamic leadership of Dr. R. V. Marathe, Director.

5490