+
+
Energy & Fuels 1996, 10, 1053-1059
1053
Application of Artificial Neural Network for Estimating Tight Gas Sand Intrinsic Permeability Ali A. Garrouch* Department of Petroleum Engineering, Kuwait University, P.O. Box 5969, Safat 13060, Kuwait
Nejib Smaoui Department of Mathematics and Computer Sciences, Kuwait University, P.O. Box 5969, Safat 13060, Kuwait Received January 24, 1996. Revised Manuscript Received May 31, 1996X
A supervised feed-forward back-propagation neural network model has been developed and used to estimate tight gas sand permeability from porosity, mean pore size, and mineralogical data. A sensitivity study on different neural network architectures led to the choice of an optimal network topology consisting of an eight-neuron input layer, two five-neuron hidden layers that use nonlinear sigmoid transfer functions, and a linear single-neuron output layer. The network model has been trained on a data set from a tight gas sand well and tested on some core samples data that were not seen by the network during training. The optimal network architecture was able to estimate back the permeability from the training set within 0.89% average relative error and was able to predict the permeability of the test data set within 3.3% average relative error. This is a remarkable result, since linear and nonlinear multivariate regression models were unable to predict the intrinsic permeability within less than 40% average relative error.
Introduction Low-permeability tight gas sands are a potentially abundant source of natural gas. Over the past 20 years, new technologies and higher gas prices enhanced production of substantial amounts of gas reserves in lowpermeability sands. As development of these resources continues, new and reliable techniques for estimating tight gas sand permeability become essential for assessment of gas in place, recoverable reserves, and productivity estimates and for making set-pipe decisions in the field. Although accurate permeability estimates are essential for all rock types, this accuracy becomes critical in the low-permeability range associated with tight gas sands. Indeed, formation evaluation in tight gas sand formations is complicated by the complex rock texture, clay effects, and shale instability leading to poor hole quality. This inevitably makes wireline permeability estimates versus well depth sometimes inaccurate and hard to make. For such formations in which the permeability can be as small as a few microdarcies and may vary over 3 and 4 orders of magnitude, flow rates become difficult to measure and pressure buildup tests may last weeks. The use of core analysis combined with wireline mineralogical data is, therefore, an adequate alternative for accurately estimating the intrinsic permeability as a function of depth. Traditionally, rock permeability has been related to several petrophysical parameters such as porosity, surface area, and tortuosity or indices that are closely related to these parameters.7 With the recent development of techniques for estimating rock mineralogy5 from wireline measurements and the advancement of core X
Abstract published in Advance ACS Abstracts, July 15, 1996.
analysis techniques,9-14 several researchers have tried to relate intrinsic permeability to clay mineralogy and rock texture. Herron6 indicated that a significant amount of permeability information was contained in the weight fractions of sedimentary minerals because fundamental pore system properties that control permeability depend strongly on mineralogy. For rocks bearing a large amount of clay minerals, grain size, grain morphology, and surface area are intimately tied to the total mineral assemblage. Herron6 estimated permeability of clastic sediments using porosity and mineralogical data obtained from geochemical well logs. He used the maximum feldspar abundance as an indicator (1) Collins, R. E. Flow of Fluids through Porous Materials; Research and Engineering Consultants: Inc.: CO, 1990. (2) Coskun, S. B.; Wardlaw, N. C.; and Haverslew, B. J. Pet. Sci. Eng. 1993, 8, 279-292. (3) Dorfman, M. N.; Newey, J. J.; Coates, G. R. Geological applications of wireline logs. 1990, 48, 113-120. (4) Everett, R. V.; Herron, M. M.; Pirie, G.; Schweitzer, J.; Edmundson, H. Presented at the SPE 60th Annual Technical Conference, Las Vegas, Sept 22-25, 1985; SPE 14176. (5) Grau, J. A.; Herron, M. M. Presented at the American Association of Petroleum Geologists Annual Meeting, June 7-10, Los Angeles, 1987. (6) Herron, M. M. Presented at the SPWLA Twenty-Eighth Annual Logging Symposium, June 29-July 2, 1987; Paper HH. (7) Kukal, G. C.; Simons, K. E. SPE Form. Eval. 1986, 1, 609-622. (8) Lake, L. W. Enhanced Oil Recovery; Prentice Hall: Englewood Cliffs, New Jersey, 1989. (9) Luffel, D. L.; and Howard, W. E. SPE Form. Eval. 1988, 3, 705710. (10) Odom, I. E.; Doe, T. W.; Dott, R. H., Jr. J. Sediment. Petrol. 1976, 46, 862-870. (11) Schechter, R. S. Oil Well Stimulation; Prentice Hall: Englewood Cliffs, New Jersey, 1992. (12) Scheidegger, A. E. The Physics of Flow through Porous Media; Macmillan: New York, 1960. (13) Soeder, D. J.; Randolph, P. L. SPE Form. Eval. 1987, 2, 129136. (14) Soeder, D. J. SPE Form. Eval. 1986, 1, 16-22.
+
+
1054 Energy & Fuels, Vol. 10, No. 5, 1996
Garrouch and Smaoui
is estimated from the resistivity formation factor and, therefore, relates to rock cementation degree.1 A similar derivation8,16 invoking the concept of hydraulic radius illustrates even further the explicit dependence of permeability on average pore diameter Dp, given by the Carmen-Kozeny equation
k ) (1/72τ)[φ3Dp2/(1 - φ)2]
Figure 1. Comparison between measured and derived permeability for three wells using the Herron6 model. After Herron6 (1987).
of textural and mineralogical maturity and noticed that abundance of cementing agents such as calcite decreased the permeability, but they were less harmful than the clay minerals. The Herron6 model is given as
∑BiMi)
k ) Af[φ3/(1 - φ)2] exp(
(1)
where Mi stands for the abundance of the ith mineral and Bi is a constant for the ith mineral.6 These constants are positive for quartz and feldspar and negative for clay minerals as well as for cements such as calcite and dolomite. Af is an empirical constant that is a function of the maximum feldspar content in a given zone.6 Herron used this model to predict permeability for three tight gas sand wells in different parts of the world (Venezuela, California, and Oklahoma). As can be seen in Figure 1, the agreement between the laboratory-measured and derived permeability based on this model is unsatisfactory for a considerable number of data points presented. The model inaccuracy is possibly due to the existence of a different nonlinear relationship between permeability and mineralogy. Failure to accommodate for effects of a significant independent variable can also prohibit the model from capturing the overall data variance. The dependencies of permeability on tortuosity and pore size, for instance, were not accounted for in this model. A simple rate of momentum balance on a capillary tube model reveals the dependence of rock permeability on tortuosity, a function of rock cementing, porosity, and pore size distribution. This is illustrated by the expression for permeability
φ k) 2 8τ
∫0∞fpr4 dr ∫0∞fpr2 dr
(2)
where k is the absolute permeability (m2), φ is the porosity, fp is a normalized pore size distribution with respect to radius (m-1), and r is the pore radius (m). The dimensionless variable τ stands for tortuosity and
(3)
Other forms of this relationship exist that relate k to the surface area per unit volume of solid.12 Even though these are simplified models, they can be used as a tool for detecting potentially influencing parameters. The aim of this paper is to use a generalized artificial neural network model capable of capturing intricate input-output relationship as an alternative for ad-hoc and multivariate regression models for predicting permeability in tight gas sand formations. The following sections outline procedures for preparing data used for training and present the design and testing of the network topology. Discussion and Results 1. Data Preparation and Processing. The data used in this study consists of permeability, porosity, mean pore radius, and weight fractions of quartz, chert, plagioclase, K-feldspar, calcite, dolomite, chlorite, kaolonite, illite, and smectite for 175 core samples. The rock samples were obtained from the Chimney Butte tight gas sand field from a depth of 7360-7785 ft for SFE no. 4 well. The permeability of these core samples was measured using a computer-controlled un-steadystate gas permeameter that can measure permeability down to 0.1 µD. Permeability for 34 core samples was smaller than this limiting range, and, therefore, data for these rock samples were eliminated to leave us with data for a total of 141 core samples. These data were divided randomly into a training data set made of 118 samples and a testing data set made of 23 rock samples. Porosity was measured with the helium porosimeter, and the mineralogical weight fractions were obtained using Fourier transform infrared spectroscopy. The mean pore radius was obtained using a Micromeritics poresizer (Model 9320). This is a 30 000 psia automated system that can measure pore diameters ranging from approximately 360 to 0.006 µm by mercury intrusion. Experimental error analysis on these measured variables is presented in the Appendix. Permeability did not correlate well with the independent variables. Figures 2 and 3 illustrate the apparent complex relationship between permeability, porosity, and weight fraction of illite and smectite. Similar relationships were observed with the other variables except for the mean pore radius, which gave a 0.82 coefficient of determination when correlated logarithmically with permeability. Linear and logarithmic multivariate regression analysis on permeability as a function of all these variables gave coefficients of determination equal to 0.85 and 0.89, respectively. At (15) Veelenturf, L. P. J. Analysis and Applications of Artificial Neural Networks; Prentice Hall: London, 1995. (16) von Englehardt. The Origin of Sediments and Sedimentary Rocks; Wiley: and Sons: New York, 1977.
+
Tight Gas Sand Intrinsic Permeability
+
Energy & Fuels, Vol. 10, No. 5, 1996 1055
Figure 2. Permeability versus porosity for tight gas sand rock samples used in the training data set.
Figure 4. Optimal back-propagation neural network paradigm.
Figure 3. Permeability versus illite and smectite weight fraction for tight gas sand rock samples used in the training data set.
this point one is left either with an ad-hoc procedure for finding a nonlinear model that better fits the data or with a systematic procedure based on the neural network architecture proposed here. We went along with the latter choice because of the inherent ability of neural network to capture hidden complex nonlinear patterns.15 This systematic procedure is also free of any guessing routines associated with nonlinear modeling. Because of the lack of obvious pattern relationships between these variables and the permeability and to avoid cumbersome computation time, we reduced the number of input variables by combining some weight fractions into single-input variables. Pore-lining lowswelling clay, such as kaolonites and chlorites, are not as damaging as pore-bridging high-swelling clays such as illite and smectite.11 For this reason, the weight fractions of chlorite and kaolonite were added to a single-input variable. Likewise, weight fractions of illite and smectite were added to give another single-input variable. Cementing agents such as calcite and dolomite were also added into another variable. Detrital minerals such as plagioclase (NaAlSi3O8-CaAl2Si2O8) and K-feldspar10 (KAlSi3O8) have similar compositions and were also lumped together into another input
variable. Chert-like quartz made with mainly SiO2, however, known for its large microporosity,2,3,13 was left as a single variable. The remaining inputs consist of porosity, mean pore size, and quartz weight fraction. The neural network is, therefore, designed for the purpose of estimating tight gas sand permeability from eight inputs. These are porosity and mean pore size radius and weight fractions of quartz, chert, plagioclase and feldspar, calcite and dolomite, chlorite and kaolonite, and illite and smectite. The network design and testing are discussed next. 2. Network Paradigm Design. A neural network is a logical structure in which multiple nodes, or neurons, operating in parallel, communicate with each other through connecting synapses. The network ability to model complex nonlinear relationships without any a priori assumptions about the nature of the relationships is its greatest advantage. Figure 4 presents our optimal network topology, which consists of an eightneuron input layer, two five-neuron hidden layers, both with nonlinear sigmoid functions, and a single-neuron output layer with linear transfer function. An arbitrary neuron from the jth layer connects with all neurons from the previous layer. Figure 5 shows the structure of a single neuron from the first hidden layer of the paradigm presented in Figure 4. This jth neuron occupies a general position in the network since it accepts inputs from nodes in the input layer and sends its output to neurons to the second hidden layer. Each synaptic connection is weighted. That is, when the ith neuron sends its output to the jth neuron, that output is multiplied by a weighting coefficient of the i,j synapse, wij. The ith neuron computes its activation (uj), which consists of the product of all inputs from nodes of the previous layer multiplied by their corresponding synaptic weights minus the bias (θj) of the jth neuron:
+
+
1056 Energy & Fuels, Vol. 10, No. 5, 1996
Garrouch and Smaoui
weight adjustment on a hidden node h, with input signal xh, leading to output node k is given by
∆whk ) ηdkxh
(8)
Here, η is the learning rate and dk is the error function at the output node k derived as
dk ) yk(1 - yk)(zk - yk)
1st hidden layer
dk ) 1/2(1 - yk2)(zk - yk)
(9a)
2nd hidden layer (9b)
The full error factor generated in response to a particular input pattern (p), for nodes of the second hidden layer, is derived as
∑k dkpwhk
dhp ) 1/2(1 - H2)
Figure 5. Structure of an arbitrary neuron from the first hidden layer.
uj )
∑j xiwij - θj
(4)
The neuron then computes a signal transfer function to the activation S(uj) which it outputs to all nodes of the next layer. The transfer functions in the first and second hidden layers are, respectively, given by
S1(u) ) 1/(1 + e-u)
(5)
S2(u) ) [2/(1 + e-2u)] - 1
(6)
These functions belong to the class of S-shaped functions and have advantageous characteristics such as being continuous, differentiable at all points, and monotonically increasing. They also accept inputs varying from -∞ to ∞ and produce outputs over a finite range from 0 to 1 and from -1 to 1, respectively. The network paradigm discussed herein is supervised; that is, the network is trained in what response it makes to each input it receives. At the same time that this target response is being fed in, the network is outputting its own response, which is a function of the initial synaptic weights and its past training. The network compares its actual response with the target response and adjusts its weights in such a way to minimize the sum of the square of the errors (SSE), using the gradient descent method, also known as the generalized delta rule.15 The global error, SSE, is given by
SSE ) 1/2
∑p ∑k (zk - yk)2
(7)
In this notation zk is the target vector of the kth output node and yk is the actual output vector of the kth output node. The p subscript refers to the specific input vector pattern used. The weights leading into an output node k are adjusted in proportion to the difference between the actual node output and its target output. The
(10)
H represents the output from the hidden node h. The weight adjustment on a hidden node h, with input signal xh, leading to output node k is, therefore, calculated using eq 8 with dk substituted by dhp from eq 10. With correction terms thus established for the nodes of the second hidden layer, the following equation is then applied to nodes of the first hidden layer (layer M):
dmp ) M(1 - M)
∑h dhpwmh
(11)
The network uses a random generator to initialize the weight matrices and bias vectors for all synapses. The target vector is computed followed by SSE. If the SSE is greater than a fixed value, then the new weight adjustments are computed for all hidden layers one by one until the synaptic weights of the input layer are reached. These weights are adjusted to give a global minimum value of SSE. The error surface is generally irregular, containing peaks, valleys, and craters. The goal of this back-propagation paradigm is to locate a global minimum in the error surface. Momentum technique has been implemented for the network to ignore local minima in the error surface. The weight adjustment of eq 8 is replaced by the following expression
∆whk-new ) (1 - δ)ηdkxh + δ∆whk-old
(12)
Typically, the momentum constant (δ) is set to 0.95. In order to shorten the training time, an adaptive learning rate procedure is implemented. If the new SSE exceeds the old one by more than a predefined ratio (typically 1.04), the new weights are discarded. The learning rate is decreased (typically ηnew ) 0.7ηold), and the weight matrix is recalculated using the updated learning rate value. If the new SSE value is less than the old value, the learning rate is increased by a factor of 1.05 in order to speed up training while maintaining training stability. Since our input layer consists of eight neurons, our first and second hidden layers are made of five neurons each, and the output layer is made of one neuron, corresponding weighting matrices become WI-H1(5 × 8) for synapses connecting the input nodes with nodes of the first hidden layer and WH1-H2(5 × 5) for synapses connecting the first hidden layer with the second one.
+
+
Tight Gas Sand Intrinsic Permeability
Energy & Fuels, Vol. 10, No. 5, 1996 1057
Figure 6. Sum of the squared errors (SSE) versus the number of epochs for the optimal paradigm.
[
The matrix WI-H1 is, therefore, given by
ω1,1 ω2,1 WI-H1) ω3,1 ω4,1 ω5,1
ω1,2 ω2,2 ω3,2 ω4,2 ω5,2
ω1,3 ω2,3 ω3,3 ω4,3 ω5,3
... ... ... ... ...
ω1,8 ω2,8 ω3,8 ω4,8 ω5,8
]
(13)
Figure 6 shows an example plot of SSE versus the number of iterations, or epochs, for the optimal network. At the end of the learning phase, when the SSE has reached its goal, the weights are frozen. For a set of data not included in the training data set, the feedforward model presents a fixed structure to the new inputs that it processes. This implies that at this stage the network operations are those of a deterministic mechanism defined by a weight matrix and a bias vector between all consecutive layers. After exhausting a single hidden layer architecture option, a sensitivity study was conducted to fine-tune both the SSE and the number of neurons for a double hidden layer architecture option. This is done as follows: we construct a network by fixing an equal number of neurons in the two hidden layers, then vary the SSE for that particular architecture and compute the corresponding average relative error (Π h ), defined as
Π h )
100 n
n
∑ i)1
|yi - zi| yi
Figure 7. Average relative error versus number of neurons used in the hidden layers for both training and test data sets (SSE ) 0.002).
(14)
As shown in Figures 7 and 8, an architecture of two hidden layers having five neurons each gave the lowest average relative error (Π h ) for the testing data set (3.3% with a standard deviation on the average errors of 5.3%). This same paradigm gave a Π h value of 0.89% over the training data set with a standard deviation of 0.66%. Figure 7 shows that the lowest Π h value computed over the training data set, by itself, is not necessarily an indicator of the best paradigm. As the number of neurons increased, the Π h value of the training data decreased slightly, whereas the Π h value corresponding to the test data set reached a minimum
Figure 8. Average relative error versus sum of the squared errors (SSE) associated with the optimal paradigm for both training and test data sets.
and then increased. A small training data average relative error (10%) test data average relative error indicates an architecture that did not capture the physical model but rather overfitted the data. “Overfitting” describes a situation in which all training data points are well fit but the fitting model takes wild oscillations between these points. 3. Results Comparison with Other Models. The optimal network paradigm, discussed earlier, has been first trained using input data for 118 tight gas sand core samples. Figure 9 shows the percent relative error between the artificial neural network (ANN) estimate and the actual permeability values for all core samples used in the training. The neural network model is then presented with a test data set consisting of data for 23 core samples which were not included in the training data set. Figure 10 shows the excellent agreement (Π h ) 3.3%) between the actual permeability values and those estimated using the ANN model. The same training data set has been used to develop a high-order nonlinear regression model using multivariate regression analysis, with a coefficient of determination equal
+
+
1058 Energy & Fuels, Vol. 10, No. 5, 1996
Figure 9. Percent relative error between actual permeability values and ANN permeability estimates.
Figure 10. Comparison of ANN estimates with actual permeability values for the test data set.
to 0.89. The model is of the form
log(k) ) log b + m1 log(x1) + . . . + mn log(xn) (15) where b, m1, . . ., mn represent constants generated from regression analysis, and x1, . . ., xn represent input variables of this study. Even though the nonlinear regression model was able to estimate most of the training permeability data within an acceptable relative error less than 10% (Figure 11), four of the permeability values were estimated with a relative error ranging between 10 and 200%. The regression analysis model estimate of permeability values used in the testing data set, however, was for some points as good as guessing (Figure 12), with an average relative error of approximately 41%. We have also estimated the permeability for both training and test data sets using the Herron model (eq 1). Our data did not make the exception over previous data tested with Herron model.6 The average relative errors were approximately 163% and 111% for training and test data sets, respectively. Table 1 presents a summary of average relative error (Π h ) and standard deviation (σ) for a variety of models used in this study. This comparison between the ANN performance and
Garrouch and Smaoui
Figure 11. Percent relative error between actual permeability values and nonlinear regression model estimates for the training data set.
Figure 12. Percent relative error between nonlinear regression model estimates and actual permeability values for the test data set. Table 1. Comparison of Average Relative Errors and Standard Deviations for a Variety of Models Used in This Study model neural network nonlinear regression linear regression Herron6 model
training data set
test data set
Π h ) 0.89% σ ) 0.66% Π h ) 3.9% σ ) 17.6% Π h ) 27.7% σ ) 39.7% Π h ) 162.9% σ ) 193.5%
Π h ) 3.3% σ ) 5.3% Π h ) 41.7% σ ) 38.5% Π h ) 124.7% σ ) 313.3% Π h ) 111.1% σ ) 202.7%
linear and nonlinear model performances illustrates vividly the striking superiority of the ANN model in estimating every single permeability value within a very accurate relative error margin. These results also show the ability of the network model to capture the complex nonlinear relationship that exists between the permeability, on one hand, and mineralogical data, porosity, and mean pore size, on the other hand. As such, these results are restricted to SFE no. 4 well of the Chimney Butte field. The database for the training data set has to be expanded to validate these results on other wells from the same field and if possible from other tight gas sand fields around the world.
+
+
Tight Gas Sand Intrinsic Permeability
Conclusions An optimal neural network has been designed and tested for estimating tight gas sand permeability from mineralogical data combined with porosity and mean pore size. The back-propagation model uses momentum and adaptive learning rate techniques for searching a global error minimum. The momentum technique minimizes the chance of the network to get stuck into local minimum. The adaptive learning rate technique increases the speed and reliability of the network by optimizing the learning rate step size while maintaining learning stability. Implementation of the latter technique eliminates trial and error problems associated with back-propagation models. A sensitivity study has been performed to help avoid overfitting paradigms. This neural network model presents a superior, systematic, and reliable alternative to multivariate regression analysis and ad-hoc nonlinear models, plagued by guessing, for estimating intrinsic permeability of tight gas sand formations from core analysis data. These results present a potentially significant universal application for estimating tight gas sand permeability. This can be accomplished by (i) building a database from existing mineralogical data and core analysis data from tight gas sand fields around the world and then (ii) using this data to train an optimal neural network architecture. The trained network can then be fed with mineralogical data derived from geochemical logs in conjunction with porosity as well as mean pore size data to yield a fast and reliable estimate of rock permeability. A variety of oil companies as well as research institutions3,4,6 dispose of this kind of data. Acknowledgment. We express deep gratitude and thanks to Kuwait University for the financial support of this project (EP007). Appendix Experimental error analysis on these measured variables is obtained using the chain rule. The analysis
Energy & Fuels, Vol. 10, No. 5, 1996 1059
assumes that the absolute error in each single variable measured is equal to half of the finest reading of the instrument used to measure that particular variable. For instance, the absolute error in porosity (φ ) Vp/Vb), measured using the helium poresimeter, is given by
∆φ ≈ (|∂φ/∂Vp|)∆Vp + (|∂φ/∂Vb|)∆Vb
(A1)
where Vp and Vb stand for the core sample pore volume and bulk volume, respectively. Here the bulk volume depends on the core sample radius (r) and length (L) as follows:
Vb ) πr2L
(A2)
The pore volume is obtained using a mass balance and by applying Boyle’s law on the helium poresimeter core sample system. The pore volume expression is given by
Vp ) (PrefVref/P) - Vref
(A3)
where Vref and Pref are the volume of the reference chamber and the initial pressure of the reference chamber, respectively. The variable P stands for the equilibrium pressure measured after expanding helium originally in the reference chamber into the sample chamber. In order to calculate the absolute error on either the bulk volume Vb, or the pore volume Vp, one has to apply again the chain rule on Vb and Vp expressions defined by eqs A2 and eq A3, respectively. The absolute error in the bulk volume is given by
∆Vb ≈ (|∂Vb/∂r|)∆r + (|∂Vb/∂L|)∆L
(A4)
The absolute error in measuring the core sample length (∆L), for instance, is equal to half the finest reading of the instrument used for measuring the core sample length, which is the caliper in this case. The relative errors calculated this way were approximately 3% for the porosity, 7% for the mineral weight fractions, and 4% for the mean pore radius. EF960017W