Multivariate Analysis of Crude Oil Composition and ... - ACS Publications

Aug 2, 2012 - Christian Michelsen Research AS, Bergen, Norway. § ... amounts of data that reflect both the composition and properties of crude oils a...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/EF

Multivariate Analysis of Crude Oil Composition and Fluid Properties Used in Multiphase Flow Metering (MFM) Andreas L. Tomren,*,†,§ Tanja Barth,†,§ and Kjetil Folgerø‡,§ †

Department of Chemistry, University of Bergen, Bergen, Norway Christian Michelsen Research AS, Bergen, Norway § Michelsen Centre, Bergen, Norway ‡

ABSTRACT: Crude oil characterization by infrared (IR) spectroscopy and whole oil gas chromatography (GC) has been used to provide data for establishing multivariate prediction models for physical properties of crude oils. The parameters of interest are used in multiphase flowmeters (MFMs) for monitoring production and transport of petroleum fluids, and permittivity parameters are of special interest. Data for 20 crude oils and condensates has been acquired and modeled using partial least squares (PLS) modeling. Good quality predictions were obtained for density, velocity of sound, and static and high frequency permittivity. Biodegradation of crude oil is the main cause of variation in the modeled variables. The data required are obtained in standard analytical procedures, and thus the approach has a considerable potential for use in on-site quality assurance. Quality control based on monitoring the fluid composition using standard crude oil analytical data provided by generally available analytical instrumentation in combination with multivariate estimation of the required parameters is thus an attractive alternative. A MFM measures the permittivity of oil−water−gas mixtures, and it is important to know the permittivity of each phase (oil, water, and gas) in order to calculate the ratio of the different phases in the mixture. The topic of this work is to determine the permittivity, in addition to other important variables for MFM metering such as density and velocity of sound, of the crude oil phase, and to investigate which compounds in the oil are important for the variation in the different variables. Permittivity of liquids is normally measured using dielectric spectroscopy, and the spectra are described by the parameters of a Cole−Cole model13 fitted to the dielectric spectrum. When predicting the permittivity of a crude oil, either the parameters of the Cole−Cole equation or the whole spectrum can be modeled.14 This work presents models for prediction of the Cole−Cole parameters for the permittivity spectra based on compositional analysis of crude oils. Some additional parameters that are relevant for the permittivity and flow monitoring in general are also modeled. Both spectral profiles and compound based compositional data are tested for their potential to provide reliable estimates of the parameter values. Standard IR spectra comprise the spectral data, while individual hydrocarbon distributions are determined using whole oil gas chromatography. For both analytical techniques, the analytical data is so extensive that direct bivariate correlation is not feasible, and data treatment procedures that can handle large data sets are needed. PCA (principal component analysis) of the whole data set is therefore used initially to explore the systematic variation in the analytical data,15 while partial least squares

1. INTRODUCTION Modern, sophisticated analytical techniques generate large amounts of data that reflect both the composition and properties of crude oils and crude oil fractions. This opens up possibilities for estimating a wide range of important properties and quality parameters for such complex mixtures based on a few, or even a single, set of analytical data. Conventionally, the values of the physical properties of interest are mostly used in the form of simpler numerical scales or expressions, that is, the density, permittivity, or octane number. Such parameters have traditionally been individually determined, but applications based on estimations or calibrations using complementary data have been increasing. Typically, the potential for estimating a number of conventional classification parameters based on near infrared (NIR) spectral data using multivariate statistical methods has recently been reported,1,2 and this approach is already extensively used in modern refinery operations (e.g., ref 3). Similar strategies have also been used previously to predict more limited sets of parameters.4−9 Determining the permittivity of mixed oil−water−gas (OWG) fluid flow is a typical example of a measurement where calibration models could be useful. In crude oil production monitoring, multiphase flowmeters (MFMs) are used for online monitoring of the OWG flow in pipelines, where the output of the measurement is the volume or mass of each phase passing the flowmeter in a given time. Data from multiphase meters help in optimizing petroleum production, increasing the oil recovery and lowering the investments and operational costs.10,11 Permittivity measurements provide input for the determination of flow rates and relative distributions of the fluids in several well established measurement technologies.12 Such systems are, however, calibrated to the initial oil composition and may loose precision over time due to changes in the fluid compositions that result in changes in the actual permittivities relative to the incorporated calibration values. Monitoring the permittivity regularly is one approach to quality assurance of the meter readings, but the necessary instrumentation is often not easily available. © 2012 American Chemical Society

Received: April 12, 2012 Revised: July 27, 2012 Published: August 2, 2012 5679

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Energy & Fuels

Article

(PLS) is used to establish the multivariate calibration models.16 The quality of the models is determined by evaluating the accuracy of the prediction of samples not included in the initially modeled data. Simple data pretreatment is applied to make the data set internally comparable.

assessment are based on the Norwegian Standard Oil (NSO-1),18 in which the compounds have been identified in a GC-MS, and by manual inspection, the compounds in the chromatograms have been identified and quantified for all the oils, assuming a constant response factor. The quantified values for two gas chromatograms for each oil are averaged, giving the quantified values used in the modeling. 2.4. Velocity of Sound. The velocity of sound of the oils was measured by the technique described by Bjørndal et al. (2008),19 where the measurement cell was modified to include a pressure seal and a pressure transmitter. As eq 1 shows, the velocity is dependent on the density of the given fluid.

2. MATERIAL AND METHODS 2.1. Crude Oil Samples. Twenty crude oils, originating mainly from the North Sea, have been analyzed for physical and electrical properties and chemical composition. The types of measurements performed and the number of variables produced are specified in Table 1. The data have also previously Table 1. Types of Analytical Data and Number of Variables from Each Procedure analysis

no. of variables

units

density velocity of sound whole oil GC FTIR dielectric spectroscopy

1 1 82 1738 5

g/mL at 20 °C m/s at 30 °C normalized peak area absorbance as specified for eq 2, 20 °C

c fluid =

K ρ

(1)

Equation 1: Velocity of sound in a given fluid. cfluid = speed of sound in given fluid, K = bulk modulus of given fluid, and ρ = density of the given fluid.20 2.5. Dielectric Spectroscopy. The dielectric spectra were measured on a measurement system for complex permittivity measurements at 20 °C, based on a system developed by Christian Michelsen Research AS in 1996.21 2.6. Density. The density of the oils has been obtained by using an Anton Paar K.G. DMA 60 densitometer with DMA 602 measuring cell. Air and distilled water is first measured for calibration, then the oil. Five measurements for each oil have been averaged, and the resulting value is used in the modeling. 2.7. Data Set for the Multivariate Analysis. The values for density and velocity of sound are used directly as variables in the data set. A total of 82 compounds have been quantified from the GC analysis, all of them compounds containing only carbon and hydrogen. 1738 variables have been collected from the FTIR, corresponding to the absorbance at each wavenumber. Five variables have been derived from the dielectric spectroscopy, corresponding to the 5 variables in the Cole− Cole equation obtained from curve fitting of the spectra (eq 2). The data available for the analysis are summarized in Table 1.

been used in a preliminary presentation of some models for an extended range of parameters.9 The data set consists of 4 condensates, which are light crude oils with a clear yellow to brown color, while the rest of the 16 crude oils are black and opaque. The biodegradation level of all the crude oils in the data set have been determined on the qualitative scale of Peters and Moldowan17 by visual inspection of the hydrocarbon distribution as observed in the GC traces. All oils, prior to all measurements, have been placed in an oven at 60 °C for 4 h, to dissolve waxes that may have precipitated during storage. They have also been shaken and turned upside down multiple times, to homogenize the oils thoroughly. 2.2. Infrared (IR) Spectrospcopy. A Nicolet Protegè 460 FTIR (Fourier transform infrared) spectrometer with an ATR (attenuated total reflection) measuring cell, equipped with a diamond crystal, has been used for obtaining FTIR spectra of the oils. One drop of oil is placed on the crystal, and measurements (32 scans, giving an averaged spectrum) are taken. For quality assurance, 5 drops has been measured for each oil to eliminate differences due to lack of homogenization, and the average of the resulting five spectra has been used in the modeling. The data is given as absorbance: A = log(1/R), where R is the percentage reflectance divided by 100. Percent reflectance shows the amount of infrared energy reflected from the sample: %R = (IS/IB)100, where IS is the intensity of infrared energy reflected from the sample and IB is the intensity of the infrared energy passing through the reflection accessory without a sample in place. 2.3. Whole Oil Gas Chromatography (WOGC). The oils have been analyzed on a ThermoFinnigan Trace GC instrument equipped with a flame ionization detector (FID). The stationary phase is a HP-PONA dimethylpolysiloxane column (50 m × 0.20 mm × 0.5 μm) from Agilent technologies. The mobile phase is helium. The temperature program is as follows: 30 °C for 15 min, 1.5 °C/min 60 C, 4 °C/min 320 °C, 320 C for 35 min. The injector temperature is 300 °C while the FID is kept at 350 °C. Warm, homogenized crude oil (1 μL) is introduced manually into the GC system through a syringe, using split injection. The assignment of chromatographic peaks and quality

ε* = ε∞ +

εs − ε∞ 1−α

1 + (j ϖ τ )

−j

σ ϖε0

(2)

Equation 2: Cole−Cole equation, for curve fitting of dielectric spectroscopy. ε* = relative permittivity (dimensionless), ε∞ = high frequency permittivity (dimensionless), εs = static permittivity (dielectric constant at low frequencies, dimensionless), ω = angular frequency (radians/second), τ = macroscopic relaxation time (seconds), σ = finite conductivity (Siemens/ meter), α = empirical factor (distribution factor, dimensionless), ε0 = permittivity in vacuum (Farads/meter).

3. MULTIVARIATE ANALYSIS 3.1. Data Pretreatment. Data collected directly from an instrument is termed raw data. Raw data can contain noise, baseline drift, scattering effects, and other factors that may influence the significant information in the data set. Therefore, it may be necessary to pretreat the raw data, in order to remove effects that do not represent chemical, physical, electrical, or biological properties in the sample. To find the variation between the objects, the raw data needs to be centered. This can be done by calculating the average value for each variable, and then, subtracting this from each of the original variables. 5680

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Energy & Fuels

Article

Table 2. Range of Measured Physical Properties

range standard dev. exptl uncertainty

static permittivity

high frequency permittivity

α

τ (s)

σ (S/m)

density (g/mL)

2.017−2.270 0.02 0.05

1.996−2.337 0.02 0.04

0.0318−0.9210 0.03 0.06

0−5.7 × 10−7 7.29%

0−1 × 10−7

0.7300−0.940 0.007 0.013

1 N

N

∑ x(i , j) i=1

(3)

Equation 3: Centring of the data set. X = full matrix, i = column in matrix, j = row in matrix, N = total number of objects. Pretreatment of GC Data. When injecting samples onto the GC column, it is not certain that the amount of sample is exactly the same in every injection. To eliminate any effects from this, the quantified amounts have been normalized to constant sum. This is done by dividing the selected variables of each object with the sum of the objects to obtain the relative distribution of the variables in each object. This procedure is normal for GC.22,23 For models based on GC, the data sets has been centered and normalized to constant sum. Pretreatment of IR Data. For the models based on FTIR, the raw data has been centered, but no further pretreatment has been done. 3.2. Modeling. Multivariate data analysis has been performed using the SIRIUS program package, version 7.0.24 PLS models have been built, based on GC and FTIR data, in order to investigate the possibility of predicting the other physical and electrical properties that have been measured. The data sets are first investigated as PCA models to determine the degree of systematic variation in the data. The principal components (PCs) provide a system of orthogonal axes that each describe a maximum of the systematic variation remaining in the data set. The PLS models are then established. PLS models generate Latent Variables (LVs) that are orthogonal and describe the maximum covariation between the independent data matrix (the analytical data) and the dependent data (the predicted properties). Each model is based on 17 of the 20 oils in the data sets. Three oils in each data set have been omitted from the models in order to use them as validation objects. It is important to balance the number of validation objects against the total number of samples, as the model might be less robust if too many oils are omitted from the model building. The predictive quality of the models can be examined by testing the apparently unknown objects against the models. For each model, one biodegraded oil, one nonbiodegraded oil, and one condensate has been omitted from the data set, as these oils in general are chemically different. These validation objects have been chosen based on their values for each variable; for each model, there is one validation object with high value, one validation object with low value, and one validation object with medium value. This is done in order to validate the predictive quality of the model, given unknown samples with high, low, and medium values of the modeled variable. When building a PLS model, you get a model in the form of eq 4.

4. RESULTS 4.1. Initial Data Evaluation. The ranges for the measured physical properties for all the oils are given in Table 2. The measurements span a reasonable range of values for most of the parameters, with the exception of τ, dielectric relaxation time, and σ, finite conductivity, where the value is zero for 3 and 10 of the samples, respectively. This lack of variation can represent a problem for establishing robust models. The uncertainty in α and τ is high because they are based on the curve fit of the Cole−Cole model, while σ is close to zero for oils in general. This means that, among the parameters extracted from the Cole−Cole model, static permittivity and high frequency permittivity are easier to determine than α, τ, and σ. The standard deviations in Table 2 are based on replicate experiments. For the permittivity variables, the standard deviation is based on two replicates, density is based on three replicates, while for velocity of sound no replicates were measured. The experimental uncertainty in Table 2 is the standard deviation multiplied by 2, as ±1.96 of the standard deviation about the mean marks the range within which, when a sample exists, there is a 95% chance that it is a part of the population. Standard deviations for σ and velocity of sound could not be obtained because of the insufficient number of measurements. The standard deviation for τ is given in %, as the standard deviation varies relatively with size. PCA performed on GC data of the sample set of crude oils show that three groupings occur, separating biodegraded oils, nonbiodegraded oils and condensates, as shown in Figure 1.The two first PCs explain 67.6% of the total variance in the data set,

Y = B0 + B1 × X1 + B2 × X2 + B3 × X3 + ...BN × XN

1141−1447

Equation 4: Regression model for PLS modeling. N is the total amount of variables in the model. B0 is the starting point, B1, B2, ..., BN are regression coefficients for variable 1, 2, ..., N. X1, X2, ..., XN is variable 1, 2, ..., N. By evaluating the regression coefficients for a given model, you can detect which variables have the most significant effect on the model, be it positive or negative effect and, hence, which variables are most important for the variation in the model. Nevertheless, the effect of the total amount of coefficients usually is more important than the effect of one single coefficient. In the modeling stage, the goal was to find the models that gave as low deviations as possible for the validation objects. The number of latent variables (LVs) giving the lowest deviations is chosen for each model. Based on this, the number of LVs in the different models is not necessarily the same, causing some of the models to be more robust than the others. The number of LVs has been chosen bearing in mind that if too many LVs are chosen, there is a possibility that noise is modeled as well as the significant signal, causing the model to give poor predictions for unknown samples. Also, if too few LVs are chosen, there is a possibility that some of the significant information remains unmodeled and gets categorized as noise, also causing the model to give poor predictions for unknown samples.

By doing this, the origin of the coordinate system is placed at the center of gravity in the data set. Xcentred(i , j) = X(i , j) −

velocity of sound (m/s)

(4) 5681

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Energy & Fuels

Article

coordinate system. Condensates are partly clustering with the nonbiodegraded oils and partly spanning out a different direction than biodegraded and nonbiodegraded oils. This indicates, as expected, that biodegraded and nonbiodegraded oils are chemically different, based on GC data. PCA performed on IR data of the same set of crude and model oils show similar groupings (Figure 2), though with a tendency to less variation within each group. The two first PC’s explain 93% of the total variance in the data set, which show that the systematic variation in the data is high. Figure 2 shows that biodegraded oils span out in one direction from the origin of the coordinate system, while the nonbiodegraded oils span out a different direction of the coordinate system. Condensates also span out a different direction of the coordinate system. This indicates that biodegraded oils, nonbiodegraded oils, and condensates are different based on IR data. The presence of such systematic variation indicates that the models for each data type describe variations due to chemical compositional factors. If this variation is connected directly or indirectly to the properties to be modeled, there is a potential for generating prediction model for the parameters in question and to use them to predict unknown physical properties of new oils based on GC and IR data. Some objects in the data set might be classified as outliers, but since all of the oils in the data set are representative samples, they cannot be excluded from the model. If they are excluded, the range in which the model is valid might be reduced. A larger data set would most likely expand the range of the model, in addition to reducing the impression that some objects are outliers. 3.2. Prediction Models by PLS. Table 3 gives an overview of the established models, presented at a qualitative scale ranging from poor, via OK for distinguishing between high and low value, via OK, via good, to very good. This is done by comparing the experimental uncertainty of the measurements (shown in Tables 4 and 5) with the prediction error of the validation objects; if the error is much lower than the experimental uncertainty, then the predictive quality of the model is rated as very good, if the error is similar to the experimental uncertainty, then the predictive quality of the model is rated as good. If the difference between the experimental uncertainty and the prediction error is larger, the prediction errors are more closely examined. If the validation objects are predicted to be of high value, while the measured values are low, the model is rated as poor. If the validation objects are predicted to be of high value, and the measured values are of high value, the model is rated as

Figure 1. PCA score plot of GC data. Brown circles = biodegraded oils. Blue squares = nonbiodegraded oils. Light blue triangles = condensates.

Figure 2. PCA score plot of IR data. Brown circles = biodegraded oils. Blue squares = nonbiodegraded oils. Light blue triangles = condensates.

which shows that there is a high degree of systematic variation in the data, as opposed to unsystematic noise. Figure 1 shows that biodegraded oils span out in one direction from the origin of the coordinate system, while the nonbiodegraded oils span out in a different direction of the

Table 3. Overview of Established Models with Qualitative Evaluation GC modeled variable

coding

quality

IR no. LVs

explained variancea

quality

no. LVs

explained variancea

static permittivity high frequency permittivity α τ

permvar e_st permvar e_inf

good good

8 8

99.29% (99.32%) 99.77% (99.36%)

good good

4 5

96.93% (98.52%) 97.38% (98.29%)

permvar a permvar t

6 7

96.56% (83.24%) 98.68% (96.94%)

96.87% (77.51%) 98.78% (91.75%)

permvar s

density velocity of sound

eensity velocity of sound

good good

6 5

95.49% (97.66%) 94.43% (96.38%)

OK OK for distinguishing between high and low level OK for distinguishing between high and low level good very good

4 6

σ

OK OK for distinguishing between high and low level poor

3 5

89.80% (96.03%) 99.57% (99.95%)

a

Total explained variance from Independent block, and total explained variance from dependent block in (). 5682

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

2.08 2.04

0.33 0 0 0.766 1266

permvar a permvar t permvar s density velocity of sound

meas.

permvar e_st permvar e_inf

coding

0.26 −2.79 × 10−7 7.70 × 10−9 0.762 1268

2.04 2.03

pred.

0.07 2.79 × 10−7 7.70 × 10−9 0.004 −2

0.04 0.01

dev.

0.51 4.20 × 10−9 9.80 × 10−9 0.824 1347

2.22 2.14

meas.

0.43 9.24 × 10−8 4.30 × 10−9 0.833 1344

2.15 2.12

pred.

dev.

0.08 −8.82 × 10−8 5.5 × 10−9 −0.009 3

0.07 0.02

medium value object

0.72 4.24 × 10−7 7.70 × 10−8 0.926 1398

2.43 2.22

meas.

0.62 2.54 × 10−7 6.75 × 10−8 0.940 1397

2.39 2.24

pred.

high value object

0.1 1.7 × 10−7 9.5 × 10−9 −0.14 1

0.04 −0.02

dev.

0.08 1.79 × 10−7 7.57 × 10−9 0.009 2

0.05 0.02

avg. dev. val. obj.a

0.013

0.06

0.05 0.04

exptl. unc.b

5683

permvar e_st permvar e_inf

static permittivity high frequency permittivity α τ σ density velocity of sound

0.27 0 0 0.766 1157

2.08 2.04

meas.

0.42 −9.00 × 10−10 1.92 × 10−8 0.788 1189

2.05 2.05

pred.

−0.15 9.00 × 10−10 −1.92 × 10−8 −0.022 −32

0.03 −0.01

dev.

0.50 4.00 × 10−9 9.70 × 10−9 0.824 1324

2.22 2.14

meas.

0.48 4.20 × 10−9 3.16 × 10−8 0.847 1329

2.22 2.14

pred.

dev.

0.02 −2 × 10−10 −2.19 × 10−8 −0.023 −5

0 0

medium value object

0.59 7.70 × 10−9 7.70 × 10−8 0.904 1423

2.43 2.22

meas.

0.68 8.60 × 10−9 6.24 × 10−8 0.902 1387

2.41 2.22

pred.

High value object

−0.09 −9 × 10−10 1.46 × 10−8 0.002 36

0.02 0

dev.

0.09 3.37 × 10−9 1.86 × 10−8 0.016 24

0.02 0.003

avg. dev. val. obj.a

0.013

0.06

0.05 0.04

exptl. unc.b

a

Average deviation of the validation objects. bStandard deviation for measured values multiplied with 2, as ±1.96 of the standard deviation about the mean marks the range within which, when a sample exists, there is a 95% chance that it is a part of the population.

permvar a permvar t permvar s density velocity of sound

coding

modeled variable

low value object

Table 5. Overview of the Quantitative Predictive Quality of the Established Models for GC

a Average deviation of the validation objects. bStandard deviation for measured values (see Table 2) multiplied with 2, as ±1.96 of the standard deviation about the mean marks the range within which, when a sample exists, there is a 95% chance that it is a part of the population.

static permittivity high frequency permittivity α τ σ density velocity of sound

modeled variable

low value object

Table 4. Overview of the Quantitative Predictive Quality of the Established Models for IR

Energy & Fuels Article

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Energy & Fuels

Article

Figure 3. Predicted value plotted against measured for (a) density based on GC, (b) velocity of sound based on GC, (c) static permittivity (e_st) based on IR data, (d) high frequency permittivity (e_inf) based on IR data, (e) density based on IR data, (f) velocity of sound based on IR data. Brown squares = biodegraded oils. Blue squares = nonbiodegraded oils. Light blue squares = condensates. Red squares = validation objects.

evaluated based on a modeling perspective. As clearly shown in Figure 3e and f, the validation objects do not stand out in comparison to the model objects, and with a maximum error of 0.22% for the model for IR, it is difficult to argue against the statement that the model is very good. However, since no replicate measurements of velocity of sound are done, this cannot be verified completely. 4.3. Chemical Significance of the Models. Figure 4a−f shows the regression coefficients for the same set of models, giving an overview of which variables are important for building the model, and therefore, the most important variables when looking at variation in the different properties. Figure 4 shows that positive effect on the model for density based on GC data originates almost exclusively from branched alkanes and the higher molecular weight straight chained alkanes, while negative effect originates almost exclusively from low to medium molecular weight straight chained alkanes. This indicates that biodegradation of crude oil has an important effect on the variance of density in crude oils, since the smallest straight chained alkanes are the first to be removed or altered during biodegradation.17 Similar trends are observed for the model for velocity of sound, as shown in Figure 4b. Positive effects on the model originate almost exclusively from branched alkanes and the higher molecular weight straight chained alkanes, while negative

either OK for distinguishing between high and low value, or OK, depending on how close to the experimental uncertainty the prediction errors are. The number of LVs for the different models are also given in Table 3. Figure 3 shows modeling results for density and velocity of sound based on GC data, and static permittivity, high frequency permittivity, density, and velocity of sound based on IR data, all presented as predicted value plotted against measured value. These figures represent the best modeling results for each variable for GC and IR. As the models for the permittivity variables α, τ, and σ at best give “OK” results, the results for these models are not shown here. In addition, the modeling for static and high frequency permittivity based on GC data give good quality models, which already have been presented in ref 9. Figure 3a and b shows that prediction of density and velocity of sound based on GC data are possible with good results within the range of oil compositions represented in the data. Figure 3c−f shows that prediction of static permittivity, high frequency permittivity, density, and velocity of sound, all based on IR data, are possible with good results within the range of oil compositions represented in the data. No standard deviation is obtained for velocity of sound, since there was only made one measurement for each oil, so the quality of these models is 5684

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Energy & Fuels

Article

Figure 4. Regression coefficients for model for (a) density based on GC data, (b) velocity of sound based on GC data, (c) static permittivity (e_st) based on IR data, (d) high frequency permittivity based on IR data, (e) density based on IR data. The biodegradation effect is more evident for the model for density than for the models for static and high frequency permittivity, as there are fewer effects that might be considered as noise. (f) Regression coefficients for the model for velocity of sound based on IR data. Coding for the variables is given in Appendix 1.

The coefficients for high frequency permittivity looks very similar as for static permittivity, with a positive effect from CH3, both stretch and bend, and a negative effect from CH2 stretch. In addition, there is a negative effect from CH2 bend at around 1465 cm−1, enhancing the indication that biodegradation is the most important effect on the model. For velocity of sound, the same trends for CH3 and CH2 as for the models for density and e_inf are present, but there are also a lot of effects that looks like noise. However, the predictive quality of the model is very good. 4.4. Precision of Prediction. Figure 5 shows the predicted and measured values for the three validation objects for the models for static permittivity, high frequency permittivity, permittivity variable α, and density, all based on IR data, and Table 3 summarizes the quantitative modeling results. It is clear that the deviations are small for most of the models and validation objects. Exceptions are the permittivity variables α, τ, and σ, where α is considered to give the best results of these three variables. For the variables static permittivity and α, all validation objects are predicted with a value that is lower than the measured. It is not possible using PLS to determine whether this is a bias or due to the choice of validation objects. A larger data set

effects originate almost exclusively from small to medium molecular weight straight chained alkanes. Thus, biodegradation has an important effect also on the variance of the velocity of sound in crude oils. For static permittivity modeled from IR, the area between 2800 cm−1, which corresponds to the signals originating from CH3 stretch, have a positive effect, while signals originating from CH2 stretch have a negative effect. This is an indication of biodegradation, as a high amount of CH3 indicates a high amount of branched alkanes and a high amount of CH2 indicates a high amount of straight chained alkanes.4 As straight chained alkanes are the first compounds to be attacked during biodegradation, the branched alkanes will dominate the composition increasingly. Other regions having a positive effect are the CH3 bend at around 1375 cm−1 and CO at around 1450 cm−1; these signals also indicate biodegradation, as the CO might origin from carboxylic acids formed during biodegradation.4 From 800 cm−1 to 694 cm−1, there are signals having both positive and negative effects, these signals are in the fingerprint region and are not very easy to assess, but it is likely that they originate from C−H bonds in aromatic compounds. The positive effects are due to the fact that biodegraded oils have more absorbance in these regions. 5685

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Energy & Fuels

Article

Figure 5. Predicted and measured values for PLS models for static permittivity, high frequency permittivity, permittivity variable α, and density based on IR data. (a) High value validation objects; (b) medium value validation objects; (c) low value validation objects.

Figure 6. Predicted and measured values for PLS models for static permittivity, high frequency permittivity, permittivity variable α, and density based on GC data. (a) High value validation objects; (b) medium value validation objects; (c) low value validation objects.

might clarify this, as more validation objects can be used in the model validation. Compared to the deviations that Satya et al.1 achieved for density based on NIR spectral data, we see that the results are somewhat better; Satya et al. achieved deviations of 1.6% and 5.3% for their two validation objects, while the three validation objects in this work have deviations of 0.5%, 1.1%, and 1.5%. Figure 6 shows the predicted and measured values for the three validation objects for the models for static permittivity, high frequency permittivity, permittivity variable α, and density, all based on GC data, and Table 4 summarizes the quantitative modeling results. It is clear that the deviations are small for most of the models and validation objects. Exceptions are the permittivity variables α, τ, and σ, where α is considered to give the best results of the three variables. Compared to the deviations that Satya et al.1 achieved for their model of density based on NIR, we see that the deviations for the model for density based on GC in this work are in the same range; Satya et al. achieved deviations of 1.6% and 5.3% for their two validation objects, while the three validation objects in this work have deviations of 2.9%, 2.8%, and 0.3%.

5. DISCUSSION As Figures 3−6 and Tables 4 and 5 show, PLS calibration models for prediction of several properties of crude oil based on both GC data and IR data can be built with good results. The results are comparable or even better in some cases than previously established models.1 The reason for the improved quality of the models in this work is uncertain, as the basis for the models in Satya et al.1 is not possible to identify precisely. The improvement may be due to more consistent experimental data, since all measurements were performed by one operator during a short time period. The sample quality may also be relevant. In addition, the distribution of the sample properties over a reasonable range may contribute, since the sample sets are not very large. Overall, the values estimated for the experimental uncertainties and the model uncertainties lie in the same range, indicating that the data give a good basis for accurate predictions. The model for velocity of sound based on IR data is in fact very good, at least from a modeling perspective, with an average 5686

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Energy & Fuels

Article

Table A1. Variable Coding and Variable Names for the GC Data variable coding

variable name

iC5 nC5 22dm-C4 cyC5 23dm-C4 2 m-C5 3 m-C5 nC6 22dm-C5 m-cyC5 24dm-C5 223tm-C4 benzene 33dm-C5 cyC6 2 m-C6 23dm-C5 11dm-cyC5 3 m-C6 1c.3dm-cyC5 1t.3dm-cyC5 1t.2dm-cyC5 nC7 m-cyC6 113tm-cyC5 e-cyC5 25dm-C6 223tm-C5/24dm-C6 1c.2t.4tm-cyC5 33dmC6 1t.2c.3tm-cyC5 234tm-C5 Toluen/233tm-C5 23dm-C6 2 m-C7 4 m-C7 3 m-C7 1.c3dm-cyC6 1.14dm-cyC6 11dm-cyC6 1t.2dm-cyC6

iso-pentane n-pentane 2,2-dimethylbutane cyclopentane 2,3-dimethylbutane 2-methylpentane 3-methylpentane n-hexane 2,2-dimethylpentane methylcyclopentane 2,4-dimethylpentane 2,2,3-trimethylbutane benzene 3,3-dimethylpentane cyclohexane 2-methylhexane 2,3-dimethylpentane 1,1-dimethylcyclopentane 3-methylhexane cis-1−3-dimethylcyclopentane trans-1−3-dimethylcyclopentane trans-1−2-dimethylcyclopentane n-heptane methylcyclohexane 1,1,3-trimethylcyclopentane ethylcyclopentane 2,5-dimethylhexane 2,2,3-trimethylpentane/2,4-dimethylhexane cis-1-trans-2−4-trimethylcyclopentane 3,3-dimethylhexane trans-1-cis-2−3-methylcyclopentane 2,3,4-trimethylpentane toluene/2,3,3-trimethylpentane 2,3-dimethylhexane 2-methylheptane 4-methylheptane 3-methylheptane cis-1−3-dimethylcyclohexane trans-1−4-dimethylcyclohexane 1,1-dimethylcyclohexane trans-1−2-dimethylcyclohexane

variable coding nC8 e-cyC6 i-C9 e-benzene m-xylene p-xylene 4 m-C8 2 m-C8 3 m-C8 o-xylene nC9 i-C10 nC10 i-C11 nC11 nC12 i-C13 i-C14 nC13 i-C15 nC14 i-C16 nC15 nC16 i-C18 nC17 pristane nC18 phytane nC19 nC20 nC21 nC22 nC23 nC24 nC25 nC26 nC27 nC28 nC29 nC30

error of 0.12% for the validation objects, while the models for τ (dielectric relaxation) and σ (conductivity) are more inaccurate. For quality assurance purposes for the flowmeters, the models for static permittivity, high frequency permittivity, density, and velocity of sound, based on GC and IR, respectively, are considered precise enough to be useful. As a minimum, the prediction of significantly different values than used in the initial calibration of the MFM will indicate that a new calibration of the system is required. The models also give information on the chemical basis for the variations in the sample set. For most of the models, the regression coefficients that build up most of the models, both for GC and IR, strongly indicate that biodegradation of crude oils is the main cause of variation in the modeled variables. For models based on GC, the main positive contributors to the model are branched alkanes and long, straight chained alkanes. As low to medium molecular weight straight chained alkanes are the first molecules to be depleted by

variable name n-octane ethylcyclohexane iso-nonane ethylbenzene meta-xylene para-xylene 4-methyloctane 2-methyloctane 3-methyloctane orto-xylene n-nonane iso-decane n-decane iso-undecane n-undecane n-dodecane iso-tridecane iso-tetradecane n-tridecane iso-pentadecane n-tetradecane iso-hexadecane n-pentadecane n-hexadecane iso-octadecane n-heptadecane pristane n-octadecane phytane n-nonadecane n-icosane n-henicaosane n-docosane n-tricosane n-tetracosane n-pentacosane n-hexacosane n-heptacosane n-octacosane n-nonacosane n-triacontane

biodegradation, the branched and large molecular weight straight chained alkanes will have a proportionally higher relative abundance in the crude oil mixture. The small to medium molecular weight alkanes are mostly negative contributors, further supporting that biodegradation is the main cause of variation in the modeled variables. The permittivity of a pure compound increases with increasing number of carbon atoms in the compound;25 hence, biodegraded oils generally have a higher permittivity than nonbiodegraded oils, since they have a higher relative abundance of higher molecular weight straight chained alkanes. This is also observed in the measurements for this data set. For the models based on IR, the biodegradation effect is also observed since significant values for the regression coefficients tend to originate from areas in the IR spectra that are influenced by the biodegradation process. For all models, CH3 stretch regions are positive contributors, while CH2 stretch regions are negative contributors, reflecting that biodegraded 5687

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Energy & Fuels

Article

(5) Parisotto, G.; Ferrao, M.; Muller, A. L. H.; Muller, E. I.; Santos, M. F. P.; Guimaraes, R. C. L.; Dias, J. C. M.; Flores, E. M. M. Energy Fuels 2010, 24, 5474−5478. (6) Abbas, O.; Rebufa, C.; Dupuya, N.; Permanyer, A.; Kister, J. Talanta 2008, 75, 857−871. (7) Flumignan, D. L.; Ferreira, F. O.; Tininis, A. G.; de Oliveira, J. G. Chemom. Intell. Lab. Syst. 2008, 92, 53−60. (8) Peinder, P.; Visser, T.; Petrauskas, D. D.; Salvatori, F.; Soulimani, F.; Weckhuysen, B. M. Vib. Spectrosc. 2009, 51, 205−212. (9) Tomren, A. L.; Barth, T.; Folgerø, K. Unpublished results 2012. (10) Falcone, G.; Harrison, B. Oil Gas J. 2011, 109 (10), 68−73. (11) Thorn, R.; Johansen, G. A.; Hammer, E. A. In 1st World Conference on Industry Process Tomography, Buxton, Greater Manchester, U.K., 1999. (12) Falcone, G.; Hewitt, G. F.; Alimonti, C.; Harrison, B. J. Pet. Technol. 2002, 54, 77. (13) Cole, K. S.; Cole, R. H. J. Chem. Phys. 1941, 9, 341−351. (14) Carlson, J. E.; Tomren, A. L.; Folgerø, K.; Barth, T. Chemom. Intell. Lab. Syst. 2012, submitted for publication. (15) Wold, S.; Esbensen, K.; Geladi, P. Chemom. Intell. Lab. Syst. 1987, 2, 37. (16) Hoskuldsson, A. J. Chemom. 1995, 9, 91. (17) Peters, K. E.; Moldowan, J. M. The Biomarker Guide: Interpreting Molecular Fossils in Petroleum and Ancient Sediments; 1st ed.; Prentice Hall: Englewood Cliffs, NJ, 1993. (18) Weiss, H. M.; Wilhelms, A.; Mills, N.; Scotchmer, J.; Hall, P. B.; Lind, K.; Brekke, T. NIGOGA, The Norwegian Industry Guide to Organic Geochemical Analyses, edition 4.0; Norsk Hydro, Statoil, Geolab Nor, SINTEF Petroleum Research, and the Norwegian Petroleum Directorate: Norway, 2000. Available online: http://www. npd.no/engelsk/nigoga/default.htm (19) Bjørndal, E.; Frøysa, K. E.; Engeseth, S. A. IEEE Trans. Ultrason., Ferroelectr. Freq. Control 2008, 55 (8), 1794−1808. (20) Halliday, D.; Resnick, R.; Walker, J. Fundamentals of Physics, 5th ed. extended; Wiley: Hoboken, NJ, 1997. (21) Folgerø, K. PhD thesis, University of Bergen, 1996. (22) Blomquist, G.; Johansson, E.; Söderström, B.; Wold, S. J. Chromatogr. 1979, 173, 7−17. (23) Karrer, L. L.; Gordon, H. L.; Rothstein, S. M.; Miller, J. M.; Jones, T. R. B. Anal. Chem. 1983, 55, 1723−1728. (24) SIRIUS, Pattern Recognition Systems AS (PRS AS), Version 7.0, 2004. Available online: http://www.prs.no/Sirius/Sirius.html (accessed March 28, 2012) (25) Maryott, A. A.; Smiths, E. R. Table of Dielectric Constants of Pure Liquids, Circular 514; United States Department of Commerce, National Bureau of Standards: Washington, DC, 1951.

oils have higher absorbance in the CH3 regions than nonbiodegraded oils, while they have lower absorbance in the CH2 regions. A strong absorbance in the CH3 regions indicates a high amount of branched alkanes, while a strong absorbance in the CH2 regions indicates more straight chained alkanes. Biodegradation also causes increased contents of polar compounds in crude oils, either due to production of, for example, organic acids as metabolites in the microbial processes, or due to the partial removal of nonpolar bulk hydrocarbons. The IR spectra can reflect this increase directly in the parts of the spectra that reflect carbon−oxygen bonds. This is observed in the regression coefficients as having a positive effect on the model, and the trend is that the biodegraded oils have stronger absorbance in those areas than the oils that are not biodegraded, which further supports the interpretation that biodegradation is the major cause of variation in the modeled variables. The condensate samples are strongly dominated by hydrocarbons in the low molecular range, so the differences in the hydrocarbon composition will be dominant in these samples. A larger sample set consisting of condensates only might reveal patterns of variation that are not obvious in the sample set used here. Overall, GC- and IR-based PLS calibration models of properties that are important in MFM operation have shown good predictive quality, supporting their usefulness in oil production and transport facilities. A typical application would be to detect changes in oil composition during the production lifetime of an oil field and highlight the need for updating calibration values. Both analytical techniques provide data that give good model results. If MFM calibration is the major purpose, IR spectroscopy is the best approach. It is a much faster and easier measuring technique; also, portable measuring devices for IR spectroscopy already exist. GC measurements take several hours but are already in use for quality control in other contexts, and may therefore be a good choice in a combined flow assurance perspective.

■ ■

APPENDIX A Table A1 shows the variable coding and variable names for the gas chromatography data, as used in Figure 4. AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The funding for this work comes from the Norwegian Research Council through the Michelsen Centre for Research Based Innovation and from Norwegian industrial partner Roxar.



REFERENCES

(1) Satya, S.; Roehner, R. M.; Milind, D. D.; Hanson, F. V. Energy Fuels 2007, 21, 998−1005. (2) Morris, R. E.; Hammond, M. H.; Cramer, J. A.; Johnson, K. J.; Giordano, B. C.; Kramer, K. E.; Rose-Pehrssson, S. L. Energy Fuels 2009, 23, 1610−1618. (3) Statoil, Norway. Advanced analysis technology. Available online: http://www.statoil.com/en/technologyinnovation/refiningandprocessing/ oilrefining/nir/pages/default.aspx (accessed March 1, 2012). (4) Genov, G.; Nodland, E.; Skaare, B. B.; Barth, T. Org. Geochem. 2008, 39 (8), 1229−1234. 5688

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688