Uncertainty of Oil Field GHG Emissions Resulting from Information

Aug 10, 2014 - Regulations on greenhouse gas (GHG) emissions from liquid fuel production generally work with incomplete data about oil production ...
0 downloads 0 Views 383KB Size
Article pubs.acs.org/est

Uncertainty of Oil Field GHG Emissions Resulting from Information Gaps: A Monte Carlo Approach Kourosh Vafi† and Adam R. Brandt*,† †

Department of Energy Resources Engineering, Stanford University, Stanford, California 94305, United States S Supporting Information *

ABSTRACT: Regulations on greenhouse gas (GHG) emissions from liquid fuel production generally work with incomplete data about oil production operations. We study the effect of incomplete information on estimates of GHG emissions from oil production operations. Data from California oil fields are used to generate probability distributions for eight oil field parameters previously found to affect GHG emissions. We use Monte Carlo (MC) analysis on three example oil fields to assess the change in uncertainty associated with learning of information. Single factor uncertainties are most sensitive to ignorance about water−oil ratio (WOR) and steam−oil ratio (SOR), resulting in distributions with coefficients of variation (CV) of 0.1−0.9 and 0.5, respectively. Using a combinatorial uncertainty analysis, we find that only a small number of variables need to be learned to greatly improve on the accuracy of MC mean. At most, three pieces of data are required to reduce bias in MC mean to less than 5% (absolute). However, the parameters of key importance in reducing uncertainty depend on oil field characteristics and on the metric of uncertainty applied. Bias in MC mean can remain after multiple pieces of information are learned, if key pieces of information are left unknown.



INTRODUCTION Regulations such as the California low carbon fuel standard (LCFS) and the European Union fuel quality directive (FQD) seek to incentivize the production of less greenhouse gas (GHG) intensive fuels.1−3 In order to improve understanding of the benefits of low-GHG fuels, significant effort has been made to better understand the life cycle GHG impacts of the “baseline” fuels produced from conventional oil resources.1,2,4−8 Understanding emissions from crude oil production and processing is of key importance because crude oil output supplies over 86% of global liquid fuel feedstocks (with the remainder supplied by natural gas liquids, biofuels, and synthetic fuels).9 GHG emissions from production of petroleum can be computed in many ways. If operating data are available, energy consumption (fossil fuels or electricity) can be computed per barrel of oil produced. Approaches that use direct energy consumption data can provide accurate results, yet such data are generally only accessible to operating companies. Therefore, regulators or other interested external parties require independent means to approximate GHG emissions from oil production. For this reason, GHG emissions models have been developed to estimate emissions from oil producing regions without access to complete data.4,6,10−12 Stanford University, in partnership with the California Air Resources Board, has worked to develop an open-source GHG emissions estimation tool for the oil industry, called the oil p r o d u c t i o n g r e e n h o u s e g a s e m i s s i o n s e st im a t o r (OPGEE).4,13,14 OPGEE is a GHG emissions model that © XXXX American Chemical Society

models emissions from the well to refinery entrance gate (WTR boundary). OPGEE is built in a modular fashion, with separate modules for parts of the production chain, including drilling, production, processing, and transport of crude. All sources of direct emissions are included in the model, including both combustion emissions and fugitive emissions. OPGEE is designed to work with variable data inputs and uses “smart” default values to allow computation of emissions even in the absence of complete data. Smart defaults are adjusted automatically by OPGEE depending on known properties of the oil field. OPGEE is explained in detail in extensive technical documentation and in peer-reviewed articles.4,5 In another paper,14 we compared several of these models with OPGEE. One key result of that comparison was the importance of using open source models to transparently assess emissions, especially when they apply to energy or GHG policy. Another key result of this comparison was a significant divergence between “bottom-up” engineering-based methods such as OPGEE and methods that rely on aggregate industryaverage values. Using industry-average energy intensities for production of petroleum introduces uncertainty by ignoring variation in oil field characteristics, processes, and operating conditions. Received: April 29, 2014 Revised: August 8, 2014 Accepted: August 10, 2014

A

dx.doi.org/10.1021/es502107s | Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Environmental Science & Technology

Article

of oil produced (the water−oil ratio, or WOR), or the application of steam injection. Technology data, on the other hand, represent the efficiencies or emissions factors associated with the use of a given technology. For example, OPGEE assumes that sucker-rod pump efficiency is 65%,5 while reported literature values range from 60 to 70%18 and some poorly maintained real-world systems likely have even lower efficiencies. This uncertainty analysis focuses on unavailability of activity data, as these are known to be highly variable between oil fields, and are likely to represent a much larger source of uncertainty. For example, WORs vary by a factor of over 100 between fields, directly affecting the work of fluid lifting. These activity data are therefore a much larger driver of variation in lifting work than variation in pump efficiency. OPGEE v1.1 Draft A19 is modified to include a Monte Carlo (MC) simulator. The general method of this analysis is to quantify the uncertainty associated with incomplete data by modifying probability distributions of data inputs to the MC model and studying the effect on the probability distributions of the calculated emissions. We examine differences by modeling cases where particular pieces of information are unknown and drawn from probability distributions (from the reported data), compared to cases where the true values of those pieces of information are assumed to be known. Due to computational effort involved in Monte Carlo simulations, we examine the effect of availability of eight model inputs (see Table 1). While other model inputs will affect emissions estimates, we analyze these eight parameters because they were found to be key model inputs in previous studies.4,13,7,8,12

Process-based models can give estimates of emissions, provided that the model is populated with reliable input data. However, a lack of specific oil field data can introduce uncertainty into even the most detailed models. In most cases, only some of the input variables to a model like OPGEE are publically available. The uncertainty in estimated GHG emissions thus depends on how unknown inputs are treated and how sensitive the model is to the unknown inputs. Given the importance of GHG emissions from crude oil production, we argue that the uncertainty associated with estimates of GHG emissions from these processes has not been addressed in sufficient detail in published work. Previous models analyze uncertainty in various ways. One report performs sensitivity analysis of model results to key input assumptions,12 while another report calculates simple low−high ranges that serve to provide boundaries on likely emissions.7 In contrast, a study by Venkatesh et al. more explicitly examined emissions uncertainty and created cumulative emissions distributions for a variety of world regions.15 This paper explores in detail one source of uncertainty: uncertainty that arises due to sparse availability of input data for engineering-based models of crude oil GHG emissions. We systematically test the effects of sparse input data. One goal is to determine if there are specific pieces of information that are the most effective in reducing the uncertainty of GHG emissions estimates. Another goal is to understand how uncertainty is reduced upon learning of information. The results point to methods that may be used to better inform users of the model about the appropriate level of effort that should be expended to develop crude oil data sets. This analysis does not address the uncertainty in constructing crude oil “baseline” values for a given region across many crude oils (see more discussion below).

Table 1. Selected Oil Fields and Model Inputs for Each Field



METHODS Uncertainty in estimates of GHG emissions from oil production can arise from a number of sources. First, not all data are generally available for a given oil field, so default values must be used in place of specific field information. Second, there can be uncertainty from imprecision in measurement, collection, and reporting of input data into public databases or other published sources. Third, inaccuracies can arise from structural model uncertainties, such as errors in simulating or approximating the physics of petroleum production. For example, OPGEE models wellbore frictional losses without accounting for the (complex) effects of two-phase flow. Fourth, models include certain production technologies (e.g., sucker rod pump) when other technologies may be applied in practice (e.g., electric submersible pump). Fifth, the actual emissions impacts from producing a fuel can vary with market interactions within the fuel sectors and in other sectors, as has been shown clearly in the biofuels literature.16,17 This study focuses solely on the first source of uncertainty, as we judge it to be among the largest sources of uncertainty in crude oil modeling. Future analysis of the OPGEE model will explore other sources of uncertainty. With regard to uncertainty resulting from a lack of information, two general classes of data can be missing or available for a given oil field. For simplicity we will call these activity data and technology data. Activity data represent how much of a given input or requirement is used or applied in a given oil field or what activities are applied to the oil field. For example, activity data can include barrels of water lifted per bbl

recovery method production rate (kbbl/d) API gravity field depth (ft) GOR (scf/bbl) WOR (bbl water/ bbl oil) SOR (bbl water/ bbl oil) RSPC

Midway-Sunset

Wilmington

Beverly Hills

steam flooding 83.3

conventional 35.3

conventional 2.2

19.0 2698 161 7.2

19.5 2824 341 37.2

33.1 7545 1284 10.9

4.94

NA

NA

0.28

NA

NA

Distributions for input parameters are shown in the Supporting Information (Table S1), constructed using pooland field-level data from up to 1381 pools and oil fields in California.20,23 We used EasyFit software to find the best-fit probability distribution function (PDF) to California oil field data for each parameter.20−23 All available distributions are allowed. Goodness of fit is measured by Kolmogorov−Smirnov and Chi Square tests, with the best scoring distribution chosen (see Table 1 for list of the resulting best-fitting distributions for the eight varied parameters). In some cases there was not a single PDF that fit with acceptable error properly to the entire range of available data. In these cases, the data range was split into two or three smaller ranges, and a combination of PDFs or PDFs and probability distribution tables were used. Some divergence between the fitted PDFs and tables and the underlying data exists, which could introduce uncertainty. This source of uncertainty is B

dx.doi.org/10.1021/es502107s | Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Environmental Science & Technology

Article

judged to be small due to good fit between data and distributions (see Figures S1−S8, Supporting Information). The generated PDFs and probability tables were used to generate a set of random values for each input parameter. A total of 10000 values for each parameter were drawn for each run. A total of 10000 trials were more than sufficient to ensure convergence of the MC simulation (see Figures S19 and S20, Supporting Information, and the associated text). These parameters were then input into OPGEE using a purposebuilt VBA MC simulator. After each run of 10000 trials, the average, standard deviation (SD), and the range encompassing 90% of observed GHG emissions was recorded and analyzed. In all runs, all other input parameters other than the eight studied parameters are set to the OPGEE’s smart default values for California. To test our method, we study three oil fields in California which are chosen to represent oil recovery operations with low, medium, and high GHG emissions: Beverly Hills, Wilmington, and Midway-Sunset, respectively. Table 1 shows the input data for each field.20 The default conventional oil production method in OPGEE assumes water reinjection and use of downhole pumps for oil lifting. While definitions in the literature can vary, here we define two uncertainty metrics. First, inaccuracy is defined as the difference between the mean of a MC simulation results distribution and the “true” value obtained when all information is known. Inaccuracy could also be called bias in the mean of a MC simulation. Second, we define imprecision as the dispersion around the mean of MC simulation results. We quantify imprecision in an absolute sense using the SD. We quantify imprecision in a normalized sense using the coefficient of variation (CV, SD/mean). Ideally, learning information to reduce uncertainty would reduce both inaccuracy and imprecision. First, we examine the inaccuracy and imprecision in a field’s estimated emissions resulting from a lack of knowledge of each of the eight studied parameters. In this single-factor sensitivity analysis, we fix all input parameters to the known value except one, and examine the inaccuracy and imprecision introduced by drawing the remaining parameter from our California-specific distributions. For each free input we compute the mean, SD, and CV of the generated MC results distribution. While this single-factor sensitivity analysis illustrates which pieces of knowledge are most important for a given oil field, it does not account for cumulative effects of ignorance of multiple pieces of information. Therefore, a multifactor cumulative sensitivity analysis is performed. In the first step of this multifactor analysis we assume that all eight study variables are unknown. Therefore, all input values are drawn from California-specific distributions rather than set to known values. The result for any field at this stage is identical and represents the distribution of emissions for a “generic” California oil field when no information is known about the field. From this starting point, we study the value of increased information on our distribution of estimated emissions for each of the three California oil fields. As knowledge about a particular oil field increases, parameter distributions are replaced with fixed or “known” values from Table 1. This is performed for each consistent combination of parameters separately: for each combination, 10000 MC trials are generated for each combination. When a piece of information is “learned”, it is assigned a single real number in place of the

original distribution (i.e., all other sources of uncertainty are neglected). If, for a given oil field, it is learned that conventional oil recovery methods are used, two of our eight input parameters are then no longer applicable: steam−oil ratio (SOR) and ratio of steam produced by cogeneration (RSPC). At any step of simulation when the method of recovery is unknown, the California-wide SOR and RSPC distributions are used.20 If the recovery method is learned to be conventional oil production, the MC simulator inactivates the steam flooding module so SOR and RSPC values are not drawn from the distributions and their value does not affect calculation of GHG emissions by the model, respectively. The total number of the combinations in our multifactor sensitivity analysis can be calculated as follows Nc =

∑ i

N! − Ninc i ! (N − i )!

(1)

where Nc is the number of consistent combinations for a given number of fixed inputs. In this equation, i is the number of data distributions selected to be replaced with fixed values and N is the total number of MC simulation input variables for the oil field. Some number of combinations, Ninc, are inconsistent, and these are removed. For example, if SOR is fixed to a nonzero value, the method of recovery should not be allowed to alternate between both conventional and steam-based recovery methods. In some cases, the range of inputs is truncated on the basis of fundamental engineering knowledge, while the full number of combinations is still possible. For example, the method of recovery is not allowed to vary freely in all cases, as combinations can arise that are not observed in practice (e.g., API > 40 combined with steam flooding). The rules used to remove inconsistent combinations are described in detail in the Supporting Information. The challenge in computational effort arises because there are many ways that i of N inputs are known while the rest are uncertain. Even after removal of inconsistent combinations, Nc = 38 combinations for Beverly Hills and Wilmington fields and 86 for Midway-Sunset. While future studies could examine more than eight inputs, the number of combinations that must be run depends on the factorial of N, and computational burden quickly expands. In addition, correlations between input variables are included through logic of the generated PDFs, depending on the order of data learning that occurs. For example, if it is learned that the oil field has a low API gravity (i.e., high density), then it becomes more probable that the oil field apply thermal oil recovery. See Figure S9 and associated discussion in the Supporting Information.



RESULTS AND DISCUSSION Figure 1 shows the result of a single-factor sensitivity analysis for our three studied fields. In this case, all inputs are known for each field except for WOR, which is drawn from the Californiaspecific distribution of WORs. Skewed distributions result: lower values of WOR quickly result in no appreciable impact (i.e., emissions asymptotically approach a lower bound), while high values of WOR generate a “long-tail” of high emissions scenarios. This is due to the long tail in underlying WOR values (see Figure S7, Supporting Information) as well as the increasing importance of WOR at high values (modeled GHG emissions vary nonlinearly with WOR due to frictional C

dx.doi.org/10.1021/es502107s | Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Environmental Science & Technology

Article

in all cases. The SD depends on the interaction of the input variables in the underlying mathematical model (see the Supporting Information). Next, we show results arising from learning (fixing) parameters. Parts b−d of Figure 2 show selected results for the case of the Midway-Sunset oil field. Figure 2b shows that fixing the API gravity to the reported value for the field (19.0° API) results in a slight change of the distribution MC simulation results: mean GHG emissions increase to 12.5 g of CO2 equiv/MJ LHV crude oil, while the SD increases to 8.9 g of CO2 equiv/MJ LHV crude oil. In this stage, the method of recovery is not known: it can be both conventional or steam flooding, selected using a probability table based on API value (see the Supporting Information). When we fix the method of recovery to thermal oil recovery as in Figure 2c, the distribution of results shows a notable increase in the mean GHG emissions (16.7 g of CO2 equiv/MJ LHV crude oil) and a reduction in the SD (7.9 g of CO2 equiv/MJ LHV crude oil). By next fixing the SOR to 4.94 in Figure 2d, a significant reduction in the dispersion of the results occurs: mean GHG emissions increase to 23.7 g of CO2 equiv/MJ LHV crude oil and the SD is reduced to 4.4 g of CO2 equiv/MJ LHV crude oil. A bifurcation in resulting emissions distributions occurs when a binary variable such as production method is chosen. This is seen in Figure 2 in the transition from b to c in Figure 2. The distribution in Figure 2b is in fact made up of two underlying distributions, one of which results from choosing thermal recovery (as seen in Figure 2c) and one of which results from choosing conventional recovery. See Figure S25 in the Supporting Information. Figure 3 shows the mean results from MC simulations of all meaningful combinations for fixing a set number of input data for our three studied oil fields. In each case, there is only one way in which no data can be known: all eight input parameters are drawn from distributions, and the “generic” California field is represented. Similarly, there is for each field only one possible case where all inputs are known (either six or eight input parameters are fixed, depending on the recovery method). In between these extremes, a large number of combinations are possible, given by eq 1. Note that mean expected emissions from each field start at the California “generic” case shown as a distribution in Figure 2a and then gradually shift toward the “true” value on the right-hand side of the figure as more information is learned about each field.

Figure 1. Sensitivity of three California oil fields to variation in WOR. All data are fixed to field-specific information, except WOR, which is drawn from the distribution of California WORs.

effects). Note that the distribution for each field is centered around a different baseline, governed by the other parameters for each field. Table 2 shows the result for each of the three fields of fixing all input parameters except for the listed parameter, displaying the mean, SD, and CV for each. We can see across fields that learning WOR and (if applicable due to production method) SOR are the most impactful parameters for reducing inaccuracy and imprecision. Figure 2 illustrates the multifactor cumulative uncertainty analysis for Midway-Sunset oil field. Figure 2a shows the probability distribution of GHG emissions calculated using OPGEE from Midway-Sunset when we have no field-specific knowledge. That is, all eight studied parameters are unknown and therefore drawn from California-specific distributions. The mean GHG emissions for this “generic” California oil field equal 10.6 g of CO2 equiv/MJ LHV crude oil. The standard deviation (SD) is 8.3 g of CO2 equiv/MJ LHV crude oil, and the 90% confidence range is 2.7−27.7 g of CO2 equiv/MJ LHV crude oil. Additions to knowledge about the oil field and replacement of distributions with known values will change the distribution of results (the mean, SD, and the CV will change). It should be noted that fixing a variable does not necessarily reduce the SD

Table 2. Single Variable Sensitivity analysis: Sensitivity of WTR GHG Emissions from Three California Fields to Uncertainty in Each Piece of Informationa Midway-Sunset

Wilmington

Beverly Hills

paramb

mean

SD

CV

param

mean

SD

CV

mean

SD

CV

depth GOR prod rate WOR SOR RSPC

22.97 22.94 22.90 24.02 15.87 22.44

0.47 0.04 0.08 2.49 7.56 3.30

0.02 0.00 0.00 0.10 0.48 0.15

depth GOR prod rate WOR method

13.38 10.34 8.61 4.86 16.74

5.12 0.04 1.15 4.34 7.33

0.39 0.00 0.13 0.89 0.44

5.04 5.77 5.48 7.56 5.62

0.70 0.56 0.21 6.41 0.00

0.14 0.10 0.04 0.85 0.00c

a

All other information held constant at known values, while unknown information is drawn from California-specific distributions. SD = standard deviation; CV = coefficient of variation (SD/mean). bSee section S4 in the Supporting Information for definition and technical explanation of the parameters. cMethod has SD = 0 in the case of Beverly Hills because the single-variable uncertainty analysis cannot select thermal recovery as the production method. Because of correlations between API gravity and thermal recovery (never applied in California above API = 28), Beverly Hills has no probability of applying thermal recovery. D

dx.doi.org/10.1021/es502107s | Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Environmental Science & Technology

Article

Figure 2. Change in emissions distribution in Midway-Sunset field upon learning of information. (a) Overall California distribution. (b) California distributions with API gravity fixed to Midway-Sunset field API. (c) Production method fixed (to thermal oil recovery). (d) Steam−oil ratio fixed.

Parts a and c of Figure 3 show a general trend in learning: as more parameters are learnedare set to fixed values rather than drawn from distributionsthe inaccuracy in emissions estimates decreases as they tend toward the result when all information is known. This reduction in inaccuracy, however, varies with the learned parameters and with characteristics of the oil field. For example, in Figure 3c we see that after fixing only one parameter for the Beverly Hills field we can obtain a result from OPGEE that is near to the final result upon fixing all six parameters. Figure 3a shows that only three pieces of information are required before the result for the MidwaySunset field converges to near its final result. However, we also see pathological behavior in some cases: in Figure 3a, we see that one particular combination of learning five out of eight pieces of information about the Midway-Sunset field gives a result that is still very near the “generic” California case. Parts a and c of Figure 3 show another interesting phenomenon: in a small number of cases, learning can occur that increases inaccuracy. For example, we know that once all information is learned, Midway-Sunset should have high emissions relative to our “generic” California case, while Beverly Hills should have low emissions. We see, however, that a few combinations of learned parameters can move MC mean emissions estimates for these fields lower and higher, respectively, than the “generic” California case. Similarly, learning a piece of information can, in uncommon cases, increase imprecision rather than decrease it (seen in both SD and CV, see Figures S10−S18, Supporting Information). We explore the mathematical justification for this behavior in the Supporting Information. Examining individual combinations is instructive. We can see some cases in Figure 3 where significant information is applied

but inaccuracy persists. In the case of Midway-Sunset, inaccurate (low) emissions estimates are associated with combinations in which SOR is unknown. Thus, in a case where seven of eight input variables are fixed, a lack of information about SOR in a thermal oil field can cause notable inaccuracy. In the case of Beverly Hills, the large positive inaccuracy remaining when five of six variables are known is given by the case when the water−oil ratio (WOR) remains unknown. Clearly, two possible interpretations of the results in Figure 3 are possible. First, the analysis shows that, in all cases, learning only a few pieces of information can result a very accurate MC simulation (i.e., MC simulation mean is very close to the mean when all information is known). Conversely, there are cases where almost all parameters are known (e.g., five out of six or seven out of eight data are known), but significant inaccuracy (± 50%) remains. This points to a clear need for future studies: across global oil fields, what pieces of information are vital to model crude oil emissions with reasonable accuracy? Note that the right-hand side of Figure 3a−c shows no dispersion. That is, after all information is learned about a field, no uncertainty remains (inaccuracy and imprecision are removed). This is not correct except in the limited analysis performed here (see the previous discussion about included sources of uncertainty). By examining all possible combinations shown in Figure 3, we can determine the probability that learning information will reduce inaccuracy or imprecision. Figure 4 shows the probability of improvement in accuracy or precision upon learning a specified number of randomly selected parameters. Figure 4 is representative of the impact of learning either when no foreknowledge is available regarding the most impactful pieces of information or when the analyst cannot control the E

dx.doi.org/10.1021/es502107s | Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Environmental Science & Technology

Article

mean when no information is learned (i.e., it is a very “average” California oil field), learning of information reduces the imprecision but does not robustly improve the accuracy until most of the parameters are learned. Rather than a random learning process, ideally we would learn information that is most useful for reducing uncertainty (e.g., follow the most rapid path in Figure 3 toward the value at right). Figure 5 shows the “best case” and “worst case” information learning path for the Midway-Sunset field (see Figures S23 and S24, Supporting Information, for results for Wilmington and Beverly Hills). For each diagram, a filled cell represents learned information in that stage of the cumulative sensitivity analysis. Parameters that have large numbers of filled cells in the “best case” paths represent parameters that should be learned early (e.g., API gravity, production method, or SOR). As we can see, the inaccuracy decreases rapidly: after three pieces of information are learned the absolute inaccuracy from 53% to 4% (for Midway-Sunset, these three key pieces are API, production method, and SOR). Similar, though not identical, learning paths are found to most effectively reduce imprecision. In contrast, the right side diagrams (red) show the “worst case” learning paths. We note here that with poor learning choices, significant inaccuracy and imprecision remain even if seven of eight parameters are known (in MidwaySunset, if SOR is not known). For some parameters, best and worst paths involve learning information in approximately opposite order (e.g., GOR is learned later in best learning paths and earlier in worst learning paths). The best path analysis across three fields suggests that key parameters are API gravity, production method, WOR, and SOR if the field uses steam injection. API gravity is a key input not due to the direct impact of hydrocarbon density, but through its effect on the probability of applying thermal recovery methods. The rest of the parameters are of minor importance in California. We do not have enough information to know if this result holds globally. Clearly, the use of default values drawn from regional distributions has different implications for the different fields studied here. Using regional distributions gives a more accurate estimate of emissions for Wilmington than for Midway-Sunset and Beverly Hills. The lesson here is that where fields are atypical, using regional distributions or other defaults for key input parameters can result in error.

Figure 3. Accuracy of mean WTR GHG emissions as a function of increased data availability. All parameters that are not fixed are drawn from California-specific distributions.

type of information learned. We see that in nearly all cases, learning two or three parameters results in robust improvements in accuracy and precision (P > 0.8 of improvement). The Wilmington field represents a counter-example: because the true emissions from the Wilmington field are very close to the

Figure 4. Probability of improvement in accuracy and precision as a function of number of parameters fixed (learned). See text for definitions of precision and accuracy. F

dx.doi.org/10.1021/es502107s | Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Environmental Science & Technology

Article

Figure 5. Best and worst paths of learning information for the Midway-Sunset field. Metrics include inaccuracy in mean MC estimate relative to case with all information (top, % error), standard deviation (SD, middle), and coefficient of variation (CV, bottom). Note that the best path to reduce uncertainty varies somewhat depending on the metric chosen. PR and PM stand for production rate and production method, respectively.

Note that uncertainty in emissions from an individual field assessment is different from uncertainty in the average of a large number of fields, such as computed for regional baseline carbon intensities (CIs).24 The results of this study should not be applied to regional baseline analyses. For example, the California LCFS models 100s of oil fields or producing regions that supply California with crude oil. In these cases, missing data from a given oil field may result in over- or underprediction for that particular field, but compensating errors can reduce the imprecision and inaccuracy resulting from missing information. For example, OPGEE’s default WOR is derived from a best fit across many global oil fields.5 In a baseline analysis that examines many oil fields, this relationship may either underpredict or overpredict WOR (and thus emissions) for particular fields, but these errors will offset each other when the regional baseline is computed. The extent of improvement in accuracy and precision due to compensating errors is currently unknown. This analysis is exploratory, and not predictive. Our methodology relies on knowing the underlying data and their distributions a priori and, therefore, cannot be used in regions where the distributions of parameters across all fields in the region are not known. As we only examine three fields in a single region, any findings about the best learning path to reduce imprecision and inaccuracy are strictly indicative and do not supply a predictive tool to guide data gathering efforts. This suggests an important future study: examine the effect of data gap uncertainty across a large number of oil fields (i.e., 100s of oil fields from various world regions). This would allow much more robust predictive understanding of the reduction of uncertainty with information. Such a study would help to develop best learning paths to facilitate efficient gathering of

information. Such a study faces two difficulties. First, few countries report comprehensive data across many oil fields. Second, the computational effort required would likely necessitate a modification of the approach used here. Possible approaches include parallelization of the current approach across many machines or reduction in computational effort for Monte Carlo runs. Such an analysis would allow more robust understanding of the most important pieces of information about an oil field. Lack or inaccessibility of oil field data has been frequently reported in study of the GHG emissions from production of petroleum. We recommend improved transparency in reporting oil field data to aid modeling of GHG impacts of crude oil production, as well as more comprehensive assessment of uncertainty when analyzing crude oil GHG intensity.



ASSOCIATED CONTENT

* Supporting Information S

Methodology, SD, CV and error diagrams, convergence analysis, binary variable and bifurcation data, and error distribution data. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Corresponding author: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Support for K.V. for this work provided by Stanford University, School of Earth Sciences faculty support funds, from A.R.B. We G

dx.doi.org/10.1021/es502107s | Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Environmental Science & Technology

Article

acknowledge prior financial support for OPGEE model development from the California Air Resources Board, European Commission, and Carnegie Endowment for World Peace.



(17) Lemoine, D. M.; Plevin, R. J.; et al. The climate impacts of bioenergy systems depend on market and regulatory policy contexts. Environ. Sci. Technol. 2010, 44 (19), 7347−7250. (18) Takács, G. Sucker-rod pumping manual; PennWell Books: Tulsa, 2003. (19) El-Houjeiri, H. M.; McNally, S.; Brandt, A. R. Oil Production Greenhouse Gas Emissions Estimator OPGEE v1.1 Draft A, user guide and technical documentation; Stanford University: Stanford, 2013. (20) Low Carbon Fuel Standard Program Meetings, MCON Inputs, ARB meeting, Mar 5th 2013, California Resources Board, Sacramento, CA, http://www.arb.ca.gov/fuels/lcfs/lcfs_meetings/lcfs_meetings. htm (accessed April 28, 2014). (21) California Oil and Gas Fields, Volumes I−III. California Department of Conservation, Division of Oil, Gas, and Geothermal Resources, 1982, 1992, 1998 (GOR data collected from this source and processed by Clearly, K). (22) Monthly Oil and Gas Production and Injection Report, California Department of Conservation, January to December 2010: http://opi.consrv.ca.gov/opi/opi.dll (accessed April 28, 2014); (API data collected from this source by Duffy, J. and processed for API −GOR correlation). (23) Department of Conservation, California Oil and Gas Fields, Volume I,II,III: http://www.conservation.ca.gov/dog/pubs_stats/ Pages/technical_reports.aspx (accessed April 28, 2014). (24) Kocoloski, M.; Mullins, K. A.; Venkatesh, A.; Griffin, W. M. Addressing uncertainty in life-cycle carbon intensity in a national lowcarbon fuel standard. Energy Policy 2013, 56, 41−50.

REFERENCES

(1) Farrell, A.; Sperling, D. A Low-Carbon Fuel Standard for California. Part 1: Technical Analysis; Institute of Transportation Studies, University of California: Davis, CA, 2007; http://www.energy. ca.gov/low_carbon_fuel_standard/UC_LCFS_study_Part_1-FINAL. pdf (accessed April 28, 2014). (2) Farrell, A.; Sperling, D. A Low-Carbon Fuel Standard for California. Part 2: Policy Analysis; Institute of Transportation Studies, University of California: Davis, CA, 2007; http://www.energy.ca.gov/ low_carbon_fuel_standard/UC_LCFS_study_Part_2-FINAL.pdf (accessed April 28, 2014). (3) Directive 2009/30/EC of the European Parliament and of the Council of 23 April 2009. (4) El-Houjeiri, H. M.; Brandt, A. R.; Duffy, J. E. Open-source LCA tool for estimating greenhouse gas emissions from crude oil production using field characteristics. Environ. Sci. Technol. 2013, 47 (11), 5998−6006. (5) El-Houjeiri, H. M., McNally, S.; Brandt, A. R. OPGEE v1. 1 DRAFT A. Excel spreadsheet model 2013: https://pangea.stanford. edu/researchgroups/eao/ (accessed April 28, 2014). (6) Skone, T. J.; Gerdes, K. J. An evaluation of the extraction, transport and refining of imported crude oils and the impact on life cycle greenhouse gas emissions; Office of Systems, Analysis and Planning; DOE/NETL2009/1346; Technical report; National Energy Technology Laboratory: Pittsburgh, PA, 2008; http://www.netl.doe.gov/ File%20Library/Research/Energy%20Analysis/Publications/DOENETL-2009-1346-LCAPetr-BasedFuels-Nov08.pdf (accessed April 28, 2014). (7) Comparison of North American and Imported Crudes; Technical report; Jacobs Consultancy and Life Cycle Associates for Alberta Energy Research Institute: Calgary, Alberta, 2009; http://eipa.alberta. ca/media/39640/ life%20cycle%20analysis%20jacobs%20final%20report.pdf (accessed April 28, 2014). (8) EU Pathway Study: Life Cycle Assessment of Crude Oils in a European Context; Jacobs Consultancy and Life Cycle Associates for Alberta Petroleum Marketing Commission: Calgary, Alberta, 2012. (9) EIA. International Energy Statistics: Petroleum Production. U.S. Energy Information Administration. http://www.eia.gov (accessed August 5th, 2014. (10) EcoInvent; Swiss center for life cycle inventories; http://www. ecoinvent.ch/ (accessed April 28, 2014). (11) GaBi; http://database-documentation.gabi-software.com/ (accessed April 28, 2014). (12) Rosenfeld, J.; Pont, J.; Law, K.; Hirshfeld, D.; Kolb, J. Comparison of North American and imported crude oil life cycle GHG emissions; TIAX LLC., prepared for Alberta Energy Research Institute: Alberta, Canada, 2009; http://www.assembly.ab.ca/lao/library/ egovdocs/2009/aleri/173913.pdf (accessed April 28, 2014). (13) El-Houjeiri, H.; Brandt, A. R. Exploring the variation of GHG emissions from conventional oil production using an engineering-based LCA model; LCA XII: Tacoma WA, 2012, Sep 25−27, American Center for Life Cycle Assessment. (14) Vafi, K.; Brandt, A. R. Comparing LCA models of greenhouse gas emissions from production of crude oil: modeling differences and results variation. Environ. Sci. Technol. 2014, submitted for publication. (15) Venkatesh, A.; Jaramillo, P.; Griffin, W. M.; Matthews, H. S. Uncertainty analysis of life cycle greenhouse gas emissions from petroleum-based fuels and impacts on low carbon fuel policies. Environ. Sci. Technol. 2011, 45 (1), 125−131. (16) Searchinger, T.; Heimlich, R.; et al. Use of U.S. croplands for biofuels increases greenhouse gases through emissions from land-use change. Science 2008, 319 (5867), 1238−1240. H

dx.doi.org/10.1021/es502107s | Environ. Sci. Technol. XXXX, XXX, XXX−XXX