Article pubs.acs.org/est
Uncertainty in Regional-Average Petroleum GHG Intensities: Countering Information Gaps with Targeted Data Gathering Adam R. Brandt,* Yuchi Sun, and Kourosh Vafi Department of Energy Resources Engineering, Stanford University, Stanford, California 94305, United States S Supporting Information *
ABSTRACT: Recent efforts to model crude oil production GHG emissions are challenged by a lack of data. Missing data can affect the accuracy of oil field carbon intensity (CI) estimates as well as the production-weighted CI of groups (“baskets”) of crude oils. Here we use the OPGEE model to study the effect of incomplete information on the CI of crude baskets. We create two different 20 oil field baskets, one of which has typical emissions and one of which has elevated emissions. Dispersion of CI estimates is greatly reduced in baskets compared to single crudes (coefficient of variation = 0.2 for a typical basket when 50% of data is learned at random), and field-level inaccuracy (bias) is removed through compensating errors (bias of ∼5% in above case). If a basket has underlying characteristics significantly different than OPGEE defaults, systematic bias is introduced through use of defaults in place of missing data. Optimal data gathering strategies were found to focus on the largest 50% of fields, and on certain important parameters for each field. Users can avoid bias (reduced to 50% of information is randomly learned. Smart learning strategies described below greatly reduce the learning required to reduce bias. Figure 4 shows the results of learning strategies based on field size (defined in Table 2). Large circles represent learning Figure 2. Reduction of imprecision increases as basket sizes increase. Coefficient of variation (SD/mean) across 25 attempts of estimating the CI of randomly created baskets of various sizes. As the number of crudes in a basket increases, the COV decreases regardless of how much information is learned for each field. In each case, information is learned randomly at the rate noted (e.g., COV 25% indicates 25% of information for each field in a basket is learned at random).
learning strategies (randomly learning 25%, 50%, and 75% of data). In these cases, a total of 25 attempts are made at each learning rate, and randomly defined baskets of different sizes are created postfacto and their dispersion assessed. Baskets of 5 crudes have greater COV between RNG attempts than baskets of 25 or 30 crudes, with diminishing benefits seen at all learning rates. Given that regional crude baselines for regions like California or the EU include hundreds of oil fields, this implies low basket COV resulting from randomness in learning. Note that in Figure 2 the COV is smaller for the random baskets with 25% learning than 50% learning. This is because cases where little data are learned (e.g., only 25% of data are learned at random) have many inputs filled with OPGEE defaults and, therefore, can exhibit spuriously low measures of imprecision. However, these cases with low fractions of parameters learned at random can suffer from bias (see below). This beneficial averaging across fields cannot eliminate bias in a basket with significantly “atypical” underlying field properties (where atypical is defined relative to OPGEE defaults). Recall that the elevated-emissions basket was defined as a basket of 20 crudes production-weighted to be 1.5 times that of the OPGEE baseline. In this case, missing information is
Figure 4. Effect of field-size-based learning strategies on reducing inaccuracy (bias) in the elevated-emissions basket. See Table 2 for definitions of field-size-based learning cases.
strategies where complete information is learned about the top 10%, 20%, 30%, etc. of fields. These symbols show good agreement (inaccuracy