Environ. Sci. Technol. 1994, 28, 1370-1374
Automobile Emissions Are Statistically y-Distributed Yi Zhang, Gary A. Bishop, and Donald H. Stedman'
Department of Chemistry, University of Denver, Denver, Colorado 802 10 In the present study, we investigated the statistical distributions of on-road CO and HC emissions by comparing the skewness and kurtosis of the distributions. The results of the analysis based on remote sensing data sets show that on-road automobile CO and HC emission distributions are well represented by a y-distribution. Laboratory dynamometer emissions measurements are similarly distributed, thus the classical statistics based on the assumptions of normal distributions cannot be applied.
Introduction Automobile emissions such as CO, HC, NO,, and other toxic air pollutants play important roles in urban pollution inventories (1). In many locations, the photochemical transformation of oxides of nitrogen and hydrocarbons gives rise to violations of the ozone standard (2). Carbon monoxide standards are violated as a result of direct emission of the gas under conditions of persistent meteorological stagnation. Proper characterization of automobile emissionbehavior demands the availability of the statistical distributions of the on-road emissions. Studies of these distributions are important in urban pollution control for a number of reasons: (1) to examine the value of introducing new numerical models in the description of automobile emission behavior, (2) to examine the value of introducing new control strategies for improving urban air quality, (3) to reduce and simplify the substantial quantity of emissions data being collected,and (4) to evaluate current procedures for the analysis of measured automobile emissions. Our previous studies have evaluated various factors affecting on-road automobile emissions emphasizing low-order moments (e.g., means and variances) of statistical distributions and have introduced a y-distribution empirically for automobile emissions (3-6). Pitchford and Johnson applied a y-distribution representation of on-road CO data to demonstrate their newly developed empirical model for vehicle exhaust emissions (7). In this paper, higher order moments (skewness and kurtosis) are investigated for these distributions. The results based on several field studies of the on-road emissions data sets collected by means of remote sensing are presented.
at 4.6, 3.4, and 4.3 pm, respectively, and ratioed to the reference absorption at 3.9 pm to eliminate the effect of dust and smoke behind the vehicle. The signals are digitized and acquired by a computer system. Error checking routines in the computer eliminate invalid data caused by the noise exceeding preset bounds, such as the following: inadequate amount of exhaust, pedestrians, bicycles, vehicles with high-mounted exhaust pipes, etc. On-site calibration is performed daily with a certified gas cylinder containing known concentration ratios of CO, COZ, propane (for HC), and nitrogen. The details of the instrument system are described elsewhere (8, 9). The University of Denver's remote sensor for on-road emissions has been used to measure the emissions of more than 600 000 vehicles in many locations over the world. The measurements have been validated by showing that they accurately reflect the instantaneous exhaust emissions by means of double-blind comparisons with vehicles of known emissionssponsored by the CaliforniaAir Resources Board (CARB) and General Motors (GM) Research Laboratories (IO, 11). More recent studies by the CARB for both CO and HC also show that the remote sensing CO readings are correct within f 5 % of the values reported by an on-board gas analyzer and within *15% for HC (12). The present study was designed to take into account both the skewness and kurtosis of the remote sensing data distributions. The individual distributions were mostly based on single day measurements with an average size of 3000 records. For each individual distribution, skewness and kurtosis were calculated, and the correlation between these two parameters were examined for validating the y-distribution of on-road emissions. Mobile source emission distributions were also investigated by an analysis of skewness and kurtosis related parameters of CO distributions in each model year from 1970to 1992 based on a fleet of 154 110 vehicles. Finally, an approximation of the onroad emission function for the fleet was made by means of orthogonal polynomials for the y-function [I'(n)l.
Statistical Theory
A statistical probability density function of distribution shape f ( x ) can be characterized by moments, either raw moments mr (the moments about the origin of random variable x ) given by
Experimental Design The data sets of the present study were collected at various locations in the United States during the period of 1989-1992 by means of remote sensing developed by the University of Denver. The basic instrument measures the carbon monoxide/carbon dioxide ratio (CO/CO2)and the hydrocarbon/carbon dioxide ratio (HC/C02) in the exhaust of an on-road vehicle passing through an infrared light beam directed across a single lane of roadway. The IR absorption bands caused by CO, HC, and C02 in the exhaust plume are isolated by separate bandpass filters
* To whom all correspondence should be addressed. 1370 Environ. Sci. Technol., Vol. 28, No. 7, 1994
or central moments pr (the moments about the mean of a random variable x ) given by
Particularly important are mo, the peak area; ml, the mean of the distribution of x ; and p2 = u2, where u is the standard deviation of the distribution and p2 is the variance. Different distributions can also be compared by the determination of dimensionless moments about the 0013-936X/94/0928-1370$04.50/0
0 1994 American Chemical Society
X
X Figure 1. Effect of skewness (S)shown for distributions with K = 0 and normalized for equal area.
Figure 2. Effect of kurtosis (4 shown for distributions wtth S = 0 and normalized for equal area.
Table 1. Skewness (S), Kurtosis ( K ) , and Coefficients (C, and 9)for Some Probability Density Functions (PDFs)
mean, defined as
PDF normal
Particularly important are the moment of skewness (SI, the degree of lack of symmetry:
S 0
2/d@
K 0 61a
C,
ck
0 llda 1
0 llda 1
2 6 exponential a is one of the two important shape parameters (a and 8) of y-distributions,
and the moment of kurtosis (K),the degree of peakedness:
For a normal (Gaussian) distribution, both S and K are zero. For distributions not symmetric about their expected value, S will be non zero, being positive if the right tail is heavier than the left and negative if the left tail is heavier than the right. When the tails of the distribution have more mass than that of the normal, K is positive, and the distribution is said to be leptokurtic. When the tails of the distribution are less heavy than even that of a normal, K is negative, and the distribution is said to be platykurtic. Figures 1and 2 show schematically distributions when S and K are positive, zero, and negative. Table 1 shows S and K for a number of distributions. For the y-distribution:
Figure 3. C, vs & for the on-road CO emission distrlbutions obtained in Denver, Los Angeles, and Chicago areas during 1990-1992.
f ( x ) = [ i / ~ ~ r ( a ) i xexp(-x/P) ~-'
and the exponential distribution (when y-distribution):
f ( x ) = WPI exp(-x/@
CY
5
cs
= 1 in the
(7)
the mean and the variance of the distributions are g.ven by ,u = ab and u2 = aP2. S and K are taken from statistics textbooks (13,14). C, and c k are skewness and kurtosis coefficients which are given by C,= SI2 and c k = (K/6)V2. When the data fit a y-distribution (e.g., C, = Ck),a plot of C, against c k will fall on a straight line of unit slope through the origin. If Cs= c k = 1, the distribution is
exponential. This method was used by Liu (15) for examining raindrop size distributions.
Results The on-road emissions data collected in Denver, Los Angeles, and Chicago areas during the period of 19901992were divided into 1-daysampling intervals for a single location with an average size of 3000 records. Both C,and c k for CO and HC emission distributions were calculated. The results of C,vs c k plots are presented in Figure 3 (50 CO distributions) and Figure 4 (45 HC distributions). For all the distributions, the (C,,ck) points fall close to the Environ. Sci. Technoi., Vol. 28, No. 7. 1994 1371
IO,
n
4.5
/
4
/
8-
76-
6
154,110 Vehicles (1989-1 992)
All Model Year Subfleets
3.5 3
2.5
0'
5-
2
4-
1.5
3-
1
2-
' 1 0.5
00
1
2
3
4
5
6
7
8
9
10
0
t.5
1
2
cs Flgure 4. C, vs C, for the on-road HC emission distributions obtained in Denver, Los Angeles, and Chicago areas during 1990-1992.
straight line C, = c k , which suggests that all on-road CO and HC emission distributions are fairly well represented by a y-distribution. No (C,, ck)points fall close to (1,l), suggesting the inadequacy of the exponential distribution. For CO emission distributions, C, and ck are in a range of 1.3-2.1 with an offset of about 0.1 down from the straightline. In contrast, C, and ck for HC distributions are in a larger range of 1.2-8.5 and cluster around the 1:l straight line. The comparison of C, vs ck plots of CO and HC emission distributions suggests that the shape and scale of HC emission distributions fluctuate much more than the shape and the scale of CO emission distributions, and HC emission distributions, from the viewpoint of larger C, and c k values, are more skewed and leptokurtic than CO emission distributions. The average C, are 1.57 and 4.02, and c k are 1.42 and 3.91 for the all CO and HC samples, respectively. In order to study on-road emission distributions as a function of model year, an analysis was carried out of C, and c k from CO distributions in each model year based on 154 110 measurements collected in various locations of the United States during the period of 1989-1992. For this analysis, all the measurements in a particular model year were organized as an individual distribution. Figure 5 shows the result of C, vs c k for this analysis, illustrating the same phenomena as Figure 3 except the larger C, and c k range. Figure 6 shows the evolution of 6, and c k from the 1970model year and older to the 1992 model year. For all model years, C, and ck change with time homogeneously, which suggests that C, and ck are highly correlated. In other words, a highly skewed on-road emission distribution is also highly leptokurtic. In addition, C, and c k of CO emissions decrease with increasing vehicle age, which indicates that the newer fleets have more skewed and leptokurtic distributions than the older fleets because most newer vehicles are in the cleanest category, as expected. This phenomenon can be attributed to poor maintenance and/or emissions system tampering increasing with age. The fact that emissions system tampering increases with increasing age is clearly demonstrated by the U S . Environmental Protection Agency tampering surveys (16). The slight drop down of the C, and ck for the newest model year is unexpected. We attribute this to the fact 1372
Envlron. Scl. Technol., VoI. 28, No. 7, 1994
__ 4 45
I
0.5
2.5
3.5
3
cs Figure 5. C, vs C, for the on-road CO emission distributions of each model year based on 154 110 measurements obtained in various locations of the United States during 1989-1992. 4 5
_.I
4-
4
154,i 10 Vehicles (1989-1992) All Model Year Subfleels
3.5-
35
3-
3
cs
2.5-
2 5
3
Y
(0
2-
2
1.5-
15
1-
1
0 5O
J/
35
rx t
!
78
I
,
80
,
I
,
82
,
84
,
,
86
,
7
88
90
92
3
Model Year
Figure 6. C . and C, versus model year from pre-1970 model year to 1992 model year.
that the newest vehicles emit so little CO that they show a tendency toward normal distribution, where the normal distribution for vehicles is off the assembly line plus instrument noise. The 1992 model year data set also has the smallest population of measurements compared to all other model years (ages). The distributions can be approximated by eq 6, where CY andB can be obtained from calculated mean and variance ( p = CY/^ and u2 = and the y-function [I"(cY)] can be obtained from a special statistical table or by the means of orthogonal polynomials. Figure 7 gives an impression of a fit of a real distribution, in this case the CO distribution of the 154 110 measurements obtained in 1989-1992. The bars show the measured distribution of CO emissions by 5% CO category, while the line represents the y-distribution approximation as a function of the actual mean in each CO category (the number above every corresponding bar). This excellent fit is typical for a number of experiments we have performed. Limited investigations of laboratory dynamometer emissions with a group of oxygenated fuels test data sets run by the EPA and the Colorado Department of Health confirm the skewed nature of the CO and HC distributions.
%CO category
Flgure 7. Comparison of the data to a ydistribution approximation for the CO distributionof Um 154 110 measurements (01 = 0.29.@= 3.27). Over each
-
bar
is the
mean %CO for that categay.
EPA Oli*Ul , " I Y..": I.ss.Y.".-:L.O
0.2
131ww
a3 tL
a'
ohviousdisadvantage is that extreme values strongly affect SD, and one would either not detect true outliers, or then throw out the good measurements as well when the distribution is not normal. In the case of automobile emissions, the authors suggest the use of a more robust algorithm such as Huber's rule (17)for outlier rejection with limits set a t X , f kMAD where x, is the median of the data set, MAD is the median of the absolute deviations from the median, and the coefficient k is chosen between 3 and 5. Nevertheless, outliers should not he labeled as such solely on the basis of a fixed statistical rule, the decision should reflect to the major part scientific experience. For on-road remote sensing data, the software rejects only chemically impossible results (>19% CO or >3.9% HC). Robust sampling statistics of emission data require large population N values to properly characterize the high-emitting vehicle contribution because the automobile emissions picture is dominated by only a small number of high emitters. As noted by Pitchford and Johnson (7), this requirement can be met most efficiently by on-road remote sensing. Though conventionalemission monitoring techniques can also be used in a random sampling mode, the inconvenience to motorists would likely discourage its frequent use.
0.15
f
Conclusion
c
g
0.1
0
F
IL
0.05
1
2
3
4
5 6 7 8 9 1011 12131415 CO Category (gmimile)
Flgure 8. CO ernisslon distribution of 236 vehicles as measured by laboratoly dynamometer using U?a federal test procedure showing a ydistribution. The data set was horn Um EPA AH 7 5 - F I F Blend Tests as of 3/30/89with PFI: HiEtOH and HiMTBE.
Figure 8 shows a y-distribution fit for one of the laboratory dynamometerC0 emissionFTPdatasets. Comparedwith on-road emissions,the laboratory dynamometer emission shows less agreement with y-distribution, but they are still similarly distributed. The poorer agreement of the laboratory dynamometer distributions may be caused by an insufficient number of testing vehicles and/or unrepresentative testing vehicle recruitment.
Discussion Since both on-road and dynamometer automobile emissions are statistically y-distributed, it is ohvious that the classical statistical procedures cannot be applied. Various distribution free or nonparametric tests and robust analysis techniques may he used to evaluate and estimate mobile source emissions. The challenge would, thus, be whichstatistical estimator is mostappropriatein thisgiven situation, which will expose us to the smallest risk, which will give us the most information at the lowest cost, and so forth. One deeply rooted concern for data statistical analysis is the rejection of outliers. Over the years, one commonly used algorithm is based on the classical statistics that any point outside fzSD is to be regarded as an outlier. An
The analyses of the distributions of mobile source emissions performed in this study lead to the following conclusions. First,on-roadCOand HC emisaionsare never normally distributed. In contrast, they are well represented by a y-distribution. Since laboratory dynamometer emissionsare similarly distributed, the rejection of outliera based on normal statistics is not valid. Second, the highly positive skewed and leptokurtic nature of the distributions illustrates that most vehicles are low in CO emissions as well as HC emissions. The overall averages of the emission distributions are controlled by the distribution tails,which contain the vehicles with extremely high emission rates. Third, the fraction of the vehicles with high emission rates increases with age. This conclusion is supported hy the fact that the skewness and kurtosis of the distributions decrease homogeneously with time.
Literature Cited (1) Fujita, E. M.; Croes, B. E.; Bennett, C. L.; Lawson, D. R.; Lunnmn, F. W.;Main, H.H.J. Air Waste Manage. Assoc. 1992,42, 267-76. (2) Finlayson-Pitts, B. J.; Pitts, J. N., Jr. Atmospheric
Chemistry: Fundamentals andExperimenta1 Techniques; John Wiley and Sons: New York, 1986, Part 1. (3) Stedman,D. H.; Bishop, G. A.; Peterson, J. E.; Guenther, P. L.; McVey.1. E.; Beaton, S.P. On-Road CarbonMonoxide and Hydrocarbon Remote Sensing in the Chicago Area;
JLENR/REAQ-91/14 Final report to Illinois Department of Energy and Natural Resources, available though ENR Clearinghouse, (217)785-2800, 1991. (4) Stedman,D. H.;Bishop, G. A,; Beaton, S.P.; Peterson, J. E.; Guenther,P.L.;MaVey, I. F.;Zhang,Y. On-RwdRemote Sensingof COandHCEmisswns in Ca1ifornia;Findreport toCalifomiaAirResourcesBoard, 1991; ContractNo.A932189. (5) Peterson, J. M.; Stedman,D. H. Chemtech 1992, Jan, 4753.
(6)Zhang, Y.; Stedman,D. H.;Bishop, G. A,; Guenther, P. L.; Beaton, S. P.; Peterson, J. E. Enuiron. Sci. Technol. 1993,
27,1885-1891.
Envkm. sd.Tecimd., Vd. 28, NO. 7. 10%
Ian
(7) Pitchford, M.; Johnson, B. Environ. Sci. Technol. 1993,27, 741-748. (8) Bishop, G. A,; Starkey, J. R.; Ihlenfeldt, A.; Williams, W. J.; Stedman, D. H. Anal. Chem. 1989,61,671A-677A. (9) Guenther, P. L.; Stedman, D. H.; Bishop, G. A.; Hannigan, J.; Bean, J.; Quine, R. Remote Sensing of Automobile Exhaust; Final report to the American Petroleum Institute, 1991. (10) Lawson, D. R.; Groblicki, P. J.; Stedman, D. H.; Bishop, G. A.; Guenther, P. L. J . Air Waste Manage. Assoc. 1990,40, 1096-1105. (11) Stephens, R. D.; Cadle, S. H. J. Air Waste Manage. Assoc. 1991, 41, 39-46. (12) Ashbaugh, L. L.; Lawson, D. R.; Bishop, G. A.; Guenther, P. L.; Stedman, D. H.; Stephens, R. D.; Groblicki, P. J.; Parikh, J. S.; Johnson, B. J.;Huang, S. C. On-Road Remote Sensing of Carbon Monoxide and Hydrocarbon Emissions
1374
Environ. Sci. Technol., Vol. 28,
No. 7, 1994
During Several Vehicle Operating Conditions; Presented at the Air & Waste ManagementAssociationIEnvironmental Particulate Source Controls, Phoenix, AZ, 1992. (13) Freund, J. E. Mathematical Statistics;Prentice-Hall, Inc.: Englewood Cliffs, NJ, 1992; pp 127-158. (14) Madansky, A. Prescriptions for Working Statisticians; Springer-Verlag: New York, 1988; pp 36-40. (15) Lin, Y. Atmos. Environ. 1992,26A, 2713-2716. (16) Motor Vehicle Tampering Survey 1989; EPA Field Operations and Support Division, EPA Office of Air and Radiation: Washington, DC, May 1990. (17) Davies, P. L. Fresenius 2.Anal. Chem. 1988,331,513-519. Received for review December 22, 1993. Revised manuscript received March 10, 1994. Accepted March 15, 2994." *Abstract published in Advance ACS Abstracts, April 15,1994.