Characterization and Matching of Oil Samples Using Fluorescence

Feb 18, 2005 - Dissolved organic matter dynamics in surface waters affected by oil spill pollution: Results from the Serious Game exercise. M Gonnelli...
0 downloads 12 Views 389KB Size
Anal. Chem. 2005, 77, 2210-2217

Characterization and Matching of Oil Samples Using Fluorescence Spectroscopy and Parallel Factor Analysis Jan H. Christensen,*,†,‡,§ Asger B. Hansen,† John Mortensen,‡ and Ole Andersen‡

Department of Environmental Chemistry and Microbiology, National Environmental Research Institute, Frederiksborgvej 399, 4000 Roskilde, Denmark, Department of Life Sciences and Chemistry, Roskilde University, Universitesvej 1, 4000 Roskilde, Denmark, and Department of Natural Sciences, Royal Veterinary and Agricultural University, Thorvaldsensvej 40, 1870 Frederiksberg C, Denmark

A novel approach for matching oil samples by fluorescence spectroscopy combined with three-way decomposition of spectra is presented. It offers an objective fingerprinting based on the relative composition of polycyclic aromatic compounds (PACs) in oils. The method is complementary to GC-FID for initial screening of oil samples but can also be used for prescreening in the field, onboard ships, using a portable fluorescence spectrometer. Parallel factor analysis (PARAFAC) was applied to fluorescence excitation-emission matrixes (EEMs) of heavy fuel oils (HFOs), light fuel oils, lubricating oils, crude oils, unknown oils, and a sample collected in the spill area two weeks after the Baltic Carrier oil spill (Denmark, 2001). A total of 112 EEMs were decomposed into a five-factor PARAFAC model using excitation wavelengths from 245 to 400 nm and emission wavelengths from 280 to 550 nm. The PARAFAC factors were compared to EEMs of PAC standards with two to five rings, and the comparisons indicate that each of the factors can be related to a mixture of PACs with similar fluorescence characteristics: a mixture of naphthalenes and dibenzothiophenes, fluorenes, phenanthrenes, chrysenes, and five-ring PACs, respectively. Oils were grouped in score plots according to oil type. Except for HFOs and crude oils, the method easily discriminated between the four oil types. Minor overlaps of HFOs and crude oils were observed along all five PARAFAC factors, and the variability of crude oils was large along factor 2 due to a varying content of five-ring PACs. The spill sample was correctly assigned as a HFO with similar PAC pattern as oil from the cargo tank of the Baltic Carrier by comparing the correlation coefficient of scores for the oil spill sample and possible source oils (i.e., oils in the database). Accidental and deliberate oil spills occur frequently in the natural environment, and it can affect the ecosystem as well as human health due to the high content of toxic and mutagenic * Corresponding author. Telephone: +45-35282366. Fax: +45-35282398. E-mail: [email protected]. † National Environmental Research Institute. ‡ Roskilde University. § Royal Veterinary and Agricultural University.

2210 Analytical Chemistry, Vol. 77, No. 7, April 1, 2005

compounds in oil. Polycyclic aromatic compounds (PACs), which are present in high concentrations in oil, form a large group of relatively persistent compounds, several being carcinogenic, mutagenic, or both. Accordingly, the U.S. Environmental Protection Agency (http://www.epa.gov) has classified 16 individual PACs as priority pollutants. In the Danish maritime territory, the frequency of minor oil spills (e.g., due to tank washings) was ∼400 year-1 during a 15year period from 1987 to 2001, and worldwide the number of spills is enormous. Thus, there is a constant need for improving existing methods for oil characterization and identification in order to determine the source of spills. The standard method for chemical characterization of oil consists of initial screening using gas chromatography-flame ionization detection (GC-FID) followed by more comprehensive analyses by gas chromatography/mass spectrometry (GC/MS).1,2 Numerous methods for chemical fingerprinting based on GC/ MS data exist, most of which basically compare the relative abundancies of selected petroleum hydrocarbons in spill samples and suspected sources.1,3-7 However, initial screening of oil samples to reject obvious nonmatches from further analysis is also an important part of a multiple-criteria approach for oil spill identification, and it is often based on visual comparison of GCFID chromatograms.1 This method is subjective, time-consuming, and often limited mainly to the comparison of the n-alkane distribution, pristane/phytane ratio, and size and position of the unresolved complex mixture. Fluorescence spectroscopy is a screening method complementary to GC-FID, since it focuses on a different part of the oil, namely, the PACs.8-10 The fluorescence of individual PACs is (1) Daling, P. S.; Faksness, L. G.; Hansen, A. B.; Stout, S. A. Environ. Forensics 2002, 3, 263-78. (2) Wang, Z. D.; Fingas, M.; Page, D. S. J. Chromatogr., A 1999, 843, 369411. (3) Christensen, J. H.; Tomasi, G.; Hansen, A. B. Environ. Sci. Technol. 2005, 39, 255-60. (4) Christensen, J. H.; Hansen, A. B.; Tomasi, G.; Mortensen, J.; Andersen, O. Environ. Sci. Technol. 2004, 38, 2912-18. (5) Short, J. W. Environ. Forensics 2002, 3, 349-55. (6) Stout, S. A.; Uhler, A. D.; McCarthy, K. J. Environ. Forensics 2001, 2, 8798. (7) Wang, Z. D.; Fingas, M.; Sigouin, L. Environ. Forensics 2002, 3, 251-62. (8) Li, J. F.; Fuller, S.; Cattle, J.; Way, C. P.; Hibbert, D. B. Anal. Chim. Acta 2004, 514, 51-6. 10.1021/ac048213k CCC: $30.25

© 2005 American Chemical Society Published on Web 02/18/2005

highly specific and efficient, the latter due to the presence of delocalized electrons within the aromatic rings, and because their rigid structure does not allow for efficient vibrational relaxation. Several authors have described the use of full-scan excitationemission fluorescence spectra (also known as EEM: excitationemission matrix) and synchronous fluorescence scan for identification and quantification of PACs in simple mixtures.11,12 However, fluorescence is a complex process in multicomponent mixtures such as oil, which contains hundreds of individual PACs, and quenching and energy-transfer processes can extensively affect the analysis. Quenching occurs when the ground-state fluorophore and a ground-state quencher molecule form a stable complex or when an excited-state fluorophore collides with and transfers energy to a ground-state quencher molecule. Furthermore, effects such as self-quenching of individual PACs and energy transfer between PACs combine to produce large red-shifts in the resulting fluorescence emission and thereby affect the EEM.13 The effects of these processes on fluorescence of multicomponent mixtures mainly depend on concentration and can thus be reduced by sample dilution.13 Fluorescence spectroscopy has been used for forensic analysis of oil and petroleum products since at least the 1980s.10 However, due to the compositional complexity of oil and petroleum products, excitation, emission, and excitation-emission scans are broad and without fine structure, and analyses of fluorescence spectra have until recently been limited to visual comparison or simple subtraction methods.10,14 Recent work has, however, demonstrated that the use of multivariate statistical methods (e.g., principal component analysis, PCA) enables a more objective matching of emission spectra of oil spill samples with spectra from suspected sources.8 Fluorescence EEMs collected for several oil samples give rise to three-way data, which can be arranged in a cube. Analysis of three-way data by PCA requires initial unfolding of the data cube into a two-way matrix with each excitation-emission pair defined as a variable. An alternative method for decomposing EEMs is to retain the three-way structure and apply a multiway decomposition technique such as parallel factor analysis (PARAFAC). The PARAFAC model has its origin in psychometrics,15,16 and it is being increasingly used for analysis of fluorescence EEMs. PARAFAC is suitable for decomposing three-way fluorescence data since the model corresponds to the underlying physical model in fluorescence,17 namely, Beer’s law, which applies for sufficiently diluted solutions.18 PARAFAC has recently been applied to resolve mixtures of PACs into pure factors,11,19 but resolution and quantification of individual PACs were limited to mixtures of (9) Pharr, D. Y.; Mckenzie, J. K.; Hickman, A. B. Ground Water 1992, 30, 4849. (10) Siegel, J. A.; Fisher, J.; Gilna, C.; Spadafora, A.; Krupp, D. J. Forensic Sci. 1985, 30, 741-59. (11) JiJi, R. D.; Cooper, G. A.; Booksh, K. S. Anal. Chim. Acta 1999, 397, 6172. (12) Patra, D.; Mishra, A. K. TrAC-Trend. Anal. Chem. 2002, 21, 787-98. (13) Smith, G. C.; Sinski, J. F. Appl. Spectrosc. 1999, 53, 1459-69. (14) Maher, W. A. B. Environ. Contam. Toxicol. 1983, 30, 413-9. (15) Carroll, J. D.; Chang, J. J. Psychometrika 1970, 35, 283. (16) Harshman, R. A. Working Pap. Phonetics 1970, 16. (17) Tomasi, G. Ph.D. Dissertation. The Department of Food Science, The Royal Veterinary and Agricultural University, Denmark, 2005. (18) Skoog, D. A.; West, D. M.; Holler, F. J. Fundamentals of Analytical Chemistry, 7th ed.; Saunders College Publishing: Fort Worth, TX, 1996. (19) Beltran, J. L.; Guiteras, J.; Ferrer, R. J. Chromatogr., A 1998, 802, 263-75.

relatively few individual compounds (i.e., 3-10). It is unlikely that multiway methods are able to resolve all individual compounds present in complex environmental samples (e.g., soil, sediment, biota) because the fluorescence characteristics of isomers with similar chemical structure are almost identical. Furthermore, background fluorescence from, for example, proteins and fulvic and humic acids present in environmental samples in varying amounts complicate the analysis even further. The PAC contents of crude oil and petroleum products are dominated by C0-C4 homologue series of naphthalene, phenanthrene, dibenzothiophene, fluorene, and chrysene. Each of these five groups comprises a large number of individual compounds, which only differ in degree and position of alkylation. Thus, isomers within each group of PACs have quite similar fluorescence EEMs. This paper presents a rapid and objective screening method for characterization and matching of crude oils and petroleum products based on their PAC composition. The method comprises fluorescence EEM measurements, data preprocessing, PARAFAC modeling, and spill/source matching. Preprocessing of fluorescence EEMs prior to PARAFAC analysis is important, and several techniques are presented that reduce the variability unrelated to the chemical composition. Subsequently, EEMs of oil samples are decomposed into factors with chemical relevance (i.e., average spectra of different groups of PACs). These PARAFAC factors characterize oil samples based on their PAC composition and can thus be used to match spill samples to suspected sources. EXPERIMENTAL SECTION Methods. Oil samples used in this study were part of the oil database at the forensic oil spill laboratory, National Environmental Research Institute, Denmark. Oils in the database are stored at -20 °C in airtight vials at a total oil concentration of 2000 mg/L in dichloromethane (Rathburn, HPLC grade). Oil samples selected for this study included 30 crude oils, 19 heavy fuel oils (HFO), 21 light fuel oils (LFO), 6 lubricating oils (Lub), and 13 unknown oils (Unk). Oil classification was based on GC-FID fingerprinting data and a priori knowledge. In addition, an oil sample from the cargo tank of the Baltic Carrier and an environmental spill sample collected two weeks after the Baltic Carrier spill accident (March 29, 2001, Denmark) were selected and analyzed in triplicate. A blank sample of dichloromethane was analyzed each day, and a reference oil consisting of a mixture of a North Sea crude oil, LFO, HFO, and Lub was analyzed often (17 replicates in total). The mixture was prepared in such a way that the four oils contributed approximately equally to the combined fluorescence signal. Furthermore, 10 individual PACs, naphthalene, fluorene, phenanthrene, 1-methylphenanthrene, 2-methylphenanthrene, 3,6dimethylphenanthrene, dibenzothiophene, pyrene, chrysene, and perylene, were analyzed in appropriate dilutions (5-50 ng/mL). Oil samples were further diluted to a total oil concentration of 0.05-20 µg/mL to ensure that the maximum fluorescence signal was between 60 and 90% of the maximum allowed with a constant high detector voltage. The optimal fluorescence detector voltage was 850 V, and at this voltage, the sample absorbance in the wavelength range 240-600 nm was below 0.05 for all oil types. This was the criterion used to avoid pronounced light absorption and, thus, to reduce inner filter effects and effects of quenching and energy-transfer processes on the measured EEMs. The UVAnalytical Chemistry, Vol. 77, No. 7, April 1, 2005

2211

Figure 1. Preprocessing of fluorescence EEM. EEM of a typical HFO without preprocessing, (b) EEM after blank subtraction, (c) missing values and a triangle of zeros have been inserted, and (d) excitation/emission corrected EEMs.

visible spectrum was measured on a Shimadzu UV-2401 spectrometer and did not for any sample exceed the limit of 0.05 within the wavelength range. EEMs were measured on a Varian Eclipse fluorescence spectrophotometer in scan mode. A collection of emission scans from 250 to 600 nm with 2-nm increment was obtained at varying excitation wavelengths ranging from 240 to 475 nm with 5-nm increment. The bandwidths were 5 nm for both excitation and emission, and the scan rate 4800 nm/min, the latter leading to a scan time of less than 10 min per sample. Each scan was composed of 176 emission and 48 excitation wavelengths. Data. A total of 112 oil samples were analyzed by fluorescence spectroscopy besides blanks and PAC standards. EEM regions with low or noisy fluorescence were removed, resulting in a reduced three-way array of the size 112 × 136 × 32, with excitation ranging from 245 to 400 nm and emission from 280 to 550 nm. Data Preprocessing. Fluorescence at emission wavelengths below excitation is physically nonexistent and leads to zero fluorescence intensity. The modeling aspect of this has been investigated by Andersen and Bro,20 who argued that indiscriminately inserting zeros in this triangle of the EEM may interfere with the trilinearity of data (e.g., the basic assumption of multi(20) Andersen, C. M.; Bro, R. J. Chemom. 2003, 17, 200-15.

2212

Analytical Chemistry, Vol. 77, No. 7, April 1, 2005

linearity is that each variable describes the same phenomenon) and that missing values should be inserted instead.20 Raleigh and Raman scatter show up in three-way fluorescence data as diagonal lines across EEMs.20 The Raleigh scatter is elastic (i.e., there is no energy loss); hence, the scattered emission wavelength is equal to that of excitation (Figure 1a). Conversely, Raman scatter is inelastic and emission is shifted to longer wavelengths compared to excitation due to energy loss. Scatter is unrelated to the chemical sample composition and cannot be modeled adequately into few PARAFAC factors.20 Several methods have been used to reduce detrimental effects due to scatter.20-22 In this study, the mean EEM of four blanks was subtracted from each sample EEM. This procedure removed most of the Raman scatter, but it was inefficient in removing the effects due to Raleigh scatter. The arrow in Figure 1b marks residual effects of Raleigh scatter. Subsequently, missing values were inserted in the lower right triangle of EEMs. In addition to excitation-emission pairs with emission wavelength lower than excitation, diagonal lines with emission wavelengths from 0 to 10 nm higher than excitation were included, which removed data still affected by Raleigh scatter. A (21) Bro, R. Chemom. Intell. Lab. 1999, 46, 133-47. (22) Stedmon, C. A.; Markager, S.; Bro, R. Mar. Chem. 2003, 82, 239-54.

Figure 2. Five-factor PARAFAC model. (a) EEM of a typical HFO, (b) PARAFAC modeling of data, and (c) residuals.

small triangle of zeros, which did not interfere with the trilinearity, was added to the EEMs in the region of missing data (Figure 1c). This was done to speed up the modeling process and to avoid the model from generating large spurious peaks in the region with missing data.23 Furthermore, EEMs were corrected for instrument biases by applying an excitation/emission correction spectrum derived from a combination of a Rhodamine spectrum and the spectrum from a ground quartz diffuser.22 The excitation/emission correction removed small artifacts in EEMs due to variations in detector efficiency as a function of wavelength and changed the form of the EEM (Figure 1d). The arrows mark those sections of the EEM most affected by the excitation/emission correction. Characterization and matching of oil samples depend on the relative composition of PACs in oils rather than their concentrations. Hence, the estimated PARAFAC factors were normalized prior to classification and matching of oil samples. This gave equivalent results to normalization of EEMs prior to PARAFAC modeling using all data or normalization prior to PARAFAC using a limited set of excitation-emission pairs with low relative standard deviation in replicate reference oil samples. Parallel Factor Analysis. PARAFAC is a generalization of bilinear PCA to higher order arrays (i.e., three or higher). PCA decomposes two-way data into a product of a score matrix T and a loading matrix P describing the systematic variations in data, plus a matrix of residuals, E.24 Likewise, PARAFAC decomposes N-way arrays into N loading matrices. Thus, if fluorescence EEMs are arranged in a three-way array X of dimensions I × J × K, where I is number of samples, J number of emission wavelengths, and K number of excitation wavelengths, PARAFAC decomposes them into three matrices A (the score matrix), B, and C (loading matrixes) with elements aif, bjf, and ckf. These matrices are sometimes called first, second, and third mode loadings, respectively:25 F

xijk )

∑a b c

if jf kf

+ eijk

f)1

(i ) 1... I;

j ) 1... J;

k ) 1... K)

(1)

where xijk is the intensity of fluorescence for the ith sample, at (23) Thygesen, L. G.; Rinnan, A.; Barsberg, S.; Moller, J. K. S. Chemom. Intell. Lab. 2004, 71, 97-106. (24) Wold, S.; Esbensen, K.; Geladi, P. Chemom. Intell. Lab. 1987, 2, 37-52. (25) Bro, R. Chemom. Intell. Lab. 1997, 38, 149-71.

emission wavelength j and excitation wavelength k. The number of columns in the loading matrices (F) is the number of PARAFAC factors and eijk the residuals, which represents variability not accounted for by the model. Figure 2 shows (a) the fluorescence EEM for a HFO, (b) the five-factor PARAFAC model of the sample EEM, and (c) the residuals. Bilinear models have an intrinsic rotational freedom, and uniqueness is attained via ad hoc mathematical constraints, such as orthogonality in PCA.22,25 Owing to the orthogonality constraint, it is unlikely that PCA components describe underlying chemical spectra. Conversely, the PARAFAC factors can be uniquely determined up to trivial permutation and scaling.25,26 Under mild conditions, uniqueness can be obtained if no two proportional columns are present in any of the matrices A, B, or C (e.g., if two compounds have the same concentration profile over the samples, they cannot be uniquely identified) and if the global minimum of the loss function is attained.17 Under such circumstances, mixtures of fluorescence signals from PACs can mathematically be separated into its constituents. This is done in such a way that the score vector af is directly proportional to the concentration (with the quantum yield as proportionality factor) and bf and cf are estimates of the emission and excitation spectra of the underlying chemical constituents, respectively.22 However, experimental EEMs often deviate from trilinearity and are affected by scatter effects and measurement variability, which leads to inaccurate model estimates. To ensure that the model estimation is led in the right direction and to speed up the modeling process, certain constraints can be applied.20 Specifically, in the case of modeling fluorescence EEMs, “nonnegativity” constraints are often enforced to the estimates of A, B, and C. These constraints improve the robustness of the modeling and are at the same time in accordance with chemical a priori knowledge (i.e., negative concentrations and fluorescence intensities are impossible). A detailed introduction to PARAFAC and examples of its applications to analysis of fluorescence EEMs have recently been published.20-22,25 The PARAFAC model was fitted in MATLAB ver. 6.5.1 using the N-way toolbox.27 The convergence criteria and maximum number of iterations used throughout the modeling were 10-610-8 and 5000-10 000, respectively. Determination of the appropriate number of factors was mainly based on split-half analysis, analysis of residuals,20,25 and comparison of PARAFAC factors with EEMs of individual PACs. (26) Riu, J.; Bro, R. Chemom. Intell. Lab. 2003, 65, 35-49. (27) Andersson, C. A.; Bro, R. Chemom. Intell. Lab. 2000, 52, 1-4.

Analytical Chemistry, Vol. 77, No. 7, April 1, 2005

2213

Figure 3. Excitation and emission loadings from split-half analysis. Loadings obtained from the complete data set (solid line) and the two random halves (dashed lines) are compared.

Measurement of the Similarity of Samples. Since the relative abundancies of individual factors and not absolute concentrations are relevant for chemical fingerprinting, af was normalized according to eq 2, prior to spill/source matching, F

a /f ) af /

∑a

(2)

f

f)1

where a /f is the normalized score vector. Subsequently, the similarity of oil samples was calculated using the correlation coefficient based on the similarity of normalized scores, a /f . F

∑(a r)

/ 1f

/ - aj /1)(a2f - aj /2)

f)1

x∑ F

f)1

(3) F

∑(a

/ (a1f - aj /1)2

/ 2f

- aj /2)2

f)1

where r is the correlation coefficient, a1/ f and a2/ f the normalized scores for the first and second samples of the f th factor and aj /1 and aj /2 the mean normalized scores for the first and second samples. RESULTS AND DISCUSSION A large number of PARAFAC models were estimated using an increasing number of factors (from 1 to 10), with split-half analysis, dividing data into two random halves (56 × 136 × 32 in each set). The models were estimated first without constraints and then enforcing nonnegativity on all three modes. The use of 2214

Analytical Chemistry, Vol. 77, No. 7, April 1, 2005

nonnegativity constraints prevented small negative regions to show up in excitation and emission loadings and significantly increased the speed of the modeling process. Bro suggested the use of residual and leverage analyses for initial outlier detection in PARAFAC modeling.25 Plots of the fit quality (sample sum-of-squared errors) and the sample leverage revealed no outliers. Determining the Optimal Number of Factors. Repeated split-half analyses with different random splits and random initialization indicate that at least five factors can be reliably extracted from the data set. Convergence is fast (within 200 iterations) to a consistent five-factor solution, and a clear bend around five factors was found in the sum-of-squares of errors. Figure 3 shows the close to perfect correspondence between the excitation and emission loadings for the five-factor PARAFAC model using the complete data set and the two independent random halves, respectively. When using random initialization the PARAFAC modeling, occasionally, reached a different solution (local minimum); however, these solutions always had a higher sum-of-squares of errors. It is difficult to establish from split-half analysis and analysis of the residuals whether more than five factors can be reliably extracted. Extraction of too few factors leads to inadequate modeling of data and to the presence of systematic variations in the residuals because two independent groups of chemical constituents have been modeled insufficiently (e.g., by one common factor). On the other hand, when extracting too many factors, the true factors may be modeled by more highly correlated factors, and the model becomes more difficult to fit. It may come as a surprise that five seems to be the correct number of factors, since oil is composed of hundreds of PAC

Figure 4. Comparison of PARAFAC excitation and emission loading spectra of selected PACs. Loadings and spectra of individual PACs have been normalized to ease visual comparison. Furthermore, excitation and emission loadings are shown together in the plots.

isomers. However, the fluorescence EEMs of individual PACs overlap extensively and the data set consists of relatively few samples (i.e., 56 samples in each split-half set). PARAFAC is likely to extract a larger number of reliable factors when more chemical variation is included in the data set.26 Correspondingly, we expect that PARAFAC is able to extract more reliable factors from the complete data set than is suggested by split-half analysis. Chemical Interpretation of PARAFAC Factors. Chemical interpretation of factors in 5-10-factor PARAFAC models is based on EEMs of selected PACs with 2 to 5 rings and from fluorescence characteristics (excitation and emission maximums) for a broad range of PACs. The content of petrogenic PACs in oils is dominated by naphthalene, phenanthrene, dibenzothiophene, fluorene, and chrysene homologue series and to a lesser extent of PACs with five or more rings.2 Comparisons of the fluorescence characteristics of the five PARAFAC factors and EEMs of individual PACs indicate that each of the factors can be related to these main constituents. In Figure 4, the excitation and emission loadings from the fivefactor PARAFAC model are compared with those of individual PACs with most similar spectra (based on the correlation coefficient). Fluorene, chrysene, and phenanthrene have excitation and emission spectra similar to factors 1, 3, and 4, respectively. Fluorescence EEMs of 1-methylphenanthrene, 2-methylphenanthrene, and 3,6-dimethylphenanthrene show that emission spectra for alkylated phenanthrenes are slightly shifted (to higher wavelengths) depending on the position and degree of alkylation. Hence, complex mixtures of C0-C4-phenanthrenes give broad PARAFAC factors without fine structure. Likewise, EEMs of PAC standards and a priori knowledge of excitation and emission characteristics of a broad range of PACs13,19 indicate that factors

2 and 5 can be associated to five-ring PACs and a mixture of naphthalenes and dibenzothiophenes, respectively. The association of factor 2 to a complex mixture of five-ring PACs is mainly based on a priori knowledge of excitation and emission maxima of PACs with five to six rings. Thus, the wavelengths of the excitation and emission maximums, generally, increase with the number of aromatic rings (e.g., perylene emits at higher wavelengths than phenanthrene). Small systematic variations are present in the residuals of the five-factor model. These systematic variations become less apparent for PARAFAC models with six and seven factors. The two additional factors in a seven-factor model is used to split the “phenanthrene” and the “naphthalene/dibenzothiophene” factors into separate factors, the latter associated with naphthalenes and dibenzothiophenes, respectively (not shown). These observations indicate that PARAFAC might be able reliably to extract one or two additional factors from the complete data set. However, since the “resolution power” (i.e., ability to distinguish oil samples from different sources) of models with five to seven factors is comparable, the lower rank five-factor model is used in the subsequent characterization and matching of oil samples. Characterization of Oil Samples. Oil samples can be characterized from the normalized sample vectors a/f , which map relationships between samples based on their chemical composition. These plots are equivalent to score plots in PCA, but factors are not arranged according to the explained variance as in PCA. Panels a and b in Figure 5 show the plot of the normalized first and second sample vectors a/1 versus a/2 and the third and fifth sample vector a/3 versus a/5, designated factor 1 versus factor 2 and factor 3 versus factor 5, respectively, in the following Analytical Chemistry, Vol. 77, No. 7, April 1, 2005

2215

Figure 5. PARAFAC score plots. (a) factor 1 vs factor 2 and (b) factor 3 vs factor 5. Symbols for LFOs, HFOs, Lubs, Crude oils, unknown oil samples, replicate reference oils, and the triplicate spill and ship samples is explained in the legend. The 17 replicate references are encircled, and arrows mark the position of spill and Baltic Carrier ship samples.

Figure 6. Distribution and variability of factors for LFOs, HFOs, Lubs, and Crudes. The identities of PACs associated to factors are listed in the plot: naphthalenes/dibenzothiophenes (N/DBT), fluorenes (F), phenanthrenes (P), chrysenes (C), and five-ring PACs (5-ring).

sections. The four oil types can, except for HFOs and crude oils, easily be distinguished in the two score plots. However, there is some overlap of HFOs and crude oils, and the scores of crude oils, especially along factor 2 (i.e., five-ring PACs), vary extensively. The latter is not surprising, as the heaviness of crude oils, which influence the relative composition of five-ring PACs, varies depending on source rock and thermal maturity. Conversely, LFOs, HFOs, and Lubs are refined petroleum products containing PAC fractions from specific and limited boiling point ranges. The content of five-ring PACs is for example below the detection limit (factor 2 is zero) for most LFOs and Lubs. It is evident from the low variability of scores for replicate reference oil samples compared to the total variability of samples in the data set that the resolution power of the five-factor PARAFAC model is high. Based on these considerations, factor 4 (associated with the relative content of phenanthrenes) is the least discriminative. 2216 Analytical Chemistry, Vol. 77, No. 7, April 1, 2005

Figure 6 shows the average normalized scores as well as the standard deviation of scores within the four classes of oils (HFOs, LFOs, Lubs, crude oils). The crude oils have the most even distribution of factors, whereas LFOs and Lubs have a higher relative content of low-boiling compounds (naphthalenes/dibenzothiophenes, fluorenes) compared to crude oils and HFOs. Conversely, HFOs have a higher relative content of high-boiling compounds (chrysenes, five-ring PACs) compared to LFOs and Lubs. Matching of Spill Samples to Oils in the Database. A sample collected in the spill area two weeks after the Baltic Carrier spill accident was compared to samples in the database using the correlation coefficient (r). The triplicate samples from the cargo tank of the Baltic Carrier and three additional HFOs gave the highest match to the spill samples (r ) 0.998-0.999). The correlation coefficients for the remaining samples in the database were between 0.996 and 0.05. The high similarity of samples from the cargo tank of the Baltic Carrier and the spill samples can also be seen graphically in the score plots (Figure 5), since spill samples are clustered with samples from the Baltic Carrier. Consequently, the spill sample can be assigned as a HFO with a PAC pattern similar to oil from the Baltic Carrier, which is also the correct assignment, based on a priori knowledge. Chemical fingerprinting by GC-FID and GC/MS revealed that the spill sample was only slightly affected by weathering processes. Generally, weathering processes affect the composition of the PAC fraction thereby affecting the matching of heavily weathered samples. Short-term weathering processes such as evaporation, which changes the composition immediately after a spill accident, are currently neither taken into account in our approach nor is it in the standard GC-FID screening. However, the chemical composition is affected systematically by weathering processes,28 which is expected to affect the distribution of PARAFAC factors in oil samples, accordingly. (28) Wang, Z. D.; Fingas, M.; Blenkinsopp, S.; Sergy, G.; Landriault, M.; Sigouin, L.; Foght, J.; Semple, K.; Westlake, D. W. S. J. Chromatogr., A 1998, 809, 89-107.

CONCLUSION A novel fingerprinting approach for classification and matching of oil samples has been developed and applied in this study. It is based on three-way decomposition of excitation-emission fluorescence spectra by PARAFAC analysis and subsequent classification and matching of oil samples. It is complementary to GC-FID for screening of oil samples as part of multiple-criteria approaches since it focuses on a different part of the oil (i.e., the PACs). The PARAFAC excitation and emission loadings were compared to the corresponding spectra of selected PACs, and the results indicated that the factors describe true chemical constituents (i.e., groups of compounds). At least five reliable factors could be extracted from the collection of EEMs, but further investigations showed that two additional factors might be reliably extracted from the complete data set. Likewise, we expect that increasing the data set further may lead to extraction of an even larger number of reliable factors. This may improve the resolution power of the model, as well as more detailed information about the distribution of PACs is gained. Compared to visual comparison of GC-FID fingerprints, which is the standard method for initial screening of oil samples, the new approach is more objective. The measured EEMs are imported to MATLAB where they are preprocessed and modeled by PARAFAC. Subsequently, spill samples can be matched to oils in the database based on their normalized scores. The complete procedure is computerized, and human intervention can almost

exclusively be limited to determination of the optimal number of factors. Portable fluorescence spectrometers are commercially available, and combined with a lap-top personal computer, the procedure may be implemented for on-site (i.e., onboard ships during state-port control) initial prescreening of spill samples and suspected sources. Thus, fixing the number of factors enables a complete analysis of a spill sample and a suspected source oil, including dilution of oil samples, excitation-emission scan, and PARAFAC analysis, in less than 1 h. In fact, the fluorescence spectrometer used in this study is portable and has previously been used for field measurements of dissolved organic carbon in the Danish maritime territory. ACKNOWLEDGMENT The authors acknowledge Giogio Tomasi and Colin Stedmon for many fruitful discussions on PARAFAC and fluorescence, and the technical assistance of Linus Malmquist and Jørgen Avnskjold. This work was supported by Roskilde University, The National Environmental Research Institute, and the Natural Sciences Research Foundation, all Denmark.

Received for review December 2, 2004. Accepted January 17, 2005. AC048213K

Analytical Chemistry, Vol. 77, No. 7, April 1, 2005

2217