Pixel-Based Analysis of Comprehensive Two-Dimensional Gas

Jun 30, 2014 - University of Copenhagen, Faculty of Science, Department of Plant and .... Metabolic fingerprinting of Lactobacillus paracasei: a multi...
0 downloads 0 Views 3MB Size
Feature pubs.acs.org/ac

Pixel-Based Analysis of Comprehensive Two-Dimensional Gas Chromatograms (Color Plots) of Petroleum: A Tutorial We demonstrate how to process comprehensive two-dimensional gas chromatograms (GC × GC chromatograms) to remove nonsample information (artifacts), including background and retention time shifts. We also demonstrate how this, combined with further reduction of the influence of irrelevant information, allows for data analysis without integration or peak deconvolution (pixelbased analysis). Søren Furbo,*,† Asger B. Hansen,‡ Thomas Skov,§ and Jan H. Christensen† †

University of Copenhagen, Faculty of Science, Department of Plant and Environmental Sciences, Thorvaldsensvej 40, DK-1871, Frederiksberg C, Denmark ‡ Research and Development Division, Haldor Topsøe A/S, Nymoellevej 55, 2800 Kongens Lyngby, Denmark § University of Copenhagen, Department of Food Science, Rolighedsvej 30, DK-1958, Frederiksberg C, Denmark S Supporting Information *

handling strategies are used today, including visual inspection, integration, and peak deconvolution. During visual inspection of color plots, the experience and pattern recognition capabilities of the analyst are used for quality control and extraction of trends in data. Although the human mind can discover even complicated trends, this strategy is time-consuming,3 and the results depend on the analyst’s experience. The strategy is inherently subjective and qualitative and prone to missing details and unexpected trends. Furthermore, the analyst can only comprehend a limited number of samples, and thus this strategy is not feasible when comparing many samples (e.g., >100) An alternative strategy is to reduce the amount of data before interpretation. This is usually done by integration, either directly or following peak deconvolution.4 Compounds or groups of compounds can be quantified from GC × GC chromatograms by integrating individual peaks5 or groups of peaks.6 Automated peak integration requires a choice of integration parameters, where improper choices can lead to large errors in integration results or missed peaks.7 If all peaks are integrated individually, each sample provides hundreds, or even thousands, of areas, too much information to interpret directly. If only a few target compounds are of interest, integrating only these will circumvent this problem.8 Likewise, if the general composition of the sample is sought and not the details, groups of peaks can be integrated, leading to a much simpler description.6 However, these two latter techniques will lead to an information loss instead of a complete characterization of the sample complexity. Peak deconvolution and integration of GC × GC/MS data can provide a peak table for each chromatogram with the retention times, intensity (peak height or area) and mass spectrum for each peak, which can then form the basis for further analysis.9,10 The mass spectra can be used as a basis for database lookup for identification of compounds making this a

C

omprehensive two-dimensional gas chromatography (GC × GC) offers unrivaled separation power with peak capacities up to 10 times higher than one-dimensional GC.1 This allows for comprehensive analysis of complex samples, such as petroleum or environmental samples. GC × GC chromatograms can be presented as images (color plots),2 with closely related compounds (compound classes) clustering together and homologues producing easily recognizable patterns (roof tile effect). However, the sheer amount of data makes data analysis and presentation challenging. Several data © 2014 American Chemical Society

Published: June 30, 2014 7160

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

useful technique for, e.g., metabolic pathway identification.11 However, manual inspection of peak tables is usually necessary due to errors in peak deconvolution and mass spectral matching. This quality control step is very time-consuming as peak tables can contain over 1000 entries per sample.10 To use either of these data reduction methods to completely categorize complex samples, further data analysis is needed. However, such data analysis could also be done directly on the data prior to data reduction: The detector intensity at each combination of first dimension retention time (1tR) and second dimension retention time (2tR) could be analyzed in place of peak areas. This approach has been termed pixel-based analysis3 as a 1tR,2tR pair represents a pixel in a color plot. The corresponding approach for one-dimensional chromatography, using intensities at individual retention time (tR) points or tR, m/z pairs as input variables for multivariate data analysis, has been done successfully,12,13 deducing complicated trends in large sample sets (>100 samples) in a rapid and objective way. As this approach does not rely on finding peaks, there are no problems with incorrectly set integration parameters nor with errors in peak deconvolution. Consequently, manual inspection of the mass spectral matching can be skipped. However, this approach has challenges of its own: Multivariate analysis usually finds the largest contributions to the signal or to variations within the data. Nonsample variation in the chromatograms, such as retention time shifts, peak shape changes, or intensity variations might be so large that the chemical variations are overlooked.14,15 Even when chemical variation is explained, a naive approach to multivariate analysis might catch trivial variations in the most intense peaks, while missing the important variations in the smaller peaks. To remove nonsample variation and focus the analysis on the relevant information, the data must be processed in various ways. Selecting the proper processing parameters is not trivial, can be somewhat subjective, and is beyond the training of most analytical chemists. The pixel-based approach has also been used on GC × GC data.2,16,17,14 However, when it comes to allowing the reader to perform pixel-based analysis, the approaches used in these papers leave much to be desired. The processing steps are not described in sufficient detail for others to repeat the analysis: This include the choice of processing steps, parameter sensitivity analysis, and optimization of the parameters. Mohler et al.17 use replicate samples to assess the quality of the signal processing. However, even this approach is not well fitted for a more complex data set, as it requires 12 chromatograms per biological replicate. Furthermore, the described approaches skip steps which could improve the quality of the analysis. In particular, scaling is not used. The aim of the present study was to develop a novel stepwise method for signal processing of GC × GC data; second, to give a tutorial for processing of GC × GC data that provides a thorough and clear description of each individual step, parameter sensitivity, and how to use replicates for objective optimization of the data processing. To accomplish this aim, two data sets were processed and analyzed with principal component analysis (PCA). The first data set consisted of the same sample having undergone hydrotreatment to various degrees. The challenge here was to map the time development, like in, e.g., environmental biodegradation studies.18 The second data set consisted of 75 middle distillates used as petroleum feeds in refinery processing. The challenge here was to identify the types of samples and explain the differences,

such as in oil spill identification analysis.19 As both of these data sets consisted of petroleum samples, their successful analysis can only demonstrate the applicability of the pixel-based approach to such samples. To make the choice of processing parameters easier and less subjective, we introduce a novel method to estimate the proportion of the amount of variation in the data set that is due to sample variation by introducing a quality factor. This can be calculated for combinations of processing parameters and be used to validate the selection. It provides an objective measure to choose the optimal processing parameters for specific data sets. In the following sections, we describe the steps needed to prepare GC × GC data for pixel-based analysis, to ensure that the data acquisition provides useful data for multivariate analysis, to remove instrument artifacts from the data, and to prepare the data for multivariate analysis. The use of this tiered approach for obtaining low-artifact data from GC × GC is novel. As for the analysis itself, PCA was performed but the processed data can be analyzed by many other multivariate data analysis techniques.



METHODOLOGY Types of Chromatograms. Chromatograms serve different purposes in this work. In order to distinguish them, we will separate them into three groups: (i) Sample chromatograms are the chromatograms to be analyzed. (ii) Facilitator chromatograms guide the data preprocessing. They will typically be replicate chromatograms of the same mixture of the samples. (iii) Quality control chromatograms are used to check the analytical process, the data treatment and whether final analysis describes sample variation. They can either be replicate chromatograms of a mixture sample, like the facilitator chromatograms, or replicates of the sample chromatograms. Analytical Considerations. Replicates. Sampling replicates and analytical replicates are important in analytical chemistry applications for quality control. When multivariate data analysis is going to be performed, replicates have additional uses: When correcting for retention time shifts, a chromatogram is needed as alignment target; for scaling, replicates can be used to determine the proportion of nonsample variation at each variable. Pretreatment relying on facilitator samples works best if they contain all compounds. The easiest way to ensure this is to use a mixture of all samples as facilitator sample, taking into account large differences in concentrations when deciding on the ratios of samples used.13 Another use of facilitator chromatograms is to tune the signal processing steps in order to minimize the variation between identical samples.20 As a consequence, these chromatograms cannot later be used for quality control of the signal processing, as they will by definition exhibit low variation. For example, if a batch of facilitator chromatograms is used to determine the optimal scaling parameters, the variables where these chromatograms exhibit large variations (e.g., from residual retention time shift) will be down-weighted. Separate sets of replicates are thus needed for facilitation and quality control. Sequence Order. The performances of analytical instruments are not constant: Retention times and response factors change. These changes can affect the analysis to the point where the chemical variation can be completely hidden.14 If all samples of a particular kind are analyzed immediately after each other or there are other correlations between the nature of the samples and the run order, this can be hard to separate from 7161

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

Fourier transform (PAFFT),28 and then the shift for each 2tR are estimated by fitting the tentative shifts to a polynomial. Chromatographic regions where the tentative shift changes rapidly as a function of 1tR is weighted down as the 2tR shift is not expected to change rapidly with 1tR. Here, we will test both COW of 1D chromatograms as well as the fast Fourier transform alignment method. When aligning chromatograms, a common alignment target must be chosen. To ensure proper alignment, it is preferable that all compounds are present.29 This can be ensured by selecting a facilitator chromatogram. Furthermore, selecting a facilitator chromatogram from the middle of the analytical sequence ensures that it is representative.30 Chromatographic Background. Chromatographic background will here be defined as the slowly changing part of the nonsample signal, excluding, e.g., the unresolved complex mixture (UCM) of petroleum analysis. In GC, column bleed is a major component of the chromatographic background.31 With many overlapping peaks, such as in the UCM, estimating the underlying background can be almost impossible. This is problematic as the intensity of the background can vary between chromatograms.32 If such variation is present in the data during analysis, it could be mistaken for sample variation. In addition to possible mistakes about trends in sample composition, this could mask true sample variation from the analysis. As the chromatographic background changes slowly, its intensity where peaks elute can be estimated from nearby peakfree regions of the chromatogram.3 However, as it is unclear how to select such regions if not manually, this approach will be subjective and time-consuming. Instead, we have developed a novel approach to estimating chromatographic background in GC × GC: As the temperature is almost constant during one 2D separation, the background can also be assumed to have a constant intensity in each 2D chromatogram. The chromatographic background for a 2D chromatogram can be estimated by the minimum intensity of that 2D chromatogram if it contains regions without peaks. This is the case for all 2D chromatograms if the modulation period is set so that no wraparound occurs, i.e., so that the last compound of every 2D chromatogram is eluted before the first compound of the next 2D chromatogram elutes, as is generally recommended.4 If a few 2D chromatograms have peaks at all retention times, the minimum of each 2D chromatogram can be used as the input values for asymmetric least-squares baseline estimation.33 This can be done iteratively:34 At each iteration, points with intensities far above the baseline estimated in the previous iteration do not affect the baseline estimation. If more than a few 2D chromatograms have peaks at all retention times, other methods, such as the rolling ball baseline removal35 are more suitable. Signal Processing. After reducing artifacts (e.g., retention time shifts and chromatographic baselines), it may be necessary to further process the data to remove residual artifacts and emphasize important variation: To remove unwanted variation, such as noise and background, filters can be applied. To focus the analysis on subtle variation, the large variation can be removed by multiplying each chromatogram, or individual parts of each chromatogram, by a factor (normalization).36 The differences between the chromatograms can be emphasized over their similarities by subtracting the mean of each variable from the data (mean-centering).37 To reduce the influence of large signals, the data can be transformed.22 To adjust the

the chemical variation between the samples. As a consequence, the sequence order of the samples should not coincide with the structure of the data set. Normally, randomizing the run order will ensure that no such coincidence occurs. The analytical variations in the quality control and facilitator chromatograms should be similar to those in the sample chromatograms, as their purpose is to mimic the analytical variations in the sample chromatograms. As sequence order is a source of analytical variation,13 it is advisible to distribute the quality control and facilitator samples evenly throughout the analytical sequence.21 Practically, this can be ensured by dividing the samples randomly into groups of, e.g., 10, adding a quality control and a facilitator sample to each, and randomizing the order within each group. Instrumental Artifacts. Retention Time Shifts. Retention time shifts in chromatographic data are a major hurdle for pixelbased analysis of GC × CG data,22 as one of the assumptions in pixel-based analysis is that the intensity of a 1tR, 2tR pair represents the same information in different chromatograms. This assumption only holds if the same compounds contribute to the intensity of a 1tR, 2tR pair in all chromatograms, which is only the case if the compounds have the same retention times across chromatograms. 2tR shifts have been reported to be a bigger impediment for analysis than 1tR shifts.17,14 For one-dimensional chromatography, many approaches have been developed to align chromatograms, e.g., correlation optimized warping (COW),23 dynamic time warping,24 and iCoshift.25 For COW, which works by allowing the lengths of segments of the chromatograms to change, the parameters segment length (how long the segments are) and slack (how much the segments are allowed to be stretched or contracted) must be chosen.23 This selection can either be guided by knowledge of the magnitude of the shift23 or an automated optimization routine can be followed.26 In comprehensive twodimensional chromatography, the challenge can be expected to be even greater, as the possibility for shifts in two dimensions exists. Several approaches have been used, the simplest being to use a one-dimensional alignment method for each 1D or 2D chromatogram.17,14 This approach does not take advantage of the structure of GC × GC chromatograms. If the onedimensional alignment method returns erroneous alignments in some cases, the GC × GC chromatogram will be misaligned. If there is an error in the alignment of one or more of, e.g., the 2D chromatograms, a peak that had the same 2tR in all 2D chromatograms before alignment will be split up into several peaks with different 2tRs after alignment. This problem can be limited by limiting how large the shifts of the alignment techniques are allowed to correct for. Another strategy for applying 1D shift algorithms to GC × GC chromatograms that does take advantage of the structure of GC × GC chromatograms have been proposed by Zhang et al.:27 The 2D-COW technique interpolates the shifts of the entire GC × GC chromatogram from the COW estimated shifts of a subset of the 1D and 2D chromatograms if it is assumed that closely coeluting peaks shift similarly between chromatograms. As the shifts are continuous in 1tR and 2tR, the peaks will not be split up. However, one erroneous alignment will still cause misalignment of the GC × GC chromatogram. We have developed a novel alignment technique for GC × GC that utilizes the same assumption as 2D-COW while being less likely to propagate erroneous alignments from the underlying 1D alignment technique: First, tentative shifts are found for each 2D chromatogram by peak alignment by fast 7162

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

Transformation. The peaks in GC × GC can span a large intensity range. As multivariate methods focus on explaining the largest part of the variation, random variations in the most intense peaks may be modeled instead of meaningful variation in the less intense peaks. To avoid this, the data can be transformed, i.e., the intensity is replaced by a function of the intensity.37 This will typically reduce the influence of the most intense peaks and increase the influence of less intense areas of the chromatograms. If variation in the concentration of the compounds that gives rise to the most intense peaks is not important, this focus can be advantageous. However, it can also increase the influence of noise on the analysis, as the less intense variables tend to have a lower signal-to-noise ratio. If an initial PCA of the data does not group facilitator samples together, transforming the data could be considered. Logarithmic transformation is often used,22 though power transformation might be better equipped to handle noise.37 The power chosen in power transformation is arbitrary, so logarithmic transformation is more objective. Scaling and Mean Centering. Mean centering subtracts the mean of each variable from the data, to focus the analysis on the differences between the samples.38 Before multivariate analysis, data are often autoscaled, i.e., in addition to mean centering, each variable is divided by its standard deviation.22 This ensures that each variable has an equal probability of affecting the model. However, this is often far from ideal for chromatographic data.38 A pixel representing a peak should have a higher probability of affecting the model more than one representing background or noise. Scaling by the inverse of either the standard deviation or the relative standard deviation21 of the intensities in facilitator chromatograms decrease the influence of the chromatographic regions that are most affected by nonsample variation while increasing the influence of peak regions.13 Another way to decrease the influence of chromatographic regions most affected by nonsample variation is variable selection. Here, some variables are discarded, i.e., have weight 0. The nondiscarded variables all have equal opportunity to affect the model, i.e., they have the same weight. If this weight is arbitrarily set to 1, variable selection is simply scaling where the only possible weights are 1 and 0. Because more details can be reflected in the weights for scaling, more information about the reliability of the variables can be included in scaling. The less reliable variables are not completely removed, so the information they describe can still be reflected in the analysis. As their influence are reduced, analysis can be performed even though there are thousands of times as many variables as there are samples. This makes scaling an essential part of the data processing before pixel-based analysis. Scaling according to the analytical uncertainty requires the presence of facilitator chromatograms. As such have not been included in any previously published investigation of pixel-based analysis of GC × GC data, its use there has not been published prior to this manuscript. Analysis. The selection of multivariate analysis methods to use for analysis of the processed data depends on the samples and what the expected result of the analysis is If the signals sought are independent, independent component analysis can be applied.14 If a nested tree of samples is sought, clustering can be applied.16 To describe the differences between known groups of samples, partial least-squares regression-discriminant analysis (PLSDA)16 is an option. If the data are ordered in a tensor, multiway methods like parallel factor analysis

influence each variable has on the analysis, each variable can be multiplied by a scaling factor (scaling).38 Filtering. Filters can be used to remove unwanted parts of the signal. The selection is usually based on differences in how quickly the intensities change, with signal changing much faster or much slower than the typical chromatographic peak being removed from the data. By taking the first or second derivative, constant background or a background that changes linearly with tR (linear background) can be removed.12 It is often implemented using the Savitzky−Golay smoothing filter39 to limit the decrease in signal-to-noise ratio (S/N). However, this approach requires selection of the filter width and the order of the fitted polynomials, making it harder to automate and somewhat subjective. By smoothing with, e.g., Gaussian broadening, the S/N will be improved, but unwanted peak broadening is introduced. Furthermore, the remaining noise is more peak-like than the noise before the filtering.40 This makes it harder to distinguish noise and peaks and thus harder to estimate the S/N. Combining the two approaches, the second derivative Gaussian filter40 improves the S/N and removes a constant and linear background while slightly narrowing the peaks. It requires one parameter to be set, the filter width. The filter gives the largest S/N improvement if the filter width is 2−3 times the peak width.40 An error of up to 40% in the estimation of the peak width only reduces the S/N increase by 10%.40 Its main drawback is that intense peaks can hide nearby nonintense peaks. This is less of a problem with MS detection, as the larger peak must have a higher signal in all mass channels for the small peak to be completely hidden. In our experience, the second derivative filter is easy to use and effective in improving S/N and removing background. Analysis is also possible without filtering, as shown with the analysis of the first data set (see the Results and Discussion, GC × GC-FID analysis). Normalization. When normalizing, a normalization factor is calculated for each chromatogram. The intensity of all data points in that chromatogram is then divided by that factor. Different normalization schemes can be chosen depending on the desired focus of the analysis36 and knowledge of the variation between the samples.3 If one or more internal standards (ISs) have been added to the samples, their intensity can be used for normalization. This will correct for nonsample variation (e.g., variations stemming from detector sensitivity and injected volume) with two limitations: The variation must have been introduced after the ISs were added, and the variation must affect the sample compounds in the same way as it affects one of the ISs.41 Normalization to the sum of intensities3 or to the Euclidean norm36 will remove concentration effects from the data set, in addition to the variations removed by the IS approach with one IS. This can be advantageous if patterns are more important than concentrations but can make interpretation more complicated.13 In degradation studies, the total amount of compounds will usually decrease with time, so the apparent signal of refractory compounds will increase if normalization is used.12 Furthermore, if background is not completely removed before normalization, the contribution of the background will affect the result. In this case, normalizing to the sum of areas known to contain peaks, such as the 10% highest intensity data points, or to the sum of pixels with low standard deviation in replicates12 will often be a better strategy. 7163

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

(PARAFAC) can be utilized.17 Whatever the final application is, PCA is always a good starting point. It finds the largest contribution to total variation in the data set, making it excellent to discover both outliers, flaws in the signal processing, and the main trends in the data. Here, we will only apply PCA. To determine which principal components (PCs) describe sample variation, the loading coefficients can be visually investigated for indications of artifacts: If a PC describes a tR shift of certain peaks, these peaks will resemble the first derivative of peaks in the loading coefficients of that PC.13 This can happen for both chromatographic directions in GC × GC. Ultimately, determining which PCs are relevant will be a subjective judgment. Parameter Optimization Strategy. The amount of variation in the intensity of each pixel that is due to chemical differences between the samples (Var(Isignal)) can be used to assess the effectiveness of data processing. As the total intensity (Itotal) is the sum of Isignal and intensity from other sources (Inoise), Var(Itotal) = Var(Isignal) + Var(Inoise), assuming that Isignal and Inoise are uncorrelated. Var(Inoise) can be estimated as the variance in intensity in different chromatograms of the same sample, e.g., facilitator chromatograms (Var(Ifacilitator)), as Var(Isignal) is zero here. It follows that Var(Isignal) = Var(Itotal) − Var(Ifacilitator). If this is negative, it can be assumed that all variation is due to nonsample differences, and Var(Isignal) is zero. The average sample-induced variation over all pixels is Σ(Var(Itotal) − Var(Ifacilitator))/N, where the summation is over all pixels with a positive summand and N is the total number of pixels. In order to make this comparable between, e.g., native and transformed data, the proportion of this to the average total variance in the pixels can be used (see eq 1): A= =

the purpose of the analysis must be considered when considering such steps.36



EXPERIMENTAL SECTION Two sets of samples were analyzed. The first sample set consisted of one straight run light gas oil (feed) and seven product samples obtained from catalytic hydrotreating. The seven samples were obtained from a fixed bed pilot unit loaded with Haldor Topsoe TK-607 BRIM catalyst operated at 60 barg and 320 °C using liquid hourly space velocities (LHSV) ranging from 8.5 h−1 to 0.75 h−1. The variation in LHSV gives rise to different residence times of the oil in the reactor and thereby different sulfur-conversion levels. In addition to these samples, a sample consisting of equal proportions of all other samples was prepared for use as facilitator sample. The samples were analyzed on a Trace GC Ultra (Thermo Fischer Scientific Inc.) fitted with an Agilent 355 sulfur chemiluminescence (Agilent Technologies) detector, a flame ionization detector (FID), a two-stage cryogenic (liquid CO2) modulator, a 14.5 m Zebron ZB-1 column (0.25 mm i.d., 1 μm film thickness) from Phenomenex Inc. as a primary column, and a 3.5 m BPX50 (0.1 mm i.d., 0.1 μm film thickness) from SGE Analytical Science Pty. Ltd. as a secondary column. Only the FID signal was used. The samples were run in random order, with the facilitator sample run in duplicate. This procedure was repeated to yield replicate chromatograms of each sample as quality control chromatograms and four chromatograms of the facilitator sample. Initial PCA placed two chromatograms far from the rest of the chromatograms, indicating that they were markedly different from even replicate chromatograms of the same samples. As these two chromatograms were the first to be recorded on different Monday mornings, this was assumed to arise from impurities in the gas supply, with the 1D column acting as an active sampler during the weekend. These chromatograms were discarded before further analysis. In one of the remaining chromatograms, a localized spike was removed as its shape indicated that it was not a chromatographic peak. The data set hence consisted of eight sample chromatograms, six quality control chromatograms, and four facilitator chromatograms. Each chromatogram described the intensity at every combination of 668 1tR and 800 2tR or at 534 400 data points. The second data set consisted of 75 petroleum samples. They were mainly of light gas oils (LG), light cycle oils (LC) and kerosenes (KE), and blends of these. The use of these labels may be inconsistent between samples, as the labels were applied by different suppliers. In addition to these samples, two mixture samples were prepared: A facilitator sample consisting of equal proportions of each sample, and a quality control sample consisting of equal proportions of every LG sample, every LC sample, and every sample being a mixture of LG and LC. The samples were analyzed on a LECO Pegasus 4D GC × GC/MS (LECO Corporation, MI) incorporating a secondary GC oven, a four-stage modulator (liquid N2), and fitted with a 15 m ZB5 column (0.25 mm i.d., 0.5 μm film thickness) from Phenomenex Inc. as a primary column and a 1.5 m BPX 50 (0.1 mm i.d., 0.1 μm film thickness) from SGE Analytical Science Pty Ltd. as a secondary column. As additional quality control, 25 of the samples were randomly chosen to be recorded in duplicate, four in triplicate and one in quadruplicate for additional quality control. The samples were analyzed in random order and in batches, each batch consisting of 13 samples, the facilitator sample, and the quality control sample.

Σ(Var(Itotal) − Var(Ifacilitator))/N ΣVar(Itotal)/N Σ((Var(Itotal) − Var(Ifacilitator)) ΣVar(Itotal)

(1)

A can be used to evaluate the quality of preprocessing even if the assumptions it is derived from does not hold, e.g., if heteroscedastic data makes Isignal and Inoise correlated. In such cases, A will still be higher for a preprocessing method that is better at removing nonsample variations. The same principle can be applied to the PCs from a PCA of partly processed data: If PCA of the data after processing with a certain choice of processing parameters group the facilitator chromatograms together, and PCA of the data after processing with other parameters does not, the processing parameters that does not group the facilitator samples together are not optimal. Evaluating whether the chromatograms group together is done by visual inspection and is thus subjective. The grouping of facilitator samples in PCA is closely linked to the desired goal: Ensuring that only sample information influences the analysis. Thus, it is a better measure of quality of processing than magnitude of A. However, magnitude of A is less subjective and can be used even if the facilitator samples do not group together in the first PC (PC1) or when different processing parameters produce equal degree of grouping. Some processing steps focus the analysis on removing unwanted sample information, e.g., cropping and normalization. Such steps can decrease A even when facilitating the analysis, so A cannot be used to assess which normalization to use. Instead, 7164

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

Initial PCA indicated that six of the samples were outliers, and visual inspection confirmed incomplete release from the modulator at various 1tR in these. These samples were discarded before further analysis, and the final data set hence consisted of 134 chromatograms, divided into 75 sample chromatograms, 10 facilitator chromatograms, and 49 quality control chromatograms. Each chromatogram contained 1 175 600 mass spectra (2139 1tR × 600−500 2tR), each containing intensities from m/z = 50 to m/z = 350 (401 m/z values). For both data sets, the chromatographic raw data were exported as netcdf files and imported into Matlab (The Mathworks, Inc., Natick, MA) using built-in Matlab functions. All further analysis was done in Matlab. For the first data set, Matlab 7.10.0.499 (R2010a) 32 bit (glnx86) was used. For the second data set, Matlab 7.13.0.564 (R2011b) 64-bit (win64) (The MathWorks, Inc., Natick, MA) was used.



RESULTS AND DISCUSSION GC × GC-FID (Data Set 1). A facilitator chromatogram recorded in the middle of the analytical sequence was chosen as the alignment target. For each chromatogram and 1tR, the 2tR shift was approximated by PAFFT. For each chromatogram, the 2 tR shifts were estimated by fitting a linear function to these tentative 2tR shifts. This estimated shift was removed from the chromatograms. Correcting for 1tR shift with COW was tested by aligning chromatograms consisting of all data points with identical 2tR with the alignment target. All combinations of segment length 20, 30, and 40 and slack 1, 2, and 3 were tested. Segment length 30 and slack 3 were chosen as parameters as this resulted in the highest value of A. The effects of alignment on the facilitator chromatograms are shown in Figure 1b,c. The background was estimated for each 1tR as the minimum intensity of the corresponding 2D chromatogram. As a few 2D chromatograms exhibited wrap-around, these estimates were used as input values for iterative asymmetric least-squares estimation of a spline function.34 When the iteration converged, the resulting background was subtracted from the chromatograms. After testing from one to ten splines, five splines was chosen for the processing as it resulted in a slightly higher increase in A than the others (A = 28.3%−28.4% after background correction with a different number of splines). It seems to be more important that baseline correction is performed than what exact parameters are used (Figure 1d), red lines). The effect of background removal is shown in Figure 1b). The data were not normalized as the samples were injected undiluted, so the total concentration of hydrocarbons were constant and a high-level overview of the chemical differences were sought.36 As PCA of the untransformed data did not group the facilitator chromatograms together, logarithmic transformation and power transformations with powers 1/2, 1/3 and 1/10 were tested. The facilitator chromatograms only grouped together in PCA of the logarithmically transformed data. This is expected for a data set where the most abundant compounds (paraffins) do not vary in concentration, as random differences in their intensities, although small relative to the total intensities, can be much larger than nonrandom differences in intensities for less abundant compounds. Any analysis describing the largest absolute variations will then describe random variations. Analysis of logarithmically scaled data will instead describe the largest variations relative to the intensities, which is less likely to be random.

Figure 1. (a−c) Two consecutive 2D chromatograms (borders marked by dashed lines) with 1tR around 60 min from replicate facilitator chromatograms in the GC × GC-FID data set at different steps in the processing. (a) Raw chromatograms. (b) After background subtraction, the intensities are much closer to matching. (c) After retention time alignment, the position of the peaks match between the chromatograms. (d) Percentage of the information in the data set that is sample variation (A), as a function of preprocessing step. The blue line represents the final processing route, while other colors demonstrate the effect of alternative choices: The red line is the alternative numbers of splines for background estimation and the purple line represents normalization.

For each pixel, data were scaled by the inverse of the relative standard deviation of the facilitator chromatograms to scale down noise variables and focus the analysis on the variables that describe chemical variation. The final data set consisted of 14 processed chromatograms of samples, each consisting of 400 800 data points (800 2tR × 501 1tR), in addition to four chromatograms of facilitator samples. The data set (14 samples × 400 800 data points) were analyzed with PCA. The score plot of PC1 vs PC2 for this PC model is shown in Figure 2a. Most of the variation in the data set (63%) is described by PC1. The PC1 scores are large and positive at the start of the hydrotreatment test and decreases as the hydrotreatment progresses. The PC1 loading plot is shown in Figure 2b. The triaromatics, naphthenodiaromatics, and diaromatics have positive loading coefficients (red), indicating 7165

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

hydrocarbons is to saturate polycyclic aromatic compounds. The sulfur in the feed is mostly present as benzothiophenes and dibenzothiophenes, which elute in the diaromatics and triaromatics band, respectively. Their concentrations, relative to the hydrocarbons, are too small for their behavior to be determined from this analysis. There is a group of compounds eluting in the diaromatics band with a 1tR of 57−70 min which have negative loadings in PC1. Visual inspection of the chromatograms confirms that these peaks have increasing signal intensity during the hydrotreatment and are not artifacts of the data processing. The identities of these peaks have not been established, but they could be biphenyls produced from direct hydrodesulphurization of dibenzothiophenes42 or, less likely given the catalyst,43 from cracking of naphthenodiaromatics. Hence, the pixel-based approach is able to pinpoint uncharacteristic behavior of individual peaks, even in tight clusters. The PC2 scores are large and positive for samples at the start and end of the hydrotreatment, and negative in the middle of the hydrotreatment process. All compounds that PC1 shows are consumed have negative loading coefficients in PC2, adding no new information about the process. For the compounds that PC1 shows are produced, the naphthenoaromatics show negative PC2 loadings and the polynaphthenes show positive PC2 loadings, indicating that the naphthenoaromatics are produced earlier than the polynaphthenes. Further PCs mainly contain a 1tR shift. While more meaningful components could potentially have been extracted if the 1tR alignment had removed more of the shift, finding more than two important PCs is not expected with only eight samples from a single process. Comparison with Integration-Based Analysis. To validate the pixel-based method, chromatograms from the same set of samples were integrated: For each sample, the areas of 195 groups of peaks were calculated. Each group consisted of all of the compounds with the same functional groups and total number of carbons (i.e., single-carbon-number groups). The functional groups were either n-paraffins, iso-paraffins, mononaphthenes, dinaphthenes, monoaromatics, naphthenoaromatics, diaromatics, naphthenodiaromatics, triaromatics, naphthenotriaromatics, or tetraaromatics, and the number of carbon atoms was between 6 and 30. The iso-paraffin/n-paraffin ratios

Figure 2. Part of the results from pixel-based PCA of GC × GC-FID data. (a) Scores plot of PC1 versus PC2. Each chromatogram is represented by a blue cross. Scores of analytical duplicates are connected by a line. The position of the feed is indicated by “Feed”, and the progress of hydrotreatment is indicated by the curved arrow. PC1 is monotonically decreasing with the degree of hydrotreatment, while PC2 decreases at the start of the hydrotreatment and increases toward the end. (b) PC1 loading coefficients, (c) PC2 loading coefficients, and (d) average GC × GC chromatogram of eight samples. The black lines in parts b−d indicate the boundaries between classes of compounds. The classes are indicated by blue capital letters. They are paraffins (A), naphthenes (B), aromatics (C), naphthenoaromatics (D), diaromatics (E), naphthenodiaromatics (F), and triaromatics (G). In parts b and c, areas where no compounds elute are gray, other colors indicate the sign and magnitude of the loading coefficients: dark blue (negative and large), light blue (negative and small), green (zero), yellow (positive and small), and red (positive and large).

that they are consumed during hydrotreatment. The naphthenoaromatics and polynaphthenes have negative coefficients (blue), indicating that they are produced. The paraffins and aromatics have loading coefficients around zero (green), indicating that they are neither produced nor consumed during hydrotreatment. The main effect of the hydrotreatment on

Figure 3. Results of PCA of 193 areas obtained by traditional integration of GC × GC-FID data. (a) Scores plot, each sample is represented by a point. The feed sample is indicated by “Feed”, and the progress of hydrotreatment is indicated by the curved arrow. (b) Loadings plot, each isomer group is represented by a point. The groups are designated by their type (n-paraffins (nPar), iso-paraffins (iPar), mononaphthenes (mNaph), dinaphthenes (diNaph), monoaromatics (mAro), naphthenoaromatics (NmAro), diaromatics (diAro), naphthenodiaromatics (NdiAro), triaromatics (triAro), naphthenotriaromatics (NtriAro) or tetraaromatics (tetraAro)) and total carbon number. 7166

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

of the samples were nearly identical in all of the hydrotreated samples except one. This sample was considered an outlier and excluded from the analysis as neither n-paraffins nor isoparaffins are expected to be produced or consumed during hydrotreatment. The remaining samples were analyzed with PCA. The scores plot of PC1 vs PC2 (Figure 3a) is similar to the scores plot from the pixel-based analysis, just rotated and mirrored, as is expected from PCA. This indicates that the underlying phenomena described are identical. The PC1-PC2 loading plot is shown in Figure 3a. Peaks from C11−C14 diaromatics have higher intensities in the feed, C11−C14 naphthenoaromatics have higher intensities in the intermediate samples, and C11−C13 dinaphthenes have higher intensities in the higher space velocity samples. However, more detailed information on changes in the composition cannot be extracted from the data. It is not clear to what degree the overall trend can be extended to compounds with other numbers of carbon atoms or whether this pattern holds for, e.g., all diaromatics with 13 carbon atoms. In particular, the compounds in the diaromatics band with a 1tR of 57−70, which pixel-based analysis discovered to grow in concentration during hydrotreatment were not detected by the integration-based approach. While this could be corrected by integrating more groups each containing fewer or even only a single compound, the fundamental problem is inherent to integration-based analysis, in any integration (or indeed any data reduction step), there is the risk of losing important information, in this case the differences between the intensities of compounds within the same integration window. Furthermore, integrating more groups would take much longer time, and more variables would make loading plots like the one presented in Figure 3 harder to interpret. The integration-based analysis gave no indication that more detailed analysis was necessary. The pixelbased approach is thus better equipped for untargeted or less targeted analysis. This also highlights the subjectivity of the integration approach: The detail level of the integration needs to be chosen, and if a detail level finer than what was originally planned is required, the analysis must start over. This is in contrast to the pixel-based approach, where a change of focus only requires changing the normalization, the scaling, or the cropping to remove the unwanted variation. GC × GC-TOFMS (Data Set 2). Each mass channel was filtered with a second derivative Gaussian filter using a filter width of 60 data points, two times the width of a typical peak by visual inspection which provides the maximum increase in S/N,30 and negative intensities were removed by setting them to zero. This produced a massive increase of A (Figure 4). Around half of this improvement is due to removal of 50 Hz noise (from power line), an artifact specific to this instrument, with most of the remainder being due to background correction (see the Supporting Information). After filtering, the mass channels were summed to produce a filtered total ion chromatogram (TIC). All further data analysis was performed on this filtered TIC. A facilitator chromatogram from the middle of the chromatographic sequence was chosen as an alignment target. For each 1tR, in each chromatogram, the 2 tR shift was approximated by PAFFT. The 2tR shifts were estimated by fitting a second order polynomial to these approximated 2tR shifts. The estimated 2tR shift was then removed, while the 1tR shifts were not corrected for. Normalization to the sum of all intensities, to the sum of all intensities weighted by the inverse of the relative standard deviation of the normalized facilitator chromatograms, to the

Figure 4. Amount of sample variation as a percentage of total variation (A) at different points in processing of the GC × GC/MS data set, calculated based on the TIC and filtered TIC. The green lines represent alternate normalizations.

Euclidean norm of the intensities, and to the maximum of the intensities were tested. Normalization to the equal weighted sum was chosen, as only this approach increased A, as shown in Figure 4. As PCA of the untransformed data grouped the replicate facilitator chromatograms closer together than the sample chromatograms, no transformation was done before analysis. For each pixel, data were scaled by the relative standard deviation of the normalized facilitator chromatograms. This increased A, as the pixels where the GC × GC chromatograms of the same sample show high variability is weighted down. If residual artifacts remain in the data, pixels that are more heavily influenced by them will be weighted down. This will increase the proportion of the sample variation in the data set. The final processed data set contained 134 chromatograms, including 11 facilitator chromatograms and 11 chromatograms of the quality control sample. Each chromatogram consisted of 420 986 data points. The processed data were analyzed with PCA. For this data set, one set of quality control analytical replicates would have sufficed, as the analytical replicates of the quality control sample and the analytical replicates of the samples exhibited the same grouping behavior during PCA. With one quality control chromatogram and one facilitator chromatogram recorded for every 13 sample chromatograms, 90 chromatograms are needed to analyze 75 samples. The first two PCs of the PC model are shown in Figure 5. The score plot of PC1 vs PC2 is shown in Figure 5a. Samples of the same type group together, with kerosenes (yellow stars) with large negative PC1 score values and large positive PC2 score value, light cycle oils (blue crosses) with slightly higher PC1 score values and large negative PC2 score values, and light gas oils (green ×) with large positive PC1 and PC2 score values. Blended samples (diamonds, color corresponds to the type of the samples mixed) have score values in-between the pure groups they are blends of. The PC1 loading plot is shown in Figure 5b. The loading coefficients of high boiling compounds are positive (red), and the loading coefficients for the low boiling compounds are negative (blue-green). A sample with a large positive PC1 score value will contain more of the high boiling compounds and less of the low boiling compounds compared to an average samples. PC1 thus describes volatility of the sample. Figure 5c shows PC2 loading coefficients. Paraffins have positive PC2 loading coefficients (red) and diaromatics have negative PC2 loadings (blue). Thus, 7167

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

essential. However, most of the increase in A provided by the filtering was due to the removal of 50 Hz noise and background (as shown in Table S1 in the Supporting Information), both of which could have been dealt with in other ways. The 50 Hz noise could be removed by a 4 point (1/50 s) running average, and the chromatographic background could be removed in a similar to how it was removed in the first data set.



CONCLUSION The pixel-based approach is a powerful analysis strategy for GC × GC data if the artifacts in the data can be handled. The steps necessary to remove artifacts from data from petroleum samples have been discussed in detail. It finds the same overall trends as integration coupled to multivariate analysis, and is better at locating subtle patterns that can easily be missed otherwise. It presents the analysis as images which are closely related to the chromatography. They require some training to interpret, especially the later, less important loadings. While we have not directly compared the pixel-based approach to deconvolution, it can be assumed that deconvolution will share many of the shortcomings of integration. Furthermore, it cannot be used with GC-FID data and requires time-consuming manual inspection of the results. However, once analysis has been performed, the identity of the compounds are already established, facilitating interpretation of the results. For large petroleum data sets, two extra chromatograms per 13 samples, or 1.15 chromatogram per sample, suffice to guide the processing and to ensure the quality of the analytical work and of the processing. However, using this bare minimum is not recommended, as it is impossible to tell how much quality control is needed before the chromatograms have been recorded, and the only way to add more facilitator or quality control samples is to redo the data recording. In particular, data sets where more sources of variation influence the data, such as will typically be the case for biological data sets, will probably require more quality control samples. We have presented an objective function (A) to guide the choice of processing parameters, allowing the pixel-based approach to be more objective than integration that requires the manual setting of integration limits. It is not sufficient to ensure a suitable selection of all parameters, but combined with the grouping of facilitator chromatograms compared to the sample chromatograms, an effective guide to choosing processing parameters is obtained. 2 tR alignment and scaling improved A in both data sets, while removal of instrument artifacts and background removal was only important in one of the investigated data sets. Transformation of the data can be necessary if compounds giving rise to intense peaks do not vary in concentration between the samples. The focus of the analysis can be changed without redoing most of the work, by applying a different normalization or scaling or by cropping the chromatograms before analysis.

Figure 5. Part of the results of PCA of the GC × GC-TOFMS data set. (a) Scores plot for PC1 vs PC2. Facilitator chromatograms are orange triangle, replicate quality control chromatograms are red squares, KEs (kerosenes) are yellow stars, LCs (light cycle oils) are blue crosses, LGs (light gas oils) are green ×, and other types are black circles. Mixes between KEs, LCs, and LGs are diamonds, with their colors indicating the amounts of the individual components. (b) PC1 loading coefficients, (c) PC2 loading coefficients, and (d) mean chromatogram. Blue is the lowest intensity, then cyan, yellow, and red. The black lines in parts b−d indicate the boundaries between classes of compounds. The classes are indicated by red capital letters. They are paraffins (A), aromatics and naphthenoaromatics (B), and diaromatics, naphthenodiaromatics, and triaromatics (C). In parts b and c, areas where no compounds elute are gray; other colors indicate the sign and magnitude of the loading coefficients: dark blue (negative and large), light blue (negative and small), green (zero), yellow (positive and small), and red (positive and large).

PC2 describes the paraffinicity versus the aromaticity of the samples. Investigation of the higher PCs (see the Supporting Information) reveals that PC3 describes further boiling point differences, that PC4 describes further aromaticity differences, particularly in the heavier compounds, and that PC5 and PC6 both describe the proportion of monoaromatics to diaromatics as well as some residual 2tR shift. For the higher PCs, the variation in the quality control chromatograms is nearly as large as for the sample chromatograms, and though some of them might contain relevant sample information, no further investigations were performed. More detailed analysis could be performed on a subset of the samples, such as investigations into the differences within the group of light cycle oils or a subsection of the chromatograms could be used, e.g., if only data points with 1.5 s < 2tR < 2 s were analyzed, the patterns of monoaromatics could be investigated. Normalizing each compound group separately would remove the overall patterns, leaving the finer, within-group patterns to be analyzed.36 While GC × GC/MS allows for increased peak capacity as compounds that show chromatographic overlap can be separated on differences in their mass spectra, we have not utilized the advantage in the analysis, as it was performed on the TIC. The filtering was performed before summing to the TIC. As filtering with a second derivative Gaussian filter can hide peaks if overlap exists and as overlap is much more prevalent in the TIC than in the individual EICs, this was



ASSOCIATED CONTENT

* Supporting Information S

Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. 7168

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

Notes

chromatography platforms in combination with chemometric data analysis.

The authors declare no competing financial interest.



Biographies

ACKNOWLEDGMENTS Asbjørn S. Andersson and Rasmus G. Egeberg, Haldor Topsøe A/S, are greatly acknowledged for carrying out catalytic hydrotreating tests and providing samples for this study.

Søren Furbo (born Værløse, Denmark, 1980) holds a position as post doc in the Analytical Chemistry Group, Department of Plant and Environmental Sciences, University of Copenhagen, Denmark. His research focus is on making advanced processing of chromatographic data accessible to the nonprogrammer by integrating it into an easy-touse graphical user interface. During his Ph.D. he worked on analysis of multidimensional chromatographic data, with special focus on comprehensive two-dimensional gas chromatography (GC × GC) of petroleum, in a collaboration between Haldor Topsøe A/S and the Analytical Chemistry group at the University of Copenhagen. Here, he transferred existing techniques for processing chromatographic data to GC × GC, developed novel techniques for removal of artifacts from GC × GC chromatograms, and investigated different approaches to analyzing GC × GC data.



REFERENCES

(1) Oh, C.; Huang, X.; Regnier, F. E.; Buck, C.; Zhang, X. J. Chromatogr., A 2008, 1179, 205−215. (2) Schoenmakers, P.; Marriott, P.; Beens, J. LC−GC Eur. 2003, 16, 335−339. (3) Gröger, T.; Schäffer, M.; Putz, M.; Ahrens, B.; Drew, K.; Eschner, M.; Zimmermann, R. J. Chromatogr., A 2008, 1200, 8−16. (4) Dallüge, J.; Beens, J.; Brinkman, U. A. J. Chromatogr., A 2003, 1000, 69−108. (5) Gaines, R. B.; Ledford, E. B.; Stuart, J. D. J. Microcol. Sep. 1998, 10, 597−604. (6) Van De Weghe, H.; Vanermen, G.; Gemoets, J.; Lookman, R.; Bertels, D. J. Chromatogr., A 2006, 1137, 91−100. (7) de la Mata, A. P.; Nizio, K. D.; Harynuk, J. J. J. Chromatogr., A 2012, 1255, 190−195. (8) Kallio, M.; Hyötyläinen, T. J. Chromatogr., A 2007, 1148, 28−235. (9) Welthagen, W.; Shellie, R.; Spranger, J.; Ristow, M.; Zimmermann, R.; Fiehn, O. Metabolomics 2005, 1, 65−73. (10) Risticevic, S.; DeEll, J. R.; Pawliszyn, J. J. Chromatogr., A 2012, 1251, 208−218. (11) Aura, A.-M.; Mattila, I.; Hyotylainen, T.; Gopalacharyulu, P.; Bounsaythip, C.; Oresic, M.; Oksman-Caldentey, K.-M. Mol. Biosyst. 2011, 7, 437−446. (12) Christensen, J. H.; Hansen, A. B.; Karlson, U.; Mortensen, J.; Andersen, O. J. Chromatogr., A 2005, 1090, 133−145. (13) Christensen, J. H.; Tomasi, G. J. Chromatogr., A 2007, 1169, 1− 22. (14) Vial, J.; Nocairi, H.; Sassiat, P.; Mallipatu, S.; Cognon, G.; Thiébaut, D.; Teillet, B.; Rutledge, D. N. J. Chromatogr., A 2009, 1219, 2866−2872. (15) van Mispelaar, V. G.; Janssen, H.-G.; Tas, A. C.; Schoenmakers, P. J. J. Chromatogr., A 2005, 1071, 229−237. (16) Grö ger, T.; Welthagen, W.; Mitschke, S.; Schäffer, M.; Zimmermann, R. J. Sep. Sci. 2008, 31, 3366−3374. (17) Mohler, R. E.; Dombek, K. M.; Hoggard, J. C.; Young, E. T.; Synovec, R. E. Anal. Chem. 2006, 78, 2700−2709. (18) Brown, K.; Donnelly, K. Environ. Pollut., Ser. B: Chem. Phys. 1983, 6, 119−132. (19) Wang, Z.; Fingas, M.; Landriault, M.; Sigouin, L.; Feng, Y.; Mullin, J. J. Chromatogr., A 1997, 775, 251−265. (20) Christensen, J. H.; Tomasi, G.; Hansen, A. B. Environ. Sci. Technol. 2005, 39, 255−260. (21) Christensen, J. H.; Hansen, A. B.; Tomasi, G.; Mortensen, J.; Andersen, O. Environ. Sci. Technol. 2004, 38, 2912−2918. (22) Pierce, K. M.; Kehimkar, B.; Marney, L. C.; Hoggard, J. C.; Synovec, R. E. J. Chromatogr., A 2012, 1255, 3−11. (23) Nielsen, N.; Carstensen, J.; Smedsgaard, J. J. Chromatogr., A 1998, 805, 17−35. (24) Kassidas, A.; MacGregor, J. F.; Taylor, P. A. AIChE J. 1998, 44, 864−875. (25) Savorani, F.; Tomasi, G.; Engelsen, S. J. Magn. Reson. 2010, 202, 190−202. (26) Skov, T.; van den Berg, F.; Tomasi, G.; Bro, R. J. Chemom. 2006, 20, 484−497. (27) Zhang, D.; Nuang, X.; Regnier, F. E.; Zhang, M. Anal. Chem. 2008, 80, 2664−2671. (28) Wong, J. W. H.; Durante, C.; Cartwright, H. M. Anal. Chem. 2005, 77, 5655−5661. (29) Tomasi, G.; van den Berg, F.; Andersson, C. J. Chemom. 2004, 18, 231−241.

Asger B. Hansen (born in Stadil, Denmark, 1951) holds a position as research scientist at the Chemical Analytical Department, the R&D Division at Haldor Topsøe A/S, Denmark, a global supplier of catalysts and technologies, where he is heading the development and applications of GC × GC techniques for characterizing petroleum refinery streams. He received his M.Sc. degree in chemistry from Aarhus University, Denmark, in 1980. Prior to his current position he was research scientist for 12 years at Risø National Laboratory, Denmark, and senior scientist for 16 years at the National Environmental Research Institute, Denmark. Most of his professional carreer has been devoted to the analysis, characterization, and monitoring of petroleum hydrocarbons, both as energy sources and as pollutants in the environment using chromatographic and spectroscopic techniques. He has authored and coauthored more than 50 peer-reviewed papers, conference proceedings, and book chapters. Thomas Skov (born Aarhus, Denmark, 1975) is an Associate Professor at the Spectroscopy and Chemometrics section, Department of Food Science, Faculty of Science, University of Copenhagen, Denmark. His research spans a range of areas in chemometrics, process analytical technology and chemistry, multiway chemometrics, biotechnological process optimization, and hyphenated and multidimensional chromatographic techniques. He has published more than 40 peer-reviewed papers and book chapters on these topics. His research motivation is mainly to make clever methods that can be applied by more than just experts within chemometrics to make sure that data from chromatographic instruments can be analyzed even better. Jan H. Christensen (born in Hillerød, Denmark, 1973) is an Associate Professor in the Analytical Chemistry group, Department of Plant and Environmental Sciences, University of Copenhagen, Denmark. He heads the Research Centre for Advanced Analytical Chemistry (RAACE) and is currently responsible for >15 research grade analytical instruments. He has pioneered cutting-edge analytical and chemometric methods for oil hydrocarbon fingerprinting and now works with all aspects of contaminant fingerprinting, petroleomics, and metabolomics. He develops analytical platforms and new tools to handle and process complex data from cutting-edge analytical instrumentation and apply this chemical fingerprinting concept for analysis of complex mixtures of organic contaminants and in numerous industry project collaborations with the petrochemical, environmental, and food industry. He has authored and coauthored more than 60 peer-reviewed papers and book chapters on these topics. His current research focus is on development and application of multidimensional 7169

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170

Analytical Chemistry

Feature

(30) Bylund, D.; Danielsson, R.; Malmquist, G.; Markides, K. E. J. Chromatogr., A 2001, 961, 237−244. (31) Poole, C. F. The Essence of Chromatography, 1st ed.; Elsevier Science B.V.: Amsterdam, The Netherlands, 2003. (32) Furbo, S.; Christensen, J. H. Anal. Chem. 2012, 84, 2211−2218. (33) Mazet, V.; Carteret, C.; Brie, D.; Idier, J.; Humbert, B. Chemom. Intell. Lab. Syst. 2005, 76, 121−133. (34) Danielsson, R.; Allard, E.; Sjöberg, P. J. R.; Bergquist, J. Chemom. Intell. Lab. Syst. 2011, 108, 33−48. (35) Schmarr, H.-G.; Bernhardt, J. J. Chromatogr., A 2010, 1217, 565−574. (36) Gallotta, F. D.; Christensen, J. H. J. Chromatogr., A 2012, 1235, 149−158. (37) van den Berg, R. A.; Hoefsloot, H. C.; Westerhuis, J. A.; Smilde, A. K.; van der Werf, M. J. BMC Genomics 2006, 7, 142. (38) Bro, R.; Smilde, A. K. J. Chemom. 2003, 17, 16−33. (39) Savitzky, A.; Golay, M. J. E. Anal. Chem. 1964, 36, 1627−1639. (40) Danielsson, R.; Bylund, D.; Markides, K. E. Anal. Chim. Acta 2002, 454, 167−184. (41) Christensen, J. H.; Tomasi, G.; de Lemos Scofield, A.; de Fatima Guadalupe Meniconi, M. Environ. Pollut. 2010, 158, 3290−3297. (42) Wang, H.; Prins, R. J. Catal. 2008, 258, 153−165. (43) Du, H.; Fairbridge, C.; Yang, H.; Ring, Z. Appl. Catal., A 2005, 294, 1−21.

7170

dx.doi.org/10.1021/ac403650d | Anal. Chem. 2014, 86, 7160−7170