Orthogonality Aspects - ACS Publications - American Chemical Society

Sep 3, 2013 - Laboratoire de Recherche des Monuments Historiques, 29 rue de Paris, 77420 Champs-sur-Marne, France. §. Department of Measurement ...
0 downloads 0 Views 739KB Size
Article pubs.acs.org/ac

Assessment of Two-Dimensional Separative Systems Using NearestNeighbor Distances Approach. Part 1: Orthogonality Aspects Witold Nowik,*,†,‡,⊥ Sylvie Héron,† Myriam Bonose,† Mateusz Nowik,§ and Alain Tchapla† †

Groupe de Chimie Analytique de Paris-Sud EA 4041, LETIAM, IUT d’Orsay, Univ. Paris-Sud, Plateau de Moulon, 91400 Orsay, France ‡ Laboratoire de Recherche des Monuments Historiques, 29 rue de Paris, 77420 Champs-sur-Marne, France § Department of Measurement and Electronics, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, AGH University of Science and Technology, Aleja Adama Mickiewicza 30, 30-059 Kraków, Poland S Supporting Information *

ABSTRACT: We propose here a new approach to the evaluation of two-dimensional and, more generally, multidimensional separations based on topological methods. We consider the apex plot as a graph, which could further be treated using a topological tool: the measure of distances between the nearest neighbors (NND). Orthogonality can be thus defined as the quality of peak dispersion in normalized separation space, which is characterized by two factors describing the population of distances between nearest neighbors: the lengths (di(o)) of distances and the degree of similarity of all lengths. Orthogonality grows with the increase of both factors. The NND values were used to calculate a number of new descriptors. They inform about the extent of peak distribution, like the arithmetic mean (A̅ (o)) of NNDs, as well as about the homogeneity of peak distribution, like the geometric mean (G̅ (o)) and the harmonic mean (H̅ (o)). Our new, NNDbased approach was compared with another recently published method of orthogonality evaluation: the fractal dimensionality (DF). The comparison shows that the geometric mean (G̅ (o)) is the descriptor behaving in the most similar way to dimensionality (DF) and the harmonic mean (H̅ (o)) displays superior sensitivity to the shortest, critical distances between peaks. The latter descriptor (H̅ (o)) can be considered as sufficient to describe the degree of orthogonality based on NND. The method developed is precise, simple, easy to implement, and possible to use for the description of separations in a true or virtual system of any number of dimensions. dimension should be great, allowing extraction of the “new” peaks from coelutions coming from the first dimension. Second, the resulting “orthogonal” separation should fully separate all sample peaks, preferably with baseline resolution. For the purpose of comparison between various twodimensional systems of separation, it is necessary to find an appropriate way to evaluate their orthogonality. Several approaches were used in this aim. They are periodically reviewed and compared.4−7,23 Orthogonality is usually investigated in normalized space defined as a rectangular surface delimited by the first and last retained compound in each dimension, and it may be evaluated mainly through the measure of (1) independence of dimensions, meaning divergence of retention mechanisms through visual comparison of projections of the retention in Cartesian space,24−27 correlation tests,7,13,28−32 regression trees, 3 3 , 3 4 factor analysis, 1 3 , 3 5 , 3 6 PCA 2 7 −2 9 , 3 7 −4 0 or

T

he development of two-dimensional (2D) separation techniques for separation of complex mixtures has been increasing in popularity recently, especially in the case of 2D gas chromatography (GC × GC), but also 2D liquid chromatography (LC × LC) thanks to a large choice of separation conditions (modes, stationary phases).1−8 They are often applied to the analysis of extracts from plants displaying nutritive or pharmacological interest.9−13 Two-dimensional separation is based on efficient separation systems (stationary phase/mobile phase) displaying strongly different selectivity in each dimension. The difference of selectivity is often called the “degree of orthogonality”. The more selectivity difference observed, the more orthogonal the system under consideration.14−22 Orthogonality in separative sciences can be defined as follows: “orthogonality of two dimensions of separation exists when the retention mechanisms in these dimensions can be considered as statistically independent”.17 According to Pellett et al.19 successful practical “orthogonal” separation should fulfill two conditions. First, the change in selectivity in the second dimension compared to the “primary” © 2013 American Chemical Society

Received: April 27, 2013 Accepted: September 3, 2013 Published: September 3, 2013 9449

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458

Analytical Chemistry

Article

peaks to “insert” the additional compounds which may possibly appear in other analyses of similar samples. Considering the frequency of peak clustering occurring in separation of real samples, it could be stated that the separation quality depends on the resolution of the worst resolved peaks. Then the shortest distances between the neighbor peaks appear extremely important in description of the whole separation. Two conditions are thus to be taken into account to estimate orthogonality: (1) spread maximizationthe peak apexes should be the most distant from each other; (2) spread uniformitythe dispersion of the lengths of distances between any apex and its closest neighbor should be as narrow as possible. That leads to a definition of orthogonality as “maximal and uniform distribution of all separated compounds in separation space”. Estimation of this distribution can be treated as a topological problem of distances between n points (peak apexes) scattered in 2D space. Nearest-Neighbor Distances. The shortest of all distances connecting a point with all other points (n − 1) from the population (N) containing n elements is defined as the nearestneighbor distance (di). The points connected by di are thus nearest neighbors. Each point has only one nearest neighbor but can be itself a nearest neighbor of several points. For example, let us take three nonequidistant points A, B, and C which have the lengths of their mutual distances ordered growing diAB < diBC < diCA. The nearest neighbor for B is A (and inversely), and for C is B (in this case inversion is not true). So they will be connected A−B−C. The points A and C are not connected, as they are already connected to their nearest neighbors. The point B is thus connected to two points. The number of di is 2. The connection of each point in population N with its nearest neighbor gives (n − 1) distances di. The so-obtained connected graph is called minimal spanning tree (MST), and its total length is the sum of all distances connecting the nearest neighbors (Figure 1). The minimal spanning tree is thus the shortest way, or at least equal in the length to any other possibility to connect all n points with no cycles using Euclidean distances. This approach was initially

HCA,27,32,33,41,42 measure of spreading angle,34 and through the information theory42 or divergence of stationary phases’ descriptors; 19,44−46 (2) spatial spreading of peaks by information theory25,35,36,43,47−52 or system dimensionality;53 (3) peak capacity;18,35,54,55 as well as (4) the part occupied by peaks in separation space (often expressed as percentage).4,20,35,56−60 Some general comments on the choice of appropriate methodology were given by Stoll et al.4 and Schure.53 The former authors indicated that correlation is an insufficient approach to orthogonality and the calculation of the fraction of 2D separation area is better adapted. The latter author, quoting earlier observations of Watson et al.,20 suggested that peak spreading measure methods give a better expression of orthogonality than methods based on the measure of the occupied fraction of separation space. Many methods of evaluation of orthogonality are difficult to use because of complex theory or fastidious calculations, e.g., information theory.35 Other approaches do not seem welladapted, e.g., correlation methods or space occupation calculations. Some of them lack univocal policy in the choice of primary parameters, e.g., the size of bins in the space occupation approach4 or the criteria for definition of initial slope in the system dimensionality graphic approach using the box counting method.53 That can result in necessarily approximate values. To overcome these drawbacks, we developed a new approach to the multidimensional systems orthogonality which is based on topological measure of the shortest distances between adjacent objects, called nearest-neighbor distance (NND).61,62 The application of this method, using additional descriptors proposed in an original publication,61 was tentatively tried for evaluation in 2D chromatography.63 However, the descriptors were complex and the results were judged unsatisfactory. In the present work, we propose a series of simpler descriptors: arithmetic mean (A̅ (o)), geometric mean (G̅ (o)), and harmonic mean (H̅ (o)) of NNDs. They allow the evaluation of peak spreading quality. The approach is illustrated by predictive 2D separation of anthraquinone derivatives in reversed-phase liquid chromatography starting from retention data collected for various stationary phases.64,65 The phases with the most pronounced selectivity divergences were selected, and their virtual 2D couplings were investigated using the nearest-neighbor distances calculation and evaluated with some newly proposed descriptors. The obtained results were compared with another, recent approach to the measure of orthogonality: the system dimensionality (DF).53



THEORETICAL BACKGROUND Definition of Orthogonality from the Point of View of 2D Separation Quality. The main requirement of effective separations is to allow separation of all compounds from the sample. In complex samples, where coelutions may occur, the number of maxima (peak apexes) observed in a chromatogram can be a sufficient criterion to compare and range several separation systems. Thus, the chromatograms displaying the highest number of peaks indicate the best two-dimensional systems. If several chromatograms contain the same number of peaks, their separation power can be measured as peak spreading uniformity. The aim of obtaining the most uniform peak dispersion is to get sufficient room between any pair of

Figure 1. Minimal spanning tree (MST) connecting nine randomly distributed points on Euclidean surface through nearest-neighbor distances (NNDs, dotted lines). 9450

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458

Analytical Chemistry

Article

bound of the data series using the harmonic mean (H̅ (o)), because dimin < H̅ (o) < G̅ (o). In this case, the shortest and most critical distances from the entire population of distances have the most decisive influence on the value of this descriptor. So, we decided to use the harmonic mean (H̅ (o)) of distances (di(o)) to reveal the importance of the presence of the shortest distances in normalized separation space:

used by Borůvka in 1926 to set up the electricity network in Moravia region (Czech Republic, EU) requiring the minimal cost of construction.66 The classical applications of the MST are for 2D surfaces, but it is easy to imagine using them in the n-dimensional space, including one-dimensional space (linear distribution). In the latter case the MST is equivalent to the distance from first to last point on the line. The measure of distances between the nearest neighbors (di) in the population of peaks (N) may provide a lot of information useful to qualify the orthogonality potential of a twodimensional separation system. Several aspects of two-dimensional apex plots can be described using retention parameters, namely, peak spreading and dispersion homogeneity. Data Preparation: Retention Parameters. To accentuate the selectivity effect of systems and discharge the proportion of variability coming from overall retention, the normalization of retention times is necessary. The normalization proposed by Steuer et al.25 allows obtaining the “retention parameters”: χi =

(ti − t1) (tn − t1)

H̅ (o) =

(1)

DF = lim

ε→ 0

log N (ε) log 1/ε

(5)



EXPERIMENTAL SECTION Software. Calculation of the retention coefficients and apex plots was done with Excel (Microsoft France, Issy-lesMoulineaux, France). The statistical treatments, principal component analysis (PCA) used in selection of stationary phases and hierarchical cluster analysis (HCA) employed for NND calculations, were performed using XLStat (AddinSoft, Paris, France) an add-in Excel tool. The intermediary data (di(o) of NND) were extracted from HCA analysis, and orthogonality descriptors were again calculated under Excel with appropriated formulas. Finally, MatLab (MathWorks, Natic, MA, U.S.A.) was used to create a script for automatic calculations of all descriptors and plot of 2D apex plots starting from retention data. The script (“Orthogonality” MatLab script) is available in the Supporting Information. Selection of Stationary Phases and Retention Data. Using retention data for 40 anthraquinone derivatives (listed in the Supporting Information, Table S-1) obtained on several stationary phases we studied formerly,64,65 we tried some preliminary 2D simulation plots for various pairs of phases. So, we used a bulk correlation method for detection of a potentially orthogonal combination of stationary phases. For that purpose we used the examination of the dissimilarity of a whole set of stationary phases by PCA, using the retention coefficient values of all anthraquinoid standards.64,65 The correlation matrix obtained in PCA (results not shown) was inspected to find the phases the least correlated with any others.

n−1

(2)

The n is a number of peaks. In predictive systems it corresponds to the number of injected compounds, and in real separations it signifies the number of observed apexes. Descriptors of Dispersion Homogeneity. According to human perception experiments, the visual appreciation of the distribution homogeneity of objects on a surface is mostly due to the possible appearance of “detached” points and clusters and their respective ratios. Quantification of that appreciation could be fairly well modeled by the geometric mean (G̅ (o)) of distances di(o) between n points (distinguishable peak apexes):67 ⎛n − 1 ⎞1/ n − 1 G(o) ̅ = ⎜ Π di(o)⎟ ⎝ i=1 ⎠

(4)

In this equation, N(ε) stands for the number of geometrically defined surface units (squares), obtained from the division of the normalized surface and filled with peak apexes, and ε is the size of the surface unit (square). The plot of change of peak apexes density, log N(ε) versus log ε, gives the fractal dimension DF from the slope of linear correlation of the initial points, multiplied by −1. The fractal dimension for 1D separations takes the value 0 < DF ≤ 1 and for 2D systems 1 < DF ≤ 2.

∑i = 1 di(o) n−1

n−1 1 di(o)

∑i = 1

with n being the number of observed peak apexes. Comparison with System Dimensionality. We decided to compare the proposed nearest-neighbor distances approach with the recently proposed expression of system dimensionality,53 which gives a very interesting connection between the orthogonality and sample dimensionality defined by Giddings.15 The dispersion of points on a surface can be measured as fractal dimension DF, using the equation

where ti is the retention time of ith compound and t1 is the retention time of first and tn is the retention time of last eluted compounds. The values obtained vary from 0 to 1 from the least to the most retained compound in each dimension. This type of normalization takes into account only the time window effectively occupied by compounds, which is usually narrower than the whole analysis space. The so-normalized 2D space will be later called “normalized separation space” or just “normalized space”. Descriptor of Peak Spreading. Using minimal distances calculation in the normalized separation space it is possible to approach peak spreading. To express the quality of this spreading in normalized space, the arithmetic mean (A̅ (o)) of distances di(o) will be used:

A̅(o) =

n−1

(3)

For a series of data with a minimal value (dimin) and a maximal value (dimax), the geometric mean G̅ (o) gives more weight to low values in comparison with arithmetic mean (A̅ (o)): dimin < G̅ (o) < A̅ (o) < dimax. This observation is consistent with our conditions of evaluation of orthogonality, which emphasize the importance of small distances between not uniformly distributed peak apexes. However, there is a possibility to give even more importance to the values close to the lower 9451

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458

Analytical Chemistry

Article

Figure 2. continued

9452

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458

Analytical Chemistry

Article

Figure 2. Apex plots on 2D normalized surfaces obtained with various combinations of stationary phases: (a) PYE−PEG; (b) NEC−PYE; (c) FSP− PYE; (d) Gold−PEG; (e) Gold−PYE; (f) NEC−PEG; (g) PYE−CDG; (h) FSP−PEG; (i) Gold−CDG; (j) FSP−CDG; (k) Gold−FSP; (l) NEC− FSP; (m) NEC−CDG; (n) PEG−CDG; (o) NEC−Gold.

recovered (copied) from the result sheet of the software used into the Excel sheet. This intermediary data was used for further calculations of descriptors: A̅ (o), G̅ (o), and H̅ (o) under Excel environment. Any other clustering software performing HCA, ex. Statistica, or several Internet-accessible freeware programs can be employed in the purpose of aggregation distances data delivery. System Dimensionality Evaluation Methodology. System dimensionality (DF) can be calculated by dividing the separation area into square boxes and by performing computation of filled boxes only.53 We adopted this method, dividing synchronically each dimension by k = 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 20, and 32 to obtain successively k2 square boxes ε of side size 1/k. The obtained diagrams representing the number of filled boxes (as log N(ε)) in function of box size (as log ε) displayed quasi-linear dependence of both variables in the part corresponding to the large box size. The slope of the Pearson’s determination coefficient (R2) was used in the system dimensionality evaluation according to the original publication.53

Average correlation coefficients, extracted from the correlation matrix, indicate that only four stationary phases show values below 0.9: ChiraDex Gamma (underivatised γ-cyclodextrin bonded silica, abbreviated CDG), Discovery HS-PEG (poly(ethylene glycol) bonded silica, abbreviated PEG), Cosmosil 5PYE (2-(1-pyrenyl)ethyl bonded silica, abbreviated PYE) and Fluorosep-RP (pentafluorophenyl bonded silica, abbreviated FSP). To these selected functionalized phases we added the ODS reference phase Uptisphere NEC (monolayer octadecyl bonded silica, non end-capped, abbreviated NEC)64 as well as another ODS phase of slightly different selectivity, Hypersil Gold (monolayer octadecyl bonded silica, end-capped, abbreviated Gold) to complete the set submitted for further evaluations. The characteristics of the selected phases are given in Table S-2 (see the Supporting Information). Practical Implementation of NND Methodology. The calculation of distances connecting nearest neighbors may be seen as the construction of the matrix n × n of distances di between n points. The first n − 1 lowest values extracted from half of the obtained matrix, except the diagonal, correspond to the foreseen linkage di of all n points. To calculate nearest-neighbor Euclidean distances from their x, y coordinates several algorithms can be used. We simply used the functionality allowing the calculation of heights of aggregation distances in the HCA in XLStat software. The series of coordinates (retention parameters) describing the positions of n objects (in our case peak apexes) were introduced as primary data. The distances di to calculate were defined as “Euclidean”, and the linkage between successive clusters was set as “single linkage”. Single linkage defines the shortest distance (Δ) between two clusters A and B as a shortest distance (di) between elements I and J from each cluster: Δ(A , B) = min min d(I , J ) = di(I , J ) I∈A J∈B



RESULTS AND DISCUSSION NND Orthogonality. The retention coefficients (χi), calculated from raw retention data for six selected stationary phases, are also given in the Supporting Information (Table S3). The two-dimensional diagrams of normalized 2D retention of standards obtained with various combinations of stationary phases are reported in Figure 2. The values of orthogonality descriptors, calculated with eqs 2−4, ranged according to decreasing H̅ (o), are presented in Table 1. The system classifications obtained with various descriptors are different, but the most and least orthogonal systems are the same for all NND-based descriptors. The column set classified the most orthogonal is PYE−PEG (Figure 2a). Indeed, the peak scattering in 2D normalized space is the most extended (A̅ (o) = 0.084) and the least clustered (G̅ (o) = 0.067, H̅ (o) = 0.054). The system found second using homogeneity descriptors (G̅ (o) and H̅ (o)) is NEC−PYE (Figure 2b). Although the arithmetic mean of distances di(o) is not very high (A̅ (o) = 0.077), the harmonic mean and the geometric mean are relatively good (H̅ (o) = 0.047 and G̅ (o) = 0.064) compared to next stationary phases combinations.

(6)

This concerns also the monoelemental “clusters”, actually, elements. Examining all possible links between all elements, n − 1 successive clusters which correspond to n − 1 distances di are obtained. The software calculated the heights of knots (aggregation distances), which were the values di(o) of NND. They could be 9453

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458

Analytical Chemistry

Article

showing important aggregation of peaks. The combination of phases FSP−PEG (Figure 2h) has the same harmonic mean value as PYE−CDG (Figure 2g) and very close geometric mean value, but a smaller extension descriptor (A̅ (o) = 0.077). From the point of view of arithmetic mean the better classified system is PYE−CDG, and from the point of view of geometric mean the better one is FSP−PEG. The strong clustering of peaks apexes, present in the PYE−CDG system, was observed also in other CDG-based systems. As could be observed, the set of columns NEC−PEG (Figure 2f) gives more homogeneous peak scattering than for couple of phases PYE−CDG (Figure 2g), because the contribution of small di(o) in the population of NNDs is less important (H̅ (o) = 0.038) for the first of them than for the second system (H̅ (o) = 0.036). However, the arithmetic mean ranges them in the inverse order, because of wider normalized area coverage in the case of PYE−CDG. The harmonic mean of PEG−CDG (Figure 2n) phases combination, H̅ (o) = 0.018, points out a dramatic case of clustering. It is noteworthy that the retention mechanism for both phases composing the system is strongly differentonly four compounds are located close to diagonal. Indeed, the majority of compounds are retained on CDG phase in the very narrow zone. The system of phases NEC−Gold (Figure 2o) is practically nonorthogonal, and the harmonic mean is low (H̅ (o) = 0.017) and very close to the former system PEG−CDG. Both systems are very different from the point of view of space occupation (A̅ (o), respectively, 0.040 and 0.064), but the very inhomogeneous peak spreading in PEG−CDG gives to this system poor

Table 1. Estimation of Orthogonality nearest-neighbor distances

dimensionality

harmonic mean (H̅ (o))

fractal function (DF)

0.054 0.047 0.042 0.039 0.039 0.038 0.036 0.036 0.031 0.030 0.030 0.027 0.026 0.018 0.017

1.65 1.64 1.62 1.59 1.63 1.60 1.57 1.61 1.50 1.50 1.50 1.44 1.41 1.44 1.28

phases combinations

arithmetic mean (A̅ (o))

geometric mean (G̅ (o))

PYE−PEG NEC−PYE FSP−PYE Gold−PEG Gold−PYE NEC−PEG PYE−CDG FSP−PEG Gold−CDG FSP−CDG NEC−CDGa Gold−FSP NEC−FSP PEG−CDG NEC−Gold

0.084 0.077 0.077 0.072 0.075 0.073 0.080 0.077 0.070 0.078 0.073 0.056 0.058 0.064 0.040

0.067 0.064 0.061 0.051 0.059 0.051 0.053 0.054 0.045 0.047 0.047 0.041 0.043 0.038 0.026

a

Only 39 peaks on 40 compounds are distinguishable. The distance di(o) = 0 is excluded from G̅ (o) and H̅ (o) calculation.

The third system FSP−PYE (Figure 2c) as well as the eighth system FSP−PEG (Figure 2h) have the same surface coverage (A̅ (o) = 0.077) as the NEC−PYE, but FSP−PYE is slightly more clustered (G̅ (o) = 0.061 and H̅ (o) = 0.042) and FSP−PEG is much more clustered (G̅ (o) = 0.054 and H̅ (o) = 0.036). The PYE−CDG column set (Figure 2g) has the second best spreading extension descriptor (A̅ (o) = 0.080). However, the clustering descriptors G̅ (o) = 0.053 and H̅ (o) = 0.036 are low

Figure 3. Correlation diagram between system dimensionality (DF) and harmonic mean of di(o) (H̅ (o)). 9454

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458

Analytical Chemistry

Article

orthogonality, even if some compounds are responsible for its good normalized surface coverage. When comparing the systems classification of the descriptors used it could be observed that the arithmetic mean is a good descriptor of surface coverage, while the geometric mean and the harmonic mean give information on the importance of clustering. The values of geometric mean, being intermediate between arithmetic and harmonic ones, seem to be less sensitive to the clustering effect than harmonic mean. Comparison with the System Dimensionality (DF) Approach. The sets of columns PYE−PEG, NEC−PYE, Gold−PYE, and FSP−PYE give in decreasing order the best system dimensionalities (DF, Table 1). These combinations of orthogonal stationary phases are the same as those found in the most orthogonal using harmonic mean of NNDs (H̅ (o)), but Gold−PYE and FSP−PYE are in inverse order. Also the next Gold−PEG and NEC−PEG systems are ordered in another way than was done by H̅ (o). These differences in classification order between DF and H̅ (o) are numerous as observed in Table 1. The correlation diagram between the harmonic mean (H̅ (o)), from the nearest-neighbor approach, and system dimensionality (DF), from fractal approach, is drawn in Figure 3. Little correlation could be observed between both orthogonality descriptors (R2 = 0.812). The reason is that the system dimensionality calculations based on the number of filled boxes in function of box size (eq 5) depends much on general scatter and little on local clusters as the correlation slope is determined for the boxes of the largest size.53 On the contrary, the harmonic average of the distances between the nearest neighbors (H̅ (o)) is strongly influenced by the shortest distances and sensitive to the presence of clusters. From the point of view of dispersion quality this descriptor is more suitable to the classification of 2D apex plots. Influence of Apex Distribution Change on Descriptors. To check out the influence of the changes in distribution on values of descriptors, we generated four additional plots. They were done by moving peaks from one of the obtained apex plots. Using the distribution of peak apexes of the column combination PEG−CDG (Figure 2n), we generated two systems in which we changed the position of one point. We have selected this column combination because it contains very short distances as well as very long distances, so the alterations we made highlight the differences between compared descriptors. In the first series of data (“long-distance influenced”) we replaced one of the coordinates of the point (x = 0.757, y = 0.234) from the gravity center of the whole population (Figure 2n) to the new coordinates, putting the point in the extremely remote position (x = 0, y = 1) in the upper left corner of the apex plot (Figure 4a). This was done to verify the sensitivity of descriptors on the presence of outliers. In the second series of data (“short-distance influenced”), we modified very slightly the position of one of the points from the couple of the closed neighbors (x = 0.718, y = 0.154, Figure 2n) by enlarging it in the X axis by 0.001 (x = 0.719, y = 0.154, Figure 4b). Actually, this operation doubled the distance of the shortest one from the whole population and allowed us to check the sensitivity of descriptors on changes in the shortest and most critical distances.

Figure 4. Dispersion of apexes in a modified apex plot of PEG−CDG: ○, apexes; ×, removed apexes; ●, moved apexes. (a) Change of one average distance to one long distance (“long-distance influence”). (b) Doubling of the smallest distance (“short-distance influence”)the former peak position not indicated (too close to be clearly seen). (c) Removal of five “internal” peaks (“internal peak number influence”). (d) Removal of five “external” peaks (“external peak number influence”). The dotted line rectangle corresponds to the new space normalization.

Further changes were made by cancellation of five points from the initial PEG−CDG scatter plot for the third and fourth series of data. A third series of data (“internal peak number influenced”) contained thus only 35 points. The points were removed randomly from central positions in the peak apex (Figure 4c). This operation implied at the same time a side effect consisting in removing five average distances from the distances population. The last series of data (“external peak number influenced”) also contained 35 points, but the points removed were the most remote ones from the mass center of the apex plot (Figure 4d). Removal of these points implied the renormalization of apexes coordinates, because they delimited a former normalization. Also, this change eliminates a couple of the longest distances di from distances population. The two above-described apex number modifications have in fact a mixed impact on the change of values of tested descriptors. The real changes of values of descriptors between original and modified distributions, as well as relative changes effects, in percentage (%), are presented in Table 2. In agreement with the assumptions of our approach (Definition of Orthogonality from the Point of View of 2D Separation Quality section), we looked for the longest and most similar distances between all nearest neighbors. We considered the crucial role of the distribution of length of critical (the shortest) distances. Therefore, the most suitable system descriptors should be those which are sensitive to 9455

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458

Analytical Chemistry

Article

Table 2. Sensitivity of Descriptors to the Change of Peak Distribution and Peak Numbera descriptor value

descriptor change (%)

descriptor

PEG−CDG

long-distance influenced

short-distance influenced

internal peak no. influenced

external peak no. influenced

long-distance influence

shortdistance influence

A̅ (o) G̅ (o) H̅ (o) DF

0.063556 0.0376 0.0176 1.44

0.082015 0.0403 0.0178 1.52

0.063559 0.0381 0.0216 1.44

0.069523 0.0397 0.0173 1.41

0.113466 0.0859 0.0321 1.79

29.0 7.2 0.8 5.6

0.004 1.3 22.8 0

a

Relative best scores for each type of influence are written in bold font.

changes in the shortest distances and relatively insensitive to the longest distances as well as the peak number. The descriptor least sensitive to the influence of long distances is the harmonic mean (H̅ (o)) which grows 0.8%. The geometric mean (G̅ (o)) is influenced more, 7.2%, but this change remains quite low and comparable with 5.6% displayed by system dimensionality (ΔDF). The same descriptors are also slightly variable with “internal” peak number change. This time, the harmonic mean and system dimensionality changes are very similar, respectively, −2.0% and −2.1%. The geometric mean varies in an opposite, increasing direction: 5.7%. The apparition of long distances strongly impacts the arithmetic mean (A̅ (o)) with 29.0% of change in our example. Arithmetic mean and geometric mean as well as system dimensionality are practically insensitive to the influence of short distances, whereas harmonic mean displays sensitivity toward this factor. The impact of internal peak number is noticeable practically only for arithmetic mean. The alteration of peak number by removing the external ones greatly influences all descriptors, which was expected because of the redefinition of limits of the normalized separation space. The least sensitive is the system dimensionality descriptor (DF) which grows by 24.3%. The arithmetic and harmonic means change in practically the same way. But the observed effects are not due to the same reasons, as the sensitivity of both descriptors to various factors is different and the renormalization introduces complex changes. The most convenient descriptor appears thus to be the harmonic mean (H̅ (o)), which varies greatly with the smalldistances length change. This change in our example reach 22.8% for doubling of distance between the closest positioned peaks (from di(o) = 0.001 to di(o) = 0.002). The other changes, replacement of one average length distance by the long one, connecting a remote point, or lowering of peak number by extracting five of them from inside the cloud of the peak apexes, does not have a large impact on harmonic mean. The respective changes are 0.8% and −2.0%. If the normalized separation space is identical for different peak apex plots the system dimensionality (DF) could be replaced in a fairly satisfying way by the geometric mean (G̅ (o)). Both descriptors have very similar and low sensitivity to shortand long-distance influences. Reduction of the number of peaks inside the normalized separation space produces in the presented example opposite reactionsG̅ (o) increases and DF decreasesbut the absolute changes are small, respectively, 5.7% and −2.1%. The arithmetic mean (A̅ (o)) of distances (di(o)) could be used to express normalized surface covering extent, if necessary.



internal peak no. influence

external peak no. influence

9.4 5.7 −2.0 −2.1

78.5 128.6 82.2 24.3

CONCLUSION

The information extracted from the NND approach using the proposed descriptors allows the description of the orthogonality of 2D systems according to dispersion criteria. When the number of peaks in compared apex plots are identical, the appropriate and precise description of systems in NND orthogonality evaluation could be made with the harmonic mean (H̅ (o))a descriptor sensitive to the shortest distances between the nearest neighbors. The NND approach describes the peak apexes distribution from another point of view than fractal dimensionality DF. The harmonic mean can be then used as the unique descriptor for orthogonality estimation with the NND approach. The evaluation of orthogonality is done sequentially, following these steps: (1) extraction of the 2D retention time coordinates of peaks 1tR, 2tR; (2) conversion of retention time coordinates to retention coefficients 1χ, 2χ in normalized space; (3) calculation of nearest-neighbor distances (NND) in normalized space di(o); (4) calculation of descriptors arithmetic mean (A̅ (o)) and harmonic mean (H̅ (o)) of distances di(o); (5) ranking of systems on the basis of decreasing clustering descriptor (H̅ (o)). Although nearly all steps may be done with simple calculations using an Excel sheet, the NNDs’ computation need to be performed with any software dealing with HCA. This clustering should be made on retention coefficients setting distances Euclidean and links single. The “heights of knots” calculated by software correspond to distances di(o). The whole calculation chain could be “automated” under MatLab. The script we add in the Supporting Information (S-5) calculates all descriptors and shows peak spreading in normalized space with input of retention times. The nearest-neighbor distances approach and calculation of simple orthogonality descriptors based on such distances is a very attractive method of comparison of the 2D chromatograms. Although we demonstrated here the application for twodimensional separations, the proposed approach works in the same way for systems of any number of dimensions. Also, this approach can be applied for any multidimensional couplings of GC, LC, supercritical fluid chromatography (SFC), capillary electrophoresis (CE), and mass spectrometry (MS) giving analytical data described by multiple coordinates. Finally, the calculation of nearest-neighbor distances, as well as all descriptors, is accurate and independent of any supplementary conditions. It can be easily performed using basic statistical and mathematical software. 9456

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458

Analytical Chemistry



Article

(24) Gonnord, M. F.; Levi, F.; Guiochon, G. J. Chromatogr., A 1983, 264, 1−6. (25) Steuer, W.; Grant, I.; Erni, F. J. Chromatogr., A 1990, 507, 125− 140. (26) Neue, U. D.; Alden, B. A.; Walter, T. H. J. Chromatogr., A 1999, 849, 101−116. (27) Turowski, M.; Morimoto, T.; Kimata, K.; Monde, H.; Ikegami, T.; Hosoya, K.; Tanaka, N. J. Chromatogr., A 2001, 911, 177−190. (28) Van Gyseghem, E.; Van Hemelryck, S.; Daszykowski, M.; Questier, F.; Massart, D. L.; Vander Heyden, Y. J. Chromatogr., A 2003, 988, 77−93. (29) Van Gyseghem, E.; Crosiers, I.; Gourvénec, S.; Massart, D. L.; Vander Heyden, Y. J. Chromatogr., A 2004, 1026, 117−128. (30) Van Gyseghem, E.; Jimidar, M.; Sneyers, R.; Redlich, D.; Verhoeven, E.; Massart, D. L.; Vander Heyden, Y. J. Chromatogr., A 2004, 1042, 69−80. (31) Forlay-Frick, P.; Van Gyseghem, E.; Héberger, K.; Vander Heyden, Y. Anal. Chim. Acta 2005, 539, 1−10. (32) Zhu, S. J. Chromatogr., A 2009, 1216, 3312−3317. (33) Put, R.; Van Gyseghem, E.; Coomans, D.; Vander Heyden, Y. J. Chromatogr., A 2005, 1096, 187−198. (34) Van Gyseghem, E.; Jimidar, M.; Sneyers, R.; Redlich, D.; Verhoeven, E.; Massart, D. L.; Vander Heyden, Y. J. Chromatogr., A 2005, 1074, 117−131. (35) Liu, Z.; Patterson, D. G.; Lee, M. L. Anal. Chem. 1995, 67, 3840−3845. (36) Cordero, C.; Rubiolo, P.; Sgorbini, B.; Galli, M.; Bicchi, C. J. Chromatogr., A 2006, 1132, 268−279. (37) Lochmüller, C. H.; Breiner, S. J.; Reese, C. E.; Koel, M. N. Anal. Chem. 1989, 61, 367−375. (38) Vervoort, R. J. M.; Debets, A. J. J.; Claessens, H. A.; Cramers, C. A.; de Jong, G. J. J. Chromatogr., A 2000, 897, 1−22. (39) Detroyer, A.; Schoonjans, V.; Questier, F.; Vander Heyden, Y.; Borosy, A. P.; Guo, Q.; Massart, D. L. J. Chromatogr., A 2000, 897, 23−36. (40) Felinger, A.; Kele, M.; Guiochon, G. J. Chromatogr., A 2001, 913, 23−48. (41) Guiochon, G.; Beaver, L. A.; Gonnord, M. F.; Siouffi, A. M.; Zakaria, M. J. Chromatogr. 1983, 255, 415−437. (42) Giddings, J. C. In Multidimensional Chromatography: Techniques and Applications; Cortes, H. J., Ed.; Marcel Dekker Inc.: New York, 1990; pp 1−27. (43) Slonecker, P. J.; Li, X.; Ridgway, T. H.; Dorsey, J. G. Anal. Chem. 1996, 68, 682−689. (44) Biswas, K. M.; Castle, B. C.; Olsen, B. A.; Risley, D. S.; Skibic, M. J.; Wright, P. B. J. Pharm. Biomed. Anal. 2009, 49, 692−701. (45) Zhang, Y.; Carr, P. W. J. Chromatogr., A 2009, 1216, 6685− 6694. (46) Dolan, J. W. LC·GC Eur. 2011, 3, 142−149. (47) Dupuis, F.; Dijkstra, A. Anal. Chem. 1975, 47, 379−383. (48) Eskes, A.; Dupuis, F.; Dijkstra, A.; De Clercq, H.; Massart, D. L. Anal. Chem. 1975, 47, 2168−2174. (49) Whelan, T. J.; Gray, M. J.; Slonecker, P. J.; Shalliker, R. A.; Wilson, M. A. J. Chromatogr., A 2005, 1097, 148−156. (50) Shalliker, R. A.; Gray, M. J. Adv. Chromatogr. 2006, 44, 177− 236. (51) Dumarey, M.; Vander Heyden, Y.; Rutan, S. Anal. Chem. 2010, 82, 6056−6065. (52) Pourhaghighi, M. R.; Karzand, M.; Girault, H. H. Anal. Chem. 2011, 83, 7676−7681. (53) Schure, M. R. J. Chromatogr., A 2011, 1218, 293−302. (54) Giddings, J. C. Anal. Chem. 1967, 39, 1027−1028. (55) Davis, J. M.; Carr, P. W. Anal. Chem. 2009, 81, 1198−1207. (56) Gray, M.; Dennis, G. R.; Wormell, P.; Shalliker, R. A.; Slonecker, P. J. Chromatogr., A 2002, 975, 285−297. (57) Ryan, D.; Morrison, P.; Marriott, P. J. Chromatogr., A 2005, 1071, 47−53. (58) Semard, G.; Peulon-Agasse, V.; Bruchet, A.; Bouillon, J.-P.; Cardinaël, P. J. Chromatogr., A 2010, 1217, 5449−5454.

ASSOCIATED CONTENT

S Supporting Information *

Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Phone: +33 0140205651. Fax: +33 0147033246. E-mail: [email protected]. Present Address ⊥

Centre de Recherche et de Restauration des Musées de France (C2RMF), 14 quai François Mitterrand, 75001 Paris, France. Notes

The authors declare no competing financial interest.

■ ■

ACKNOWLEDGMENTS We greatly acknowledge Michel Martin (ESPCI, Paris, France) for critical and extremely useful comments on the manuscript. REFERENCES

(1) Gray, M.; Sweeney, A. P.; Dennis, G. R.; Slonecker, P.; Shalliker, R. A. Analyst 2003, 128, 598−604. (2) Jandera, P. J. Sep. Sci. 2006, 29, 1763−1783. (3) Shellie, R. A.; Haddad, P. R. Anal. Bioanal. Chem. 2006, 386, 405−415. (4) Stoll, D. R.; Li, X.; Wang, X.; Carr, P. W.; Porter, S. E. G.; Rutan, S. C. J. Chromatogr., A 2007, 1168, 3−43. (5) Guiochon, G.; Marchetti, N.; Mirziq, K.; Shalliker, R. A. J. Chromatogr., A 2008, 1189, 109−168. (6) François, I.; Sandra, K.; Sandra, P. Anal. Chim. Acta 2009, 641, 14−31. (7) Al Bakain, R.; Rivals, I.; Sassiat, P.; Thiébaut, D.; Hennion, M. C.; Euvrard, G.; Vial, J. J. Chromatogr., A 2011, 1218, 2963−2975. (8) Kivilompolo, M.; Pól, J.; Hyötyläinen, T. LC·GC Eur. 2011, 24, 232−243. (9) Dixon, S. P.; Pitfield, I. D.; Perrett, D. Biomed. Chromatogr. 2006, 20, 508−529. (10) Jandera, P.; Vyňuchalová, K.; Hájek, T.; Č esla, P.; Vohralík, G. J. Chemom. 2007, 22, 203−217. (11) Herrero, M.; Ibáñez, E.; Cifuentes, A.; Bernal, J. J. Chromatogr., A 2009, 1216, 7110−7129. (12) Dugo, P.; Cacciola, F.; Donato, P.; Airado-Rodríguez, D.; Herrero, M.; Mondello, L. J. Chromatogr., A 2009, 1216, 7483−7487. (13) Mnatsakanyan, M.; Stevenson, P. G.; Shock, D.; Conlan, X. A.; Goodie, T. A.; Spencer, K. N.; Barnett, N. W.; Francis, P. S.; Shalliker, A. R. Talanta 2010, 82, 1349−1357. (14) Giddings, J. C. J. High Resolut. Chromatogr. 1987, 10, 319−323. (15) Giddings, J. C. J. Chromatogr., A 1995, 703, 3−15. (16) Blomberg, J.; Schoenmakers, P. J.; Beens, J.; Tijssen, R. J. High Resolut. Chromatogr. 1997, 20, 539−544. (17) Schoenmakers, P.; Marriott, P.; Beens, J. LC·GC Eur. 2003, 16, 335−339. (18) Gilar, M.; Olivova, P.; Daly, A. E.; Gebler, J. C. Anal. Chem. 2005, 77, 6426−6434. (19) Pellett, J.; Lukulay, P.; Mao, Y.; Bowen, W.; Reed, R.; Ma, M.; Munger, R. C.; Dolan, J. W.; Wrisley, L.; Medwid, K.; Toltl, N. P.; Chan, C. C.; Skibic, M.; Biswas, K.; Wells, K. A.; Snyder, L. R. J. Chromatogr., A 2006, 1101, 122−135. (20) Watson, N. E.; Davis, J. M.; Synovec, R. E. Anal. Chem. 2007, 79, 7924−7927. (21) Bedani, F.; Kok, W. T.; Janssen, H. G. Anal. Chim. Acta 2009, 654, 77−84. (22) Blumberg, L.; Klee, M. S. J. Chromatogr., A 2010, 1217, 99−103. (23) Gilar, M.; Fridrich, J.; Schure, M. R.; Jaworski, A. Anal. Chem. 2012, 84, 8722−8732. 9457

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458

Analytical Chemistry

Article

(59) Omais, B.; Courtiade, M.; Charon, N.; Thiébaut, D.; Quignard, A.; Hennion, M.-C. J. Chromatogr., A 2011, 1218, 3233−3240. (60) Rutan, S. C.; Davis, J. M.; Carr, P. W. J. Chromatogr., A 2012, 1255, 267−276. (61) Brown, D.; Rothery, P. Biometrika 1978, 65, 115−122. (62) Ripley, B. D. J. R. Stat. Soc., Ser. B 1979, 41, 368−374. (63) Semard, G. Recherche et quantification de substances cibles et émergentes dans les eaux résiduaires à l′aide de la chromatographie en phase gazeuse bidimensionnelle. Ph.D. Thesis, Université de Rouen, 2009. (64) Bonose-Crosnier de Bellaistre, M.; Nowik, W.; Tchapla, A.; Héron, S. J. Chromatogr., A 2011, 1218, 778−786. (65) Nowik, W.; Bonose-Crosnier de Bellaistre, M.; Tchapla, A.; Héron, S. J. Chromatogr., A 2011, 1218, 3636−3647. (66) Nešetřil, J.; Milková, E.; Nešetřilová, H. Discrete Math. 2001, 233, 3−36. (67) Gauvrit, N.; Delahaye, J. P. Math. Soc. Sci. 2006, 175, 41−51.

9458

dx.doi.org/10.1021/ac4012705 | Anal. Chem. 2013, 85, 9449−9458