A Hierarchical Multivariate Curve Resolution Methodology To Identify

Oct 24, 2018 - A Hierarchical Multivariate Curve Resolution Methodology To Identify and Map Compounds in Spectral Images. Clémence Fauteux-Lefebvreâ€...
0 downloads 0 Views 4MB Size
Article Cite This: Anal. Chem. XXXX, XXX, XXX−XXX

pubs.acs.org/ac

A Hierarchical Multivariate Curve Resolution Methodology To Identify and Map Compounds in Spectral Images ́ ence Fauteux-Lefebvre,† Francis Lavoie,‡ and Ryan Gosselin*,‡ Clem †

Department of Chemical and Biological Engineering, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada Department of Chemical and Biotechnological Engineering, Université de Sherbrooke, Sherbrooke, Québec J1K 2R1, Canada



Anal. Chem. Downloaded from pubs.acs.org by UNIV OF WINNIPEG on 10/28/18. For personal use only.

S Supporting Information *

ABSTRACT: The use of spectroscopic methods, such as near-infrared or Raman, for quality control applications combined with the constant search for finer details leads to the acquisition of increasingly complex data sets. This should not prevent the user from characterizing a sample by identifying and mapping its chemical compounds. Multivariate data analysis methods make it possible to obtain qualitative and quantitative information from such data sets. However, samples containing a large (and/or unknown) number of species, segregated trace compounds (present in few pixels), low signal-to-noise ratios (SNR), and often insufficient spatial resolutions still represent significant hurdles for the analyst.

I

(1) they form a miscible blend or (2) they form an immiscible blend with domain sizes inferior to the spatial resolution of the probe. In contrast, mixed pixels contain information from two or more species present as pure pixels elsewhere in the spectral image. Many factors can lead to the presence of mixed pixels. The most common is the presence of boundaries between homogeneous domains of pure compounds. This can occur at the surface of a sample, at the edge of two large domains, or inside the sample if the depth of penetration of the analysis is sufficient to acquire underlying layers of materials.1,7,8 The analysis of complex samples can present data analysis challenges. When the identity and the number of compounds are unknown, or when the data set is characterized by a low SNR, identifying the present species and calculating their chemical map become nontrivial tasks.9−11 Multivariate methods such as principal component analysis (PCA) and multivariate curve resolution (MCR)12,13 have been used to map and identify compounds in spectral images. MCR is of particular interest as it extracts physically meaningful spectra without requiring any internal or external references.12 By including certain user-defined constraints into the model (e.g., spectral non-negativity, concentration closure) and the number of components, MCR can decompose data similarly to PCA (further explained in MCR Algorithms), but it produces results which can be more easily interpreted by an analyst. The most common MCR method is the alternating leastsquares (MCR-ALS) algorithm.7,12,14 The spectra and related

n this work, we present a hierarchical multivariate curve resolution (MCR) method intended to compute physically meaningful spectra and to map them onto the sample. This method is specifically tailored to samples with low SNR in the presence of pure and mixed pixels. The methodology first extracts large numbers of mathematical components through a hierarchical use of MCR algorithms and uses them to identify the purest spectral components as well as their blends. The hierarchical methodology is presented and compared to nonhierarchical approaches via two MCR methods: alternating least-squares (MCR-ALS) and log-likelihood maximization (MCR-LLM). Results show that the hierarchical forms outperform the nonhierarchical forms for both the quality spectral signatures and the chemical map produced, leading to a more intuitive analysis of samples for the user. The ability to draw chemical maps of segregated samples using spectral images is of growing interest for many research and industrial applications. Applications include near-infrared (NIR) and Raman analysis in the food and pharmaceutical industries,1−4 imaging mass spectrometry (MS) for biological samples,5 and magnetic resonance imaging (MRI) for medical purposes.6 The use of multispectral, instead of singlewavelength, acquisitions typically leads to the creation of more robust models. These acquired spectral data sets have, however, a high level of complexity dependent both on probe and sample characteristics. This complexity influences the information content of a pixel. Let us consider pure pixels as those containing only information from a single phase, as may occur when analyzing a single-compound sample or a highly segregated multicompound sample. For the present discussion, a phase can represent either a single chemical species or a blend species if © XXXX American Chemical Society

Received: October 9, 2018 Accepted: October 18, 2018

A

DOI: 10.1021/acs.analchem.8b04626 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Two variants of MCR are used in this work. The first is MCR-ALS, in which contributions (C) are calculated using multilinear regressions.19 The second is MCR-LLM,17 in which noise characteristics of the signal are considered when calculating C. In this second method, a constrained likelihood maximization is calculated from each acquired observation x, and the resulting spectrum is calculated from optimized contribution values x* = cST, where c is a single row of C and represents the component contributions of a single pixel. In both algorithms, all spectra were individually normalized as

contributions are calculated sequentially in an iterative algorithm via least-squares regressions. While MCR may be well-suited to analyze spectral images, there are still challenges to extract representative spectra and to obtain coherent chemical maps for more complex samples.15−17 MCR results are strongly influenced by the number of components, especially in the presence of mixed pixels, and determining the number of components is not trivial.11,18 Li et al. presented a promising sequential method to estimate the appropriate number of components K. The method was applied to small data sets characterized by a high signal-tonoise ratio and did not seek to quantify composition.11 Moreover, each component may represent a blend of chemical compounds always found together. They are not necessarily related to a single chemical compound. In addition, the application of user-defined constraints appears to cause numerical stability issues when working with low-SNR data, which can render the extracted spectra physically meaningless, as observed by loss in physical meaning of the spectra.17 For many types of spectroscopy, including NIR, Raman, energy-dispersive X-ray spectroscopy (EDXS), and X-ray photoelectron spectroscopy (XPS), physically meaningful spectra have the following characteristics: positive numbers of counts, coherent peak intensity ratios, and continuous local spectral derivatives. In previous work, multivariate curve resolution by log-likelihood maximization (MCR-LLM) was shown to outperform MCR-ALS for lowcount and low-SNR data.17 While robust for many types of spectral imaging data sets, including XPS, electron energy loss spectroscopy (EELS), EDX spectroscopy, and Raman spectroscopy, these results hold only when a limited number of compounds are present in an image (1000, depending on image characteristics) and to identify the purest spectra. Although it is common to identify component clusters in MCR,7 the presented method seeks to extract a much greater number of spectra to resolve complex samples containing mixed and pure pixels. The following discussion details the algorithm. At the initial hierarchical level (Figure 1), two components are extracted from all available pixels (all rows of X0*) using MCR either with ALS or LLM variants (as described in MCR Algorithms). The X0* matrix is then divided into two new matrices (X1* and X2*) for level 1. Each pixel from X0* is placed in either X1* or X2* according to the highest contribution (C0) value. Based on closure constraints, C values fall in the range [0−1]. Therefore, a given pixel with a contribution c = [0.6 0.4], which is 60% of component 1 and 40% of component 2, would be attributed to component 1. At the next hierarchical level, both matrices (X1* and X2*) are analyzed individually with a two-component MCR, followed by pixel segmentation as performed in the first hierarchical level, yielding four matrices for the next level. The analysis proceeds accordingly until one of two userdefined stopping criteria is reached: (1) the maximum allowed number of hierarchical levels or (2) the minimum size of a new Xi,j,···,n * matrix. If the minimum number of pixels is reached for one group of pixels (X*i,j,···,n) at level n, the hierarchical process ends for this group only. The final output of this hierarchical algorithm is the matrix S* containing the output spectra of each final step, i.e. when one of the two stopping criteria is reached (Figure 1), with the associated calculated contributions in C*.



HIERARCHICAL MULTIVARIATE CURVE RESOLUTION MCR Algorithms. MCR is a latent variable method used to find pure spectra and the relative contribution maps from spectral data sets. This decomposition is expressed as X = CST + E

X m,n N ∑l = 1 X m , l

(2.1)

where X (M × N) is the raw data matrix and S (K × N) represents the spectra of the components; C (M × K) contains their relative contributions, and K is the number of components; E (M × N) contains the residuals. Because MCR is unsupervised, the final spectra (S) may not necessarily represent true chemical compounds, but the “purest” form present in the data set. For example, if two chemicals are always present together, but in varying proportions, a twocomponent model would compute a spectrum with a large proportion of compound 1 and another with a large proportion of compound 2. However, neither S spectra would represent a pure chemical species. T

B

DOI: 10.1021/acs.analchem.8b04626 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Figure 1. Hierarchical MCR method: S is for spectra, C for calculated contributions, and X* for pretreated data used to compute C and S (X0* is for initial pretreated data set). The subscript indicates the hierarchy between groups of pixels. The subscripts (n) indicate the decomposition level.

The user-defined stopping criteria depend on the raw data attributes [number of pixels, signal-to-noise ratio (SNR), number and concentration of species], computation time, and the purpose of the analysis, because they would influence the final results. Application of the Hierarchical MCR. The hierarchical MCR was applied using in-house Matlab codes. Figure 2 illustrates the overall methodology to obtain the final contribution and spectral results (C and S) from raw X data through hierarchical MCR. This process is detailed below. After normalization (Figure 2a), the hierarchical MCR can be applied using both LLM and ALS variants denoted as HLLM and HALS, respectively (Figure 2b). A hierarchical analysis yields several spectra in S* (typically many more than the number of species in the sample). Contribution values C* linked to all these spectra cannot be directly used because they are associated with each of the hierarchical levels and not to the entire data set. The spectra represent the full range of spectra present in the sample. If only two chemical species are present in a granular sample, spectra will be divided into two main groups, with some spectra between these two associated with the mixed pixels at grain interfaces. There will be some variability within each group due to acquisition noise, but clusters should be clearly discernible. For a multicomponent

Figure 2. Schematic of the steps required to apply hierarchical MCR, from raw data to the identification of final component spectra and the contributions. (a) Data pretreatment with spectral intensity normalization, (b) hierarchical application of MCR-LLM or MCR-ALS, (c) classification of extracted spectra to obtain K final components, and (d) final contribution calculation using (i) regression (CReg) or (ii) likelihood maximization (CLM) .

sample with many mixed pixels, there should be a continuum between groups of spectra, with many intermediated spectra. In addition, low-concentration species with clearly distinguishable spectral features, or simply pixels subject to acquisition errors, may stand out as sample outliers. To obtain a final contribution map and associated spectra, it is necessary to reduce the number of spectra. This is achieved by identifying the purest components. The first step of this identification is the classification of extracted spectra from HALS or HLLM (S*). Many authors propose the use of principal component analysis (PCA) to retrieve pure observations. Their methods assumed that (K − 1) PCA factors are required to retrieve K pure observations from a multivariate data set.21,22 From this, we propose to use the same method to select the purest spectra from all spectra resulting from HLLM or HALS methodologies. After centering C

DOI: 10.1021/acs.analchem.8b04626 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry S* (thus removing its mean s ̅ [1 × N]), a PCA is computed on the spectra. By plotting PCA scores (T), K clusters should be apparent (Figure 2c). These clusters represent similar spectra. Selected coordinates from these clusters represent score values related to the purest spectra. Then, multiplying these K score coordinates (contained in To[K × (K − 1)]) with loading vectors (P[(K − 1) × N]) leads to the reconstruction of the K purest spectra (Sfinal): Sfinal [K × N ] = To × P + S̅

(2.3)

The last step consists of calculating the final contributions for each observation in X based on the Sfinal matrix (Figure 2d). Two options are discussed. In the first option (Figure 2di), a multilinear regression between the Sfinal matrix (obtained after PCA) and the initial X0* matrix is used. The final contributions are denoted CReg for this option. In the second option (Figure 2d-ii), the log-likelihood maximization is used to determine the final contributions from Sfinal and X0* and is designated as CLM. Considering that the final step can be done either by regression or by likelihood maximization, both for HLLM and HALS, we obtain four different application cases for the hierarchical MCR: HLLM-CLM, HLLM-CReg, HALSCLM, and HALS-CReg.



DATA SET FOR ALGORITHM APPLICATION AND VALIDATION Energy-Dispersive X-ray Spectroscopy Data Set. The methodology was tested on simulated energy-dispersive X-ray spectroscopy (EDXS) maps using MC Xray, a Monte Carlo simulation-based program.23 Three chemical species were used for the simulations: cobalt, nickel, and iron. These chemical species were chosen because of their overlapping spectral features, complicating the spectral unmixing. Figure 3a illustrates the map composed of three regions (1, Fe−Co; 2, Co−Ni; 3, Fe−Ni−Co). Each region is subdivided into three subregions for a total of nine subregions. In each, a chemical species is present in relatively low concentrations (1, 2, or 5% w/w): region 1, Co; region 2, Ni; region 3, Co. The sample was simulated for eight different acquisition dwell times (0.05, 0.1, 0.5, 1, 5, 10, 50, and 100 s), creating eight maps with different SNR values. The resulting images are 128 × 128 pixels.2 Simulations included typical sources of noise for EDXS: electron gunshot noise, noise on emitted Xrays, and detection noise. The resulting spectra at 100 s are presented in Figure 3b, while typical spectra taken from subregion 3.2 at different dwell times (0.1, 1, and 100 s) are presented in Figure 3c−e. Performance Indexes to Evaluate Algorithms. We applied three performance criteria to compare Sfinal and Cfinal results obtained from the various data sets and methodologies. The first performance index (spectral performance index, Is) quantifies the quality of the final component spectra obtained (Sfinal) compared to reference spectra. The second performance index (contribution performance index, Ic) evaluates the calculated contributions (i.e., how well the final contributions, CReg or CLM, correspond to the expected concentration values). These two performance indexes were previously presented,17 and their application to the actual algorithm and data set is detailed in the Supporting Information. A third performance index is proposed in order to analyze samples known to contain uniform regions, such as may be found in microelectronic samples. This index (regional performance index, IR) evaluates the extent to which uniform

Figure 3. (a) Map of the EDXS data set divided into nine subregions with associated chemical specie concentrations (Fe/Co/Ni), with three regions (1, Fe−Co; 2, Co−Ni; 3, Fe−Ni−Co). Interfaces between groups are shown in gray. (b) Average spectra of six subregions (extreme concentrations of each region), from 100 s count data set. (c) Single spectra of Fe−Ni−2Co (subregion 3.2) at 100 s. (d) Single spectra of Fe−Ni−2Co at 1 s. (e) Single spectra of Fe− Ni−2Co at 100 ms.

regions of an image, each with different compositions, can be distinguished. Figure 4 seeks to illustrate this index, which is based on ANOVA computations. Contrary to IS and IC, this performance index does not rely on reference information. However, the analyst must specify which spatial areas to compare (yellow regions in Figure 4). These MCR methodologies convert a spectral image into K composition maps. For a given component k, contribution values of all the pixels present in a selected area (yellow regions in Figure 4) are considered to be replicates of each other. We now seek to determine if pixels in expected regions G1, G2, ..., GQ differ from one another via ANOVA. An ANOVA can therefore be carried out for each of the K components. Selected regions are not necessarily distinguishable in all K components. It may be possible that regions are statistically distinguishable in only one component, particularly in the case of highly heterogeneous samples where pixel contributions are D

DOI: 10.1021/acs.analchem.8b04626 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

least one of the K components, with p-values obtained using the honest significant difference (HSD) test (p < 0.05).



DATA ANALYSIS USING HIERARCHICAL ALGORITHMS MCR-LLM and MCR-ALS for the Highest SNR Data Set. MCR-LLM and MCR-ALS (without the hierarchical methodology) were tested on the data with the highest dwell time (100 s). The aim was to identify the nine subregions and to obtain physically meaningful spectra for each. Because it is an unsupervised method, the objective was not to extract spectra representing pure chemical species of Ni, Co, and Fe (as no pixel containing only one of these species is present in the sample) but to extract spectra that represent the blends of various concentrations present and to differentiate the nine subregions. Neither variant of MCR was able to successfully resolve the sample into nine expected subregions (Figure 3a). Figure 5 shows the resulting map from MCR-LLM (a) and MCR-ALS (c). Only 5/9 and 3/9 regions were identified using IR, respectively. These results match the visual inspection of Figure 5. In both cases, the best results were obtained with six components, because the intermediate regions (1.2, 2.2, and 3.2 in Figure 3a) are blends of extreme concentrations. For example, subregion 1.2, which is 98% Fe and 2% Co, is a combination of the 99:1 and the 95:5. When increasing the

Figure 4. IR index calculation. The analyst first spatially delimits groups of pixels in different expected subregions (yellow area). Similar contribution values included in the different groups are quantified with multiple ANOVA calculations. A p-value is calculated for each of the K components. IR represents the number of statistically differentiated regions.

attributed to only one component in one of the selected areas (one yellow region in Figure 4). The IR is then calculated by summing the number of regions that can be differentiated by at

Figure 5. Contribution maps and spectra from 100 s EDXS data set with six components and 10 iterations using (a and b) MCR-LLM or (c and d) MCR-ALS. Downward peaks are visible in the inset in panel d. The associated species are Comp. 1 (orange) to Fe−5Co; Comp. 2 (black) to Fe− 1Co; Comp. 3 (purple) to Fe−Ni−5Co; Comp. 4 (green) to Fe−Ni−1Co; Comp. 5 (blue) to Co−1Ni; and Comp. 6 (red) to Co−5Ni. E

DOI: 10.1021/acs.analchem.8b04626 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry number of components from six to nine, to identify the nine subregions, interfaces and random noise patterns were identified rather than nine subregions. Therefore, another method was required to successfully analyze this type of sample. MCR-LLM spectral results (Figure 5b) are consistent with expected EDXS spectra (Figure 3b). This is supported by the spectral performance index IS = 0.985, close to 1, indicating a high correlation between the extracted spectra and the reference (the spectra from each of the extreme concentration subregions). MCR-LLM yielded a clear map with the three regions and well-defined interfaces. However, differences related to the relative concentrations of species in the extracted spectra within each group are not sufficient to clearly separate the three subregions. In MCR-ALS results (Figures 5c,d), spectra are not physically meaningful as they present small downward peaks, meaning that the beginning of the peak has a negative slope (e.g., component 6 (red) in Figure 5d at 6.4 keV) and IS = 0.763. Moreover, such spectra could not be identified easily by an analyst using reference spectra. For example, MCR-ALS component 4 (green curve in Figure 5d) does not correspond to any of the expected spectral signatures in Figure 3b. The three regions found on the resulting map therefore do not represent the three expected regions (from Figure 3a), and boundary effects at interfaces are not found. Hierarchical MCR-LLM (HLLM) and MCR-ALS (HALS) for the High SNR Data Set. The high SNR data set (dwell time of 100s) was analyzed using the HLLM and HALS (Figures 1 and 2). The stopping criteria (maximum number of levels and minimum number of pixels) were fixed for all analyses to ensure consistency; 10 levels with a minimum of 100 pixels in Xi,j were used throughout. Final spectra (Sfinal) were extracted from a PCA performed on all spectra (S*) calculated from HLLM or HALS. Figure 6a shows PCA scores along with user-selected score values for each component for HLLM (Figure 6a) as well as respective final component spectra (Figure 6b). Indeed, three clusters can be easily identified and then can be correlated to the presence of three regions (Figure 3a). In each cluster, the small variation in score values is attributed to the change in concentrations of each chemical species (the three subregions within each region). Moreover, observations nearest to other clusters correspond to mixed pixels, which are associated with interfaces between regions in this data set (gray lines in Figure 3a). Therefore, final spectra (Sfinal) were selected for the three regions as indicated in Figure 6a, using two centroids per clusters. The final spectra resemble average spectra from corresponding concentration regions of the original data set (shown in Figure 3b) and can be easily attributed to each extreme subregion (1% and 5% of the low-concentration species). The score plots and spectra obtained using HALS were similar to those of HLLM. Indeed, the calculated spectral performance indexes from HLLM and HALS for the 100 s EDXS data set were IS = 0.993 and IS = 0.992, respectively. These results illustrate a high correlation between the final spectra and the references and similar results using both methods. Final spectra Sfinal were used for the final contribution calculation, using regression (CReg) or likelihood maximization (CLM), as explained in Application of the Hierarchical MCR. In the resulting map for HLLM followed by likelihood maximization (Figure 7), nine clear subregions corresponding to

Figure 6. T-plots of PCA results (a) and corresponding chosen final spectra (b) from HLLM, for 100 s EDXS data set, with 10 levels maximum and 100 pixels minimum. The associated species are Comp. 1 (orange) to Fe−5Co; Comp. 2 (black) to Fe−1Co; Comp. 3 (purple) to Fe−Ni−5Co; Comp. 4 (green) to Fe−Ni−1Co; Comp. 5 (blue) to Co−1Ni; Comp. 6 (red) to Co−5Ni.

the nine expected ones (see Figure 3a for comparison) can be visually distinguished and were identified via the regional performance index. The nine subregions were distinguished only when likelihood maximization (CLM) was used as a final step (HLLM-CLM or HALS-CLM), while only three regions were identified when the final step was the regression (HLLMCReg or HALS-CReg). The map for the four cases are presented in the Supporting Information, Figure S1. When using the likelihood maximization to calculate the final contributions, there is practically no mixing between regions. For example, in Figure 7 the two components attributed to region 1 (Fe−1Co and Fe−5Co) are found only in this region, and calculated contributions are 0.0 in regions 2 and 3. Likewise, the four other components are found only in their respective regions. The adequate component attribution to the corresponding regions and subregions was further assessed using the contribution performance index, obtaining 0.939 for HLLM-CLM and 0.943 for HALS-CLM. When using the regression as a final step (CReg), mixing of components between the groups is present. For example, Fe− 1Co and Fe−5Co associated components are not found only F

DOI: 10.1021/acs.analchem.8b04626 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

Figure 7. Final contribution map along with contribution calculation per region using HLLM-CReg, for 100 s EDXS data set, with 10 levels maximum and 100 pixels minimum. The associated species are Comp. 1 (orange) to Fe−5Co; Comp. 2 (black) to Fe−1Co; Comp. 3 (purple) to Fe−Ni−5Co; Comp. 4 (green) to Fe−Ni−1Co; Comp. 5 (blue) to Co−1Ni; Comp. 6 (red) to Co−5Ni.

in the first region, where they belong. They have significant calculated contributions in regions 2 and 3, where they should not. The contribution performance indexes were 0.773 for HLLM-CReg and 0.796 for HALS-CReg, lower than those obtained using likelihood maximization at the final step. Therefore, even if the HALS makes the extraction of spectra possible with the same quality as HLLM (for high counts data), the final step (contribution calculation) method plays a major role. Applicability and Influence of the Dwell Time on Results from HLLM and HALS. The applicability of hierarchical MCR models to low SNR data sets was also tested. For this, EDXS data sets with lower dwell times were analyzed using the same hierarchical methodologies and parameters discussed above. It was shown in the previous section, from IR and IC values assessment, that there are significant differences between all the subregions for both HLLM and HALS when using the likelihood maximization at the final step with the 100 s dwell time data set. Consequently, only this method was used for lower dwell time data sets. Figure 8 illustrates the variation in the three performance indexes for the various dwell times after application of HLLM and HALS. For comparison, the performance indexes for MCR-LLM and MCR-ALS are also shown. The spectral performance indexes (Figure 8a) decreased with decreasing dwell time (and consequently the SNR). The gradient is more pronounced in the HALS curve than in the HLLM curve. However, in both cases, the performance is higher than for MCR-LLM and MCR-ALS. We can also observe the difference in performance due to the final step choice (regression or likelihood maximization) by looking at regional and contribution performance indexes (Figure 8b). Lowering the total number of counts decreases the number of differentiable subregions in the image, as can be observed with the contribution and regional indexes (Figure 8b). The nine subregions can be differentiated with a minimum dwell time of 500 ms with the use of HLLM-CLM (Figure 8b). This minimum is found at an order of magnitude higher (5 s) with the use of HALS-CLM. The calculated contribution index similarly decreases, indicating that calcu-

Figure 8. (a) Spectral performance indexes (IS) as a function of dwell time for HLLM and HALS. IS is calculated using extracted spectra, before contribution calculation. IS for MCR-ALS and MCR-LLM are shown for comparison purposes. (b) Contribution performance indexes (IC) in function of dwell time calculated from HLLM-CLM, HALS-CLM, MCR-LLM, and number of regions differentiated, evaluated using IR for the different components and alpha = 0.05. Indexes for MCR-ALS, HLLM-CReg, and HALS-CReg for the 100 s data set are also shown.

lated contributions differ from exact values. The performance of MCR-LLM is comparable to HALS-CLM for IC and IR. However, because the IS was much lower for MCR-LLM, this means that the subregions are differentiated at the same level but that the extracted spectra do not represent the composition as well as they do for HALS-CLM. There are differences between contribution (IC) and regional performance (IR) index results, which is expected because they do not assess the mapping results in the same way. It is then possible that the nine subregions are welldistinguished because the contribution averages are different (IR will be 9 or close to 9), but that at the same time they are not similar to expected contribution values (IC is low), and vice versa. This result shows that the hierarchical methodology combined with the final likelihood maximization step gives the best results. Differences between HLLM-CLM and HALS-CLM appear at low dwell times. The corresponding final spectra are shown in Figure S2.



CONCLUSION In this work, we propose a method to analyze multicomponent spectral data sets characterized by low signal-to-noise ratios. This new method is based on the well-known MCR-ALS algorithm as well as on the more recent MCR-LLM. Our proposed method is in fact an extension of these two algorithms in which components are extracted hierarchically, leading to the hierarchical MCR-ALS and MCR-LLM algorithms (termed HALS and HLLM). G

DOI: 10.1021/acs.analchem.8b04626 Anal. Chem. XXXX, XXX, XXX−XXX

Article

Analytical Chemistry

(7) Felten, J.; Hall, H.; Jaumot, J.; Tauler, R.; de Juan, A.; Gorzsás, A. Nat. Protoc. 2015, 10, 217−240. (8) Dobigeon, N.; Brun, N. Ultramicroscopy 2012, 120, 25−34. (9) Boiret, M.; Gorretta, N.; Ginot, Y.-M.; Roger, J.-M. J. Pharm. Biomed. Anal. 2016, 120, 342−351. (10) Lichtert, S.; Verbeeck, J. Ultramicroscopy 2013, 125, 35−42. (11) Li, Q.; Tang, Y.; Yan, Z.; Zhang, P. Spectrochim. Acta, Part A 2017, 180, 154−160. (12) Noothalapati, H.; Iwasaki, K.; Yamamoto, T. Anal. Sci. 2017, 33, 15−22. (13) Malli, B.; Birlutiu, A.; Natschläger, T. Chemom. Intell. Lab. Syst. 2017, 161, 49−60. (14) Farkas, A.; Vajna, B.; Sóti, P. L.; Nagy, Z. K.; Pataki, H.; Van der Gucht, F.; Marosi, G. J. Raman Spectrosc. 2015, 46, 566−576. (15) Ruckebusch, C.; Blanchet, L. Anal. Chim. Acta 2013, 765, 28− 36. (16) Azzouz, T.; Tauler, R. Talanta 2008, 74, 1201−1210. (17) Lavoie, F. B.; Braidy, N.; Gosselin, R. Chemom. Intell. Lab. Syst. 2016, 153, 40−50. (18) Farkas, A.; Nagy, B.; Démuth, B.; Balogh, A.; Pataki, H.; Nagy, Z. K.; Marosi, G. J. Chemom. 2017, 31, e2861. (19) De Juan, A.; Jaumot, J.; Tauler, R. Anal. Methods 2014, 6, 4964−4976. (20) MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proc. Fifth Berkeley Symp. Math. Stat. Probab.; 1967; Vol. 1, pp 281−297. (21) Veganzones, M. A.; Grañ a, M. Endmember Extraction Methods: A Short Review. In Knowledge-Based Intelligent Information and Engineering Systems; Lovrek, I., Howlett, R. J., Jain, L. C., Eds.; Springer: Berlin, 2008; pp 400−407. (22) Nascimento, J. M. P.; Dias, J. M. B. IEEE Trans. Geosci. Remote Sens. 2005, 43 (4), 898−910. (23) Gauvin, R.; Michaud, P. Microsc. Microanal. 2009, 15 (S2), 488−489.

We compared the original MCR-ALS and MCR-LLM to their hierarchical extensions using simulated EDX data sets characterized by a range of SNRs. From these analyses, we showed that the hierarchical methodologies made it possible to retrieve the nine expected spatial regions, while this was not the case with regular MCR-LLM and MCR-ALS algorithms. For data sets simulated with high SNRs, performances obtained from HALS were similar to the ones obtained from HLLM. However, we show that HLLM is more advantageous than HALS for analyzing data sets characterized with relatively low SNRs. The hierarchical methodologies enable the analysis of unknown spectral image data sets without requiring known spectral references and can do so even when working with very low SNRs. However, HALS and HLLM require some analyst inputs, such as the parametrization of the hierarchical process and the selection of the final components.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.8b04626. Description of the spectral performance index IS and contribution performance index IC and how they are applied to evaluate and compare the results of the hierarchical and nonhierarchical algorithms; mapping results of the four possible cases for the hierarchical MCR-LLM (HLLM) and MCR-ALS (HALS) for the high SNR data set; presentation and analysis of the resulting spectra from HLLM and HALS for data set at low dwell times (PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Clémence Fauteux-Lefebvre: 0000-0002-0572-250X Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the Mitacs Elevate program (IT06238), the FRQNT Industrial Innovation Scholarship (196489) as well as a matching contribution from Pfizer Canada. The authors are indebted to Raynald Gauvin, Hendrix Demers, and Philippe Kikongi for their valuable help and for providing the permission to use the simulated EDXS dataset.



REFERENCES

(1) Wei, M.; Geladi, P.; Xiong, S. Anal. Bioanal. Chem. 2017, 409, 2449−2460. (2) Huang, H.; Yu, H.; Xu, H.; Ying, Y. J. Food Eng. 2008, 87, 303− 313. (3) Roggo, Y.; Chalus, P.; Maurer, L.; Lema-Martinez, C.; Edmond, A.; Jent, N. J. Pharm. Biomed. Anal. 2007, 44, 683−700. (4) Katewongsa, P.; Terada, K.; Phaechamud, T. J. Pharm. Invest. 2017, 47, 249−262. (5) Bodzon-Kulakowska, A.; Suder, P. Mass Spectrom. Rev. 2016, 35, 147−169. (6) Kaya, I. E.; Pehlivanlı, A. Ç .; Sekizkardeş, E. G.; Ibrikci, T. Comput. Methods Programs Biomed. 2017, 140, 19−28. H

DOI: 10.1021/acs.analchem.8b04626 Anal. Chem. XXXX, XXX, XXX−XXX