Handling different spatial resolutions in image fusion by Multivariate

Apr 26, 2018 - Data fusion of different imaging techniques allows a comprehensive description of chemical and biological systems. Yet, joining images ...
1 downloads 3 Views 1MB Size
Subscriber access provided by Universiteit Utrecht

Handling different spatial resolutions in image fusion by Multivariate Curve Resolution-Alternating Least Squares for incomplete image multisets. Sara Piqueras, Carmen Bedia, Claudia Beleites, Christoph Krafft, Jürgen Popp, Marcel Maeder, Romà Tauler, and Anna De Juan Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b00630 • Publication Date (Web): 26 Apr 2018 Downloaded from http://pubs.acs.org on April 27, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

FIGURE 4)

Page 1 of 12

1 2 3 4 5 6 7 8 9 50 10 100 11 12 150 13 20 60 100140 14 15 50 16 17 100 18 150 19 20 60 100140 20 21 22 50 23 24 100 25 150 26 20 60 100140 27 28 29 50 30 100 31 32 150 20 60 100140 33 34 35 36 37 38 39 40 41

Analytical Chemistry

b)

a)

0.4

0.2

20 0.06

0 1000

1100

1200

1300

1400

1500

1600

1700

40 0.04

0.4 60 10 20 30 40 50

0.02

0.2 0 600

0 1000

1100

1200

1300

1400

1500

1600

1700

20

800

1000

1200

1400

1600

800

1000

1200

1400

1600

800

1000

1200

1400

1600

0.1

0.2 40 0.05

0.1

60 10 20 30 40 50

0 1000

1100

1200

1300

1400

1500

1600

0 600

1700

0.08

0.2 20

0.06

0.1

0 1000

40

0.04 0.02

60

1100

1200

1300

1400

1500

1600

1700

10 20 30 40 50

Wavenumbers (cm-1) ACS Paragon Plus Environment

0 600

Raman shifts (cm-1)

FIGURE 5) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

a)

CLR

CHR

Analytical Chemistry

b)

ST IR

0.03

10 20 30 40 50

1389

100

1530

0.01

150

1313

20406080100 120 140

0 1000

1100

1200

1300

1400

666 1500

1600

150

1074

20406080100 120 140

40

0 1000

1100

1260 1200

1300

1538

1393 1400

1500

1600

0.1

20

20406080100 120 140

40

0 1000

1246

50

1100

1200

1300

1400

1500

100 150 40

20406080100 120 140

0 1000

1400

1600

1656

13171339

1003 1086 1034

1232 1211

1000

1200

1400

1600

1243 1452 1467

855 817 881 918 1005 938

1700 600

1580 1586 1559

1155

800

1000

1169

1200

1308

1400

1669 1631

1600

Protein ( with epithelium) 1625 1450 1097 1166

20

1600

1052 2

1200

1450

902 863

800

624 641

1402 1447

-3 x 10 4

10 20 30 40 50

1700 600

830

1539 1103

150

1000

Connective tissue 1644

50 100

958 1003

800

644 702 722 623

0.2

10 20 30 40 50

600

1454

1641

0.2

100

1700

718742

Protein ( with epithelium)

50

20

1124

1391 1441

0.4

10 20 30 40 50

1580 1622

1256 1324

1644

0.02

40

ST Raman

Protein ( hemoglobine)

50

20

Page 2 of 12

1100

1200

1242 1271 1300

1557 1429 1405 1447 14611498 1351

1677 623644

1400 1500 Plus 1600 1700 600 ACS Paragon Environment

Wavenumbers

(cm-1)

828

720 800

938

1086 1128 1000

1669

1321

1003

1609 1580

1211

1200

Raman shifts

1400

(cm-1)

1600

Page 3 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Handling different spatial resolutions in image fusion by Multivariate Curve Resolution-Alternating Least Squares for incomplete image multisets. S. Piqueras1, † (*), C. Bedia2, C. Beleites3, C. Krafft4, J. Popp4, M. Maeder5, R. Tauler2 , A. de Juan1,(*). 1. Chemometrics group. Universitat de Barcelona. Diagonal, 645. 08028 Barcelona. [email protected]; [email protected] . 2. IDAEA-CSIC. Barcelona. 3. Chemometric Consulting and Chemometrix GmbH, Södeler Weg 19, 61200 Wölfersheim, Germany. 4. Leibniz Institute of Photonic Technologies, Jena (Germany) 5. Dept. Chemistry. The University of Newcastle. Newcastle (Australia) ABSTRACT: Data fusion of different imaging techniques allows a comprehensive description of chemical and biological systems. Yet, joining images acquired with different spectroscopic platforms is complex due to the different sample orientation and image spatial resolution. Whereas matching sample orientation is often solved by performing suitable affine transformations of rotation, translation and scaling among images, the main difficulty in image fusion is preserving the spatial detail of the highest spatial resolution image during multitechnique image analysis. In this work, a special variant of the unmixing algorithm Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) for incomplete multisets is proposed to cope with this kind of problem. This algorithm allows analyzing simultaneously images collected with different spectroscopic platforms without losing spatial resolution and ensuring spatial coherence among the images treated. The incomplete multiset structure concatenates images of the two platforms at the lowest spatial resolution with the image acquired with the highest spatial resolution. As a result, the constituents of the sample analyzed are defined by a single set of distribution maps, common to all platforms used and with the highest spatial resolution, and their related extended spectral signatures, covering the signals provided by each of the fused techniques. We demonstrate the potential of the new variant of MCR-ALS for multitechnique analysis on three case studies: i) a model example of MIR and Raman images of pharmaceutical mixture, ii) FT-IR and Raman images of palatine tonsil tissue and iii) Mass spectrometry and Raman images of bean tissue. Keywords: image fusion, incomplete multiset analysis, multivariate curve resolution alternating least squares (MCR-ALS), Raman images, FT-IR images, Mass Spectrometry images.

INTRODUCTION Hyperspectral imaging techniques based on Raman, infrared, fluorescence spectroscopy and mass spectrometry are useful methods in different research areas, since they provide structural and spatial information on samples1-3. However, it is not always possible to describe appropriately a chemical or a biological sample by using a single imaging technique. The combination of the information from different spectroscopic or spectrometric imaging platforms provides a deeper insight on the nature of samples and is an excellent option to improve the capacity to differentiate among sample constituents. Image fusion designs the combination of images collected from different platforms in order to provide a more accurate description of the sample. Image fusion started to be used in the field of remote sensing to integrate information acquired from sensors mounted on satellites, aircraft and ground platforms with different spatial and spectral resolutions4. Currently, there has been an increasing interest in performing image fusion on biomedical5,6 and pharmaceutical fields7 since it is beneficial for diagnostic purposes and to solve production-

related issues. Nevertheless, joining images from different imaging platforms is complex due to their different sample orientation and spatial resolution. Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) has been demonstrated to adapt particularly well to hyperspectral image analysis due to the ease of introduction of external spectral and spatial information about the image and the ability to work with single and multiset image arrangements1,8,9,10. For a single image, the output of MCR-ALS consists of the distribution maps and the spectral signatures of the image constituents. MCR-ALS has been successfully used to analyze image multisets formed by images collected with the same technique in very diverse and challenging contexts, such as superresolution8 or characterization of populations of biological images1,11,12. Recently, MCR-ALS has also been applied to fused images from different platforms with the same spatial resolution by previously solving the problem of differences in sample spatial orientation13. In this scenario, MCR-ALS provides a single set of distribution maps, common to all platforms used, and the related extended spectral signa-

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tures of the different spectroscopic/spectrometric techniques coupled. When the spatial resolution of images is different, there are two frequent approaches used to perform image fusion: a) binning the image with highest spatial resolution (downsampling) and, therefore, lose spatial detail or b) interpolating the image with lowest spatial resolution (oversampling), which may cause that mathematically interpolated pixel spectra be not necessarily faithful to the true image behavior. Few image fusion approaches preserving the spatial detail of the image with the highest spatial resolution and avoiding oversampling by interpolation have been proposed. Recently, multivariate regression models have been used to capture the relationship among images with different spatial resolution. This kind of strategy establishes first a regression model between the two images at the lowest spatial resolution, i.e., downsampling the image with the highest resolution. Once the model is built, a high resolution version of the image with worse instrumental resolution is predicted by using measurements performed in the platform with highest resolution and the calculated regression model5,6,14. When using this approach, it has to be taken into account that the prediction of the regression model will be satisfactory only for spectral channels that show a good correlation between the two techniques and may fail for uncorrelated information perceived uniquely by the technique with the lowest spatial resolution. For the first time, image fusion among images of different spatial resolution is addressed with multivariate resolution analysis. To do so, a new MCR-ALS variant is used that works with an incomplete multiset data arrangement formed by the complete and spatially matched multiset formed by the lowest spatial resolution version of the images of the two techniques and the appended unaltered image of the technique with highest spatial resolution. This kind of image multiset structure can be resolved (unmixed) with the recent version of MCR-ALS algorithm for incomplete multisets15,16. As a result, spectral signatures joining information from the different spectroscopic techniques are obtained and a single set of distribution maps with the highest spatial resolution is retrieved. The case studies presented to test this approach include a model example consisting of a simple real pharmaceutical mixture to simulate different image fusion scenarios and two real data sets of biological samples. Image fusion is tested on images coming from different combinations of platforms, i.e. MIR and Raman and MSI and Raman. In this way, we demonstrate that the methodology described is broadly applicable and suitable to diverse combinations of image techniques and samples. EXPERIMENTAL Pharmaceutical images. The analyzed sample is a mixture of caffeine, acetylsalicylic acid (ASA) and starch. The sample was imaged by FT-IR using a Nicolet iN10 MX (Thermo) and by Raman using a HR800 LabRAM (Horiba Jovin Yvon, Kyoto, Japan). Both images were collected in point mapping mode with a pixel size of 15×15 μm2. In a related paper13, sample preparation and instrumental conditions are described. Tonsil tissue images. Tissue preparation and instrumental FTIR parameters are described in paper1. Raman spectra were collected at 785 nm laser excitation using the Raman spectrometer from Kaiser Optical Systems. The Raman imaging system was coupled to a microscope Leica DMLM. A thermo-

Page 4 of 12

electrically cooled charge coupled device (CCD) camera was used for detection. The laser radiation of 100mW intensity was focused onto the sample with a 100× objective. Raman images were recorded sequentially by moving the sample with a motorized stage at a step size of 100 µm. Spectra were recorded with a 7s acquisition time in the spectral range going from 200 to 1800 cm1. The final FT-IR tonsil image size was (160×144) pixels with a pixel size of 44×44 μm2 while the Raman image was sized (61×53) pixels with a pixel size of 100×100 μm2 Bean tissue example Sample preparation. The green bean (Phaseolus vulgaris) used for analysis was previously humidified for 4 hours wrapped in wet cotton. Then, the bean was flash frozen in liquid nitrogen and stored at -80ºC. The tissue was mounted in a cutting block using Optimal Cutting Temperature (OTC, TissueTek) on the base of the tissue, sliced at 15 µm thickness with a cryostat (Leica CM 3050S) and placed directly onto indium tin oxide–coated glass slides (Bruker Daltonik GmbH, Bremen, Germany). Raman imaging. Raman spectra were acquired using a HR800 LabRAM (Horiba Jovin Yvon, Kyoto, Japan). A 532 nm laser was the light source and Raman spectra were recorded with a 5 s acquisition time in the spectral range going from 100 to 1800 cm-1. Raman hyperspectral images have been acquired by point mapping with a pixel resolution of 100×100 μm2. Mass spectrometry imaging:Spectra were acquired using an Autoflex III MALDI‐TOF/TOF instrument (Bruker Daltonik GmbH) equipped with a Smartbeam laser operated at 200‐Hz laser repetition rate at the “large focus” setting. Spectra were obtained in positive reflector ion mode in the 400 to 2000 m/z range. Laser raster was set to 200 μm along both x‐axis and y‐ axis. External calibration was performed using Bruker Peptide Calibration Standard (Bruker Daltonik GmbH). Then, raw data were loaded into the SCiLS Lab software (version 2014b, SCiLS GmbH, Bremen, Germany) and exported into the standard imzML format for MSI data. Finally, resulting imzML file was imported into the MATLAB® environment using the imzML converter tool. The final mass spectrometry image (MSI) recorded has a pixel size 200×200 μm2. DATA ANALYSIS Image spectra preprocessing Pharmaceutical Raman and MIR image. Signal preprocessing consisted of removal of anomalous pixels and baseline correction of Raman and MIR image due to fluorescence and Mie scattering, respectively, by Asymmetric Least Squares (AsLS) 17,18. More details can be found in a previous work13. Tonsil tissue Raman and FT-IR images. Data pretreatment in both images included detection of anomalous pixels, baseline correction by AsLS and smoothing by the Savitzky– Golay19 method (for more detail see paper1). Bean tissue Raman and MSI images. Data pretreatment on Raman image included detection of anomalous pixels, baseline correction and smoothing as done on Raman tonsil tissue. Mass spectrometry image preprocessing required a specific step of compression of the number of m/z channels using the detection of spectral regions of interest (ROI) as proposed by

ACS Paragon Plus Environment

Page 5 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Bedia et al20. To do so, m/z relevant spectral channels were selected as those presenting, at least, five pixel spectra with a signal equal or higher than a threshold of 0.1% maximum signal. All channels within +0.001 a.m.u. values were pooled and considered the same m/z channel. Detection of anomalous pixels and baseline correction by AsLS was carried out after data compression. Afterwards, MS spectra in the image were aligned by using correlation optimized warping and normalized (norm-2). Finally, each selected m/z variable was scaled dividing its value by the standard deviation plus a 0.1% of the maximum signal of the normalized data set in order to balance the importance of relevant values of variables of interest in the detriment of noisy values. Image matching procedure Building the incomplete image multiset described in the introduction requires a step of concatenating the images of the two platforms at the lowest spatial resolution. To do so, the image with the highest spatial resolution is downsampled by binning to make equal the pixel sizes of the images to be fused. Once this is done, different spatial transformations, i.e., pixel translation in x and y directions and rotation, are performed between the images to be matched. The image matching procedure used is described in reference13 and uses information from all available pixels in the pair of images to be matched. The starting information for image matching in all cases is the binarized sample contours of the two images to be matched, since the pattern of presence (coded with 1) and absence (coded with 0) of sample is independent from the imaging technique used13. Hyperspectral image resolution by MCR-ALS for complete and incomplete multiset structures The information in a hyperspectral image is structured as a data cube where two dimensions design the pixel coordinates (x and y) and the third the spectral channels. To be analyzed, the data cube needs to be unfolded into a data matrix containing all pixel spectra in the different rows. The goal of MCR-ALS is the decomposition of the original raw image matrix into meaningful distribution maps and pure spectra of the sample constituents1,8-13,21,22, according to the bilinear model shown in eq 2: D = CST + E (2) where D is the raw image matrix that contains all pixel spectra, C is the matrix of concentration profiles of the image constituents and ST the matrix of related pure spectra. E accounts for the experimental error contained in the raw measurement. The distribution maps of each image constituent are obtained folding back each column of the C matrix to recover the original two-dimensional (2-D) configuration of the sample surface1,10-12,21. MCR-ALS works by doing an alternating least-squares iterative optimization of the concentration profiles (C) and spectral signatures (ST) of the image constituents under the action of constraints. To do so, the number of image constituents in D is first estimated by, e.g., Singular Value Decomposition, SVD, and the optimization starts using spectral initial estimates coming from the selection of the purest pixel spectra in the image, e.g. by using a SIMPLISMA-based method23. The operations C=DS(STS)-1 and ST=(CTC)-1CTD are involved in each iterative cycle of the optimization until convergence is achieved. The parameter used to assess the fit quality of the final MCR model is the lack of fit defined as follows:

2 ∑𝑖𝑖,𝑗𝑗 𝑒𝑒𝑖𝑖,𝑗𝑗

Lof (%)=100 × �∑

2 𝑖𝑖,𝑗𝑗 𝑑𝑑𝑖𝑖,𝑗𝑗

Eq. 3

where dij is the element of the original data matrix in row i and column j and eij is the residual obtained from the difference between the element dij of the original data set and the analogous element obtained from the MCR-ALS model. Constraints are used in MCR-ALS to model the C and ST profiles and to decrease the ambiguity effects in the final results obtained9,10,21,22,24-28. In all images analyzed, the nonnegativity constraint has been applied in the concentration and the spectral direction because the concentration of the constituents in the image and the spectroscopic readings of the baseline corrected IR, Raman and mass spectra are positive. 2norm normalization of pure spectra in ST has been used to avoid scaling fluctuations in the profiles during optimization. MCR-ALS can also be applied to the simultaneous analysis of several images using multiset structures D formed by submatrices Di, linked to different individual images1,8,11-13,21,. In fusion of images obtained with different techniques and with the same pixel size, a row-wise augmented data matrix can be built by putting the spectra of each image next to each other. The construction of this complete image multiset structure (see figure 1a) requires that the pixel mode is common among the fused images, which means that images (Di) should have been correctly matched, as described in the section above13. The decomposition D = CST provides now a single C matrix, valid for all images analyzed, and an augmented matrix ST, formed by as many submatrices as techniques used in the images forming the multiset. Figure 1 When image fusion involves images from different platforms and with different spatial resolution, an incomplete multiset should be built to handle this situation. Incomplete multiset structures are column- and row- augmented data matrices formed by data sets where some blocks of measurements (related to particular experiments or techniques) are missing15,16 . A recent variant of MCR-ALS algorithm, designed originally for the study of environmental data sets, has been developed for the resolution of this special kind of multiset structures15,16. For the first time, this variant is adapted and used in image multiset analysis to solve the problem of fusing images from different techniques with different spatial resolution. The incomplete multiset for image fusion is shown in Figure 1b and is formed by two connected complete multisets: DA, a row-wise augmented data matrix formed by the lowest spatial resolution (LR) version of the matched images obtained with the different techniques, and DB, a column-wise augmented data matrix formed by appending the image with the highest spatial resolution (HR) to its spatially binned undersampled LR version. Bilinear decomposition of these two complete multisets under the suitable constraints is performed simultaneously in each iterative cycle of MCR-ALS providing the bilinear models in equations 4 and 5 (equations 4 and 5 refer to a particular example where a Raman image of higher spatial resolution is fused with an IR image of low resolution, but they can be generalized for fusion of any other kind of platforms with the same multiset structure). 𝐓𝐓 𝐓𝐓 ] , 𝐒𝐒𝐈𝐈𝐈𝐈 Eq.4 𝐃𝐃𝐀𝐀 = 𝐂𝐂𝐋𝐋𝐋𝐋 [𝐒𝐒𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

𝐓𝐓 𝐃𝐃𝐁𝐁 = �𝐂𝐂𝐋𝐋𝐋𝐋; 𝐂𝐂𝐇𝐇𝐇𝐇 � 𝐒𝐒𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑𝐑

Eq. 5

On the one hand, the decomposition of DA provides a rowwise augmented matrix [STRaman, STIR], which contains pure spectral information coming from the different coupled techniques, and a single matrix, CLR , which contains a common set of concentration profiles for the images with the lowest spatial resolution (see equation 4). On the other hand, DB is decomposed into a single matrix of pure spectra of the highest spatial resolution technique, SRamanT and a column-wise augmented matrix [CLR; CHR] formed by two submatrices, related to the concentration profiles of the images collected with the same technique with the lowest and original highest spatial resolution, respectively (see equation 5). The analysis of these two complete multisets gives two least squares error functions to be minimized, one per each ALS optimization problem. In the proposed ALS algorithm, these two functions are enclosed in a single total error function, as expressed in equation 6, which is optimized in each iterative cycle. min(ssq�DA -CLR �STRaman ,STIR ��+ssq(DB -�CLR; CHR � STRaman )) (6) As can be seen in equations 4 and 5, some submatrices are calculated by both multisets DA and DB. In this work, the submatrices adopted as final results in each iteration are CLR, STRaman and STIR coming from the DA multiset (the complete multitechnique multiset) and CHR coming from DB multiset (joining LR and HR versions of the image with the highest original spatial resolution). In this case, whenever possible, the solutions coming from the multitechnique multiset have been adopted because of the richer and more diverse information contained in this structure, although other options apply to different kind of problems15,16. The final results from this multitechnique image analysis are the pure spectral signatures of both spectroscopic techniques and a set of distribution maps, (coming from refolded profiles of CHR) with the highest possible spatial resolution, obtained without using oversampling procedures. The approach presented works as long as all the compounds detected by the imaging technique with the lowest spatial resolution are also detected by the imaging technique with highest spatial resolution. When this is not the case, image fusion should be done via image matching in the lowest spatial resolution scale. Otherwise, when incomplete multiset structures are analyzed, the user should be aware of the fact that only common compounds to both LR and HR imaging techniques will provide reliable distribution maps with the highest possible spatial resolution. Software Image matching and MCR-ALS routines for incomplete data sets are in-house written MATLAB routines. MCR-ALS for single image analysis and complete multisets has been performed with the GUI freely downloadable at: www.mcrals.info9. RESULTS AND DISCUSSION Before beginning image fusion, MCR-ALS must be carried out individually on the images to be coupled. This preliminary MCR-ALS analysis is performed to compare the information obtained from the images related to the same sample. The main aspect to be taken into account is whether the different platforms are able to provide information on the same sample compounds. Common compounds among images are easily

Page 6 of 12

identified because the morphology of the resolved distribution maps is very similar. Whenever differences exist, this can be due to the lack of signal of a particular compound for a certain technique or, simply, because some spectroscopic techniques may not differentiate certain compounds due to the high similarity of their spectral signatures. According to the correspondence among resolved compounds in the images to be fused, different image fusion strategies should be carried out, as it will be shown in detail in the first model example presented. Individual MCR-ALS analysis was applied on all image examples following the steps explained in the Data Treatment section. Table 1 provides information on the number of resolved compounds in each individual image analysis and the related lack of fit. Resolved distribution maps and spectral signatures for the analysis of the pharmaceutical and the bean example are displayed in the supplementary information (see figures S1 and S2). Table 1. Number of components resolved (NC) and lack of fit lof, %) for MCR-ALS analysis of the images studied.

SAMPLE Pharmaceutical Bean tissue Tonsil tissue

RAMAN NC %lof

IR NC

%lof

MSI NC %lof

3

3.35

3

5.81

--

--

3 3

11.56 9.86

-4

-2.65

4 --

20.72 --

Model example: Raman and MIR images of a pharmaceutical mixture The original Raman and MIR images of the pharmaceutical mixtures described in the experimental section have the same pixel size; therefore, image fusion can be performed via a complete regular multiset previous image matching. This real example was presented and analyzed in paper13 and results are included in the supplementary material for ease of comparison and as a reference for the following related image fusion scenarios (see figure S3). Based on these real Raman and MIR images, other image data sets have been generated to mimic situations of fusion of images with differences in spatial resolution and/or in correspondence among compounds. Table 2 describes the data sets created for the different image fusion scenarios that may be encountered in practice. Table 2. Image data sets representing different image fusion scenarios.

Image fusion 1 2 3

HR Raman image (15µm × 15µm) Compounds ASA Caf Starch x x x x x x x x

LR MIR image (30µm × 30µm) Compounds ASA Caf Starch x x x x x x x x

To simulate differences in spatial resolution, we have kept the original high spatial resolution (HR) Raman image and we have generated a low spatial resolution (LR) version of the

ACS Paragon Plus Environment

Page 7 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

MIR image by binning by a factor of two (2×2) the original MIR image. Thus, all data sets are formed by an LR MIR image with a pixel size of (30×30) µm2 and an HR Raman image with a (15×15) µm2 pixel size. To consider differences in correspondence among compounds, some images have been generated from the product of the real concentration profiles and spectral signatures of only some of the original compounds in the pharmaceutical image plus an amount of added random noise. In this way, HR Raman images and LR MIR images lacking compounds have been obtained. Image fusion scenarios 1 and 2 represent the situation where all compounds in the LR MIR image are present in the high spatial resolution (HR) Raman image. Case 1 shows a perfect correspondence among compounds of MIR and Raman images and case 2 presents a situation where the HR Raman image has the three compounds and one compound (ASA) is missing in the LR MIR image. In these two situations, working with an incomplete multiset, as shown in figure 1b, is the best option and the variant of MCR-ALS to analyze incomplete multisets has been applied. In both cases, initial estimates were obtained from the SIMPLISMA-based method on the low spatial resolution complete multiset (i.e., DA structure in figure 1b) and the analysis was carried out under the same constraints used in individual image analysis MCR-ALS resolved three constituents in both image fusion scenarios and the lack of fit was 10.5% and 8.31% for case 1 and 2, respectively. When the same compounds are present in the HR Raman and LR MIR images (scenario 1), the resolved HR distribution maps (coming from CHR) and the pure spectra (see figure S4 in supplementary material) are virtually identical to those obtained on the original complete multiset with both images with the highest spatial resolution (figure S3)13. This fact confirms the robustness of the results and the suitability of the new version of MCR-ALS for incomplete multisets to treat images with different spatial resolution. The results from scenario 2 (with the ASA compound missing in the LR MIR image) are shown in Figure 2a and present the expected resolved spectral signatures, correct and equal to the reference results of figure S3, except for the null MIR signal related to ASA, the absent compound in the generated MIR image. HR distribution maps are still correct for all compounds in the sample because of the modus operandi of the MCR-ALS algorithm for incomplete multisets and the fact that the missing compound in the LR MIR image is present in the HR Raman image. Figure 2 Image fusion scenario 3 is the most complex case, since all compounds are present in the LR MIR image, but the ASA compound is absent in the HR Raman image (see related results in Figure 2b). The spectral signatures retrieved are correct and similar to reference results in figure S2, except for the null Raman signal associated with the ASA compound, missing in the related image. However, whereas the LR distribution maps (coming from CLR) are correct because all compounds are represented in the part of the incomplete structure related to the LR fused images (DA, see Figure 1b), HR distribution maps (coming from CLR) can only be correctly retrieved for the compounds present in the HR Raman image. The HR distribution map of the missing Raman component

cannot be resolved satisfactorily, since the Raman image does not provide information on the signal of the ASA compound and these maps only consider the Raman-related column-wise augmented multiset (DB, see Figure 1b) to be recovered. Having studied all possible image fusion scenarios, we can conclude that the construction and analysis of incomplete multiset structures is beneficial when all compounds to be resolved are present in the image with the highest spatial resolution, even if some are missing in the fused LR image. In these situations, the benefit of the complementary spectroscopic techniques is taken into account in the multiset analysis framework and correct distribution maps with the highest spatial resolution are obtained. However, when compounds present in the image with lowest spatial resolution are missing in the image with the highest spatial resolution, the construction and analysis of incomplete multiset structure does not provide fully correct results. Although the spectral signatures of the compounds are well recovered, incorrect high spatial resolution maps are recovered for the missing compounds in the HR image. In this situation, the safest option is working with a complete multiset structure formed by the low spatial resolution version of the fused images, since the complementary spectral information of the coupled techniques is obtained and will improve the results of the individual image analysis. In situations where samples contain compounds with very distinct spectral signatures and mild overlap in the distribution maps (as the model example), the analysis of incomplete structures may still be of use. In this case, the high spatial resolution maps obtained for compounds common to LR and HR images can be considered correct, whereas low spatial resolution maps should be assigned as the correct representation of compounds absent in HR images. The model example presented will help to identify which image fusion scenario describes best the real examples shown below and to select the data analysis approach accordingly. Raman and MSI bean tissue images Bean images were acquired by Raman and mass spectrometry imaging systems. Individual MCR-ALS resolution analysis was performed on each of the bean images as was commented in previous sections. The HR Raman image was described by three MCR-ALS contributions, whereas the LR mass spectrometry image needed four MCR-ALS contributions to be modelled (see figure S2 in supplementary material). In this case, the number of image components in the Raman image with highest resolution is lower than the number of constituents of the low spatial resolution MSI image. Looking at the distribution maps of the individual analysis, with no many clear morphological features, and to the high spectral overlap among compounds, the safest option in this case is building a complete multiset structure formed by the two images at the lowest spatial resolution. Thus, a complete row-wise augmented multiset is formed by the originally LR Mass spectrometry image and the Raman image is binned by a factor of 2×2 to equal the pixel size of MSI. Spatial image matching was performed as mentioned in the Data Treatment section. MCR-ALS analysis was applied on the complete multiset structure built after images were spatially matched.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Four components were selected to explain the variation of the system and the lack of fit was 13.68%. Figure 3 shows the low spatial resolution maps and Raman and mass spectrometry signatures of the bean tissue components, respectively. Although there is a loss of spatial resolution, since the Raman pixel size is downsampled, the higher specificity of mass spectrometry helps to recover the Raman spectral signatures for the four constituents, a task unattainable by single Raman image analysis. Although the interpretation of the compounds obtained is not the main goal of the work, bean seeds tend to present different tissues in the seed coat and inner part, as can be seen in the resolved maps. Some of the salient masses in the related MS signatures are around m/z 740-820, related to phospholipids and triacylglycerols, between m/z 900 and 1000, attributed to triglycerides and between m/z 1000 and 1100 nm, linked to triglycerids with longer hydrocarbon chains. Raman and FT-IR tonsil tissue images This example aims at the data fusion analysis of Raman and FT-IR tonsil tissue images. Both images present translation and rotation differences and different pixel size. Results from individual MCR-ALS analysis of Raman and FT-IR images revealed that the Raman image with lowest spatial resolution could be described by three constituents, whereas the FT-IR image with highest spatial resolution could be described by four components. In this case and looking at the distribution maps obtained in the individual analyses, all compounds in the LR Raman image can be identified in the HR FT-IR image. This is a clear example where image fusion can be analyzed building an incomplete multiset structure in order to recover all tissue contributions and to preserve the spatial detail of the image with the highest resolution. To do so, a complete multiset with matched images of the two techniques with the same pixel size, previous downsampling of the Raman image, has been built. Then, the incomplete multiset structure was completed by appending the high spatial resolution Raman image to its binned version, as in Figure 1b, and analyzed with the new variant of MCR-ALS designed for this data set typology15,16. Four components described the variation on the multiset structure. The number of components was selected as the number of maximum components characterized in each of the coupled techniques (four components were the maximum number of components described during individual FT-IR resolution analysis, see figure 4a, whereas only three components were needed in the individual MCR analysis of the Raman image, see figure 4b). Higher number of components was tested but did not provide meaningful results. The lack of fit was 2.08% for the incomplete multiset structure. At the left of figure 5, the lowest and highest spatial resolution distribution maps of the tonsil tissue are displayed and at the right of figure 5 their related Raman and FT-IR spectral signatures. As expected, the new variant of MCR-ALS provides a common set of HR distribution maps for all image components, which clearly matches the morphology of the LR ones. HR FT-IR image helps to preserve the spatial detail and allows differentiating the missing Raman signature because of the complementary spectral information provided by the multitechnique multiset. As a whole, the quality of the results obtained from the new variant of MCR-ALS is significantly better than the results

Page 8 of 12

recovered from individual MCR-ALS image analysis in terms of structural and spatial definition of the components. In this example, we could characterize the most salient biological contributions of the tonsil tissue (hemoglobin protein, epithelium protein and connective tissue) according to the two techniques used and a complete description of the biological tissue at the highest possible spatial resolution was obtained. For more details in the biological interpretation of the components resolved, please refer to a previous work1 and references therein. CONCLUSIONS A new variant of MCR-ALS for incomplete multisets is applied to solve problems of image fusion when the spatial resolution of the spectroscopic platforms is not the same. This variant allows taking advantage of the complementary information of the spectroscopic techniques to be merged preserving the highest possible spatial resolution in the resolved maps of the image constituents. In difference with other approaches, neither dubious oversampling on the image with the lowest resolution nor establishment of a regression model between the image with highest and lowest resolution is required. Instead, original images can be treated as such building an incomplete multiset formed by the connection of a complete row-wise augmented multiset with matched images of the two techniques with the same pixel size (previous downsampling of the image with the highest resolution), with a complete column-wise augmented multiset formed by the image with the highest spatial resolution and its binned version. This provides a bilinear model with extended spectral signatures of both merged techniques and high spatial resolution maps for each image component. This new MCR-ALS variant provides satisfactory results as long as all compounds of the image with lowest spatial resolution are present or can be distinguished with the technique with highest resolution. If this is not the case, there is no sufficient information to retrieve maps of the missing components in the image of highest resolution with the same spatial detail, although the rest of common compounds can still be modelled with the highest possible resolution. The approach presented is valid for any kind of image fusion, independently of the spectroscopic techniques to be merged or the kind of sample under analysis. It is also a valid approach to handle other problems in image data fusion, such as the fact that the sample area scanned may not be necessarily the same in both platforms. In this case, the missing blocks would be associated with sample areas imaged by only one of the imaging techniques. ASSOCIATED CONTENT Supporting Information A file called supplementary material.pdf is attached with additional explanations and graphical material mentioned in text. The Supporting Information is available free of charge on the ACS Publications website. AUTHOR INFORMATION Corresponding Authors

ACS Paragon Plus Environment

Page 9 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

* Sara Piqueras. Anna de Juan. Universitat de Barcelona. Dept. of Chemical Engineering and Analytical Chemistry. Diagonal, 645. 08028 Barcelona (Spain) . Present Address †Sara Piqueras. Department of Geosciences and Natural Resource Management. University of Copenhagen. Rolighedsvej 23. 1958 Frederiksberg. Denmark. ACKNOWLEDGEMENTS A. de Juan and R. Tauler acknowledge financial support from the European Research Council under the European Union's Seventh Framework Program (FP/2007-2013) / ERC Grant Agreement n. 32073 (CHEMAGEB project). They also belong to the network of recognized research groups by the Catalan government (2014 SGR 1106). S. Piqueras acknowledges financial support from the Spanish government project CTQ2015-66254-C2-2-P.

REFERENCES (1) Piqueras, S.; Krafft, C.; Beleites, C.; Egodage, K.; von Eggeling, F.; Guntinas-Lichius, O.; Popp, J.; Tauler, R.; de Juan, A. Combining multiset resolution and segmentation for hyperspectral image analysis of biological tissues. Anal. Chim. Acta 2015, 881, 24– 36. (2) Infrared and Raman Spectroscopic Imaging, 2nd ed.; Salzer, R., Siesler, H., Eds.; Wiley-VCH: Weinheim, Germany, 2014. (3) Gowen, A.; Marini, F.; Esquerre, C.; O’Donnell, C.; Downey, G.; Burger, J. Time series hyperspectral chemical imaging data: challenges, solutions and applications. Anal. Chim. Acta 2011, 705 (1–2), 272–282. (4) Zhang, J. Multi-source remote sensing data fusion: status and trends. Int. J. Image Data Fusion 2010, 1, 5–24. (5) Bocklitz, T. W.; Crecelius, A. C.; Tarcea, N.; Schmitt, M.; Schubert, U. S.; Popp, J. Deeper understanding of biological tissue: quantitative correlation of MALDI-TOF and Raman imaging. Anal. Chem. 2013, 85, 10829–10834. (6) Van de Plas, R.; Yang, J.; Spraggins, J.; Caprioli, R.M. Image fusion of mass spectrometry and microscopy: a multimodality paradigm for molecular tissue mapping. Nat. Methods 2015, 12 (4), 366–372. (7) Clarke, F. C., Jamieson, M. J., Clark, D. A., Hammond, S. V., Jee, R. D., Moffat, A. C. (2001). Chemical image fusion. The synergy of FT-NIR and Raman mapping microscopy to enable a more complete visualization of pharmaceutical formulations. Analytical chemistry, 73(10), 2213-2220. (8) Piqueras, S.; Duponchel, L.; Offroy, M.; Jamme, F.; Tauler, R.; de Juan, A. Chemometric strategies to unmix information and increase the spatial description of hyperspectral images: a single-cell case study. Anal. Chem. 2013, 85 (13), 6303–6311. (9) Jaumot, J.; de Juan, A.; Tauler, R. MCR-ALS GUI 2.0: new features and applications. Chemom. Intell. Lab. Syst. 2015, 140, 1–12. (10) Felten, J., Hall, H., Jaumot, J., Tauler, R., De Juan, A., Gorzsás, A. (2015). Vibrational spectroscopic image analysis of

biological material using multivariate curve resolution–alternating least squares (MCR-ALS). Nature protocols, 10(2), 217. (11) Olmos, V., Benítez, L., Marro, M., Loza-Alvarez, P., Piña, B., Tauler, R., de Juan, A. (2017). Relevant aspects of unmixing/resolution analysis for the interpretation of biological vibrational hyperspectral images. TrAC Trends in Analytical Chemistry, 94, 130140. (12) Olmos, V., Marro, M., Loza‐Alvarez, P., Raldúa, D., Prats, E., Padrós, F., Tauler, R. de Juan, A. Combining Hyperspectral Imaging and Chemometrics to Assess and Interpret the Effects of Environmental Stressors on the Organism at Tissue Level. Journal of biophotonics (2018) (13) Piqueras.S, Maeder.M, Tauler, R.. and de Juan. A. A new matching image preprocessing for image data fusion. Chemom. Intell. Lab. Sys. 2017. 15, 32-42. (14) Gowen, A.; Dorrepaal, R. Multivariate chemical image fusion of vibrational spectroscopic imaging modalities. Molecules 2016, 21 (7), 870. (15) Alier, M.; Tauler, R. Multivariate curve resolution of incomplete data multisets. Chemom. Intell. Lab. Syst. 2013, 127, 17–28. (16) De Luca, M., Ragno, G., Ioele, G., Tauler, R. (2014). Multivariate curve resolution of incomplete fused multiset data from chromatographic and spectrophotometric analyses for drug photostability studies. Analytica chimica acta, 837, 31-37. . (17) Eilers, P. H. C. Parametric time warping. Anal. Chem. 2004, 76 (2), 404–411. (18) Eilers, P. H. C. A perfect smoother. Anal. Chem. 2003, 75 (14), 3631–3636. (19) Savitzky, A.; Golay, M. J. E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36 (8), 1627–1639. (20) Bedia, C.; Tauler, R.; Jaumot, J. Compression strategies for the chemometric analysis of mass spectrometry imaging data. J. Chemom. 2016 30, 575-588. (21) de Juan, A.; Piqueras, S.; Maeder, M.; Hancewicz, T.; Duponchel, L.; Tauler, R. In Infrared and Raman Spectroscopic Imaging; Salzer, R., Siesler, H. W., Eds.; Wiley-VCH Verlag GmbH and Co. KGaA: Weinheim, Germany, 2014; pp 57–110. (22) Tauler, R. Multivariate curve resolution applied to second order data. Chemom. Intell. Lab. Syst. 1995, 30 (1), 133–146. (23) Windig, W.; Guilment, J. Interactive self-modeling mixture analysis. Anal. Chem. 1991, 63 (14), 1425–1432. (24) Bro, R.; de Jong, S. A fast non-negativity-constrained least squares algorithm. J. Chemom. 1997, 11, 393–401. (25) de Juan, A.; Tauler, R. Multivariate curve resolution (MCR) from 2000: progress in concepts and applications. Crit. Rev. Anal. Chem. 2006, 36 (3–4), 163–176 (26) de Juan, A., Maeder, M., Hancewicz, T., Tauler, R. (2008). Use of local rank‐based spatial information for resolution of spectroscopic images. Journal of Chemometrics, 22(5), 291-298.. (27) Hugelier, S., Devos, O., Ruckebusch, C. (2015). On the implementation of spatial constraints in multivariate curve resolution alternating least squares for hyperspectral image analysis. Journal of Chemometrics, 29(10), 557-561 (28) Hugelier, S., Piqueras, S., Bedia, C., de Juan, A., Ruckebusch, C. (2018). Application of a sparseness constraint in multivariate curve resolution–Alternating least squares. Analytica chimica acta, 1000, 100-108..

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 12

FIGURES a)

b)

Figure 1. MCR-ALS multitechnique decomposition analysis a) for a complete multiset structure b) for an incomplete multiset structure. a)

b)

Figure 2. a) MCR-ALS results on incomplete multiset of pharmaceutical mixtures (scenario 2, ASA component missing in LR FTIR image) b) MCR-ALS results on incomplete multiset of pharmaceutical mixtures (scenario 3, ASA component missing in HR Raman image) Left plots: low (CLR) and high spatial distribution maps (CHR) of caffeine, ASA and starch obtained from the new variant of MCR-ALS Right plots: extended Raman and MIR spectral signatures of caffeine, ASA and starch.

ACS Paragon Plus Environment

Page 11 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 3. MCR-ALS multitechnique results of complete multiset composed by the lowest spatial Raman and MSI bean tissue images . a) Left plots: Common set of low spatial distribution maps of bean tissue. b) Right plots: FT-IR and Raman spectral signatures of beans image contributions.

Figure 4. Individual MCR-ALS analysis of palatine tonsil tissue a) Distribution maps and pure spectra of FT-IR image compounds. Squared red maps correspond to epithelium distribution on palatine tonsil Raman and FT-IR images b) Distributions maps and pure spectra of Raman image compounds.

ACS Paragon Plus Environment

9

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 12

Figure 5. MCR-ALS multitechnique results of incomplete multiset composed by HR and its LR version of FT-IR image and LR Raman image of palatine tonsil tissue. a) Left plots: low and high spatial distribution maps of tonsil tissues, b) Right plots: FT-IR and Raman spectral signatures of tonsil tissue images.

ACS Paragon Plus Environment

10