Unsupervised Analysis of Big ToF-SIMS Data Sets: a Statistical Pattern

Jan 23, 2018 - Figure 1 shows the statistical analysis of the single-pixel ToF-SIMS spectra of two polymer samples, that is, PMMA (left-hand side colu...
0 downloads 3 Views 1MB Size
Subscriber access provided by READING UNIV

Article

Unsupervised analysis of big ToF-SIMS datasets: a statistical pattern recognition approach Nunzio Tuccitto, Giacomo Capizzi, Alberto Torrisi, and Antonino Licciardello Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b05003 • Publication Date (Web): 23 Jan 2018 Downloaded from http://pubs.acs.org on January 24, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Unsupervised analysis of big ToF-SIMS datasets: a statistical pattern recognition approach Nunzio Tuccitto,a Giacomo Capizzi,b Alberto Torrisia and Antonino Licciardello*,a

a

Dipartimento di Scienze Chimiche, Università di Catania, viale A. Doria, 6 - 95125 Catania, Italy Dipartimento di Ingegneria Elettrica, Elettronica e Informatica, Università di Catania, viale A. Doria, 6 - 95125 Catania, Italy

b

*correspondence to: alicciardello@unict.it

KEYWORDS: ToF-SIMS imaging, statistical feature extraction, automated data treatment

ABSTRACT: We present a new method, fast and low demanding in terms of CPU performances, which is able to extract latent chemical information from ToF-SIMS big datasets, such as those arising from chemical imaging, by working on the unbinned raw data files. The method is able to evaluate the similarity/dissimilarity of very low intensity spectra, such as those arising from a single pixel, in terms of symmetry and asymmetry relationships of the count distribution in the Fourier transform domain. The tests performed so far on model samples show that the method supplies results that, without sacrificing mass or spatial resolution, are equivalent, at least, to those achievable by an experienced ToFSIMS user by applying PCA techniques.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction Secondary ion mass spectrometry (SIMS) is one of the major techniques for spatially resolved chemical characterization of surfaces and thin films, and it is widely applied in a large variety of fields in science and technology. State-of-art SIMS is used to obtain chemical information from inorganic as well as molecular solids, including biological samples.1-3 The chemical information is brought by the atomic and molecular ionized fragments emitted from a surface bombarded with a primary ion beam. Mass analysis of these so-called secondary ions supplies direct information on their nature and therefore on the chemical composition of the uppermost layers of the bombarded surface area. Nowadays, especially in the case of complex materials (such as organics, hybrid systems or biological samples), the most widely used SIMS instrumentation is based on time-of-flight analysis of secondary ions (ToFSIMS) thanks to the favorable performances in terms of mass range, resolution and transmission, coupled with the parallel detection characteristics. Modern ToF-SIMS spectrometers generate hyperspectral image datasets usually containing 128×128, 256×256 or even a larger number of pixels. For each pixel a high resolution mass spectrum is stored, composed by up to several millions of time-of-flight channels. Moreover, by alternating imaging cycles with erosion cycles, it is possible to obtain a 3D description of the sample, essentially composed by a stack of 2D images. This leads to a rapid growth of the size of the datasets that must be handled. The amount of stored data can be even larger in recently introduced instrumentation that combines different types of mass analysis, e.g. ToF and Orbitrap.4 Currently, the most popular approaches for handling and interpretation of ToF-SIMS data are based on multivariate analysis methods. Among these, principal component analysis5 (PCA) and related techniques increasingly became a sort of standard approach in the ToF-SIMS community.6-9 Generally, due to computational limitations, PCA treatment requires a pre-processing of the dataset in order to decrease substantially its size. Typical pre-treatments can include unit mass binning or peak picking and integration (either manual or automated) and they are often combined with pixel binning.10 Such pre-treatments can lead to a loss of resolution (either mass or spatial resolution, or both) and, moreover, they can be biased by subjective analyst’s decisions and/or by errors of automated peak search/integration routines. To date, a few methods for unattended multivariate analysis of raw and unbinned dataset have been reported recently, aiming to overcome at least some of the above-mentioned drawbacks.11, 12 However, the large size of dataset still required data reduction, including subsampling13 or pre-processing wavelet-based data compression.14. Moreover, in many cases the ToF-SIMS data pertaining to single pixel, although almost free of noise, can exhibit very low-intensity with typically up to 99% of zero-signal channels.15. Accordingly, methods that take advantage of such scarcity of signal dispersed in a big dataset are very useful to cope with the continuous increase of information collected by means of modern instrumentation. In this paper, we propose a new method able to extract latent chemical information form

ToF-SIMS data, working on the uncompressed and unbinned raw dataset. The method is fast and low demanding in terms of CPU performances. Materials and methods All the datasets used in this work were acquired in a reflector type instrument (ToF-SIMS IV, IONTOF GmbH, Muenster) by using a pulsed Bi3+ beam produced by a LMIG focused ion source. The analysed materials, polystyrene (PS), poly(methyl)methacrylate (PMMA), human serum albumine (HSA) and lactoferrin (LF), all supplied by Aldrich (Milan), were prepared as thin films on silicon substrates by spin casting from solutions.16, 17 Further details on the preparation of protein-based samples can be found in references18, 19. Measurements were performed by rastering the beam over the area of interest (typical size between 50x50 µm2 and 500x500 µm2) with a digital size of 128x128 or 256x256 pixels. The experimental conditions were set in order to comply the static SIMS conditions (primary ion fluence