Distinguishing Chemically Similar Polyamide ... - ACS Publications

Sep 27, 2018 - SIMS Using Self-Organizing Maps and a Universal Data Matrix. Robert M. T. Madiona,. †,§. Sarah E. Bamford,. †. David A. Winkler,. ...
0 downloads 0 Views 2MB Size
Subscriber access provided by University of Sunderland

Article

Distinguishing chemically similar polyamide materials with ToFSIMS using self-organising maps and a universal data matrix Robert M.T. Madiona, Sarah E. Bamford, David A. Winkler, Benjamin W. Muir, and Paul J. Pigram Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b01951 • Publication Date (Web): 27 Sep 2018 Downloaded from http://pubs.acs.org on September 29, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Distinguishing chemically similar polyamide materials with ToF-SIMS using self-organising maps and a universal data matrix Robert M. T. Madiona1,3, Sarah E. Bamford1, David A. Winkler2-5, Benjamin W. Muir3, Paul J. Pigram1* 1) Centre for Materials and Surface Science and Department of Chemistry and Physics, School of Molecular Sciences, La Trobe University, Melbourne, VIC 3086, Australia 2) La Trobe Institute for Molecular Sciences, School of Molecular Sciences, La Trobe University, Melbourne, VIC 3086, Australia 3) CSIRO Manufacturing, Clayton, VIC 3168, Australia 4) Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Australia 5) School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, UK *Electronic Mail:

[email protected]

Keywords: ToF-SIMS, Multivariate Analysis, Artificial Neural Networks, Principal Component Analysis, Self-Organising Maps, Machine Learning, Polyamide Samples

1 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 34

Abstract Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is advancing rapidly, providing instruments with growing capabilities and resolution. The data sets generated by these instruments are likewise increasing dramatically in size and complexity. Paradoxically, methods for efficient analysis of these large, rich data sets have not improved at the same rate. Clearly, more effective computational methods for analysis of ToF-SIMS data are becoming essential. Several research groups are customising standard multivariate analytical tools to decrease computational demands, provide user-friendly interfaces, and simplify identification of trends and features in large ToF-SIMS data sets. We previously applied mass segmented peak lists to data from PMMA, PTFE, PET, and LDPE. Self-organising maps (SOMs), a type of artificial neural network (ANN), classified the polymers based on their molecular composition and primary ion probe type more effectively than simple PCA. The effectiveness of this approach led us to question whether it would be useful in distinguishing polymers that were very similar. How sensitive is the technique to changes in polymer chemical structure and composition? To address this question, we generated ToF-SIMS ion peak signatures for seven nylon polymers with similar chemistries and used our up-binning and SOM approach to classify and cluster the polymers. The widely used linear PCA method failed to separate the samples. Supervised and unsupervised training of SOMs using positive or negative ion mass spectra resulted in effective classification and separation of the seven nylon polymers. Our SOM classification method has proven to be tolerant of minor sample irregularities, variations sample-to-sample, and inherent data limitations including spectral resolution and noise. We have demonstrated the potential of machine learning methods to analyse ToF-SIMS data

2 ACS Paragon Plus Environment

Page 3 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

more effectively than traditional methods. Such methods are critically important for future complex data analysis and provide a pipeline for rapid classification and identification of features and similarities in large data sets.

3 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 34

Introduction Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is a widely employed surface analytical technique that provides detailed molecular information on many material types. Polymeric materials feature frequently in ToF-SIMS studies in both the research and industrial contexts. Mass spectra, ion images (or maps) describing the spatial distribution of molecular species, and depth profiles may be produced on depth scales ranging from a single molecular layer to several micrometres and surface areas typically several hundred micrometres square. Larger surface areas may be examined by tessellating multiple ion maps. Each ToF-SIMS data set may comprise mass spectra containing thousands of discrete peaks, collected from thousands of discrete locations (pixels) at the surface, and, in some cases, across multiple depths. Very large and complex data sets are produced for multiple sample replicates and surface positions. The review by Fletcher and Vickerman provides an excellent overview of the field, developments over time, and key applications1. Most ToF-SIMS analysis consists of manual assignment of peaks and the use of relatively simple, linear clustering or dimensional reduction methods like principal components analysis (PCA). Apart from its linear clustering, PCA has also been shown to suffer from poorer performance for data sets containing many low intensity features of questionable significance, creating issues in discriminating samples and features in a given data set2. This is a scenario that commonly occurs with large, high resolution ToF-SIMS data sets. Researchers typically interrogate ToF-SIMS data sets through the selection and investigation of a small number of high intensity peaks. ToF-SIMS laboratories are building peak lists containing hundreds or thousands of peaks and use methods such as PCA to study variance in the data, peaks of most analytical significance, and relationships between groups of samples and groups of mass spectral peaks. More recently, 4 ACS Paragon Plus Environment

Page 5 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

papers have appeared that analyse large ToF-SIMS data sets using non-negative matrix factorisation3, or random vector algorithms and GPU calculations4-5, and modifications and customisations of standard multivariate analytical tools6-8. Tuccitto9 has reported the development of a wavelet compression approach for data compression and noise removal with very large ToF-SIMS data sets prior to PCA. Rapid advances in the technology of ToF-SIMS are therefore driving an urgent need for more efficient and robust large-scale data analysis methods using more sophisticated machine learning and statistical approaches. Artificial neural networks (ANNs) provide a broad framework for classifying and clustering large scale, multivariate data sets. Self organising maps (SOMs), a type of artificial neural network, have been used to analyse the ToF-SIMS data sets of minerals10 and several other types of data11. An early initiative by the Surface Analysis Research Centre at the University of Manchester (previously UMIST) created ‘NeuroSpectraNet’, a novel approach for analysing static SIMS (SSIMS) data with neural networks, in particular, using a self-organizing procedure similar to the ART2 adaptive resonance architecture12. They used it to compare experimental results with a large library of ToF-SIMS spectra13 and identify samples. Spectra were discretised to 1 m/z mass segments and normalised by total ion intensity. Positive and negative data were combined to form composite spectra, and the resulting processed data set was analysed using the ART2 algorithm13. A model was constructed based on the similarities between input spectra and categorised by chemical similarities. The resulting model was adaptive in that it was able to model features as the library grew to over time. Identification of unknown materials was demonstrated and different types of polymers were discriminated. The approach was successfully extended to the classification of adsorbed protein films using SOMs14. 5 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 34

Our group has explored the application of SOMs to biodiagnostic assays (sandwich ELISA format), characterising antibodies bound to tailored surfaces15-19. Identification of preferential antibody orientation when attached to surfaces via a metal affinity binding approach used in the ELISA assays has been demonstrated by ToF-SIMS and self-organising maps20. This work was been extended to consider limits of detection of surface bound antibodies21 and binding mechanism for antibodies at different polymeric surfaces22. The group recently reported a new approach to the analysis of ToF-SIMS data, in which very large ToF-SIMS data matrices are generated and analysed by multivariate analysis using SOMs23. Mass spectra are represented by a series of discrete mass segments, each 0.01 m/z in width; we refer to this process as up-binning. The discretization of the data is rapid; peak assignment is unnecessary, and no peak overlaps are resolved. The same series of mass segments is overlaid on all mass spectra comprising the data set. The only additional steps required are the careful and consistent mass calibration of all spectra and normalisation of the data by total ion intensity. The up-binning proof of concept study23 used four polymers (PET, PTFE, PMMA and LDPE), each of substantially different composition and structure. The polymers were successfully classified and discriminated using self-organising maps. Significantly, spectra collected from the same polymer type with six different primary ions were also successfully distinguished. The success of the up-binning / SOM approach in discriminating between polymers that were chemically different immediately raised the question of how sensitive the analysis method was to polymer composition. To address this question, we now report results of the application of this approach to ToF-SIMS data from industrially relevant polymers of very similar composition. A group of seven polyamide materials from the nylon family was selected. These are commonly

6 ACS Paragon Plus Environment

Page 7 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

used nylons with similar molecular structures, wide applications, and significant economic importance in the manufacturing sector. Our goal is to accelerate ToF-SIMS data analysis using automated workflows and fast, robust computational analysis methods that minimise analyst bias and have broad applications. We aim to develop extensible reference libraries that can be automatically interrogated and interpreted by bespoke artificial neural networks. These libraries will form a core asset for research and development, production support, quality control and forensic investigation, especially for the global polymer sector.

Experimental Nylon Sample Preparation Polyamide samples (commercial nylon types) were obtained from Scientific Polymer Products, Inc. (Sp2). Ontario, NY, USA (Polymer Sample Kit, Catalogue No. 205). The following materials were selected for this study: poly(caprolactam) (Nylon 6: CAS# 25038-54-4, Cat. No. 034), poly(undecanoamide) (Nylon 11: CAS# 25587-80-8, Cat. No. 006), poly(lauryllactam) (Nylon 12: CAS# 25038-74-8, Cat. No. 044), poly(trimethylhexamethylene terephthalamide) (Nylon 6(3)T: CAS# 25497-66-9, Cat. No. 331), poly(hexamethylene adipamide) (Nylon 6/6: CAS# 32131-17-2, Cat. No. 033), poly(hexamethylene adipamide) (Nylon 6/9: CAS# 27136-658, Cat. No. 156) and poly(hexamethylene dodecanediamide) (Nylon 6/12: CAS #26098-55-5 Cat. No. 313). Sample information and molecular structures are shown in Table 1. For simplicity, the samples will be referred to by their Nylon descriptors.

7 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 34

Table 1: Summary of the 7 different nylon samples used in this study.

Nylon No.

Molecular Formula

Monomer Molecular Weight

poly(caprolactam)

N6

(C6H11NO)n

113.16

poly(undecanoamide)

N11

(C11H21NO)n

183.30

Sample Name

Polymer Structure

H N

O

poly(lauryllactam)

N12

(C12H23NO)n

197.32

N6(3)T

(C8H4O2. C9H20N2)n

288.39

poly(hexamethylene adipamide)

N6/6

(C12H22N2O2)n

226.32

poly(hexamethylene azelamide)

N6/9

(C15H28O2N2)n

268.40

poly(hexamethylene dodecanediamide)

N6/12

(C18H34O2N2)n

310.48

poly(trimethylhexamethylene terephthalamide

The samples were supplied in pellet form (globular, cylindrical or cube). Pellets were cut using clean scalpel blades to create relatively flat surfaces for mounting and freshly cleaved surfaces for ToF-SIMS analysis. Samples were secured to the mounting plate using double sided Scotch tape (Cat. No. 665).

Time-of-Flight Secondary Ion Mass Spectrometry (ToF-SIMS) Acquisition ToF-SIMS data were acquired with an Ion-Tof TOF.SIMS V instrument (Ion-Tof GmbH, Münster, Germany) equipped with a Bi/Mn liquid metal primary ion source and time-of-flight mass analyser; pulsed 30 keV Bi3+ primary ions were used in bunched mode for acquisition. Spectra were collected using both positive and negative secondary ion polarities, from five

8 ACS Paragon Plus Environment

Page 9 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

separate 100 µm × 100 µm areas on each sample surface (128 × 128 pixel density). Low energy electron flooding, surface potential bias adjustment, random raster and adjustment of cycle time for primary ion pulses (between 100 – 150 µs cycle time) were employed to optimise charge compensation conditions. Main chamber pressure was maintained below 1.6 × 10-7 mbar during each acquisition. The Bi3+ primary ion current was in the range 1.00 – 1.50 pA. There were some variations in primary ion dose density between samples due to differences in pellet topography. Positive ion mass spectra were calibrated using mass spectral peaks assigned to C+, CH3+, C3H2+, C4H3+, C5H5+ and C7H7+ , with mass resolution (m/∆m) at C4H5+ (53.04 m/z) greater than 2400. Similarly, negative ion mass spectra were calibrated using mass spectral peaks assigned to CH2-, C3-, C3H-, C4-, C4H- and C5H- , with mass resolution at C3- (36.00 m/z) greater than 2000. Sample charging and sample topography resulted in moderately low mass resolutions. Supplementary Figures 1 and 2 show peak overlays at several masses, highlighting the level of consistency achieved in spectral calibration and the practical extent of peak overlaps in the spectrum. An average deviation of 30 ppm was achieved for each mass spectral peak chosen for spectral calibration. A total of 280 ToF-SIMS spectra were acquired; 140 positive ion spectra and 140 negative ion spectra. The data set comprised 20 spectra for each of 7 Nylon sample types for both positive and negative secondary ion polarities. Four physical replicates and five spectral replicates were employed for each sample type, resulting in 20 spectra in total.

Discretisation of mass spectra The mass axis of each mass spectral data set was segmented into 0.01 m/z mass intervals over the 1 – 500 m/z mass range, as described previously23. This resulted in each spectrum being

9 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 34

described by the same universal set of 50,000 mass segments. The data matrix formation process comprised only the following steps: spectral calibration, normalisation by total ion intensity, peak list creation by mass segmentation within the Ion-Tof data system (SurfaceLab 6), and assembly of the data matrix for PCA and ANN analysis. Universal data matrices for positive ion spectra and negative ion spectra were formed, each containing 7 million entries. Supplementary Figure 3 shows an example of the segmentation of a peak cluster for the group of samples.

Multivariate Analysis Multivariate analyses were conducted on each of the data matrices using PCA and via the creation of unsupervised and supervised SOMs using approaches previously reported by this group

17, 23

. A self-organizing map generates a model using the input data matrix and any

classification used. Samples are clustered and classified based on their similarities, with the map providing a visual representation of the input data matrix. PCA was performed using PLS_Toolbox (Version 8.5) (Eigenvector Research, Manson, WA) via MATLAB R2017a (The MathWorks Inc., USA). The pre-processing steps performed comprised normalization and mean centring. The first four principal components (PCs) were selected for each analysis. Artificial neural networks, in the form of SOMs, were constructed, trained and tested using the Kohonen and CP-ANN Toolbox (Version 3.8) (Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Italy)24-25, via MATLAB R2016b. Unsupervised Kohonen Networks (UKNs) and Supervised Kohonen Networks (SKNs) were employed. All networks were generated using the same conditions with the following network characteristics: 6 × 6, 8 × 8, and 10 × 10 neuron networks, hexagonal topology, toroidal boundary conditions, batch 10 ACS Paragon Plus Environment

Page 11 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

training, and random initialisation with 100,000 epochs. SOMs were calculated using Dell Precision T1700 workstation PCs, featuring Intel Xeon 8 Core Processors and 16 GB of RAM, under Windows 7 (64 bit).

11 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 34

Results and Discussion The seven polyamide materials under investigation are similar in composition and structure (Table 1). The major differences between the nylon polymers are the presence of aromatic and additional amide moieties and monomer and polymer molecular weights. We compared the abilities of the nonlinear up-binning / SOM method and traditional linear PCA methods to discriminate between these polymers of similar type.

Principal Components Analysis Figures 1 and 2 summarize the results of PCA analysis of the positive and negative ion spectra of the 7 nylon samples. The most obvious observation is that PCA does a poor job of discriminating between the nylon samples, with the possible exception of N6 and N63T. Figure 1 shows PCA for the positive ion data matrix (7 million entries). The first four principal components (PCs) account for 72.38% of the variance in the data set. Considering the PC1 vs PC2 scores plot, considerable scatter in individual scores is observed. This scatter arises from spectral noise and variance introduced by the mass segmentation process. Indeed, a number of sample types show a distribution of both positive and negative scores. The N63T sample type is most strongly clustered, most likely arising from the distinct features of the monomer molecular structure in comparison with the other materials. The scores highlight surface composition variance, including for example silicon-containing surface contaminants, for a single sample pellet (5 spectra), resulting in two clusters of scores. The PC3 vs PC4 scores plot again is characterised by poor discrimination of species. N6 and N63T are separated from the remaining species in PC3 and PC4, respectively. Variance within sample sets is highlighted more significantly than variance between sample sets.

12 ACS Paragon Plus Environment

Page 13 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 1. PCA of discretised ToF-SIMS positive ion mass spectra, displaying a) PC1 (27.65%) vs PC2 (23.14%) scores plot and b) PC3 (11.63%) vs PC4 (9.96%) scores plot. The sample groups are labelled as: red diamonds – N6 (Nylon 6), green squares – N11 (Nylon 11), navy triangles – N12 (Nylon 12), cyan inverted triangles – N63T (Nylon 6(3)T), pink stars – N66 (Nylon 6/6), yellow circles – N69 (Nylon 6/9) and dark green diamonds – N612 (Nylon 6/12) with 20 data points representing each sample group. Singular PC plots (positive ions and negative ions) are shown in Supplementary Figures 4 and 5, respectively. The top 50 positive and negative loadings for PC1 – PC4 (positive ions and negative ions; assigned mass bins) are shown in Supplementary Tables 1 – 8, respectively.

Figure 2 shows the equivalent PCA for the negative ion matrix (7 million data entries). In this case, the first four principal components (PCs) account for 91.52% of the variance in the data set. Irrespective of the particular analytical perspective applied to the PCA results, no clear trends or discrimination of sample types are evident.

13 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 34

Figure 2. PCA of discretised ToF-SIMS negative ion mass spectra, displaying a) PC1 (63.48%) vs PC2 (14.50%) scores plot and b) PC3 (7.85%) vs PC4 (5.69%) scores plot. The sample groups are labelled as: red diamonds – N6 (Nylon 6), green squares – N11 (Nylon 11), navy triangles – N12 (Nylon 12), cyan inverted triangles – N63T (Nylon 6(3)T), pink stars – N66 (Nylon 6/6), yellow circles – N69 (Nylon 6/9) and dark green diamonds – N612 (Nylon 6/12) with 20 data points representing each sample group. Singular PC plots (positive ions and negative ions) are shown in Supplementary Figures 4 and 5, respectively. The top 50 positive and negative loadings for PC1 – PC4 (positive ions and negative ions; assigned mass bins) are shown in Supplementary Tables 1 – 8, respectively.

Figure 1 and Figure 2 highlight the differences between PCA using conventional peak lists and PCA using a very large mass segmented data list. PCA is inherently sensitive to variance. The distribution of variance across the PCs for both data matrices indicates that PCA is responding to random variations present in ToF-SIMS spectra and those introduced by the arbitrary nature of the mass segmentation process. In contrast with conventional PCA applied to user-defined peak lists where a single variable in the data matrix represents an entire mass spectral peak, our approach analyses the entirety of the mass spectrum, discrete mass spectral peaks and spectral noise or background alike. Variables arising from spectral peaks and from background are not distinguished or prioritised. Variance associated with background therefore may have a 14 ACS Paragon Plus Environment

Page 15 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

significant impact on PCA results. There is some sensitivity to monomer structure but this is a second order effect. Limitations in the accuracy of spectral calibration are also amplified by the segmentation process that takes no account of peaks, peak integrity or peak overlap. Some level of variation between replicates is inevitable. It is evident from Figure 1 and Figure 2 that PCA fails to provide a satisfactory discrimination or classification of the sample types, drawing on the large mass segmented data matrices.

Self-Organising Maps – Unsupervised Kohonen Networks (UKNs) In contrast to linear PCA methods, self-organising maps are constructed by examining and accounting for similarities between samples and replicates. This approach is much more tolerant of variations between samples. Figure 3 shows a sequence of 6 × 6, 8 × 8 and 10 × 10 sized neural networks calculated from the positive ion data matrix. These self organising maps were created using an unlabelled (unsupervised) approach in which the map is seeded randomly and allowed to converge over 100,000 calculation epochs. Thus, there is no prior classification or intervention in the formation or initiation of the map structure, minimising external bias. Given the difficulty of the discrimination task, a sequence of map sizes was used to determine the most appropriate conditions that best classify the data sets. A total of 140 samples were used in the sample set, comprising 7 different types of nylon (sample groups) each represented by 20 spectra (4 physical replicates (different physical samples of the same polymer) and 5 spectral replicates (spectra from different points on the same physical sample)). An optimum solution is the identification of a map size that allows the 7 sample groups to be uniquely represented with zero or minimal overlap. In addition, the map must be sized to accommodate and represent the inherent variability in each sample group arising from the underlying spectra and the variance

15 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 34

introduced by the mass segmentation process. A sample group will be represented by a cluster of neurons. Figure 3 shows the assignment of sample types to each neuron on the basis of greatest weighting. For example, three neurons are coloured red in the 6 × 6 UKN and are assigned to N6. Recalling the toroidal topology of the network map (left joins right and top joins bottom), these neurons are adjacent to one another and form a cluster. The trailing number in the neuron label, for example “N6-5” for the upper left neuron, indicates the number of replicates assigned to this location. Adding the trailing numbers for the three red neurons gives a total of 20 replicates, accounting for all N6 ToF-SIMS spectra in the matrix. The 6 × 6 UKN shows several overlaps between sample types, including overlaps between N12 and N11, overlaps between N66 and N69, and N63T and N69. While this is not unexpected given the molecular similarity between the monomer species, the map is likely too small to allow complete discrimination of the sample types. Approximately 14% of neurons have more than one sample type assigned with greatest weighting. The 8 × 8 UKN (64 neurons in total) shows an improved discrimination of sample types, with clusters of neurons better defined and better separated. The number of neurons with multiple sample type assignments has fallen to 6%. The 10 × 10 UKN (100 neurons in total) shows only one neuron (1%) with multiple sample type assignments. Sample groups are generally well separated and discrete. The clean UKN classification outcome is in strong contrast with the limited capacity of PCA to discriminate sample types (Figure 1). In the case of PCA, the analysis may have been overwhelmed by variance introduced from the segmentation of each spectrum to form the data matrix. This segmentation process incorporates all signal intensity over the range 0 – 500 m/z,

16 ACS Paragon Plus Environment

Page 17 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

including all noise signal and all parts of the spectrum with little or no information content, for example, flat or noisy background regions. The UKN is calculated on the basis of similarities between replicates rather than differences. The UKN algorithm is shown to be remarkably tolerant to noise and random variance. This tolerance allows the data matrix to be constructed with minimal attention to de-noising and data curation. Good calibration of mass spectra (30 ppm or better) remains critical to the success of the analysis. Examining the 10 ×10 UKN in more detail, a number of observations may be made in relation to molecular structure. N11 and N12 neurons are separated but collectively occupy the same portion of the map. The positioning of a number of N11 and N12 samples in immediately adjacent neurons indicates that the neural network has judged these samples to be very similar. This is indeed the case with the molecular structure of N11 and N12 differing only by a single CH2 group. It is evident that the N63T sample type has a unique structure compared to the remaining nylon materials. A group of 15 replicates (3 physical samples) is located on a single neuron discrete from the other materials. A second group of 5 replicates (a single physical sample) is located in another part of the map adjacent to the N69 sample type, characterised by a molecular structure including a greater proportion of hydrocarbon chains not present on the other N63T samples. This result was also identified in positive PCA results. This neatly indicates both the capacity of the UKN to identify and tightly classify a significantly different molecular structure and the impact of sample to sample variation, in this case highlighted as a single Nylon pellet. The sample in question likely held some surface contamination. This was consistently detected and classified via the five replicate spectra collected from the pellet.

17 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 34

Sample types N66, N69 and N612 again have very similar molecular structures, differing only in the number of CH2 groups in the monomer. The UKN discriminates and clusters these groups successfully with only 2 neurons showing classification of one type with or adjacent to another (neurons labelled N69-5 and N66-1 / N612-3). The 10 × 10 UKN separated all the samples into discrete groups across the map and was selected for all successive SKN calculations.

18 ACS Paragon Plus Environment

Positive Ion Spectra 8 × 8 UKN Positive Ion Spectra 10 × 10 UKN

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Positive Ion Spectra 6 × 6 UKN

Page 19 of 34

Figure 3. 6 × 6, 8 × 8 and 10 × 10 Unsupervised Kohonen Networks (UKNs) calculated separately for the positive ion data matrix. The UKNs are false-coloured to identify where the samples are located on each network and colours refer to: red – N6 (Nylon 6), green – N11 (Nylon 11), navy – N12 (Nylon 12), cyan – N63T (Nylon 6(3)T), pink – N66 (Nylon 6/6), yellow – N69 (Nylon 6/9) and dark green – N612 (Nylon 6/12). The text labels are displayed as sample name and number of hits (for example, N66-4 displayed on a neuron shows 4 Nylon 6/6 samples are located on this neuron) and dashed neurons (no

19 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 34

text label) show no samples located on this neuron. An example of the top 50 most weighted mass segments for each neuron assigned to Nylon 6 for the 10 × 10 positive ion UKN is shown in Supplementary Table 9.

Figure 4 shows an equivalent sequence of three UKNs calculated using the negative ion data matrix, in the form of 6 × 6, 8 × 8 and 10 × 10 sized neural networks. Lower mass resolutions were achieved for negative ion mass spectra and spectral quality was impacted by the highly insulating nature of the samples. As with the UKNs calculated for the positive ion data matrix, the 6 × 6 UKN was completed rapidly but shows considerable overlaps between sample groups. The map is too small to cluster and discriminate sample types completely. More than 30% of the neurons contained two or in some cases three sample type assignments. The 8 × 8 UKN provides a better discrimination of sample types, however, many neuron overlaps remain. Approximately 19% of neurons show two or more assignments. The 10 × 10 UKN provides the best discrimination of sample types with overlaps reduced to 5% of neurons. The larger map size spreads out the sample data and provides a better opportunity for sample types to cluster. Again the 10 × 10 UKN has been demonstrated to be the optimum map size for this system. A larger neural network will not necessarily provide further discrimination, as the same sample overlaps may occur. Inspection of the 10 × 10 UKN indicates that the discrimination and clustering of sample types is noticeably weaker than the equivalent analysis for the positive ion data matrix. This arises principally from the inherently poorer mass resolution encountered with negative ion ToF-SIMS data and the structural specificity of the population of molecular ions generated. Overlaps are observed for groups of sample types such as N11 and N12, N6 and N66, and N66, N69 and N612. These overlaps are a product of the similarities between each sample type. The groups 20 ACS Paragon Plus Environment

Page 21 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

N11 and N12, and N66, N69 and N612, for example, are distinguished only by small changes in the number of CH2 groups in the monomer backbone. Even N63T, with a unique phenyl structure present in the monomer unit, has assignments to neurons immediately adjacent to N66 with less distinct clustering.

21 ACS Paragon Plus Environment

Negative Ion Spectra 8 × 8 UKN Negative Ion Spectra 10 × 10 UKN

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 34

Negative Ion Spectra 6 × 6 UKN

Analytical Chemistry

Figure 4. 6 × 6, 8 × 8 and 10 × 10 Unsupervised Kohonen Networks (UKNs) calculated separately for the negative ion data matrix. The UKNs are false-coloured to identify where the samples are located on each network and colours refer to: red – N6 (Nylon 6), green – N11 (Nylon 11), navy – N12 (Nylon 12), cyan – N63T (Nylon 6(3)T), pink – N66 (Nylon 6/6), yellow – N69 (Nylon 6/9) and dark green – N612 (Nylon 6/12). The text labels are displayed as sample name and number of hits (for example, N66-4 displayed on a neuron shows 4 Nylon 6/6 samples are located on this neuron) and dashed neurons (no

22 ACS Paragon Plus Environment

Page 23 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

text label) show no samples located on this neuron. An example of the top 50 most weighted mass segments for each neuron assigned to Nylon 6 for the 10 × 10 negative ion UKN is shown in Supplementary Table 10.

Self-organising Maps - Supervised Kohonen Networks (SKNs) An additional analytical approach was pursued, clustering and discrimination of sample types through the use of SKNs. While generally similar to UKNs, SKNs are provided with a set of polymer classes or labels, but then are free to create an optimised classification. Figure 5 shows a 10 × 10 SKN calculated using the positive ion data matrix, using 7 classes corresponding to the 7 sample types. Also shown are the class weights for each of the sample types. As with the equivalent UKN, all sample types were discriminated with strong clustering. N11 and N12 overlap at two neurons and the N11 and N12 domains on the map are closely associated. This is consistent with the very similar molecular structure of the two monomers. The maps of class weights provide additional insights into the information presented in the SOMs. The sample type assignments on the coloured maps are made by assigning each sample to the neuron of highest class weight. However, the maps of class weight highlight that the weighting for each spectrum and for each class are not typically assigned to a single neuron. A spectrum may be weighted across a group of neurons with lower weightings surrounding the winning neuron. Taking the constituent spectra together, the class weight is then applied to a domain on the map. Taking Class 7 (N612) as an example, the class weight overlays onto a tight cluster of neurons with those at the perimeter of the cluster showing a tapering class weight. Classes 1, 3, 4, 6, and 7 all show tight clustering and therefore discrimination, with only minor exceptions. Classes 2 and 5 are less well defined but nevertheless classified at the neuron level.

23 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 34

The class weights of Class 2 (N11) are closely adjacent with Class 3 (N12), undoubtedly due to the molecular similarity as discussed above. Again, recall that the map has a toroidal topology (left joins right and top joins bottom). It is notable that N6, N612 and N63T appear in particularly tight clusters or contiguous regions on the map. N63T again shows spectral assignment to just a handful of neurons, emphasising the dissimilarity of the material to the remaining types. The outlying neuron from the UKN now appears much closer to the main N63T group and the class weights for a tight and continuous domain. This outcome reinforces the conclusion that SOMs can adequately discriminate and classify chemically similar materials using a very large data matrix derived from a mass segmentation approach.

Figure 5. 10 × 10 Supervised Kohonen Network (SKN) calculated for the positive ion data matrix. The output SKN has been false-coloured for clarification of the sample locations on the network according to their sample group and the colours refer to: red – N6 (Nylon 6), green – N11 (Nylon 11), navy – N12 (Nylon 12), cyan – N63T (Nylon 6(3)T), pink – N66 (Nylon 6/6), yellow – N69 (Nylon 6/9) and dark green – N612 (Nylon 6/12). The calculated class weightings for each of the 7 classified sample groups are displayed on the right, showing the class weighting on each neuron with darkest neurons most strongly

24 ACS Paragon Plus Environment

Page 25 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

weighted and lightest neurons least weighted for each class. The text labels are displayed as sample name and number of hits (for example, N66-4 displayed on a neuron shows 4 Nylon 6/6 samples are located on this neuron) and dashed neurons (no text label) show no samples located on this neuron. The misclassification rate was 2% for this SKN, with 98% accuracy in the class assignment (3 samples were incorrectly assigned to the wrong class) for the samples located on each neuron. The top 50 highest weighted mass segments for each class and their most probable identification are shown in Supplementary Table 11.

Figure 6 shows a 10 × 10 SKN calculated using the negative ion data matrix, using 7 classes corresponding to the 7 sample types. In comparison with the positive ion SKN, the sample domains are not completely separated from each other, with numerous overlaps in evidence. These overlaps are similar to those seen for the 10 × 10 negative UKN in Figure 4, again highlighting the complexity and comparative limitations of the negative ion mass spectra. Best clustering is observed for Class 2 (N11), in contrast with the results for the positive ion SKN. N12 occurs in the same domain as N11 with weighting distributed around the N11 neurons. This is again consistent with the similarity between these materials. N63T is moderately well defined in this instance. Notwithstanding the multiple sources of variance and uncertainty present, the neural network has classified the samples, with only 10 samples out of 140 incorrectly clustered on the map. Using classification aids in discriminating the samples but detrimental surface topography, surface charging and moderate spectral resolution introduce limitations.

25 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 34

Figure 6. 10 × 10 Supervised Kohonen Network (SKN) calculated for the negative ion data matrix. The output SKN has been false-coloured for clarification of the sample locations on the network according to their sample group and the colours refer to: red – N6 (Nylon 6), green – N11 (Nylon 11), navy – N12 (Nylon 12), cyan – N63T (Nylon 6(3)T), pink – N66 (Nylon 6/6), yellow – N69 (Nylon 6/9) and dark green – N612 (Nylon 6/12). The calculated class weightings for each of the 7 classified sample groups are displayed on the right, showing the class weighting on each neuron with darkest neurons most strongly weighted and lightest neurons least weighted for each class. The text labels are displayed as sample name and number of hits (for example, N66-4 displayed on a neuron shows 4 Nylon 6/6 samples are located on this neuron) and dashed neurons (no text label) show no samples located on this neuron. The misclassification rate was 7% for this SKN, with 93% accuracy in the class assignment (10 samples were incorrectly assigned to the wrong class) for the samples located on each neuron. The top 50 highest weighted mass segments for each class and their most probable identification are not included in the Supplementary Information due to insufficient alignment with identifiable mass spectral peaks.

Comparing Figures 3 and 5 (10 × 10 positive ion UKN and SKN) and Figures 4 and 6 (10 × 10 negative ion UKN and SKN), the use of classification adds an additional layer of information to improve the discrimination of the nylon samples. The UKNs provide the initial classification and

26 ACS Paragon Plus Environment

Page 27 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

the SKNs further build on these models to provide the sample class weightings to discern the degree of overlap between the sample groups. This study focussed on classifying a group of chemically similar polyamides based on three different map sizes. The use of a 10 × 10 map and both unsupervised and supervised clustering produced a good classification result from the positive ion data matrix. Clearly, the negative ion data presents additional challenges. The degree and uniqueness of classification is weaker under these conditions. On balance, the map size and number of iterations employed in this work have delivered a positive analytical outcome. A number of opportunities for future work can be identified, including increasing the map size to explore whether this improves classification (albeit at greater computational cost) and identifying the optimum number of iterations for the SOM calculation. Increasing either of these parameters will probably require a high-performance computing environment. Bayesian regularization of the self-organized clustering algorithms may allow automatic optimization of these parameters, as has been shown for other machine learning and classification methods26-28. While accessible, our aim was to develop rapid, high throughput data analysis regimes for ToF-SIMS. Alternatively, adjustments can be made to the data collection parameters including increasing the number of replicates for negative ion spectra. PCA provides a certain level of discrimination between the different types of polyamide material but struggles to separate all the samples, especially N66, N69 and N612. Scores arising from the different samples overlap in complex ways with the highlighted variance apparently unrelated to chemical differences. This is not unexpected, as the up-binning approach generates a very large

27 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 34

peak list with the generation of variance inherent in mass segmentation process. Large data sets that contain low relevance features are known to degrade PCA performance. The up-binning approach has the advantage that 100% of the mass spectral peak intensity between 0 m/z and 500 m/z is considered in the data evaluation, assisting in the optimal recovery of information content. Many mass segments may correspond with regions of the spectrum not containing peaks of significance or regions containing only background signal. Results in this study show that self-organising maps are remarkably tolerant to such inputs and retain an excellent ability to classify and discriminate important characteristics in the sample set. Advances in data storage and data processing capacity have largely removed the need for preprocessing or peak selection steps to reduce computational overhead. This greatly simplifies experimental workflows and eliminates potential bias. The up-binning approach does not seek to identify peaks and so the resulting “peak list” is simply a list of mass segments or mass channels with associated intensity values. ToF-SIMS data matrices are formed through the assembly of these lists for multiple samples and replicate analyses. The mass segment list (“peak list”) for all spectra and all samples is inherently identical and universal. Any sample type or replicate can be rapidly integrated into the data matrix provided the analytical conditions are broadly similar and a careful and consistent mass calibration has been completed (as illustrated in Supplementary Figure 1). This characteristic paves the way for the construction of large reference libraries, interrogated by SOMs, containing multiple polymer types. Libraries, in principle, could be extended to include a variety of materials.

28 ACS Paragon Plus Environment

Page 29 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Current practice sees peak lists constructed for each sample type. Different samples or classes of samples will, almost inevitably, be described by different peak lists unless a universal peak list has been predetermined. Formation or extension of the data matrix using up-binning is not limited in this way.

Conclusion In this study, we have applied the up-binning approach to seven nylon samples that each have similar molecular compositions and structure. We have demonstrated clustering and classification of the samples using SOMs, in an unsupervised and supervised manner, directly from a large mass segmented ToF-SIMS data matrix. The universal data matrix was formed by overlaying the same mass segments across each carefully calibrated ToF-SIMS spectra. These results validate our previous study, involving an ideal data set comprising polymeric materials with very different molecular compositions and structures. The mass range from 1 – 500 m/z encompasses an abundance of features characterising the samples presented and was able to separate each sample set effectively via 10 × 10 UKNs and SKNs. The resulting models are customisable and the neural networks can be reconstructed when new samples are added to the input data matrix. In this study, self organising maps have been shown to be efficient in classifying different sample types and yet very tolerant of variance in the data matrix resulting, for example, from spectral noise or assignment of intensity from a single peak to multiple mass segments. It is helpful to consider PCA and SOMs side-by-side. With our challenging data sets, PCA can resolve the most obvious differences between the samples (N63T from the rest of the nylon samples) but struggles to resolve other sample types as the burden of noise in the data matrix 29 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 34

increases. PCA may also be impacted by spectral issues such as saturated peaks. SOMs, in contrast, are far less susceptible to aberrant data and background, clustering by similarity. SOMs do not provide a perfect classification but do under our approach take account of the entire mass spectrum and provide a compelling visualisation of relationships within and between sample types. Our work to date has shown the excellent promise of self organising maps to support the interpretation of complex and large scale ToF-SIMS data sets. Important questions remain regarding the optimal mass segment size, self-organising map size, and computational convergence. These are the subject of an upcoming study. Consideration must also be given to the information content of ToF-SIMS spectra to inform the choice of efficient data reduction conditions and to avoid information loss. Ultimately, the ToF-SIMS community must evolve efficient analytical workflows, inevitably impacted by machine learning concepts, to extract maximum information content from extremely large and complex data volumes.

Supporting Information Singular principal component analysis (PCA) scores plots (PC1-PC4) for positive and negative ion spectra; top 50 positive and negative loadings for PC1-PC4 for positive and negative ion spectra; spectral overlays of specific calibration points; spectral overlays of the overlap between spectra at low, mid and high mass units, comparative overlay between actual spectra and binned mass spectra; examples of the top 50 weightings for Nylon 6 on the 10 × 10 positive and negative unsupervised Kohonen networks (UKNs) and top 50 class weightings for each sample set on the positive 10 × 10 supervised Kohonen network (SKN).

30 ACS Paragon Plus Environment

Page 31 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

The positive and negative ion raw data matrices are available via the La Trobe University institutional repository via the following link, subject to publication embargo: http://dx.doi.org/10.26181/5bab07442d807

31 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 34

Acknowledgments This work was performed in part at the Australian National Fabrication Facility (ANFF), a company established under the National Collaborative Research Infrastructure Strategy, through the La Trobe University Centre for Materials and Surface Science. The authors acknowledge the Milano Chemometrics and QSAR Research Group for the development the Kohonen and CPANN Toolbox for MATLAB24-25.

References (1)

Fletcher, J. S.; Vickerman, J. C. Anal Chem 2013, 85, 610-639.

(2)

Clark, M.; Cramer, R. D. Quantitative Structure-Activity Relationships 1993, 12, 137-145.

(3)

Trindade, G. F.; Abel, M. L.; Watts, J. F. Chemometr Intell Lab 2017, 163, 76-85.

(4)

Cumpson, P. J.; Sano, N.; Fletcher, I. W.; Portoles, J. F.; Bravo-Sanchez, M.; Barlow, A. J. Surface and Interface Analysis 2015, 47, 986-993.

(5)

Cumpson, P. J.; Fletcher, I. W.; Sano, N.; Barlow, A. J. Surface and Interface Analysis 2016, 48, 1328-1336.

(6)

Hook, A. L.; Williams, P. M.; Alexander, M. R.; Scurr, D. J. Biointerphases 2015, 10, 019005.

(7)

Keenan, M. R.; Windig, W.; Arlinghaus, H. J Vac Sci Technol A 2015, 33, 05E123.

(8)

Konicek, A. R.; Lefman, J.; Szakal, C. Analyst 2012, 137, 3479-3487.

(9)

Tuccitto, N. Journal of Chemometrics 2018, 32, e2698.

(10)

Kalegowda, Y.; Harmer, S. L. Analytica Chimica Acta 2013, 759, 21-27.

(11)

Oyabu, M.; Tokutaka, H.; Ohkita, M.; Seno, M.; Ohki, M. In 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS), 2014, pp 1114-1119.

(12)

Carpenter, G. A.; Grossberg, S. Appl. Opt. 1987, 26, 4919-4930.

32 ACS Paragon Plus Environment

Page 33 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(13)

Sanni, O. D.; Henderson, A.; Briggs, D.; Vickerman, J. C. In 12th International Conference on Secondary Ion Mass Spectrometry, A. Benninghoven, P. B., H. -N. Migeon and H. W. Werner, Ed.; Elsevier Science B.V.: Brussels, Belgium, 1999, pp 805-808.

(14)

Sanni, O. D.; Wagner, M. S.; Briggs, D.; Castner, D. G.; Vickerman, J. C. Surface and Interface Analysis 2002, 33, 715-728.

(15)

Welch, N. G.; Easton, C. D.; Scoble, J. A.; Williams, C. C.; Pigram, P. J.; Muir, B. W. J Immunol Methods 2016, 438, 59-66.

(16)

Welch, N. G.; Madiona, R. M.; Easton, C. D.; Scoble, J. A.; Jones, R. T.; Muir, B. W.; Pigram, P. J. Biointerphases 2016, 11, 041004.

(17)

Welch, N. G.; Madiona, R. M.; Payten, T. B.; Jones, R. T.; Brack, N.; Muir, B. W.; Pigram, P. J. Langmuir 2016, 32, 8717-8728.

(18)

Welch, N. G.; Madiona, R. M.; Scoble, J. A.; Muir, B. W.; Pigram, P. J. Langmuir 2016, 32, 10824-10834.

(19)

Welch, N. G.; Scoble, J. A.; Easton, C. D.; Williams, C. C.; Bradford, B. J.; Mamedova, L. K.; Pigram, P. J.; Muir, B. W. Anal Chem 2016, 88, 10102-10110.

(20)

Welch, N. G.; Madiona, R. M. T.; Payten, T. B.; Easton, C. D.; Pontes-Braz, L.; Brack, N.; Scoble, J. A.; Muir, B. W.; Pigram, P. J. Acta Biomater 2017, 55, 172-182.

(21)

Madiona, R. M. T.; Welch, N. G.; Scoble, J. A.; Muir, B. W.; Pigram, P. J. Biointerphases 2017, 12, 031007.

(22)

Welch, N. G.; Lebot, C. J.; Easton, C. D.; Scoble, J. A.; Pigram, P. J.; Muir, B. W. J Immunol Methods 2017, 446, 70-73.

(23)

Madiona, R. M. T.; Welch, N. G.; Russell, S. B.; Winker, D. A.; Scoble, J. A.; Muir, B. W.; Pigram, P. J. Surface and Interface Analysis 2018, 50, 713-728.

(24)

Ballabio, D.; Consonni, V.; Todeschini, R. Chemometr Intell Lab 2009, 98, 115-122.

(25)

Ballabio, D.; Vasighi, M. Chemometr Intell Lab 2012, 118, 24-32.

33 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 34

(26)

Burden, F. R.; Winkler, D. A. Qsar & Combinatorial Science 2009, 28, 1092-1097.

(27)

Burden, F. R.; Winkler, D. A. Journal of Medicinal Chemistry 1999, 42, 3183-3187.

(28)

Burden, F. R.; Winkler, D. A. Journal of Chemical Information and Modeling 2015, 55, 1529-

1534.

For TOC only

34 ACS Paragon Plus Environment