A New Chemometric Approach for Automatic ... - ACS Publications

Oct 19, 2017 - without any cleaning. The main idea of this concept is based on an automated curve fitting of most relevant vibrational bands to ... th...
3 downloads 11 Views 1022KB Size
Subscriber access provided by Gothenburg University Library

Article

A New Chemometric Approach for Automatic Identification of Microplastics from Environmental Compartments Based on FT-IR Spectroscopy Gerrit Renner, Torsten C Schmidt, and Jürgen Schram Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b02472 • Publication Date (Web): 19 Oct 2017 Downloaded from http://pubs.acs.org on October 21, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

A New Chemometric Approach for Automatic Identification of Microplastics from Environmental Compartments Based on FT-IR Spectroscopy Gerrit Renner,†,‡ Torsten C. Schmidt,‡ and Jürgen Schram∗,† †Instrumental Analytical and Environmental Chemistry, Faculty of Chemistry, Niederrhein University of Applied Sciences, Frankenring 20, D-47798 Krefeld, Germany ‡Instrumental Analytical Chemistry and Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr. 5, D-45141 Essen, Germany E-mail: [email protected] Phone: +49 (0)179 1043284

Abstract

samples were compared with a selection of common reference plastics and bio polymers. As it turns out, the accuracy of identification rises significantly from 76 % by means of conventional library searching algorithms to 96 % by identifying microplastics with our new method. Therefore, the new approach can be a useful tool to compare and describe similarities of FTIR spectra of microplastics, which may improve further research studies on this topic.

One key step studying interactions of microplastics with our ecological system is to identify plastics within environmental samples. Ageing processes and surface contamination especially with biofilms impede this characterisation. A complex and time consuming cleaning procedure is a common solution for this problem. However, it implies an artificial change of sample composition with a risk of losing important information or even damaging microplastic particles. In the present work, we introduce a new chemometric approach to identify heavily weathered and contaminated microplastics without any cleaning. The main idea of this concept is based on an automated curve fitting of most relevant vibrational bands in order to calculate a highly characteristic fingerprint that contains all vibrational band area ratios. This new data set will be used to estimate the similarity of samples and reference standards for identification. A total of 300 individual naturally weathered plastic particles were measured with Fourier transformation infrared spectroscopy in attenuated total reflection mode (FT-IR ATR) and identified successfully with the new method. To that end, all

Introduction Infrared (IR) spectroscopy is one of the most common and robust analytical techniques to identify and characterise synthetic polymers, as it is fast, relatively cheap and highly selective. 1–3 In normal case, IR spectra of unknown polymer samples are evaluated by comparing these to a library of known reference polymer spectra in order to find matches. Regarding this, huge IR spectral databases for polymers are available. 4 It is, therefore, not surprising that IR spectroscopy techniques are also used to identify microplastics within environmental samples. 5–10 In this context, a library searching algorithm calculates a similarity value called

ACS Paragon Plus Environment

1

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Hit Quality Index for each spectrum within the database and generates a top list of most likely matching substances. It is recommended to double check this list and identify the unknown samples based on a visual comparison. 11–13 However, analysis of environmental microplastics is a high sample throughput task and it is not feasible to evaluate every single FT-IR spectrum. In practice microplastics are identified automatically by simply choosing the top matching candidates. 14,15 Though result uncertainty of this classification task rises significantly in contrast to pure virgin plastics due to degradation, 16 (bio)chemical contamination 17,18 or physical scattering effects. 8 Because of these, complex and time consuming isolation and cleaning procedures are common solutions, 5,17–22 although this implies an artificial change of sample composition with a risk of damaging or even losing microplastic particles. 19,23–25 Furthermore, there is no guarantee that spectra qualities are sufficient to find adequate matches with Hit Quality Indices greater than 0.7 in a reference spectra library which is one recommended criterion for library searching. 14,26 In such cases, the evaluation method of choice is manual supported interpretation, which requires much experience and is very time consuming. This problem increases especially in µm scaling, where sample number rises even more, 7,27–29 and additionally, spectra qualities are significantly worse. 30,31 In order to overcome these limitations, we present an alternative very robust method called “Microplastics Identification” (µIDENT ) which allows to identify and characterise even heavily weathered microplastics without any complex cleaning procedure. This concept bases on an accurate and fully automated evaluation of almost every individual vibrational band in an FT-IR spectrum, considering that the sample spectra consist of polymer signals interfered by a highly variable matrix. Therefore, the analytical task is to identify and filter those vibrational bands that belong to microplastics and compare these highly characteristic data sets to a microplastic database. Using only vibrational bands of microplas-

Page 2 of 15

tics for identification instead of the complete spectrum is much more robust since uncharacteristic information like noise, water content and physical baseline effects are not taken into account. Analogously to conventional library searching, our evaluation algorithm calculates a dissimilarity value, which can be considered an equivalent of a Hit Quality Index. sectionMaterials and Methods

References and Samples As this study presents a new identification algorithm for FT-IR spectra of microplastics, common and relevant polymers were analysed in a first step to create a customised representative database of reference standards. According to this, pre-consumer plastic granules, powders and films were measured, as listed in Tab. 1. In addition, milled shrimp shells were analysed in order to take typical natural polymers as possible matrix or contaminants into account. Furthermore, about 2000 individual microplastic particles were collected at two different beach areas in Texel, northern Netherlands and Fehmarn, northern Germany, respectively. In Texel, sampling was performed after flooding using a net with a mesh size of 0.33 mm, while in Fehmarn all particles were picked up individually in a range of 100 m beach, considering that these fragments represent larger microplastics with a diameter > 1 mm. Respecting that this study focusses on automatic microplastics identification, we did not separate the analysis results but treated all samples as one analytical task.

Sample Preparation and Measurement Setup One key aspect of an analytical method is practicability, which includes especially sample pretreatment. Therefore, all references were measured directly without any further preparation step, knowing well that environmental microplastics can be covered with biofilms or other organic contaminants. This risk was deliberately accepted as it also tested the robustness of the new identification algorithm. One

ACS Paragon Plus Environment

2

Page 3 of 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 1: Overview of the analysed polymer reference standards. Polymer

Label

Shape

Diameter

Source

Polyethylene Polypropylene Polyamide Polyvinyl chloride Polycarbonate Polyurethane Polystyrene Polyethylene terephthalate Chitin

LDPE PP PA-6 PVC PC PU PS PET Chitin

granule granule granule powder granule flakes film granule powder

5 mm 5 mm 5 mm 0.1 mm 5 mm 20 mm 0.1 mm 5 mm 0.5 mm

Carat GmbH Carat GmbH Carat GmbH Solvay Covestro AG Carl Bernh. Hoffmann GmbH & Co. KG Thermo Fisher Scientific Inc Carat GmbH commercially available shrimps (shells milled)

sample with very similar Hit Quality Indices, 11 and moreover, the proposed identification result could be wrong, which is shown exemplarily in Tab. 2. In this case, an experienced user has to evaluate the spectrum manually which is a very time consuming process, considering that analysis of microplastics is a high sample throughput task.

exception were shrimp shells which were milled with a RETSCH Ultra Centrifugal Mill ZM 200 with a stainless steel ring sieve (size = 0.5 mm) at 18000 rpm. In addition, all environmental samples were slightly cleaned with pure water and dried at 60 ◦ C using a drying cabinet for at least 24 h. To ensure representative results, 300 out of 2000 randomly chosen microplastic particles were analysed. All references and samples were measured with FT-IR ATR and thereby a spectral resolution of 2 cm−1 and 60 scans were chosen. All measurements were performed with a Shimadzu IRTracer-100 combined with Specac Quest ATR Diamond Accessory. Considering that this study focusses on the principles of a new identification algorithm, the very complex analysis of microplastic fibres were not taken into account, as this would also require the use of an FTIR microscope.

Table 2: Effect of weathering on library searching results by means of Hit Quality Indices. Two polypropylene samples are compared with polypropylene and chitin references. These samples represent microplastics with a thin and thick biofilm on their surface which affects the identification process. Weathering Grade

Manual Evaluation

1st Hit (HQI )

2nd Hit (HQI )

slightly heavily

PP PP

PP (0.93) Chitin (0.92)

Chitin (0.80) PP (0.83)

The main idea of the new evaluation method is to eliminate all irrelevant data and focus on vibrational bands only, as this will improve library searching. On the one hand, the new method requires slightly more computing time than many conventional library searching algorithms, as those perform less complex data preprocessing. However, on the other hand, the accuracy of automated identification rises significantly, which reduces the number of spectra that have to be evaluated manually. Assuming that the latter step is most time consuming, the new µIDENT algorithm should be more practicable.

The µ IDENT Method General Approach Information content of spectral data within an infrared spectrum is not evenly distributed, as most of the significant characteristics are connected to vibrational bands. Regarding this, the biggest part of the recorded data contains nothing but noise or scattering effects. However, a conventional library searching algorithm uses complete spectra, no matter whether these contain mostly irrelevant data points. This entails the risk that very different references, e.g., polypropylene and chitin, will match with one

ACS Paragon Plus Environment

3

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 15

Development of a Data Preprocessing Routine Baseline Correction In a first step within data preprocessing, baseline effects have to be compensated. These interferences originate commonly from physical scattering of the infrared light at the highly variable sample surface and can hardly be avoided by sample pretreatment. Furthermore, baselines of environmental microplastic FT-IR spectra are highly variable due to sample geometrical dependency, which can be observed over very broad wavenumber ranges. 12,32–34 In comparison to sharp vibrational bands, the derivatives of those broad banded baselines are often negligible. This fact can be used to minimise baseline interferences to an insignificant level while maximising signal intensities of sharp vibrational bands by transforming the FT-IR spectra into their first derivatives. 35–37 This preprocessing step is realised by the Savitzky–Golay–Algorithm, 38 which is a commonly used technique in FT-IR spectroscopy 39 to differentiate and smooth data numerically.

Figure 1: Moving correlation 40 (moving window = 11) of A) slightly weathered polypropylene microplastics and B) heavily weathered polypropylene microplastics with a polypropylene reference (dotted line). Wavenumber ranges that show a Pearson correlation coefficient > 0.95 are highlighted. It can be observed that only vibrational bands correlate with the chosen reference. However, the degree of weathering has no significant effect.

tional band maximum corresponds to a zero in its first derivative, while the inflection points correspond to local maximum or minimum, respectively. At this point, it has to be mentioned that not every local maximum or minimum originates from vibrational bands, as the signal to noise ratio decreases during the differentiation process. For this reason, a threshold has to be defined by analysing the noise level of the entire differentiated spectrum to distinguish vibrational bands from noisy spikes. In this context, a histogram analysis and distribution fitting of each differentiated spectrum is performed, making use of the relation between noise level and peak widths of the distribution curves, as can be observed in Fig. 2(A). With respect to the estimated threshold, a change of signs between local maximum and minimum defines the vibrational band positions, as shown in Fig. 2(B). However, besides these positions, the corresponding signal intensity values, i.e., peak amplitudes or areas are required to perform comparisons with IR reference spectra. In some cases, it would be possible to use the local maxima of the differentiated spectra, but commonly this approach is not suitable as IR spectra are very complex and consist of many overlapping vibrational bands. In consequence, all spectra are fitted with the-

Vibrational Band Detection In contrast to conventional library searching processes, not the complete dataset but only vibrational bands are used for identification, as the major part of an FT-IR spectrum contains nothing but noise and matrix signals, which can be seen in Fig. 1, where slightly and heavily weathered polypropylene microplastics are correlated to a polypropylene reference with a moving correlation algorithm. 40 In that respect, the two different degrees of weathering are related to their biofilm interference within the measured infrared spectra, and they cover the two extreme levels of a degradation based interference during the identification process. Therefore, one essential step of the evaluation method is automatic vibrational band detection, and considering that all spectra are transformed into their first derivative, every further step bases on those differentiated data. From a mathematical point of view, a vibra-

ACS Paragon Plus Environment

4

Page 5 of 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Where I (˜ ν ) describes the absorbance at a certain wavenumber ν˜, while ν˜0 defines the vibrational band position. The parameters A and γ are vibrational band area and dynamic full width at half maximum, respectively, where the latter is a logistic function of the asymmetry factor a and the static full width at half maximum γ0 . The form factor f describes the mixing ratio of Lorentzian and Gaussian shape. As already mentioned above, all data are differentiated resulting in a derivative variant of the asymmetric pseudo Voigt function. In addition, one key step of curve fitting is to estimate the initial fit parameters and in this context the used function is customised for an optimal starting point estimation, as shown in Eqs. 2-3.

Figure 2: Workflow of peak detection: The IR spectrum A) is differentiated B). Using these data, a histogram analysis C) is performed in order to estimate the noise distribution of the entire IR spectrum. The width of this Lorentzian distribution, which encloses 90 % of the corresponding distribution area, is used as threshold to find local extrema in B). In a last step, all zeros between these extrema are determined, as they describe the vibrational band positions. For peak identification, one local extremum has to exceed the estimated threshold.

Lorentzian* = 1+ Gaussian* =

Curve Fitting Within this study, an asymmetric pseudo Voigt function is used to fit single vibrational bands, as Stancik and Brauns demonstrated that this is very suitable for curve fitting based on complex and overlapped IR spectra. 41 The function contains five parameters and describes one individual vibrational band (Eq. 1). pseudo Voigt: I (˜ ν ) = f · Lorentzian + (1 − f ) · Gaussian Lorentzian: =

2A/πγ 1 + 4 [(˜ ν − ν˜0 ) /γ]

2

Gaussian: I (˜ ν)

=

A γ

r

" 2 #  4 ln 2 ν˜ − ν˜0 exp −4 ln 2 π γ

Asymmetry Modulation: γ (˜ ν , ν˜0 , a)

=

ν ˜−˜ ν0 γ∗

"

1 A exp − 2 ∗

2



ν˜ − ν˜0 γ∗

(2) 2 #

(3)

Where A∗ represents the vibrational band height and γ ∗ characterises the width from peak centre to the peak inflection point. The basic idea of this customisation is to derive the main fit parameters ν˜0 , A∗ , γ0∗ and a by analysing the inflection points of each vibrational band, which can be observed as a pair of local maximum and minimum within the differentiated spectra. These characteristic local points can be detected fast and easily and they are already used within this method to detect vibrational bands, as it is shown in Fig. 2(B). In consequence, the parameter γ0∗ can be estimated with regard to the distance between local maximum and minimum, respectively, and the vibrational band height A∗ is given by the intensity of the corresponding local maximum and γ0∗ , as it is shown in Fig. 3. The asymmetry factor a can be deduced indirectly with an empirical approach, which is given by Eqs. 4-5. A more detailed analysis of this function is shown in the supplement information.

oretical functions in order to characterise each vibrational band individually.

I (˜ ν)

1 3

A∗ 

2γ0 1 + exp [a (˜ ν − ν˜0 )]

a

=

R

=

 p1 − 1 · p3 − ln R · exp [p2 · γ0∗ ] + 1 Imax + Imin Imax − Imin 

(4) (5)

As a result, all significant vibrational bands (1)

ACS Paragon Plus Environment

5

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 15

Figure 3: By determination of local maximum, minimum and zero of differentiated infrared spectra, all key parameters of vibrational bands can be estimated.

within the IR-spectrum of microplastics or polymer references can be estimated very well, which is shown in Fig. 4. Additionally, this process is performed in adequate calculation time of ≈ 6000 spectra per minute assuming that each spectrum consists of 30 vibrational bands.

Figure 5: A) Comparison of raw polypropylene IR spectra with different weathering degree. B) Comparison of calculated polypropylene IR spectra. The calculation is based on the corresponding raw data from A). It can be observed that baseline effects in B) are minimised. In addition, similarities and dissimilarities of vibrational bands are better recognisable. In case of heavily weathered microplastics, some small vibrational bands cannot be observed, whereas it contains additional signals which originate from matrix components.

eas. However, no relevant characteristics for a polymer identification got lost, as that information is conserved within the vibrational bands. One important aspect is that not every isolated vibrational band can be associated with a polymer vibration but also with matrix, additive, contaminants, e.g., biofilms, sand, wood or other (in)organic matter. This circumstance has to be considered during further data comparison or microplastics identification, respectively.

Figure 4: Comparison of a raw polypropylene IR spectrum and calculated IR spectrum. The calculation is based on the estimated initial fit parameters. A) illustrates this comparison based on the corresponding absorbance spectra while B) shows the first derivative of these data.

In a further step, all parameters are fitted using the Trust-Region-Algorithm, 42 under the consideration that each vibrational band adds a set of five parameters to the fit function. As a result, every vibrational band can be isolated and reconstructed (Fig. 5), using its parameter set in order to calculate the vibrational band areas, which are required for further data comparison. At this point, the incoming untreated FT-IR spectrum, which contains approximately 2000 data pairs of wavenumbers and absorbances, was compressed into a compact dataset with 5 to 50 pairs of vibrational band positions and ar-

Development of a Library Searching Algorithm Characterisation and Internal Normalisation of Data The presented data preprocessing routine condenses characteristic information of a microplastic sample by means of an infrared spectrum, as only vibrational bands are taken into account for comparison analysis. However, this procedure requires many customisations during further evaluation, considering that the data structure has changed from a complete

ACS Paragon Plus Environment

6

Page 7 of 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

based normalisation has many obvious disadvantages like its open range of ]0, ∞], a modified variant of the Canberra distance or relative difference can be used for normalisation, which is shown in Eq. 9. 45

absorbance vector to a fragmentary set of characteristic positions and areas of vibrational bands, as illustrated in Eq. 6.  I1  I2     I3     ..   .  ν˜n In

ν˜1 ν˜2 ν˜3 .. .



  ν˜27 Λ1  ν˜102  Λ  2 ⇒ ν˜266  Λ3  ν˜815 Λ4

˜ i,j = Λi − Λj [Λi , Λj ] → Λ Λi + Λ j

(6)

˜ i,j is a normalised vibrational band Where Λ area. The range of this approach is limited [−1, +1] and the discrimination of extreme values is less pronounced. As a result, all normalised entries of the highly characteristic fingerprint matrix have the same scaling, which can be observed in Fig. 6.

discrete vector

continuous vector

Where ν˜i describes the wavenumber or vibrational band position, Ii and Λi characterise absorbance and vibrational band area, respectively. In addition, all calculated vibrational band areas have to be normalised to ensure that the high variable sample geometry and layer thickness has no effect on the characterisation process. In this context, one common normalisation algorithm is Min-Max-Scaling, which is defined by Eq. 7. 43 xnormalized =

x − min (x) max (x) − min (x)

Figure 6: Comparison of range and distribution of normalised vibrational band areas of polypropylene by means of Box-and-Whisker Plot. A) Ratio based normalisation leads to a broad range with many extreme values. B) Canberra distance based normalisation leads to a small range without any extreme values.

(7)

However, environmental microplastics are no homogeneous substances but a complex mixture of variable components, i.e., additives, sand, wood, biofilm etc. 21,23,44 As a result, normalisation can be adversely affected by these disturbances, if max (x) does not belong to a polymer vibrational band. To that end, instead of using only the two extremes of a data set, all possible and unique permutations of every data pair should be taken into account for normalisation, as it is shown in Eq. 8. This combination of vibrational band pairs is derived from the Cartesian Product.   ν˜27 Λ1  ν˜102  Λ  2→ ν˜266  Λ3  ν˜815 Λ4

ν˜27  ν˜27 ν˜102   [Λ2 , Λ1 ] ν˜266  [Λ3 , Λ1 ] ν˜815 [Λ4 , Λ1 ]

ν˜102

Development of a Distance Function The infrared spectrum of microplastics was summarised as a highly characteristic triangular matrix, which contains all normalised vibrational band areas. In a next step, several of these matrices have to be compared to calculate some kind of Hit Quality Index that characterises the similarity of the initial samples and references. Classical data comparison algorithms, e.g., Euclidean distance or CauchySchwarz inequality do not work with fragmentary or missing data due to the pairwise correlation approach of common distance functions. 46,47 Therefore, a new distance algorithm had to be developed in order to calculate the similarities of these kind of data. The central problem is that every sample could have an individual set of properties or vibrational bands, respectively, but a chemometric similarity analysis can only be realised for

ν˜266

[Λ3 , Λ2 ] [Λ4 , Λ2 ]

 [Λ4 , Λ3 ]

(9)

  

(8)

This procedure leads to a lower triangular matrix which can be understood as a highly characteristic fingerprint of the infrared spectrum that contains all normalised vibrational band areas. In this context, a suitable normalisation function to calculate all [Λi , Λj ] entries has to be evaluated. Considering that a simple ratio

ACS Paragon Plus Environment

7

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

data sets with common properties. If two spectra have only one common vibrational band, the corresponding similarity can only be calculated by means of this property. In infrared spectroscopy, vibrational band positions cannot be determined with certainty, as frequency shifts or overlapping effects of multiple vibrational bands can occur. Therefore, it seems useful to compare two vibrational bands with slightly different positions with the restriction that similarity decreases with increasing distance of the corresponding vibrational band positions. To implement this approach, a logistic function was chosen, as this allows to set two discrete thresholds of statistical relevance (Eq. 10). di = e[−k1 ·∆˜νi −k2 ] + 1 with k1 = 1.2,

−1

of two different data sets, e.g., an unknown microplastic sample and a polypropylene reference. However, considering that every vibrational band area has an individual result uncertainty, it seems useful to weight the partial dissimilarities di . Regarding this, the key parameters height A∗ and width γ0∗ are combined in order to weight the partial dissimilarities di , as shown in Eq. 12.



k2 = 9,

(10)

k2 = 9

k3 = 4.7,

(12)

Where wi is a weighting factor which can be calculated for each fitted vibrational band, taking into account the corresponding vibrational band area Λi and the partial root relative squared error RRSEi , 48 which encloses 90 % of the vibrational band. This ensures that a comparison of two data sets is mostly influenced by well fitted vibrational bands. As mentioned above, only vibrational bands with common positions are used to characterise the dissimilarity of two IR spectra. However, every vibrational band is characterised by a certain significance, which can be calculated with Eq. 12. This could have the consequence that only a low percentage of the overall significance has been taken into account for identification. Hence, it seems suitable to consider this aspect by implementing a penalisation system that increases the dissimilarity depending on the relative amount of the used significance, which is shown in Eq. 13.

−1   ˜ e[k1 ·∆˜νi −k2 ] + 1 · e[k3 ·∆Λi −k4 ] + 1

with k1 = 1.2,

Λi RRSEi

wi =

Where d describes the dissimilarity of two vibrational band positions from two different infrared spectra within the comparison algorithm. The empirical parameters k1 and k2 lead to a dissimilarity of 5 % for ∆˜ ν = 5 cm−1 and 95 % −1 for ∆˜ ν = 10 cm . Using this function, a list that contains paired similar vibrational band positions of two different infrared spectra can be filtered out. In a further step, vibrational band areas are compared, and therefore, the distance function (Eq. 10) is customised as follows: di = 1 −

Page 8 of 15

W =

k4 = 5.3

(11)

wused wall

(13)

Where W is the fraction of the used significance of all data. In combination of the partial dissimilarities di from Eq. 11 and the weighting factors wi from Eq. 12 an overall dissimilarity D of two IR spectra can be calculated as follows:

˜ is the difference of two normalized Where ∆Λ vibrational band areas from two different infrared spectra. Analogously to the empirical parameters k1 and k2 , k3 and k4 are set to define statistical thresholds at an area difference of 0.5 and 1.75, respectively. A detailed development and estimation of all k-parameters is shown in the supplement information.

D=

P

[d · w ] Pi i wi

W

(14)

At last Eq. 14 can be used to calculate a Hit Quality Index as follows:

Weighted Distance and HQI

HQI = 1 − D

The presented distance function (Eq. 11) returns partial dissimilarities di , which can be averaged to characterise a mean dissimilarity

(15)

Analogously to conventional search algorithms, this index will be calculated for each

ACS Paragon Plus Environment

8

Page 9 of 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

sample combined with every reference, but in contrast, the determination is only based on relevant data points. This should lead to higher HQI values for reference spectra with similar vibrational bands and lower HQI values for reference spectra with dissimilar vibrational bands. As a result, identification of environmental microplastics is much more robust and assignment certainty rises due to a big gap between best and second best HQI value, as shown in Fig. 7.

Canberra Distance for all unique permutations of every vibrational band area pair

4. Calculation of Hit Quality Index Comparison based on weighted bivariate logistic distance

The whole procedure was realised in MATLAB as a fully automated software application, and the source code of the core functions is published in the supplement.

Examination & Discussion of µ IDENT Identification of Microplastics from Beach Sampling All measured environmental microplastic samples (n=300) have been identified manually to allow evaluating result accuracy of the new and conventional library searching algorithms. To ensure result accuracy, two experts identified all samples independently by manual comparison of the measured spectra with a reference database and they agree in all examined cases. These reference results are summarised in Tab. 3. Most microplastic particles at the sampled two beach areas were identified as polyethylene (48.8 %) and polypropylene (24.2 %). This seems plausible, considering that these polymers are most commonly produced 49,50 and have a low density, which enables floating on the water surface. Consequently, it is not surprising that many other researchers have reported similar results. 7,44,51–53 All FT-IR spectra were fully automatically evaluated with our new µIDENT method. For identification, we chose 0.7 as a minimal Hit Quality Index and 0.1 as a minimal Hit Quality Index difference to the second-best candidate, which are typical requirements for library searching evaluation. 12–14,34 All samples that did not fulfil this statistical certainty, were marked as “not identified”. In order to identify copolymers or polymer mixtures, the corresponding HQI values of each reference polymer

Figure 7: A) and B) show IR spectra of weathered polypropylene microplastics. C) and D) show the three best results of a spectral library searching. The conventional library searching results, light grey bars, are very close to each other and in case of heavily weathered microplastics D) the assignment suggestion is incorrect. In contrast, µIDENT allows much better predictions and the difference between best and second best hit is much larger allowing unequivocal assignment despite the severe weathering.

µ IDENT in a Nutshell The new microplastics identification method can be summarised into four fundamental steps: 1. Baseline Correction Derivative, Savitzky–Golay–Smoothing

2. Vibrational Band Integration Curve fitting of the differentiated spectrum, asymmetric pseudo Voigt function

3. Internal Vibrational Band Area Normalisation

ACS Paragon Plus Environment

9

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 15

Table 3: Characterisation of the identification results using µIDENT based on reference results by visual spectra evaluation. All “Not Identified” samples are counted as “False Negative” (Type II Error). Total amount of analysed particles: n = 300 Polymer

Polyethylene Polypropylene PE/PP Copolymer/Blend Polyamide Polyurethane Polystyrene Polyvinyl chloride Polycarbonate Polyethylene terephthalate Chitin Total

Reference Identification

µIDENT Results

Success Rate

Type I error

Type II error

“Correct Identified”

“Incorrect Identified”

“Not Identified”

% of Particles

% of Particles

% of Particles

% of Particles

% of Particles

48.8 24.2 4.2 2.8 0.7 14.0 0.0 0.4 0.0 4.9 100.0

48.1 24.2 3.2 2.5 0.4 14.0 0.0 0.4 0.0 3.9 96.5

98.6 100.0 75.0 87.5 50.0 100.0 n.A. 100.0 n.A. 71.4 96.1

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 7.1 0.4

1.4 0.0 25.0 12.5 50.0 0.0 n.A. 0.0 n.A. 28.6 3.5

have to be greater than 0.7 and the HQI difference of these candidates has to be lower than 10 %. In a next step, the quality of the identification algorithm was characterised by determining the success rate (“true positive” or “correct identified”), rate for type I (“false positive” or “incorrect identified”) and type II error (“false negative” or “not identified”) 54,55 for every involved polymer reference, based on results from manual identification. According to this, Tab. 3 gives an overview of the corresponding classification results. Tab. 3 indicates that 96.1 % of all samples could be identified correctly with the new µIDENT method. Moreover, a closer look at every single search result shows that the new search algorithm is highly robust (HQImean = 0.92 ± 0.15) and selective (∆HQImean = 0.67 ± 0.21), which is shown in Fig. 8.

Figure 8: Scatter plot of every calculated HQI – ∆HQI – pair using the new search algorithm. 96.5 % of all data points (dark grey circles) are within the required identification zone (light grey). The plotted ▽ represent samples that could not be identified.

Comparison with Classical Library Searching

searching algorithms was performed. The analysed search algorithms are representative and used by common infrared spectrometer software. All algorithms involved in this comparison are listed in Tab. 4. Analogously to the new µIDENT method, all samples were evaluated with conventional search algorithms from Tab. 4. The corresponding HQI values were calculated, and based on this data, “true positive”, “false positive” and

Unfortunately, it is not possible to rank directly the performance for automatic microplastics identification from section: Identification of Microplastics from Beach Sampling. In normal case, accuracy of library searching results is not relevant, as HQI is often only a tool to support manual identification. For this reason, only a comparison with various common library

ACS Paragon Plus Environment

10

Page 11 of 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 4: Overview of the analysed library search algorithms in order to classify the performance of the new identification algorithm. s and r are normalised sample and reference spectra. Name

Formula

Absolute Distance



Euclidean Distance

"

1+

Figure 9: Correctly identified vs. falsely negative or not identified microplastics. The ROC space can be divided into three parts. Not Working: < 50 % of identifications are correct, Working: > 50 % of identifications are correct and Working well : > 75 % of identifications are correct.

s·r ksk krk "

k∂s − ∂rk2 1+ k∂rk2

fication methods. However, this does not apply to second order derivative due to worse signal to noise ratios. Dot Product and Euclidean Distance differ significantly from the diagonal, which indicates a significant “false positive” rate. In these cases, statistical criteria for automatic microplastics identification are fulfilled but the identification results are wrong. This is an error of high concern and should be avoided at all cost, and therefore, these algorithms cannot be recommended. The performed algorithm comparison based on real environmental microplastics confirmed the robustness and accuracy of the novel µIDENT method. As predicted, the performance of library searching increases significantly by focussing on relevant data and separate these from irrelevant noise and matrix signals in a former preprocessing step.

#−1

∂s · ∂r k∂sk k∂rk

Pearson Correlation Coefficient 1st Derivative

Pearson Correlation Coefficient 2nd Derivative

#−1

ˆs · ˆr

Pearson Correlation Coefficient

Euclidean Distance 2nd Derivative

−1

ks − rk2 1+ krk2

Dot Product

Euclidean Distance 1st Derivative

ks − rk krk

"

1+

#

2

∂ s − ∂ 2 r 2 −1 k∂ 2 rk2

∂2s · ∂2r k∂ 2 sk k∂ 2 rk

“false negative” rates were determined. In a next step, the performance of every algorithm was plotted in a Receiver Operating Characteristic space (ROC ), 54,55 which is shown in Fig. 9. However, we decided to use “false negative” rate as X–axis due to most “false positive” rates being negligible. Regarding this, it has to be mentioned that not identified samples are classified as “false negative”. As it can be observed in Fig. 9, only two of the analysed conventional search algorithms produce reliable data. According to this, Pearson Correlation Coefficient of both normalised and differentiated spectra are suitable identi-

Strengths, Opportunities and Limits of µ IDENT The presented µIDENT method works very robust and accurate with a success rate of 96.1 %, and therefore, it is a suitable identification technique for microplastics in environmental samples. As demonstrated, no complex sample preprocessing is required, which underlines practicability of µIDENT. The critical error type

ACS Paragon Plus Environment

11

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

I rate is very low (0.4 %), as a similar pattern of vibrational bands is the basic prerequisite for a positive assignment. Moreover, the whole evaluation process is fully automatable which makes this method interesting for chemical imaging applications. One advantage of the µIDENT algorithm is a high sensitivity for changes in chemical structure. This originates from using ratios of vibrational bands instead of absolute signals. Therefore, properties like crystallinity or degree of oxidation can be determined, as these parameters have an effect on vibrational band ratios. 56–58 Although these changes decrease similarity to the corresponding polymer reference standard, the HQI order is very robust because similarities to other references will not change. Regarding this, monitoring of ageing processes in order to study influences of natural weathering on microplastics can be a possible application. Theoretically, the µIDENT algorithm is not limited to FT-IR spectroscopy and can be customised to handle any kind of data in order to characterise similarities of objects. In this context, it seems to be possible to apply µIDENT on Raman data without any major modifications. At last, the data compression from approximately 4000 data points per spectrum to 100 data points reduces significantly memory and computing power requirements. This allows the usage of miniaturised mobile systems like handhelds or the evaluation of Big Data applications like chemical imaging or similar multidimensional analysis, respectively. At current state, the µIDENT algorithm is restricted in some aspects. The automatic vibrational band detection is one of those weak points, if peaks overlap too much. In these cases, the multiplet is treated as one vibrational band and this may cause difficulties in the further comparison, as some vibrational bands are missing. For polyethylene, this problem is of high concern, as there are only few vibrational bands anyhow. Therefore, spectral resolution is restricted to a minimum of 2 cm−1 . Furthermore, ratios of vibrational bands can vary, due to geometrical dependency 59 which limits the sensitivity for very small changes. Further re-

Page 12 of 15

search already started focussing to solve these problems by studying the effects of weathering processes to the infrared spectrum of microplastics. Thereby, we would hopefully further improve sensitivity and robustness of µIDENT. Supporting Information Available: The following files are available free of charge: renner_esi.pdf: Detailed derivation of the empirical functions and parameters; Source code with additional notifications. mIDENT_specFit.txt: Source code of the presented spectral preprocessing. mIDENT_dissimilarity.txt: Source code of the presented identification routine. ffunc9.txt: Source code of the fit function. This material is available free of charge via the Internet at http://pubs.acs.org/.

References (1) Musto, P.; Tavone, S.; Guerra, G.; Rosa, C. D. J. Polym. Sci., Part B: Polym. Phys. 1997, 35, 1055–1066. (2) Gulmine, J.; Janissek, P.; Heise, H.; Akcelrud, L. Polym. Test. 2002, 21, 557–563. (3) Saviello, D.; Toniolo, L.; Goidanich, S.; Casadio, F. Microchem. J. 2016, 124, 868–877. (4) Hummel, D. O. Macromol. Symp. 1997, 119, 65–77. (5) Thompson, R. C. Science 2004, 304, 838– 838. (6) Cole, M.; Lindeque, P.; Halsband, C.; Galloway, T. S. Mar. Pollut. Bull. 2011, 62, 2588–2597. (7) Hidalgo-Ruz, V.; Gutow, L.; Thompson, R. C.; Thiel, M. Environ. Sci. Technol. 2012, 46, 3060–3075. (8) Harrison, J. P.; Ojeda, J. J.; RomeroGonzález, M. E. Sci Total Environ 2012, 416, 455–463.

ACS Paragon Plus Environment

12

Page 13 of 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(23) Nuelle, M.-T.; Dekiff, J. H.; Remy, D.; Fries, E. Environ. Pollut. 2014, 184, 161– 169.

(9) Rocha-Santos, T.; Duarte, A. C. TrAC, Trends Anal Chem 2015, 65, 47–53. (10) Song, Y. K.; Hong, S. H.; Jang, M.; Han, G. M.; Rani, M.; Lee, J.; Shim, W. J. Mar. Pollut. Bull. 2015, 93, 202–209.

(24) Imhof, H. K.; Schmid, J.; Niessner, R.; Ivleva, N. P.; Laforsch, C. Limnol. Oceanogr. Methods 2012, 10, 524–537.

(11) ASTM International, Standard Guide for Use of Spectral Searching by Curve Matching Algorithms with Data Recorded Using Mid-Infrared Spectroscopy; ASTM E231004, ASTM International, 2015.

(25) Nor, N. H. M.; Obbard, J. P. Mar Pollut Bull 2014, 79, 278–283. (26) Woodall, L. C.; Sanchez-Vidal, A.; Canals, M.; Paterson, G. L. J.; Coppock, R.; Sleight, V.; Calafat, A.; Rogers, A. D.; Narayanaswamy, B. E.; Thompson, R. C. R. Soc. Open Sci. 2014, 1, 140317–140317.

(12) Smith, B. Fundamentals of Fourier Transform Infrared Spectroscopy, Second Edition; CRC Press, 2011. (13) Boruta, M. Spectroscopy 2012, 27, s26– s33.

(27) Levin, I. W.; Bhargava, R. Annu. Rev. Phys. Chem. 2005, 56, 429–474.

(14) Yang, D.; Shi, H.; Li, L.; Li, J.; Jabeen, K.; Kolandhasamy, P. Environ. Sci. Technol. 2015, 49, 13622–13627.

(28) Kusui, T.; Noda, M. Mar. Pollut. Bull. 2003, 47, 175–179.

(15) Primpke, S.; Lorenz, C.; RascherFriesenhausen, R.; Gerdts, G. Anal. Methods 2017, 9, 1499–1511.

(29) Doyle, M. J.; Watson, W.; Bowlin, N. M.; Sheavly, S. B. Mar. Environ. Res. 2011, 71, 41–52.

(16) Corcoran, P. L.; Biesinger, M. C.; Grifi, M. Mar. Pollut. Bull. 2009, 58, 80–84.

(30) Löder, M. G.; Gerdts, G. Marine anthropogenic litter ; Springer, 2015; pp 201–227.

(17) Cole, M.; Webb, H.; Lindeque, P. K.; Fileman, E. S.; Halsband, C.; Galloway, T. S. Sci. Rep. 2014, 4 .

(31) Filella, M. Environ. Chem. 2015, 12, 527– 538. (32) Johnson, C.; Chavka, N. Structural composites: design and processing technologies: proceedings of the sixth annual ASM/ESD advanced composites conference, Detroit, Michigan, USA, 8-11 October 1990 ; Materials Park: ASM International, 1990.

(18) Catarino, A. I.; Thompson, R.; Sanderson, W.; Henry, T. B. Environ. Toxicol. Chem. 2016, (19) Ng, K.; Obbard, J. Mar. Pollut. Bull. 2006, 52, 761–767. (20) Browne, M. A.; Galloway, T. S.; Thompson, R. C. Environ. Sci. Technol. 2010, 44, 3404–3409.

(33) Chalmers, J. M.; Edwards, H. G.; Hargreaves, M. D. Infrared and Raman spectroscopy in forensic science; John Wiley & Sons: Chichester, West Sussex, UK, 2012.

(21) Claessens, M.; Cauwenberghe, L. V.; Vandegehuchte, M. B.; Janssen, C. R. Mar. Pollut. Bull. 2013, 70, 227–233.

(34) Renner, G.; T.C. Schmidt,; Schram, J. Characterization and Analysis of Microplastics; Comprehensive Analytical Chemistry; Elsevier, 2017; Vol. 75; pp 67– 115.

(22) Witte, B. D.; Devriese, L.; Bekaert, K.; Hoffman, S.; Vandermeersch, G.; Cooreman, K.; Robbens, J. Mar. Pollut. Bull. 2014, 85, 146–155.

ACS Paragon Plus Environment

13

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 15

(35) Dietrich, W.; Rüdel, C. H.; Neumann, M. J. Magn. Reson. (1969-1992) 1991, 91, 1–11.

(49) PlasticsEurope, Published on the occasion of the special presentation of K 2016 2016,

(36) Rinnan, Å.; van den Berg, F.; Engelsen, S. B. TrAC, Trends Anal Chem 2009, 28, 1201–1222.

(50) Brien, S. Presentation at the World Vinyl Forum 2007 2007, (51) Imhof, H. K.; Ivleva, N. P.; Schmid, J.; Niessner, R.; Laforsch, C. Curr. Biol. 2013, 23, R867 – R868.

(37) Rinnan, Å. Anal. Methods 2014, 6, 7124– 7129.

(52) Kaberi, H.; Tsangaris, C.; Zeri, C.; Mousdis, G.; Papadopoulos, A.; Streftaris, N. Microplastics along the shoreline of a Greek island (Kea isl., Aegean Sea): types and densities in relation to beach orientation, characteristics and proximity to sources. 4th International Conference on Environmental Management, Engineering, Planning and Economics (CEMEPE) and SECOTOX Conference, Mykonos Island, Greece. 2013; pp 197–202.

(38) Savitzky, A.; Golay, M. J. E. Anal. Chem. 1964, 36, 1627–1639. (39) Mariey, L.; Signolle, J.; Amiel, C.; Travert, J. Vib Spectrosc 2001, 26, 151– 159. (40) Chu, X.-L.; Xu, Y.-P.; Tian, S.-B.; Wang, J.; Lu, W.-Z. Chemom. Intell. Lab. Syst. 2011, 107, 44–49. (41) Stancik, A. L.; Brauns, E. B. Vib. Spectrosc. 2008, 47, 66–69.

(53) Dekiff, J. H.; Remy, D.; Klasmeier, J.; Fries, E. Environ. Pollut. 2014, 186, 248 – 256.

(42) Sorensen, D. C. SIAM J. Numer. Anal. 1982, 19, 409–426.

(54) Metz, C. E. Basic principles of ROC analysis. Seminars in nuclear medicine. 1978; pp 283–298.

(43) Diem, M. Modern vibrational spectroscopy and micro-spectroscopy: theory, instrumentation and biomedical applications; John Wiley & Sons, 2015.

(55) Fawcett, T. Pattern Recognit Lett 2006, 27, 861–874.

(44) Andrady, A. L. Mar. Pollut. Bull. 2011, 62, 1596–1605.

(56) Blais, P.; Carlsson, D.; Wiles, D. J. Polym. Sci., Part A-1: Polym. Chem. 1972, 10, 1077–1092.

(45) Lance, G. N.; Williams, W. T. The Computer Journal 1966, 9, 60–64.

(57) Maria, R.; Rode, K.; Brüll, R.; Dorbath, F.; Baudrit, B.; Bastian, M.; Brendlé, E. Polym Degrad Stab 2011, 96, 1901–1910.

(46) Little, R. J.; Rubin, D. B. Statistical analysis with missing data; John Wiley & Sons, 2014. (47) Eirola, E.; Doquire, G.; Verleysen, M.; Lendasse, A. Inf. Sci. 2013, 240, 115 – 128.

(58) ASTM International, Standard Practice for Determination of Structural Features in Polyolefins and Polyolefin Copolymers by Infrared Spectrophotometry; ASTM D5576-00, 2000.

(48) Witten, I.; Frank, E.; Hall, M.; Pal, C. Data Mining: Practical Machine Learning Tools and Techniques; The Morgan Kaufmann Series in Data Management Systems; Elsevier Science, 2016.

(59) Luongo, J. J. Polym. Sci., Part C: Polym. Lett. 1964, 2, 75–79.

ACS Paragon Plus Environment

14

Page 15 of 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Graphical TOC Entry

ACS Paragon Plus Environment

15