Signal Smoothing with PLS Regression - ACS Publications - American

5 days ago - prediction of particular spectral point using the data without its model line. The particular shape of the model peak also depends on the...
0 downloads 9 Views 877KB Size
Subscriber access provided by Kent State University Libraries

Signal smoothing with PLS regression Vitaly V. Panchuk, Valentin G. Semenov, Andrey Legin, and Dmitry Kirsanov Anal. Chem., Just Accepted Manuscript • Publication Date (Web): 05 Apr 2018 Downloaded from http://pubs.acs.org on April 5, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Signal smoothing with PLS regression Vitaly Panchuk 1, 2, 3, Valentin Semenov 1,3, Andrey Legin 1,2, Dmitry Kirsanov1,2,* 1 Institute of Chemistry, St. Petersburg State University, St. Petersburg, Russia 2 Laboratory of artificial sensory systems, ITMO University, St. Petersburg, Russia 3 Institute for Analytical Instrumentation RAS, St. Petersburg, Russia * corresponding author, [email protected]

Abstract Smoothing of instrumental signals is important prerequisite in data processing. Various smoothing methods were suggested through the last decades each having their own benefits and drawbacks. Most of the filtering methods are based on averaging in a certain window (e.g. Savitzky-Golay) or on frequency-domain representation (e.g. Fourier filtering). The present study introduces novel approach to signal filtering based on signal variance through PLS (projections on latent structures) regression. The influence of filtering parameters on the smoothed spectrum is explained and real world examples are shown. Keywords: signal smoothing, PLS regression, filtering TOC graphics

1 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1. Introduction Signal smoothing is one of the most important steps in data preprocessing. The purpose of smoothing is in improvement of the obtained data quality in order to achieve better precision in qualitative and quantitative analysis. The number of smoothing methods is quite large and growing. This is due to the fact that all methods distort to some extent the parameters of signals and the higher is the signal-tonoise improvement the more distorted the smoothed line is. Depending on the analytical task various requirements can be presented to smoothing method. In case of qualitative data analysis it is important to preserve the particular position of the signal and its width, otherwise incorrect identification of analyte may occur. When qualitative analysis is in question the signal intensity and the area under the peak are crucial for accurate quantification of the target substance. Smoothing methods can be based on various principles. The number of methods is using signal averaging in a certain spectral window (group of moving average and median filters [1], Savitzky-Golay [2], etc.). There are methods based on frequency domain representation (Fourier [1] and Wiener filtering [3], wavelet [1], etc.). A group of methods is using signal approximation with appropriate mathematical function (e.g. penalized least squares [4], etc.). Signal smoothing (noise filtering) is widely applied almost for any type of analytical data: in chromatography [5, 6], in molecular spectroscopy [7, 8], in atomic spectroscopy [9, 10], etc. It is noteworthy that smoothing procedures can be applied not only to the raw analytical signals but also as an intermediate step in chemometric modelling, e.g. for smoothing of loadings weights vectors in multivariate regression [11] to improve model performance; for smoothing of regression coefficients [12] for multivariate calibration transfer. In spite of the fact that signal smoothing is successfully used in various analytical domains it should be handled with care when multivariate data processing is in mind, since it may introduce correlated structure and signal distortion leading to the deterioration of model quality [13]. Partial least squares (PLS) regression is one of the most popular tools for multivariate calibration in chemometrics [14]. It is typically used to relate the response of multichannel analytical instruments with certain features of samples – concentration of analytes, integral quality parameters. The mathematics behind the PLS is based on variance structure analysis in the data and as such provides the way for sorting the signals according to their contribution into the total data variance. The analytical signals will typically have higher variance compared to baseline and this feature can be employed for data smoothing. The present study is devoted to the exploration of PLS potential for signal smoothing and addresses several real cases to demonstrate the feasibility of the approach. The paper is organized as follows: in the theoretic section the idea and the mathematics of the method will be explained, than it will be applied for qualitative analysis in Mössbauer spectrometry, and finally the applicability in quantitative analysis in X-ray fluorescence spectrometry will be demonstrated. 2. Theory PLS regression is well documented in the chemometric literature and detailed description can be found elsewhere [14]. This method is based on decomposition of both independent predictors (analytical signals collected from calibration sample set) and target parameters (e.g. concentrations of target 2 ACS Paragon Plus Environment

Page 2 of 13

Page 3 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

analytes in calibration sample set) matrices X and Y correspondingly into a new latent variable (LV) space. These LVs are being drawn in the direction of variance in the data. The B vector (vector of regression coefficients) converts independent predictors into target parameters and it depends on the number of LV. First LVs are connected with the maximal sources of variance (correlated with Y) and the following LVs are associated with the smaller contributions. In case of spectral data typical measurement noise can be considered as a source of minor variance compared to that of meaningful spectral bands. Thus, varying the number of LVs in calculation of B, one can separate spectral noise from the signal. The suggested PLS smoother approach is illustrated in the Fig. 1.

Figure 1. PLS smoother concept. In this figure vector Y is the raw spectrum which has to be filtered. The matrix X is composed from single pure model line signals, each X row contains only one line and the position of this line is changing through the matrix. Thus matrix X is the basis to decompose Y into single lines. The weight of each single line is determined with corresponding regression coefficient from the vector BLV. Here vector BLV is being found through PLS regression procedure. The product of X and BLV yield smoothed spectrum. Several points in the beginning and in the end of the smoothed spectra can be distorted as the first and the last lines in the decomposition basis X contain only the parts of model line shape. BLV (and consequently the smoothed spectrum) depends on the number of LVs and on the shape and the parameters of single model lines employed in X. The choice of particular parameters including LV number depends on the particular preprocessing task. If one is aiming at complete noise suppression than small LV number is recommended. If the purpose is in the exact reconstruction of a peak shape than the higher LV number should be employed. Various validation methods employed for determination of the optimal LV number in PLS models seem not to be capable of providing some useful information here, since the X matrix is composed from individual model lines positioned along the wavelength scale. In this case validation will result in prediction of particular spectral point using the data without its model line. The particular 3 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

shape of the model peak also depends on the task in mind. The more narrow it is the less noise will be removed in the smoothed data. At the same time narrow model lines will provide for unaltered spectral line shapes in the smoothed data. The broader the model line the more the noise suppression is (at the price of spectral line distortion). Initial assessment of model line width can be done as equal to the real width of the spectral line. The amplitude of the model line peak has an influence on the scale of regression coefficients only. 3. Experimental The suggested approach was tested with Mӧssbauer and X-ray fluorescence (XRF) data. In the first case we studied the influence of parameters (number of LV and model line shape) on the resulted smoothed spectrum. In the second case the PLS filter applicability for quantitative calibration model improvement is demonstrated and compared with that of Savitzky-Golay filter. In order to implement the proposed smoothing procedure the special program code was written in C# using standard NIPALS (nonlinear iterative partial least squares algorithm) [14]. 3.1. Mӧssbauer data Mӧssbauer spectra were chosen due to the following considerations: 1) possibility of acquiring the spectra with predefined signal-to-noise ratio (SNR) by varying the spectra accumulation time; 2) known line shape (Lorenzian peak). These issues give an opportunity to estimate possible distortion of line parameters in smoothed spectra. Mӧssbauer spectra were acquired with WissEl (Wissenschaftliche Elektronik GmbH) spectrometer in constant acceleration mode with 57Co (Rh) Mӧssbauer source at room temperature. The spectra were processed through their fitting with the set of single Lorenzian peaks by Levenberg-Marquardt algorithm. As a result the following peak parameters were extracted: amplitude (in relative units), width and position (both in number of channels). The measured samples were: 6 µm α-Fe foil; magnetite (Fe3O4) and iron containing ore. The sample amount does not exceed 10 mg/cm2 in order to avoid spectral shape distortion due to saturation effects. Various spectra accumulation times were employed to get various SNR for the least intensive peak (3, 8, 14, 27, >30). Fig. 2 shows Mӧssbauer spectra of three samples acquired with different accumulation time and thus having different SNR: 3, 8 and >30. The spectra with SNR>30 were employed for estimation of parameters of the least intensive reference line. These were line amplitude, line width and line position. Table 1 shows calculated values of “ideal” parameters.

4 ACS Paragon Plus Environment

Page 4 of 13

Page 5 of 13

Fe3O4

α-Fe

Ore

1,2

1,0 1,00 1,0

S/N = 3

0,9 0,95

Intensity, a.u.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

0,8

1,1 1,0 1,00 1,0

S/N = 8

0,9 0,95

0,9

1,00

1,0

1,00

0,95

S/N > 30

0,9

0,90

0,95

0,85 0

100

200

300

400

0

100

200

300

400

0

100

200

300

400

# channel Figure 2. Mӧssbauer spectra of 6 µm α-Fe foil, magnetite (Fe3O4) and iron containing ore acquired with different SNR. Table 1. The reference parameters of Mӧssbauer spectral line.

Amplitude, A (r.u.)

Width, w (channel)

Position, IS (channel)

α-Fe

0.0236 ± 0.0002

4.776 ± 0.005

185.184 ± 0.002

Fe3O4

0.039 ± 0.0001

5.628 ± 0.004

156.536± 0.003

Ore

0.1338 ± 0.0104

26.843 ± 2.328

230.286 ± 0.834

The α-Fe Mӧssbauer spectrum consists of six freestanding non-overlapped lines (sextet). Fe3O4 spectrum is a superposition of two sextets, and all lines are more or less overlapped. The iron containing ore spectrum is a superposition of three strongly overlapped doublets. The peak width in case of the ore is larger than that for two other cases (Table 1). 3.2. XRF data To demonstrate the potential of the PLS smoothing in quantitative analysis noisy data from 40 lanthanide mixtures obtained with energy-dispersive X-ray fluorescence spectrometer Shimadzu EDX800HS were employed. The mixtures contained six lanthanides: Ce, Pr, Nd, Sm, Eu, Gd. Lanthanide concentrations in mixtures were varied in a range from 10−6 to 10−3 mol/L. The dataset was taken from the study [15]. The lanthanide concentrations were quite low and SNR was rather low. Moreover significant overlap of the individual lanthanide spectra besides cerium was observed. Thus the direct quantification of particular individual lanthanides from these data was cumbersome. In this study PLS

5 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

smoothing was applied to the spectra and after that the concentration of cerium was considered as a target parameter for quantification. Supplementary Data for this report contains MS Excel file with several Mӧssbauer spectra and the whole EDX data set together with their decomposition matrices to encourage the readers to study the filter performance in details. 4. Results and discussions 4.1 Mӧssbauer spectroscopy data The main parameters of PLS smoothing procedure determining the quality of the final spectrum are the number of LV in PLS decomposition and the shape of the basis line (single pure model line in X). In order to illustrate this thesis let us consider Mӧssbauer spectra of α-Fe (SNR=3) smoothed with various number of LV and various parameters of the basis line (Fig.3). The basis line was normalized Lorenzian line, thus the only parameter to tune is the Lorenzian width. When number of LV=1 the amplitudes are much lower than they should be and spectral widths are much broader, while SNR is the highest compared to that with other numbers of LVs. With the increase of the LV number (LV = 2, 5) the distortion of spectral parameters is lower, but the smoothed spectral quality decreases. With LV = 20 the smoothed spectra is almost equivalent to the raw one. This tendency is due to the fact that several first PLS components are connected with the highest variance in the data (normally associated with analytical signal), while higher LVs are connected with minor variance mainly associated with spectral noise. Obviously, first LV cannot accommodate the whole peculiarities of the analytical signal, thus signal shape distortion is observed. The opposite trend can be seen for basis line width: the higher it is, the more distorted is the smoothed signal. The narrowest basis line provides for the smoothed spectrum almost identical to the raw one. Thus each particular task requires careful choice of PLS smoothing parameters. In Mӧssbauer spectroscopy it is important to keep spectral line parameters (amplitude, width and position) unaltered during all data modifications in order to provide for unbiased conclusions on sample composition. The PLS smoothing parameters were optimized to yield minimal distortion of amplitude, width and position values, while keeping the maximal SNR. As a case study Mӧssbauer spectrum of α-Fe with SNR = 8 was considered. Lorenzian and Gaussian basis line shapes were applied for PLS smoothing. The line position was not altered regardless of the filtering parameters, while both amplitude and width depend a lot on these parameters. With a single LV a significant broadening of filtered line width and suppression of line amplitude were observed. These effects are more pronounced when the basis line width is higher and they reach the maximum when basis line is broader than spectral line (in this case the real width is 4.77 - Table 1). Gaussian shape of the basis line provides for lower broadening, than that of Lorenzian. This is due to the fact that Gaussian line is sharper and fits the spectrum better. Taking more LVs into account leads to smaller distortion of line parameters. In case of Gaussian line shape with width of 2.5 and 4, LV=3 already provides for distortion below 10% and SNR twice better than initial. In case of Lorenzian basis the same performance can be observed only at higher number of LVs and small basis line width, while SNR is close to the initial. Table S1 (Supplementary materials) contains the detailed results of this part of the study.

6 ACS Paragon Plus Environment

Page 6 of 13

Page 7 of 13

LV=1

LV=2

Intensity, a.u.

w=1

Intensity, a.u.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

w=2

LV=5 w=5

LV=20

w=20

Channel

Channel

Figure 3. Mӧssbauer spectra of α-Fe (SNR=3) smoothed with various number of LVs (width = 5 channels) and various width of the basis Lorenzian line (LV = 2).

The optimized parameters of the filter (Gaussian width = 4, LV=3) were applied for smoothing of all acquired Mӧssbauer spectra except for iron containing ore (Gaussian width = 20, LV=3). The SNR values for filtered data are given in the Table 2. Table 2. SNR values of Mӧssbauer spectra after PLS smoothing.

Initial SNR

SNR after smoothing α-Fe

Fe O 3

3 8 14 27

5 17 23 52

8 14 25 60

Ore 4

6 19 29 55

Modification of regression coefficients B derived from PLS modelling allows for additional denoising. As an example Fig. S1 (Supplementary) shows the PLS-smoothed spectra of α-Fe where regression coefficients for spectral variables which are not attributed to spectral lines (with values below noise 7 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

value) were all set to zero. Obviously, this type of modification leads to a certain distortion of the peak amplitude and peak width, but the peak position important for qualitative analysis remains unaltered. Regression coefficients themselves can be also used as a smoothed spectrum. While the noise suppression effect is not clearly visible in this case, the spectral resolution can be improved with using more latent variables in the model. This approach was applied for resolution improvement in Fe3O4 and ore spectra. The optimized filter parameters were employed, but the number of LV was set at 6 in order to keep the noise low. Spectral resolution was estimated as the ratio between the distance of two neighboring spectral lines to the line width. In case of Fe3O4 the resolution improvement was above 20% and for the iron containing ore above 40%. Fig. S2 (Supplementary) illustrates this feature for Fe3O4 spectrum.

4.2. EDX spectrometry data As an illustration of PLS smoothing potential in quantitative analysis we addressed the EDX data from lanthanide mixtures where cerium content was determined. For comparison purposes the most widely applied filtration methods were also employed: Savitzky-Golay (SG), Fast Fourier Transform (FFT) and penalized least squares (PenLS). The natural spectral line width for EDX data is about eight channels. In case of PLS filter the following parameters were found to be optimal: Gaussian basis line width = 5 channels and LV = 3. In case of Savitzky-Golay second order polynomial function with two window widths (5 and 11) was tested. FFT filter parameters were “low pass” with cutoff frequency = 8Hz; penalized least squares [4] were used with λ=2. Figure 4 shows the raw and the smoothed spectra for SG and PLS filtration, the latter provides for visually better results. Other filters are shown in Fig. 3S. Figure 5 shows the difference spectra between the filtered and raw data.

8 ACS Paragon Plus Environment

Page 8 of 13

Page 9 of 13

0.04 0.03

1

0.02 0.01

Intensity, a.u.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

0.04 0.03 0.02

2

0.01 0.04 0.03 0.02

3

0.01 4.5

5.0

5.5

6.0

6.5

7.0

7.5

Energy, keV

Figure 4. EDX spectra of lanthanide mixtures: 1 – raw data; 2 –smoothed with Savitzky-Golay (window widths 5); 3 – smoothed with PLS filter.

9 ACS Paragon Plus Environment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

∆I, a.u.

Analytical Chemistry

Page 10 of 13

0.004

0.000

1

-0.004

0.000

2 -0.004

0.000

3 -0.004

0.000

4 -0.004 4.5

5.0

5.5

6.0

6.5

7.0

7.5

Energy, keV

Figure 5. Difference spectra between raw data and: 1 – Savitzky-Golay, window = 5; 2 – FFT filter (“low pass”, cutoff frequency = 8Hz); 3 – PenLS filter, λ=2; 4 – PLS filter. Difference spectra show that SG filter with window = 5 induces line shape distortion and certain spectral structure can be seen. FFT, PenLS and PLS do not distort the spectra significantly, however, in case of

10 ACS Paragon Plus Environment

Page 11 of 13

FFT the periodic structure appears at the ends of the spectra (Fig. S3). The application of PenLS and PLS smoothing yields rather uniform difference spectra with small amplitude. The smoothed and raw spectra were employed for quantitative determination of cerium. The intensity of Lα line of cerium (4.84 keV) was related to the cerium concentration using ordinary least squares regression (OLS) in two concentration ranges separately. The first model covered the whole studied concentration range of cerium (10−6 to 10−3 mol/L). The second model addressed only the low concentration range (10−6 to 1.7×10−4 mol/L) at the detection limit of the instrument. Figure 6 shows the calibration plots for these two ranges derived with SG and PLS.

0.021

PLS SG5

0.0170

Intensity, a.u.

0.020

Intensity, a.u.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

0.019

0.018

0.017

0.0165

0.0160

0.0155

PLS SG5

0.016

0.0150

0.015 0.0

0.2

0.4

0.6

0.8

1.0

0.00

a)

0.04

0.08

0.12

0.16

C(Ce), mmol/L

C(Ce), mmol/L

b)

Figure 6. OLS models for determination of cerium: a) in the whole concentration range; b) in the low concentration range. Table 3. r-Pearson and RSS values for calibration plots obtained with raw and smoothed data

Raw r-Pearson RSS

0.96 6.7×10−6

r-Pearson RSS

0.38 4.3×10−6

SG5 FFT Whole concentration range 0.96 0.98 5.0×10−6 2.8×10−6 Low concentration range 0.25 0.38 −6 3.9×10 2.3×10−6

PenLS

PLS

0.97 3.2×10−6

0.98 3.1×10−6

0.29 2.6×10−6

0.39 2.4×10−6

Calibration lines in case of PLS filter is above those for SG filter due to the intensity suppression in the latter case. Fig S4 shows calibration plots for all studied filters in the whole concentration range. FFT calibration line is the closest one to that produced with the raw data. The largest intensity suppression can be observed for SG filter, while PenLS and PLS yield similar results. However, in case of PenLS certain deviation in the slope of the line can be observed (Fig S4). The general quality of fit was estimated using Pearson correlation coefficient (r-Pearson) and residual sum of squares (RSS). These parameters for the studied filters are given in Table 3. When OLS model was constructed for the whole concentration range all studied filters provided for improvement of calibration model quality. The smallest improvement in RSS was observed for SG filtering, the largest 11 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

one – for FFT. The results of PenLS and PLS were similar. RSS improvement in case of low concentrations followed the same trend. r-Pearson values for low concentrations varied significantly between the filters, however, the correlation itself is quite low since the determined Ce concentrations in this case are near the detection limit of the method. PLS filter yielded RSS value compared with that of FFT and the highest r-Pearson.

Conclusion The procedure for signal smoothing based on PLS regression was suggested. The overall idea behind this type of filtering is in sorting the signals according to the variance they hold. It is shown that PLS smoothing allows for significant signal-to-noise ratio improvement without serious distortion of line parameters (width, position, amplitude). Moreover, the PLS smoothing allows for spectral resolution improvement. The examples from Mӧssbauer spectrometry illustrate the process of filtering parameters selection and show that the suggested procedure can be successfully applied for spectral preprocessing. The features of the PLS smoothing in quantitative analysis were demonstrated with EDX spectrometry data and were compared with other popular smoothers. The basis line employed for PLS decomposition can be of any complexity (e.g. asymmetrical or multiplet), thus the smoother performance can be adapted to the wide variety of real world applications. Acknowledgements This work was partially financially supported by Government of Russian Federation, Grant 074-U01. VP, AL and DK acknowledging partial financial support from St. Petersburg State University project # 12.37.216.2016. References 1. Brereton, R.G. Chemometrics. Data Analysis for the Laboratory and Chemical Plant. John Wiley & Sons, Chichester, England, 2003. 2. Savitzky, A.; Golay, M. J. E., Smoothing and Differentiation of Data by Simplified Least Squares Procedures, Anal. Chem. 1964, 36, 1627-1639. 3. Brown, R. G.; Hwang, P. Y.C. Introduction to Random Signals and Applied Kalman Filtering (3 ed.), John Wiley & Sons, New York, USA, 1996. 4. Eilers, P.H.C., A perfect smoother, Anal. Chem. 2003, 75, 3631-3636. 5. Vivo-Truyols, G., Torres-Lapasio, J.R., van Nederkassel, A.M., Vander Heyden, Y., Massart, D.L., Automatic program for peak detection and deconvolution of multi-overlapped chromatographic signals part I: peak detection, J. Chromatogr A 2005, 1096, 133–145 6. Fu, H.Y., Guo, X.M., Zhang, Y.M., Song J.J., Zheng, Q.X., Liu, P.P., Lu, P., Chen, Q.S., Yu, Y.J., She, Y., AntDAS: Automatic Data Analysis Strategy for UPLC-QTOF-Based Nontargeted Metabolic Profiling Analysis, Anal. Chem. 2017, 89, 11083-11090. 7. Douglas, R.K., Nawar, S., Alamar, M.C., Mouazena, A.M., Coulon, F., Rapid prediction of total petroleum hydrocarbons concentration in contaminated soil using vis-NIR spectroscopy and regression techniques, Sci Total Environ. 2018, 616, 147–155 12 ACS Paragon Plus Environment

Page 12 of 13

Page 13 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

8. Byrne, H. J., Knief, P., Keating, M.E., Bonnier, F., Spectral pre and post processing for infrared and Raman spectroscopy of biological tissues and cells, Chem. Soc. Rev. 2016, 45, 1865-1878 9. Mir-Marqués, A., Garrigues, S., Cervera, M.L., de la Guardia, M., Direct determination of minerals in human diets by infrared spectroscopy and X-ray fluorescence, Microchem. J 2014, 117, 156-163 10. Leani, J.J., Sánchez, H.J., Valentinuzzi, M.C., Pérez, C., Grenón, M.C., Qualitative microanalysis of calcium local structure in tooth layers by means of micro-RRS, J Microsc. 2013, 250, 111-115 11. Rutledge, D., Barros, A., Delgadillo, I., PoLiSh - Smoothed PLS regression, Anal Chim Acta 2001, 446, 279-294 12. Camacho, J., D., Lennox, B., Escabias, M., Valderrama, M., Evaluation of smoothing techniques in the run to run optimization of fed-batch processes with u-PLS, J. Chemom. 2015, 29, 338-348 13. Brown, C.D., Wentzell, P.D., Hazards of digital smoothing filters as a preprocessing tool in multivariate calibration, J. Chemom. 1999, 13, 133–152 14. Wold, S., Sjöström, M., Eriksson, L., PLS-regression: a basic tool of chemometrics, Chemom. Intell. Lab. Syst. 2001, 58, 109-130. 15. Kirsanov, D., Panchuk, V., Goydenko, A., Khaydukova, M., Semenov, V., Legin, A. Improving precision of X-ray fluorescence analysis of lanthanide mixtures using partial least squares regression, Spectrochim. Acta B 2015, 113, 126-131.

13 ACS Paragon Plus Environment