a Processing Case Study

Aug 13, 2019 - each species. The protocol builds upon two different well-established techniques, namely. 40. Independent Component Analysis (ICA) and ...
0 downloads 0 Views 1MB Size
Subscriber access provided by Macquarie University

General Research

Analysis of Multicomponent Ionic Mixtures using Blind Source Separation - a Processing Case Study Giovanni Maria Maggioni, Stefani Kocevska, Martha A. Grover, and Ronald W. Rousseau Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.9b03214 • Publication Date (Web): 27 Aug 2019 Downloaded from pubs.acs.org on August 30, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Analysis of Multicomponent Ionic Mixtures using Blind Source Separation - a Processing Case Study Giovanni Maria Maggioni, Stefani Kocevska, Martha A. Grover,∗ and Ronald W. Rousseau∗ Georgia Institute of Technology E-mail: [email protected]; [email protected]

1

August 13, 2019

2

Abstract

3

Management and remediation of complex nuclear waste solutions require identifi-

4

cation and quantification of multiple species. Some of the species forming the solution

5

are unknown and they can be different from vessel to vessel, thus limiting the utility of

6

standard calibration approaches. To cope with such limited information, we propose

7

a procedure based on blind source separation (BSS) techniques, in particular indepen-

8

dent component analysis and multivariate curve resolution, with a one-point calibration

9

library. Here we show the applicability and reliability of our procedure for on-line mea-

10

surements of aqueous ionic solutions by proposing an automatic procedure to identify

11

the number of species in the mixture, estimate the spectra of the pure species, and label

12

the spectra with respect to a library of reference components. We test our procedure

13

against simulated and experimental data for mixtures with six species (water plus five

14

sodium salts) for the case of Raman and ATR-FTIR spectroscopy.

1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

15

1

Introduction

16

The low-level radioactive waste at the Hanford Site in Washington State (USA) is to be

17

vitrified to achieve long-term, safe, and environmentally sustainable storage. The process

18

selected to achieve this aim significantly reduces the total volume of waste by separating

19

most of the water contained in the waste from dissolved species. The process comprises

20

several unit operations and is expected to run continuously for about fifty years (from its

21

scheduled start in 2023) to complete treatment of the whole mass of waste. 1,2

22

Process safety, efficiency, and stability require that operating conditions remain within a

23

relatively narrow range of values. The key variables are the temperature, the identity of the

24

species present in the feed, and their relative concentrations.

25

Spectroscopic techniques, such as Infrared (IR) in the form of Attenuated Total Reflection-

26

Fourier Transform IR, ATR-FTIR, and Raman, are commonly used to analyze and monitor

27

the composition of solutions and slurries. The standard approach to obtain quantitative in-

28

formation from these techniques usually relies on time-consuming calibration procedures, 3–6

29

which also need carefully designed sets of experiments with known species and concentrations

30

to estimate model parameters. Additionally, if the species present in the mixture change,

31

a new calibration typically becomes necessary, which may halt or delay processing. In the

32

case of nuclear-waste treatment, this clearly is an undesired event since the process aims at

33

running continuously for several decades. 1

34

The waste at Hanford originated from various processes and treatments. 1,2 Due to the

35

history of the tank-waste farm, the waste is not homogeneous: each tank may contain

36

different species and would require its own calibration for analysis. Therefore, in the present

37

work we have developed a protocol (1) to identify the spectra of pure major species and

38

(2) to compute a reliable estimate of their relative concentrations based on Blind Source

39

Separation (BSS) techniques and using a library that stores a single reference spectrum for

40

each species. The protocol builds upon two different well-established techniques, namely

41

Independent Component Analysis (ICA) and Multivariate Curve Resolution - Alternating 2

ACS Paragon Plus Environment

Page 2 of 40

Page 3 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

42

Least Squares (MCR-ALS). In particular, we have investigated the application of our protocol

43

to Raman and ATR-FTIR spectroscopy, using a simulant of the actual low-level radioactive

44

waste.

45

The paper is organized as follows. In Section 2, we review the basic principles of Raman

46

and IR spectroscopy, the structure of the algorithms used, and the main data pre-processing

47

techniques. In Section 3, we examine the results obtained with multi-component mixtures.

48

First, we briefly discuss the details of the simulant mixture used in this study; second, we test

49

our procedure against synthetic, simulated data (Section 3.2); third, we test the procedure

50

on actual measurements (Section 3.3). Finally, in Section 4, we summarize our findings.

51

2

Modelling

In this contribution, we consider only two spectroscopic techniques, namely ATR-FTIR and Raman: the mathematical treatment developed in this section applies equally to both techniques. We assume that the intensity of the measured spectroscopic signal is linearly proportional to the concentration (Beer-Lambert Law). Additionally, we assume that the total intensity, at any wavenumber, is given by the linear superposition of the intensities of the individual species. Mathematically, these relationships can be written as a linear system:

X = CL

(1)

52

where X ∈ RnN ×nL is the matrix of measured spectra, C ∈ RnN ×nK the matrix of concentra-

53

tions, and L ∈ RnK ×nL the matrix containing the spectra of the pure species. Note that nK

54

is the number of species, nN the number of measurements, and nL the number of sampled

55

point in the wavenumber space. Standard calibration approaches rely on various forms of

56

supervised learning, such as Partial Least Squares (PLS), Principal Component Analysis

57

(PCA), or Support Vector Machines (SVM). 5–10 In the context of nuclear waste processing,

58

extensive investigations on the use and reliability of calibrations technique, PLS in particular, 3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

59

have been conducted by Bryan and co-workers at the Pacific Northwest National Labora-

60

tory. 8–10 Bryan et al. investigated several anionic systems at various pH and temperature

61

conditions, including some non-linear modifications to account for cases at particularly high

62

concentrations, where the Beer-Lambert law was found to be inaccurate. PLS and related

63

methods can determine with high accuracy and robustness the concentration of samples

64

within the range of training values. However, the main disadvantage of this approach is

65

that a significant amount of prior information is required in order to design an appropriate

66

calibration set. During the calibration phase, PLS techniques requires both X and C (or

67

even its extended version, containing information about the temperature and the pH as well)

68

to be known. Thus, the creation of a robust and accurate PLS model for a multi-component

69

system may require preparation and collection of tens or hundreds of different samples to

70

produce. In general, accurate and robust predictions at the price of lengthy calibrations is

71

typical of supervised learning approaches, not only of PLS. For example, the SVM approach

72

investigated by Griffin et al. 6 required appropriate training with tens of samples taken at

73

different conditions. Finally, PLS does not allow direct inference of the spectra of the pure

74

species from the data, since the technique is designed to exploit so-called latent variables

75

that best explain the variance between the input and output data.

76

2.1

77

As described earlier, standard calibration approaches may not be feasible during the oper-

78

ations involved in the treatment of nuclear waste, since only limited information may be

79

available and/or model recalibration could be too lengthy. In that case, one must extract

80

from the data themselves the number of species present in the system, their identity, and

81

their concentration without (or with minimal) prior knowledge of the system itself. Because

82

of such blindness, the methods developed to meet these conditions are known in the field of

83

signal analysis as Blind Sources Separation techniques. Two among the several approaches

84

available have gained popularity in analytical spectroscopy: Independent Component Anal-

Blind Source Separation

4

ACS Paragon Plus Environment

Page 4 of 40

Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

85

ysis (ICA) 11 and Multivariate Curve Resolution, particularly its Alternating Least Squares

86

variant (MCR-ALS). 12 We shall now briefly review their essential features.

87

2.1.1

Independent Component Analysis

ICA is based on the assumption that a signal can be decomposed into a linear combination of statistically independent, non-Gaussian components (or sources) that correspond to the spectra of the pure species. ICA aims to find an approximate solution to Eq. (1) by identifying two matrices A and S such that:

X = AS

(2)

with the matrices A and S related to the original spectra and concentration matrices:

A ←→ C

(3)

S ←→ L

(4)

88

where A ∈ RnN ׈nK is the mixing matrix ; S ∈ Rnˆ K ×nL is the sources matrix ; n ˆ K is the

89

estimated number of species, obtained from the analysis of the data and used instead of nK ,

90

which is unknown. However, because ICA inherently suffers from permutation, rotation, and

91

scaling ambiguity, 11 Eqs. (3) and (4) are equivalences, not identities. In fact, the related

92

matrices are the same up to a scaling and a permutation of their columns (A) or rows (S).

93

Consequently, the actual spectra of pure species and the independent components computed

94

by ICA have the same shape and upon normalization the spectra and the independent

95

components should overlap when the algorithm converges to the correct solution. From a

96

logical perspective, ICA can be broken down to two main steps: first, de-correlate the data

97

(a process usually called whitening) and reduce their dimensionality (to find n ˆ K ); second,

98

rotate the data in the reduced space to find the independent components. 11,13

99

There are several alternative algorithms to compute the whitening matrix W and the 5

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

100

rotation matrix R, divided into two main classes: those minimizing the mutual dependence

101

(entropy maximization, mutual information minimization) of the sources and those maxi-

102

mizing their non-Gaussianity (kurtosis, higher order cumulants). A popular and efficient

103

algorithm, based on maximization of non-Gaussianity, is FastICA, developed by Hyvarinen

104

and co-workers 14 ; another algorithm is MILCA, based on minimization of mutual informa-

105

tion, developed by St¨ogbauer and co-workers. 15 The details of the two algorithms can be

106

found in the dedicated literature. 11,15,16 Note that the number of independent components,

107

which ICA algorithms use to perform the analysis, is a decision variable provided by the

108

user.

109

The use of ICA for analytical spectroscopy was first proposed in 2001 by Chen et al. 17 in

110

the context of Near-IR, to study a ternary system (starch-protein-water). More recently, the

111

technique has also been applied to NMR, IR, UV, Raman, and Fluorescence, 18–20 in partic-

112

ular investigating the possibility of using a one-point calibration, as shown by Monakhova

113

et al. in UV and IR. 21 Nevertheless, these studies have been mainly limited to three- or

114

four-component systems of organic substances and have focused on analytical, rather than

115

processing applications.

116

2.1.2

Multivariate Curve Resolution

Multivariate Curve Resolution - Alternate Least Square (MCR-ALS) is a well-established chemometric technique 12,22–25 specifically aimed at retrieving the mixing and the source matrices. From a mathematical perspective, MCR-ALS solves the same problem as ICA, but it does so without relying on the independence of the sources. In fact, MCR-ALS seeks solution matrices A and S, by solving alternating least-squares problems such that:

min kX − ASk A,S

6

ACS Paragon Plus Environment

(5)

Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

117

where an initial guess for either A or S must be provided. Note that, similarly to ICA, MCR-

118

ALS also suffers from permutation, rotation, and scaling ambiguity and that the number of

119

species, nK , is a degree of freedom of the algorithm, even though embedded into the initial

120

guess.

121

The MCR-ALS formulation can also take advantage of some physical properties of the

122

spectra, such as non-negativity and mass-balance closure, to constrain the space of solutions.

123

However, when no prior information about the structure of the sought matrices is known,

124

i.e. without providing explicit search directions for the ALS algorithm, MCR-ALS may not

125

converge, or may converge only very slowly to a solution; additionally, in some cases the

126

solution may not be unique. Therefore, the initial guess for either the mixing matrix or

127

the sources matrix is crucial in determining the quality of the decomposition. When several

128

species are present, derivative spectroscopy may be better suited to estimate the spectra

129

of the individual species, 26–28 even though MCR-ALS cannot exploit the non-negativity

130

constraint on the spectra.

131

MCR-ALS has been mainly applied in systems undergoing kinetic reactions, where the

132

concentration of the species and their absorbance/scattering may be unknown, but where

133

their identity was known, or at least their evolution in time was constrained by kinetics. For

134

example, Chen and co-workers have recently adopted this approach to estimate both the

135

kinetic parameters and the unknown absorbance profiles using MCR-ALS. 29–31 However, in

136

the case of interest here, no underlying kinetic reaction constrains the system and no a priori

137

information on which species are in solution is available.

138

2.2

139

We propose a three-step procedure to analyze spectroscopic data sets: the first two steps

140

focus on determining the spectra of pure species, while the third estimates the composition

141

using a one-point calibration. We suggest sequentially exploiting both ICA and MCR-

142

ALS, rather than using them individually; Valdemara et al. 32 proposed a similar approach

Three-step Procedure

7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 40

143

in the context of IR spectroscopy for food processing, although they neither investigated

144

thoroughly the robustness of the method nor explored the feasibility and reliability of one-

145

point calibration.

146

2.2.1

147

In the first step, one must determine the number of species in the system, n ˆ K , since this is

148

the only degree of freedom in BSS algorithms. Several methods have been suggested based

149

on two main approaches: one based on inspection of the eigenvalues of the matrix X; the

150

other based on a trial-and-error procedure. Here, we have adopted a variant of the former,

151

looking at the singular values.

Step One: Determination of n ˆK

To this aim, we perform a singular value decomposition (SVD) of the data matrix, i.e. X = USVT . The diagonal elements of S ∈ RnN ×nL are the singular values, in decreasing magnitude, i.e. diag(S) = [S1 , S2 , ..., SN ] and S1 ≥ S2 ... ≥ SN . It is well-known from linear algebra that the row rank of X equals the number of non-zero singular values: based on this property, linear superposition, and Eq. (1), one sees that rank(X) = n ˆ K , in the absence of noise. In real systems, though, noise is present and the singular values are usually not exactly zero, hence we assume that n ˆ K is equal to the number of relevant singular values; i.e. we look at the so-called effective rank of X. Three main criteria can be used to determine the effective rank. The first one, based on the relationship between the singular values of X and the eigenvalues of its covariance matrix, and it looks at the explained variance 12 and the criterion to determine n ˆK :

n ˆK

PnK Si ≥ α1 s.t. V = Pni=1 N i=1 Si

(6)

where α1 ∈ [0, 1) is a constant sufficiently close to one, e.g. 0.99. The second criterion is based on the distance, measured by a p-norm, between the original set of data and the one

8

ACS Paragon Plus Environment

Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

reconstructed using the first k singular values:

n ˆ K s.t. εp =



X − (USVT )n K p kXkp

≤ α2

(7)

where α2 is a constant and the subscript n ˆ K indicates that only the first n ˆ K singular values of the SVD have been used, p ∈ [0, +∞). The third criterion looks at the rate of variation of the p-norm and can be written as:

n ˆ K s.t. |εp (ˆ nK ) − εp (ˆ nK − 1)| ≤ α3

(8)

152

where α3 should be close to zero. The choice of α1 , α2 , and α3 depends on the level of noise

153

corrupting the data, as we discuss in Section 3. In the following, we use the second and third

154

criteria together.

155

2.2.2

Step Two: Species Identification

After determining n ˆ K , in the second step we remove the blindness about the system. We compute the first (n=1) or second (n=2) derivative of X with Savitzky-Golay differentiation 33,34 and estimate the mixing matrix AI with the ICA algorithm: (n)

SI dX(n) = AI (n) (n) d λ d λ

(9)

Because of the linearity of Eq. (1) and of the derivative operator, one can directly compute the sources matrix, SI , associated with the original data matrix X by the pseudo-inverse A−1 I : SI = A−1 I X

(10)

156

The element-wise square of SI is used as initial guess for MCR-ALS. We use such a matrix

157

for two reasons. First, ICA is not constrained to provide non-negative solutions, 11,15,17 hence

158

some (or all) sources retrieved by ICA may be negative; such negative entries hinder the 9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

159

convergence rate of MCR-ALS, which, if the solution is not unique, can even converge to an

160

incorrect solution. Second, taking the element-wise square usually improves the strength of

161

the signal over the noise. The input to the MCR-ALS algorithm is the actual spectra, X,

162

rather than its derivative, which enforces non-negativity constraints; the output are the final

163

mixing matrix, AM , and its associated source matrix, SM .

164

To proceed with species identification, one must create a library of spectra of pure species.

165

To construct the library, we have measured the spectra of pure analytes in water at a known

166

molar concentration. With this point, we compute the linear relation between the spectrum

167

and the concentration, enforcing that a species not present in the mixture has zero molar

168

concentration. The calibration line obtained in this manner uses a single experimental con-

169

centration (hence one-point calibration) and will be used in Step 3. Note that, to obtain the

170

spectrum at 1 M, we divide the measured spectrum by its associated known concentration,

171

relying on the assumption of the Beer-Lambert law. The robustness of one-point calibration

172

is improved by the fact that we do not use a single value of the spectrum (e.g. the maximum

173

of the peak), but rather the whole spectrum, therefore mitigating minor deviations from the

174

regime of validity of the Beer-Lambert law.

175

Theoretically, the estimated sources and their associated actual spectra are equivalent

176

up to an arbitrary scaling and they should overlap when the separation has been correctly

177

carried out. Given the library, if one envisages the concentration of each pure species as a

178

random variable, then the intensities at the different wavenumbers in its spectral response

179

correspond to the realizations of such a random variable; the same can be thought of for the

180

sources. Therefore, one can compute the correlation coefficients, γ, between each normalized

181

library spectrum and each normalized source spectrum and create a correlation matrix, with

182

as many rows as pure species and as many columns as sources (from MCR-ALS). For each

183

row, the highest value of γ identifies the matching pure species. 18

184

Note that this procedure may not associate each source with an actual species, either

185

because the library does not contain the species or because the identified source is not a real

10

ACS Paragon Plus Environment

Page 10 of 40

Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

186

chemical species. The latter case occurs, for instance, when ICA splits the contribution of one

187

species into two or more sources due to high noise levels, resulting in n ˆ K > nK ; the opposite

188

scenario may occur (i.e. n ˆ K < nK ) due to peak overlap. Other instances where spurious

189

sources can be identified may occur when a change in temperature and/or pH takes place,

190

or when the addition of a new species causes nonlinear interactions with a species already

191

present. In all these cases, the actual spectra of the pure species may undergo changes in

192

intensity and/or drifts, i.e. nonlinear behaviors. However, since the model is forced to be

193

linear, they are interpreted as the appearance of a new independent source.

194

2.2.3

Step Three: Compositions Estimation

We now estimate species compositions. Ideally, BSS methods yield both the source and the mixing matrix, but the inherent ambiguity (see Section 2.1) means that this typically does not typically occur. Most algorithms are constructed in such a way that the mixing matrix A retrieved from either ICA or MCR-ALS does not even retain the relative proportions among the species. This is because in reconstructing the signal the product of the two matrices is important, rather than their individual entries. To alleviate this problem, Chen et al. 17 proposed a calibration step, during which one estimates a matrix B such that C = BA, where C is the concentration matrix and A is the mixing matrix. However, such a calibration can be an effective solution only in a laboratory, off-line framework, but not for on-line process control. The appearance of a new species or a concentration outside the calibration range would require a new calibration campaign. We have adopted a different approach and exploited the pre-constructed library used in Step 2, where spectra have been recorded at 1 M. With the species identified in Step 2, one constructs the matrix L, containing the spectra of the identified species, and solves the inverse problem of Eq. (1):

XL−1 = G ∝ C

11

ACS Paragon Plus Environment

(11)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 40

where, more precisely, each row of G is a multiple of the corresponding row of C, and: Gik χik = Pnˆ K k=1 Gik

∀i = 1, ..., nN

(12)

195

where χik represent the molar fraction of each species k in the measurement i. Note that,

196

because of linearity, any error and uncertainty in X propagates linearly to C and χ , with L−1

197

corresponding to the relative local sensitivity. Note also that G represents an estimate of the

198

concentration matrix, but there are two limitations for its direct use. First, the solution is

199

affected by the noise in X. Second, it assumes that the one-point calibration is valid for each

200

measuring device, independent of the fact that the reference spectra and the measurements

201

may be obtained with different machines, thus neglecting the device-specific bias. By bias,

202

we mean here the dependence of the spectrum on the intensity of the excitation source: since

203

this relationship is a property of the specific instrument used, so are the absolute values of a

204

spectrum as well as the one-point calibration. Let us suppose that all the spectra measured

205

with the same device are affected by the same type and amount of bias. Then the relative

206

intensities, i.e. the ratio between two characteristic peaks of pure species (or between the

207

areas underneath the spectra), are inherent properties of the materials and should not be

208

changed by bias. For these reasons, the estimate of the mole fractions should be more robust

209

than that of the molar concentrations.

210

3

211

3.1

212

Low-level nuclear waste is composed of more than 20 known species. 1 However, most of its

213

mass is comprised of water and a limited number of sodium and potassium salts.

Results Simulant and Experimental Conditions

214

In typical laboratory studies, actual radioactive waste is replaced with non-radioactive

215

simulant mixtures that contain the relevant ions in proportions such that the chemical and

12

ACS Paragon Plus Environment

Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

216

physical properties are very similar to those of the actual material (see for example Nassif et

217

al. 35 ). In this work, we have chosen to study a simulant formed by aqueous solutions of five

218

sodium salts, 36 namely sodium phosphate (Na3 PO4 ), sulfate (Na2 SO4 ), nitrite (NaNO2 ),

219

carbonate (Na2 CO3 ), and nitrate (NaNO3 ). Raman and IR spectra of the anions and water

220

are reported in Figure 1; the sodium ion is neither Raman- nor IR-active. The simultaneous

221

use of IR- and Raman-spectroscopy for on-line in situ monitoring could also offer several

222

advantages. First, since some species have different Raman and IR activities, their combined

223

used allows a larger number of species to be monitored; e.g. PO43 – is weakly Raman-active,

224

but strongly IR-active. Second, they are independent methods providing an effective way to

225

cross-check the results of BSS for species identification and composition estimates. Third,

226

Raman spectroscopy can also detect solid material, thus allowing possible identification of

227

the onset of precipitation, which may be problematic in waste processing.

Figure 1: The Raman (left) and IR (right) spectra of the pure species measured at 1 M.

228

13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 40

229

3.2

Simulated Data

230

3.2.1

Data Generation

231

We tested the three-step procedure on data generated via computer simulations, which surely

232

comply with the hypotheses of linearity and linear superposition. The spectra of such sim-

233

ulated mixtures have been produced using the measured Raman and IR spectra of each of

234

the five sodium salts and of water (see Figure 1) constituting the simulant. For each species

235

in each mixture, the value of concentration is a random number drawn from a Gaussian dis-

236

tribution centered around a mean µ (see Table 1) with variance σ 2 = (κµ)2 , where κ = σ/µ

237

is the coefficient of variation. Each set of synthetic data forms a (nN × nL ) matrix, Xc .

238

To study the effect of inherent sample variability on the decomposition performance, we

239

simulated two types of mixtures, one with κ1 = 0.10 and another with κ2 = 0.01. We also

240

performed simulations with κ > 0.10, namely 0.25, 0.50 and 0.70, which are representative

241

of the values often used during calibration. The results (reported in Section SI-1.4 of the

242

Supplementary Information) did not differ qualitatively from those reported for κ1 , indicat-

243

ing that once the dispersion of the data is sufficiently large (or conversely the information

244

sufficiently high) the algorithm’s performance does not improve.

245

Each set of simulated data consisted of 15 mixtures (i.e. nN = 15) and here we consider

246

the simulations of Raman spectra. Figure 2 illustrates typical examples of data with the

247

chosen values of κ; the insets in each plot show a magnification of one characteristic peak of

248

NO2 – to illustrate how the data sets change with κ.

249

250

3.2.2

Noise

Actual data are affected by noise, which we assumed to be additive to Xc :

X = Xc + η

14

ACS Paragon Plus Environment

(13)

Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 1: The mean value, µ, and the coefficient of variations, κ used for the creation of simulated data. value/ species

µ [Mol L-1 ]

κ1

κ2

PO43 – SO42 – NO2 – CO32 – NO3 – H2 O

0.6 0.6 1.85 1.25 1.85 55

0.10 0.10 0.10 0.10 0.10 0.10

0.01 0.01 0.01 0.01 0.01 0.01

Figure 2: Examples of typical sets of simulated Raman spectra based on the references in Figure 1. Each set contains 15 random mixtures: the one on the left with κ = 0.10, while that on the right with κ = 0.01. Na+ is their common counter-ion. The insets show a magnification of a nitrite peak to illustrate the spectra variability for different values of κ. where η ∈ RnN ×nL is the noise matrix and X the input matrix for the three-step procedure. The noise, acting at each wavelength, is Gaussian and white, i.e. generated from a multivariate Gaussian distribution G(0, ση2 I). The noise covariance matrix, in which I is the identity matrix, is controlled by the constant ση2 , i.e. the variance of noise; implicit in this formulation is the assumption that the noise is a stationary property of the measurement device and system of interest, so that the average noise intensity does not change over time and for different samples. Furthermore, the noise intensity does not depend on the wavenumber. 15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 40

We validated our assumptions on the noise against ad hoc experimental data for our system. The importance of noise with respect to the signal of interest is usually measured by the signal-to-noise ratio, SN R. We will use two different types of signal-to-noise ratios in the discussion of our results. The first is a local signal-to-noise ratio, SN Rl , which measures the ratio of the intensity at a specific wavenumber λl to the average noise:

SN Rl =

Xil2 ση2

(14)

The second one is the average signal-to-noise ratio, SN R, defined as: nL nL 1 X hSN Rl i 1 X SN Rl = 2 SN Rl = SN R = nL l=1 ση nL l=1 ση2

(15)

251

where hSN Rl i indicates the arithmetic average of the local signal-to-noise ratio. By specify-

252

ing the value of SN R, one can compute for each set of simulated data the noise covariance

253

and generate an appropriate noise matrix η . Since SN R can span several orders of magni-

254

tude, we report its value (and similarly for SN Rl ) in decibels (dB), where a decibel of SN R

255

is defined as 10 log10 (SN R).

256

3.2.3

257

Our investigations focus on the performance of the three-step procedure under progressively

258

noisier conditions. It is well-known (and rather intuitive) that noise deteriorates the perfor-

259

mance of ICA and MCR-ALS algorithms, 11,15 and therefore it is important to determine the

260

level of noise above which the results of Blind Source Separation are no longer reliable. Con-

261

cerning the algorithms, we have chosen FastICA, developed by Hyvarinen and co-workers 11

262

and known to be efficient and robust, as the ICA algorithm, and pyMCR, developed by Camp

263

and freely down-loadable from the Pypi project website (https://pypi.org/project/pyMCR),

264

for MCR-ALS. The noisy data are first centered and scaled via Pareto scaling, 37 then the

265

Savitzky-Golay filter is used to compute the spectra derivative from the simulated data (to

Analysis of Simulations

16

ACS Paragon Plus Environment

Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

266

which noise was added). Note that the Savitzky-Golay filter not only yields an estimate

267

of the derivative, but also mitigates the effect of noise thanks to its smoothing properties.

268

The smoothing effect increases for windows of increasing size, i.e. for larger values of the

269

number of points Sw used by the algorithm. Unfortunately, values of Sw that are too large

270

lead to a distortion of the signal shape by broadening the width of peaks, reducing their

271

intensity, and also causing the maxima to drift 34,38,39 : an optimal trade-off exists between

272

noise removal and signal distortion. Additionally, when peaks are partially or totally over-

273

lapping, using windows that are too broad hinders a complete peak resolution. Note that

274

the sources obtained from ICA are corrected to account for the scaling and centering step,

275

prior to applying MCR-ALS. We have investigated a broad range of SN R values, from 110

276

dB (practically a noise-free system) to 20 dB (a very noisy one), and of Sw values, from 3

277

(the minimum value to compute a second derivative of degree two) to 39 (roughly the width

278

of the Raman nitrate peak). For each set of parameters, we have generated 100 simulations.

279

As discussed in Section 2.2, first we need to determine the number of species, n ˆ K , which

280

we do by looking at the singular values of X for the system with κ = 0.10. We applied

281

Criterion 1 in Eq. (6) and plotted on the left side of Figure 3 the logarithms of singular

282

values, Sk , and, on the right side, the associated fraction of explained variance, V, as functions

283

of the singular values, k. The color shades from black to red indicate that SN R decreases

284

from 110 db to 20 dB, while the dashed vertical lines visualize the condition k = 6, i.e.

285

the actual number of species in the system. If we set α1 = 0.99, the algorithm determines

286

that n ˆ K = 6 for SN R ∈ [50, 110] dB, while for SN R < 50 dB, n ˆ K > 6. By inspecting

287

directly the singular values associated with SN R ≥ 50 dB, one sees that for k > 6 they

288

are almost constant and much smaller than those for k ≤ 6, thus suggesting that they are

289

describing the (low level of) noise. On the contrary, when the noise becomes more important

290

(SN R < 50 dB), overshadowing the information of interest, the number of singular values

291

necessary to describe the system increases. Since the noise is random and independent from

292

one measurement to another, the effective rank of X increases.

17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: On the left, one sees the base-10 logarithm of singular values, Sk , for a typical matrix X, while on the right the explained variance, V; both Sk and V are given as functions of k. The color from black to red indicates an increasingly noisy system, with SN R decreasing from 110 db to 20 dB, while the dashed vertical lines visualize the condition k = 6, i.e. the actual number of species in the system. 293

294

The use of Criterion 2 in Eq. (7), shown in Figure 4 (right) with α2 = 10−3 , leads to

295

conclusions qualitatively similar to those from Criterion 1. Figure 4 (left) shows the difference

296

∆ε1 between two consecutive norms (i.e. Criterion 3 in Eq. (8)) for p = 1 and α3 = 10−4 ;

297

the color shades from black to blue indicate a decrement of SN R. For high values of SN R, it

298

is apparent that setting n ˆ K > 6 improves only marginally the reconstruction of the original

299

data matrix (∆ε1 almost zero). Vice versa, for the values SN R < 35 dB, consistently with

300

the results provided by Criterion 1 and 2, the noise covers the actual signal and the number

301

of singular values to be used for correctly reproducing the original data increases. Therefore,

302

based on these criteria, we have chosen to set n ˆ K = 6. Note that at high SNR levels,

303

Criterion 3 provides the best guidance for the selection of n ˆ K , while Criterion 1 and 2 can

304

lead to erroneous selection of n ˆK . 18

ACS Paragon Plus Environment

Page 18 of 40

Page 19 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 4: On the left, the value of the ε1 , measured by the L1 -norm, as a function of the SN R, for an increasing number of singular values (from black, k = 1, to blue, k = 15). The horizontal dashed black line indicates the threshold α2 . On the right, The rate of change of the reconstruction error, ∆ε1 , measured by the L1 -norm relative to the L1 -norm of the original data set, as a function of the number k of singular values used.

305

306

After determining n ˆ K , we can run the BSS algorithm and proceed towards spectra identifi-

307

cation. We set n ˆ K = 6 for all levels of noise and allow the system to retrieve the spectra of

308

all species. Recognize, though, that the algorithm selects n ˆ K > 6 for high levels of noise to

309

satisfy both Criterion 2 and 3.

310

We first inspect the identifiability of pure species, using the correlation coefficient, γ,

311

between the reference spectrum and the spectrum produced by the algorithm. In the case

312

of phosphate, nitrite, and water, the values of γ > 0.90 extend from the noiseless region

313

(105 dB) up to about 50 dB, for all values of Sw . When SN R decreases below 50 dB, γ

314

rapidly decreases below 0.50, eventually dropping to zero for the lowest values of SN R,

315

where the sources corresponding to these three species incorporate features from the noise.

316

On the contrary, sulfate (which has an inherently stronger signal, but low concentration) and

317

carbonate (weaker than sulfate, but in greater concentration) are retrieved overall better,

318

with values of γ decreasing, but never reaching zero, only for SN R below 30 dB. Finally, 19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

319

nitrate is the only component always retrievable and the values of γ associated with it are

320

above 0.90 in all conditions explored. Contour maps of average gamma in the (SN R, Sw )-

321

plane for all the species in the mixture (namely phosphate, sulfate, nitrite, carbonate, nitrate,

322

and water) are reported in the Supplementary Information (Figure SI-2).

323

The identifiability of different species is easily rationalized from the results in terms

324

of SN Rl , rather than SN R. In the simulant, nitrate and sulfate are very Raman active,

325

while nitrite, carbonate, and water are moderately Raman active, and phosphate is only

326

weakly active; additionally, the (average) concentration of phosphate and sulfate is much

327

lower compared to that of the other species. For these reasons, the peak contribution of

328

phosphate is much smaller than that of either nitrate or sulfate: the variations in the signals

329

of phosphate can be close to the noise level and lost, even though its peak does not overlap

330

with the signals of nitrate and sulfate. A similar analysis holds for water: although its

331

concentration is high, its inherent Raman activity (in the region accessible to our device) is

332

small, so that its peak is much smaller compared to the other species. In spite of the fact

333

that the average SN R seems quite strong (30 dB indicates roughly that average signal is 30

334

times stronger that the noise), it is actually a mean between extremely strong contributions

335

(due to nitrate, and in second order sulfate, carbonate, and nitrite) and weak ones (due to

336

phosphate and water, whose signal is as intense as the noise, i.e. SN Rl ≈ 5 dB even if

337

SN R = 30 dB).

338

It is important to recall that the tolerable level of noise is also determined by the inherent

339

variability of each data set, measured by κ. Intuitively, the variability of the spectra due

340

to actual differences in concentration may be overshadowed by the variability due to noise

341

for the sets of data, in which κ is sufficiently small. To illustrate this issue, we compare in

342

Figure 5 the reference spectra with the sources recovered with our procedure at SN R = 50

343

dB, for κ = 0.10 (left) and κ = 0.01 (right). The reference spectra are reported as dashed

344

black lines, with the estimated sources in dashed color lines. While for κ = 0.10 the match is

345

almost perfect (as expected from the high values of γ), for κ = 0.01 the spectra of phosphate

20

ACS Paragon Plus Environment

Page 20 of 40

Page 21 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

346

and of water are no longer recognized (the best matches between the sources and the actual

347

spectra have γ < 0.10). Even sulfate and nitrite are more affected by noise: spurious peaks

348

belonging to nitrate and sulfate appear in the source associated with nitrite, while a spurious

349

peak from nitrate appears in the sources associated with sulfate. However, in both cases the

350

correlation coefficient with the correct spectra is still larger than 0.80.

Figure 5: Comparison between the reference spectra of the pure species (black solid lines) and their corresponding estimates from ICA/MCR-ALS (dashed colored lines), for simulations at κ = 0.10 and κ = 0.01, in the left and right plots, respectively. The level of noise was set to 50 dB, and the Sw parameter to 11. The effect of noise is clear for the data in the right plot. One sees spurious bumps in the BSS spectra of sulfate and carbonate, due to the incomplete separation between each other and with nitrite. Moreover, the spectra of nitrite and phosphate are no longer identified by comparing the BSS spectra with the reference spectra, hence no dashed line for either species is reported.

351

352

After discussing the performance in terms of species identifiability, we turn to the estimates

353

of compositions, using Eq. (12). We consider κ = 0.10, where all species can be correctly 21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

354

identified, and compare three values of SN R, namely 100, 70, and 50 dB, with a value of

355

Sw set to 11 for all cases. Figure 6 shows the parity plots with the actual compositions used

356

to generate the spectra, χo , along the abscissa and their associated estimates, χm , along

357

the ordinate. The dashed black line in the (χo , χm )-plane represents a perfect estimate, the

358

dashed blue lines an estimate within ±10% error, and the dashed red lines within ±20%

359

error. It is apparent that the estimates are rather good, even in presence of moderate noise

360

(50 dB), when all species are identified.

361

We have also looked at the influence of a set’s inherent variability by simulating the

362

case with κ = 0.01. The results are quite similar to those shown in Figure 6: the figure

363

for κ = 0.01 can be found in the Supplementary Information (Figure SI-6). The main

364

qualitative difference for κ = 0.01 is observed for the highest level of noise (SN R = 50 dB),

365

where PO43 – and NO2 – are no longer correctly identified: consequently, the fractions of

366

the remaining species deviate from their actual values.

367

22

ACS Paragon Plus Environment

Page 22 of 40

Page 23 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 6: Estimation of the composition for a set of 15 mixtures, with κ = 0.10, with SN R = 100, 70, 50 dB, from left to right; the values have been computed enforcing the spectra nonnegativity and choosing a Savitsky-Golay window of 11 for all values of SN R. The black, blue, and red dashed lines in each plot indicate a perfect match, the ±10% boundaries, and the ±20% boundaries, respectively. H2 O, NO3 – , NO2 – , CO32 – , SO42 – , PO43 – are reported as red, black, light blue, violet, orange, and green symbols, respectively. 368

3.3

Experimental Data

369

We tested the performance of the three-step procedure with the simulant solutions made of

370

the five sodium salts used to obtain the simulated data, namely sodium phosphate (Na3 PO4 ),

371

sulfate (Na2 SO4 ), nitrite (NaNO2 ), carbonate (Na2 CO3 ), and nitrate (NaNO3 ), plus water,

372

i.e. nK = 6. Raman and IR spectra of simulant solutions at different concentrations were

373

obtained at a constant temperature T = 298 K and were collected simultaneously, thus pro-

374

viding complementary, but independent information. Pre-processing consisted of removing

375

the effect of cosmic rays (de-spiking) for the Raman spectra, followed by baseline correction

376

for both Raman and IR data; the pre-processed Raman (left) and IR (right) data sets are

377

reported in Figure 7; the data fed to the ICA algorithm were also pre-processed with Pareto

378

scaling. Further details about pre-processing are reported in the Supplementary Information. 23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

379

The three-step procedure was carried out separately on Raman and IR measurements.

380

All Raman spectra were collected using a MarqMetrix All-in-one Raman System, with a

381

785 nm laser, at 300 mW laser power, and 30 sec integration time. ATR-FTIR spectra were

382

collected using a ReactIR 10 ATR-FTIR technology from Mettler Toledo. The experimental

383

measurements were conducted in a 250-mL vessel and stirred at 400 rpm to ensure well-

384

mixed conditions. The set of data used to run the algorithm comprises 18 samples, of which

385

14 had different compositions and 4 were pure water; the values of the mole fractions are

386

reported in the Supplementary Information (Table SI-1). Note that the values of κ for the

387

experimental data (excluding the measurements with only water) vary between 0.6 and 0.8.

Figure 7: The pre-processed spectra used in the three-step procedure from Raman (left) and IR (right) measurements.

388

389

3.3.1

Species Identification and Composition Estimation

390

We applied the three-step procedure, similarly to its applications with the simulated data.

391

Inspection of the singular values, using either Criterion 2 or 3 of Section 2.2.1, suggests

392

setting n ˆ K = 6 for both Raman and IR spectra (see also Figure 8, where the vertical dashed 24

ACS Paragon Plus Environment

Page 24 of 40

Page 25 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

393

lines indicate the condition k = 6); values of n ˆ K > 6 do not significantly improve the

394

reconstruction of the original spectra.

Figure 8: The error norm ε1 as a function of the number of singular values, k, used to determine n ˆ K , for the set of data measured with Raman (left) and IR (right). The vertical dashed lines correspond to k = 6.

395

396

Analysis of IR spectra associates with each source a different species used to produce the

397

simulant, with γ > 0.95 for every species. However, the analysis of Raman data did not iden-

398

tify any source to associate with phosphate; on the contrary, two other non-identical sources

399

were both identified as belonging to nitrate. To understand this behavior, we examined the

400

SN R, since a major factor hindering identifiability is the noise (see Section 3.2): the SN R

401

for the IR data is about 50 dB, whereas for Raman it is only 30 dB. Moreover, the maximum

402

value of SN Rl corresponding to the peak of phosphate is about 45 dB for the IR data, but

403

only about 5 dB for the Raman data. This indicates that, essentially, the contribution of

404

phosphate to the whole spectrum and its variability are covered by the noise, to the point

405

that they are lost in the Raman measurements. After identifying all species by the cross-

406

check of Raman and IR results, we can use Eqs. (11) and (12) to compute the composition

407

with the spectra from the library. The estimates of the mole fractions, for both Raman and 25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

408

IR data, are reported in Figure 9 in the form of parity plots. The actual numerical values

409

can be found in the Supplementary Information, Tables SI-1, SI-2, and SI-3.

Figure 9: Parity plot showing the estimates of our three-step procedure (vertical axis) against the actual composition of a simulant mixture (horizontal axis); the blue and red dashed lines represent ±10 and ±20 % deviations from the experimental values. The plots along the left column show the data from Raman measurements, while those along the right column those from IR measurements. The number of species identified is set to n ˆ K = 6. The colors for water, phosphate, sulfate, nitrite, carbonate, and nitrate are red, orange, green, light blue, violet, and black, respectively. We have reported the water (upper row) separately from the other components (lower row) for clarity, since most of the mixture (mole-wise) is made of this substance.

410

26

ACS Paragon Plus Environment

Page 26 of 40

Page 27 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

411

3.3.2

Error Analysis

412

Even though the estimates in Figure 9 mainly lay within a ±20% interval (red dashed lines)

413

from the actual concentrations, it is important to verify our explanation for the incorrect

414

identification obtained from Raman measurements alone. With the matrix G identified

415

using Eq. (11), the sampled data set can be reconstructed and compared with the measured

416

set and we can compute the error as the distance between the actual spectra and their

417

reconstructions. The reconstructed and the actual sets match perfectly only under two

418

conditions: first, the library spectra truly represent the spectra of the pure species, and,

419

second, linear superposition and Beer-Lambert law hold.

420

We adopt the simplest possible metric to quantify the distance between the reconstructed

421

data and the measured data, namely the difference between each entry of X and its corre-

422

sponding entry of GL, and analyze the error, labeled as , on the data from both Raman

423

and IR measurements.

424

Overall, the error for the IR spectra (reported in Figure SI-7 in the Supplementary In-

425

formation) is very small, and it follows a pseudo-sinusoidal pattern along the wavenumber

426

coordinate, which is likely due to baseline preprocessing. The error for the Raman measure-

427

ments is reported in Figure 10; on the left side, we show the error when the reconstruction

428

does not account for phosphate (i.e. using only the results of Raman measurements); on

429

the right side, the shown error accounts for the phosphate, whose presence was identified by

430

IR. The insets highlight the region where the phosphate peak is located (about 920 cm-1 ).

431

Note the peaks clearly visible in the left plots disappear on the right; nevertheless, the im-

432

provement is indeed minimal. Only two deviations, one negative and one positive, are very

433

significant. These two deviations are about the same magnitude and the negative deviation

434

corresponds to the peak of reference nitrate (about 1049 cm-1 ), while the positive one to

435

the spurious source identified by MCR-ALS (about 1053 cm-1 ). The correlation coefficient

436

analysis attributes the spurious source to nitrate (γ = 0.89).

437

27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 10: The error between the measured and the reconstructed spectra for Raman measurement, computed using the element-wise difference. On the left, we report the case where phosphate is not used for the reconstruction, using only the results from Raman measurements. On the right, we show the case where phosphate is included, taking into account the results from IR measurements. 438

An accurate analysis of the spectra revealed that the error spikes in the nitrate region are

439

correlated with the presence of nitrite, which induced a drift in the nitrate peak. This drift

440

violates one of the assumptions of the Beer-Lambert law that hold for all other species.

441

Drifts in the peaks are typically associated with complex interactions among ionic species.

442

For example, Sun Qin 40 reported such an effect in carbonate-water solutions, while Ahmed

443

et al. 41 have shown that in aqueous solution the stretch band of the hydroxide anion located

444

at 3400 cm-1 changes in the presence of nitrate, sulfate, and phosphate. Previous works 42–44

445

have shown that the peak of nitrate at 1048 cm-1 exhibits shifts towards higher wavenumbers

446

due to increments of temperature and of nitrate concentration itself. We ruled out effects due

447

to temperature and pH, since both were monitored and did not change in our experiments.

448

We also found that repeating the experiment at an overall lower concentration still showed

449

a shift in the spectrum. A detailed investigation of what causes this peak shift is beyond the

450

scope of this work and will be confronted separately, since this phenomenon may require the

28

ACS Paragon Plus Environment

Page 28 of 40

Page 29 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

451

use of nonlinear BSS to improve the composition estimates.

452

3.3.3

453

One could question the need to utilize BSS techniques when a library providing one-point

454

calibration is available. In fact, using the library, the inverse problem of Eq. (1) could be

455

solved directly with classical least-squares (CLS) or least absolute shrinkage and selection

456

operator (LASSO), thus obtaining the concentration matrix C. However, our procedure

457

is more advantageous than CLS and LASSO because it determines the number of species

458

and the shape of their spectra using only the experimental measurements. It does so at

459

an insignificant additional computational cost. On the contrary, CLS and LASSO could

460

determine the number of relevant components in the system only after estimating the con-

461

centration matrix, i.e. during a post-processing step. Theoretically, all species not present

462

in the mixture yield zero entries in the concentration matrix C, which should thus be sparse.

463

In reality, because of noise, most entries in the concentration matrix estimated by CLS are

464

small, but non-zero and sometimes negative. To select the actual component a criterion to

465

discriminate between entries due to noise and actual low concentrations is required. LASSO

466

partly alleviates the issues of CLS via regularization and yields as sparse a C as possible,

467

thus also performing species determination. Even with LASSO, one must still determine the

468

level of regularization, e.g. by Bayesian inference. In addition to the issues mentioned above,

469

our procedure is superior to CLS and LASSO when one or more species are not included

470

in the library. CLS and LASSO cannot determine the shape of spectra missing from the

471

library: this information has to be inferred by inspecting the residuals in post-processing.

472

BSS techniques are able to estimate all relevant spectra independently of the existence and

473

correctness of the library.

BSS versus Alternative Approaches

474

As a proof of concept for potential of BSS techniques, let us briefly look at the results

475

obtained by re-running our procedure on the experimental Raman data discussed in Section

476

3.3.1, but removing the reference carbonate spectrum from the library. Since the BSS part of

29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

477

the algorithm does not depend on the library, the spectra reconstructed by the BSS procedure

478

are not affected by the missing carbonate, as shown in Figure 11, where the error between

479

the measured and reconstructed spectra are reported. CLS and LASSO do not provide

480

any conclusive information about the shape of the missing spectrum, since the error due

481

to nonlinearities overlaps with the signal associated with carbonate, but BSS still performs

482

very well and its residual error corresponds to noise. Qualitatively similar results can be

483

obtained by removing nitrite or sulfate from the library: the corresponding Figures SI-9 and

484

SI-10 are provided in the Supplementary Information.

Figure 11: On the left column, we have reported the element-wise error between the measured Raman spectra the reconstructed one using BSS, CLA, or LASSO. The BSS residual is basically background noise, whereas CLS and LASSO capture neither the peak drift of nitrate nor the peak of carbonate.

485

486

Additionally, the BSS part of the algorithm provides a clearer insight in how the spectra

487

of the pure species should look, as one can see in Figure 12, where the vertical dashed lines

488

mark the locations of the characteristic peaks of the actual species in the mixture. Note that 30

ACS Paragon Plus Environment

Page 30 of 40

Page 31 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

489

the color code matches the one we used to report the reference spectra in Figure 1. The

490

dashed spectrum S4 , which does not match any real species, represents the spurious compo-

491

nent associated with the nitrate peak drift. Interestingly, the BSS spectrum S3 associated

492

with nitrite also presents a small bump at the location (about 1051 cm-1 ) of the spurious

493

component peak: this results suggests that nitrate peak drift is linked with the presence

494

of nitrite in solution. We also recall that no BSS peak is associated with phosphate (see

495

Section 3.3.1): nevertheless, the BSS spectrum S1 (associated with sulfate) shows a small

496

bump corresponding to the phosphate peak location (about 940 cm-1 ).

Figure 12: The spectra of the independent relevant species recovered by the BSS algorithms after analyzing the experimental data. The dashed lines indicate one characteristic peak for each substance, using the same color code as in Figure 1. The dashed spectrum labeled as S4 represents the spurious component associated with the shift of nitrate peak. Note that the spectrum S3 associated with nitrite exhibits a small peak at the location of the spurious component, suggesting a correlation between the shift and the presence of nitrite.

497

31

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

498

4

Summary and Conclusions

499

Estimating solution composition, including the identity and concentration of solutes, is of

500

great importance in managing and processing low-level radioactive waste at the Hanford

501

site. Here we demonstrate the potential use of IR and Raman spectroscopy to achieve these

502

goals. However, the unique features of radioactive wastes make standard calibration methods

503

difficult to implement and lead us to development of a three-step procedure based on blind

504

source separation (BSS) techniques. The methodology assumes validity of the Beer-Lambert

505

law and of linear superposition of the spectra of pure species.

506

The results, with simulated and experimental data, demonstrate that the proposed proce-

507

dure is efficient and robust in identifying the species and estimating their relative concentra-

508

tions, even in the presence of noise and/or of moderate deviations from linearity. Moreover,

509

BSS techniques are useful in determining the presence of unexpected (thus not in the library)

510

species and facilitate estimation of the spectra of such species, a task not easily achieved

511

with other methods. The estimated BSS spectra can be used to scan larger databases and

512

identify the best candidate species to expand the library. Therefore, in a data-driven and

513

computationally efficient manner, it is possible to gain a deeper insight about the system to

514

be analyzed using only limited initial prior knowledge of the system.

32

ACS Paragon Plus Environment

Page 32 of 40

Page 33 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

515

Industrial & Engineering Chemistry Research

List of Symbols Symbol Meaning X

Spectral intensity of a mixture

C

Molar concentration

a

Absorbance

T

Temperature

Sw

Number of window points of Savitzky-Golay Filter

Sp

Degree of the polynomial of Savitzky-Golay Filter

λ

Wavenumber

χ

Mole fraction

η

Gaussian White Noise

σ2

Variance

µ

Expected Value, Mean

κ

Coefficient of Variation

γ

Correlation coefficient

εp

Relative p-norm between congruent matrices



Element-wise error between congruent matrices

SN R

Signal-to-Noise ratio

SN Rl

Local Signal-to-Noise ratio

X

Matrix of spectral intensities of a set of mixtures

C

Matrix of concentrations of a set of mixtures

L

Matrix of spectra of pure species

A

Mixing matrix of a BBS

S

Source matrix of a BBS

η

Noise matrix

S

Singular Values 33

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Symbol Meaning nN

Number of measurements of a set of mixture

nK

Number of species in a set of mixtures

n ˆK

Estimated number of species in a set of mixtures

nL

Number of sampling points of the discretized spectra

516

Acknowledgments

517

Financial support from the Consortium for Risk Evaluation with Stakeholder Participa-

518

tion (CRESP) is gratefully acknowledged. The authors are also thankful to Michael Stone,

519

Richard Wyrwas, and the Real-Time, in Line Monitoring Program Group at Savannah River

520

National Laboratory for providing the simulant recipe and useful discussions.

34

ACS Paragon Plus Environment

Page 34 of 40

Page 35 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

521

Industrial & Engineering Chemistry Research

References

522

(1) Holt, M. Civilian nuclear waste disposal. Congr. Res. Serv. 2018,

523

(2) Program-wide strategy and better reporting needed to address growing environmental

524

cleanup liability. 2019.

525

(3) Lewiner, F.; Klein, J. P.; Puel, F.; F´evotte, G. On-line ATR FTIR measurement of

526

supersaturation during solution cystallization processes. Calibration and applications

527

on three solute/solvent systems. Chem. Eng. Sci. 2001, 56, 2069–2084.

528

(4) Togkalidou, T.; Fujiwara, M.; Patel, S.; Braatz, R. D. Solute concentration prediction

529

using chemometrics and ATR-FTIR spectroscopy. J. Cryst. Growth 2001, 231, 534–

530

543.

531

(5) Cornel, J.; Lindenberg, C.; Mazzotti, M. Quantitative application of in situ ATR-FTIR

532

and Raman spectroscopy in crystallization processes. Ind. Eng. Chem. Res. 2008, 47,

533

4870–4882.

534

535

(6) Griffin, D. J.; Grover, M. A.; Kawajiri, Y.; Rousseau, R. W. Robust multicomponent IR-to-concentration model regression. Chem. Eng. Sci. 2014, 116, 77–90.

536

(7) Siesler, H. W.; Ozaki, Y.; Kawata, S. Wiley –VCH ; 2002.

537

(8) Bryan, S.; Levitskaia, T.; Schlahta, S. Raman based process monitor for continuous

538

real-time analysis of high level radioactive waste components. HLW, TRU, LLW/ILW,

539

Mix. Hazard. Wastes Environ. Manag. 2008, 1–14.

540

(9) Lumetta, G. J.; Braley, J. C.; Peterson, J. M.; Bryan, S. A.; Levitskaia, T. G. Separating

541

and stabilizing phosphate from high-level radioactive waste: Process development and

542

spectroscopic monitoring. Environ. Sci. Technol. 2012, 46, 6190–6197.

35

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

543

(10) Lines, A. M.; Adami, S. R.; Sinkov, S. I.; Lumetta, G. J.; Bryan, S. A. Multivari-

544

ate Analysis for Quantification of Plutonium(IV) in Nitric Acid Based on Absorption

545

Spectra. Anal. Chem. 2017, 89, 9354–9359.

546

547

548

549

550

551

552

553

(11) Hyv¨arinen, A.; Oja, E. Independent component analysis: Algorithms and applications. Neural Networks 2000, 13, 411–430. (12) Ruckebusch, C., Ed. Data Handling in Science and Technology, 1st ed.; Elsevier: Oxford, 2016. (13) Naik, G. R.; Kumar, D. K. An overview of independent component analysis and its applications. Informatica 2011, 35, 63–81. (14) Hyvarinen, A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Networks 1999, 10, 626–634.

554

(15) St¨ogbauer, H.; Kraskov, A.; Astakhov, S. A.; Grassberger, P. Least-dependent-

555

component analysis based on mutual information. Phys. Rev. E - Stat. Nonlinear, Soft

556

Matter Phys. 2004, 70, 1–17.

557

558

559

560

(16) Kraskov, A.; St¨ogbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E - Stat. Physics, Plasmas, Fluids, Relat. Interdiscip. Top. 2004, 69, 16. (17) Chen, J.; Wang, X. Z. A new approach to near-infrared spectral data analysis using independent component analysis. J. Chem. Inf. Comput. Sci. 2001, 41, 992–1001.

561

(18) Monakhova, Y. B.; Astakhov, S. A.; Kraskov, A.; Mushtakova, S. P. Independent

562

components in spectroscopic analysis of complex mixtures. Chemom. Intell. Lab. Syst.

563

2010, 103, 108–115.

564

(19) Monakhova, Y. B.; Kuballa, T.; Leitz, J.; Lachenmeier, D. W. Determination of diethyl

565

phthalate and polyhexamethylene guanidine in surrogate alcohol from Russia. Int. J.

566

Anal. Chem. 2011, 2011, 1–7. 36

ACS Paragon Plus Environment

Page 36 of 40

Page 37 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

567

(20) Monakhova, Y. B.; Tsikin, A. M.; Mushtakova, S. P. Independent components analysis

568

as an alternative to principal component analysis and discriminant analysis algorithms

569

in the processing of spectrometric data. J. Anal. Chem. 2015, 70, 1055–1061.

570

(21) Monakhova, Y. B.; Mushtakova, S. P. Multicomponent quantitative spectroscopic anal-

571

ysis without reference substances based on ICA modelling. Anal. Bioanal. Chem. 2017,

572

409, 3319–3327.

573

574

575

576

577

578

579

580

(22) Lawton, W. H.; Sylvestre, E. A. Self modeling curve resolution. Technometrics 1971, 13, 617–633. (23) Neymeyr, K.; Sawall, M.; Hess, D. Pure component spectral recovery and constrained matrix factorizations: Concepts and applications. J. Chemom. 2010, 24, 67–74. (24) Tauler, R. Some surprising properties of multivariate curve resolution-alternating least squares (MCR-ALS) algorithms. J. Chemom. 2009, 24, n/a–n/a. (25) De Juan, A.; Jaumot, J.; Tauler, R. Multivariate Curve Resolution (MCR). Solving the mixture analysis problem. Anal. Methods 2014, 6, 4964–4976.

581

(26) O’Haver, T. C.; Fell, A. F.; Smith, G.; Gans, P.; Sneddon, J.; Bezur, L.; Michel, R. G.;

582

Ottaway, J. M.; Miller, J. N.; Ahmad, T. A.; Fell, A. F.; Chadburn, B. P.; Cottrell, C. T.

583

Derivative spectroscopy and its applications in analysis. Anal. Proc. 1982, 19, 22.

584

585

(27) Anderssen, R. S.; Hegland, M. Derivative spectroscopy - an enhanced role for numerical differentiation. J. Integr. Equations Appl. 2010, 22, 355–367.

586

(28) Shao, X.; Cui, X.; Wang, M.; Cai, W. High order derivative to investigate the complexity

587

of the near infrared spectra of aqueous solutions. Spectrochim. Acta - Part A Mol.

588

Biomol. Spectrosc. 2019, 213, 83–89.

589

590

(29) Chen, K. et al. Direct growth of single-crystalline III-V semiconductors on amorphous substrates. Nat. Commun. 2016, 7, 1–6. 37

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

591

592

(30) Chen, W.; Biegler, L. T.; Mu˜ noz, S. G. Kinetic parameter estimation based on spectroscopic data with unknown absorbing species. AIChE J. 2018, 64, 3595–3613.

593

(31) Chen, W.; Biegler, L. T.; Garcia-Munoz, S.; Garc´ıa, S. A unified framework for kinetic

594

parameter estimation based on spectroscopic data w/ or w/o unwanted contributions.

595

Ind. Eng. Chem. Res. 2019, acs.iecr.8b05273.

596

(32) Valderrama, L.; Gon¸calves, R. P.; Mar¸co, P. H.; Rutledge, D. N.; Valderrama, P. In-

597

dependent components analysis as a means to have initial estimates for multivariate

598

curve resolution-alternating least squares. J. Adv. Res. 2016, 7, 795–802.

599

600

601

602

(33) Savitzky, A.; Golay, M. J. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36, 1627–1639. (34) Viv´o-Truyols, G.; Schoenmakers, P. J. Automatic selection of optimal Savitzky-Golay smoothing. Anal. Chem. 2006, 78, 4598–4608.

603

(35) Nassif, L.; Dumont, G.; Alysouri, H.; Rousseau, R. W. Pretreatment of Hanford

604

Medium-Curie Wastes by Fractional Crystallization. Environ. Sci. Technol. 2008, 42,

605

4940–4945.

606

607

(36) Russell, R. L.; Schonewill, P. P.; Burns, C. A. Simulant Development for LAWPS Testing; 2017.

608

(37) van den Berg, R. A.; Hoefsloot, H. C. J.; Westerhuis, J. A.; Smilde, A. K.; van der

609

Werf, M. J. Centering, scaling, and transformations: improving the biological informa-

610

tion content of metabolomics data. BMC Genomics 2006, 7, 142.

611

612

613

614

(38) Ziegler, H. Properties of digital smoothing polynomial (Dispo) filters. Appl. Spectrosc. 1981, 35, 88–92. (39) O’Haver, T. C.; Begley, T. Signal-to-noise ratio in higher order derivative spectrometry. Anal. Chem. 1981, 53, 1876–1878. 38

ACS Paragon Plus Environment

Page 38 of 40

Page 39 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

615

616

Industrial & Engineering Chemistry Research

(40) Sun, Q.; Qin, C. Raman OH stretching band of water as an internal standard to determine carbonate concentrations. Chem. Geol. 2011, 283, 274–278.

617

(41) Ahmed, M.; Namboodiri, V.; Singh, A. K.; Mondal, J. A.; Sarkar, S. K. How ions

618

affect the structure of water: a combined Raman spectroscopy and multivariate curve

619

resolution study. J. Phys. Chem. B 2013, 117, 16479–16485.

620

621

(42) Miller, A. G.; Macklin, J. A. Matrix Effects on the Raman Analytical Lines of Oxyanions. Anal. Chem. 1980, 52, 807–812.

622

(43) Frost, R. L.; James, D. W. Ion–ion–solvent interactions in solution. Part 5.—Influence

623

of added halide, change in temperature and solvent deuteration on ion association

624

in aqueous solutions of nitrate salts. J. Chem. Soc. Faraday Trans. 1 Phys. Chem.

625

Condens. Phases 1982, 78, 3249.

626

(44) Yu, J.-Y.; Zhang, Y.; Tan, S.-H.; Liu, Y.; Zhang, Y.-H. Observation on the Ion As-

627

sociation Equilibria in NaNO 3 Droplets Using Micro-Raman Spectroscopy. J. Phys.

628

Chem. B 2012, 116, 12581–12589.

39

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

629

For Table of Content Use Only

630

Title: Analysis of Multicomponent Ionic Mixtures using Blind Source Separation - a Pro-

631

cessing Case Study

632

633

Authors: Giovanni Maria Maggioni, Stefani Kocevska, Ronald W. Rousseau, and Martha A. Grover

634

Synopsis: We have developed a blind source separation procedure to be applied on low-

635

level nuclear waste processing, to identify the number of species in a aqueous mixture, label

636

them with respect to a reference library, and determined their relative concentrations. We

637

have tested our procedure against simulated and experimental data for a mixture of water

638

plus five sodium salts using both Raman and ATR-FTIR measurements.

639

40

ACS Paragon Plus Environment

Page 40 of 40