Vibrational Analysis of Lung Tumor Cell Lines: Implementation of an

Assessing the tumor invasiveness is a paramount diagnostic step to improve the patients care. Infrared spectroscopy access the chemical composition of...
0 downloads 7 Views 1MB Size
Subscriber access provided by Northern Illinois University

Article

Vibrational analysis of lung tumor cell lines: implementation of an invasiveness scale based on the cell infrared signatures Vincent Daniel Gaydou, Myriam Polette, Cyril Gobinet, Claire Kileztky, JeanFrançois Angiboust, Michel Manfait, Philippe Birembaut, and Olivier Piot Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.6b00590 • Publication Date (Web): 02 Aug 2016 Downloaded from http://pubs.acs.org on August 5, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1 2

Vibrational analysis of lung tumor cell lines: implementation of an invasiveness scale

3

based on the cell infrared signatures

4 5

Short title: Infrared-based invasiveness scale

6 7 8

Vincent Gaydou1,2, Myriam Polette3,4,5, Cyril Gobinet1,2,5, Claire Kileztky3,4, Jean-François

9

Angiboust1,2, Michel Manfait1,2, Philippe Birembaut3,4, Olivier Piot1,2,5*

10 11

1

12

Champagne-Ardenne, UFR de Pharmacie, 51 rue Cognacq-Jay, 51096 Reims, France.

13

2

14

3

15

Cognacq-Jay, 51092 Reims, France.

16

4

17

Jay, 51092 Reims, France.

18

5

19

Ardenne, 51 rue Cognacq-Jay, 51096 Reims, France.

Equipe MéDIAN - Biophotonique et Technologies pour la Santé Université de Reims

CNRS UMR7369MEDyC, SFR Cap-Santé, 51 rue Cognacq-Jay, 51096 Reims, France. INSERM UMR-S 903, SFR CAP-Santé, University of Reims-Champagne-Ardenne, 45, rue

Biopathology Laboratory, Centre Hospitalier et Universitaire de Reims, 45 Rue Cognacq-

Platform of Cellular and Tissular Imaging (PICT), Université de Reims Champagne-

20 21

*

[email protected]

Page 1 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

22

1 Abstract

23

Assessing the tumor invasiveness is a paramount diagnostic step in order to improve the

24

patients care. Infrared spectroscopy access the chemical composition of samples; and in

25

combination with statistical multivariate processing, presents the capacity to highlight subtle

26

molecular alterations associated to malignancy development. Our investigation demonstrated

27

that infrared signatures of cell lines presenting various invasiveness phenotypes contain

28

discriminant spectral features, which are useful informative signals to implement an objective

29

invasiveness scale. This last development reflects the interest of vibrational approach as a

30

candidate biophotonic label-free technique, usable in routine clinics, to characterize

31

quantitatively tumor aggressiveness. In addition, the methodology can reveal the

32

heterogeneity of cancer cells, opening the way to further researches in cancer science.

33

Page 2 sur 32 ACS Paragon Plus Environment

Page 2 of 32

Page 3 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

34 35

2 Introduction

36 37

Metastatic progression is a multistep process involving basement membrane disruption, tumor

38

cell dispersion from the primarytumor cluster, stromal connective tissue invasion by tumor

39

cells,neo-angiogenesis, intravasation and extravasation and finally colonization of a target

40

organ. The estimation of the invasive properties of tumor cells is of paramount interest for a

41

more precise diagnosis, opening the way to a personalized therapy.The implementation of a

42

methodology dedicated to the assessment of the tumor invasivenesswould help for the

43

prognosis and then may guide the management of patients.In addition, in a retrospective

44

approach,invasiveness scoring on biopsy or surgical specimens could help to classify tissues

45

in order to further investigate the biochemical mechanisms implicated.

46

This need is particularly relevant in bronchial cancers and more precisely for non-small cells

47

lung carcinomas (NSCLC). Indeed, the histological examination of bronchial carcinomas

48

distinguishes two main types. The first type, representing only 15 % of the casescorresponds

49

to small cell carcinoma. This tumor has a worse prognosisand requires rarely a surgical

50

treatment.1 NSCLC are also very aggressive lesions frequently discovered at advanced stages.

51

Their treatment, according to the degree of extension, includes surgery, radiotherapy and/or

52

chemotherapy. In the present study, our investigation concerned these aggressive bronchial

53

NSCLC,particularly the squamous cell carcinoma developed from bronchial epithelial cells.2

54

Presently, imagingdiagnostic techniques such as X-ray, MRI (Magnetic Resonance Imaging),

55

thoracic tomodensitometry or PET (Positon Emission Tomography) do not give access to

56

information on the tumor invasiveness.3 Presently, recovering such information requires

57

complex biological test such as genetic analysis to highlight the associated mutations or

58

immune-histochemical studies, performed on cancer tissues.4 These analyses allow

Page 3 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

59

improvingthe diagnostic reliability by classifying the cancer type as well as its staging and

60

level of invasiveness.Nevertheless, they requiresharp protocols with the use of chemicals.

61

Their sensitivity and reproducibility rely also on the expertise of the operator and the

62

pathologist interpreting the data.5

63 64

In this context, vibrational spectroscopy appears as a candidate label-free technique to classify

65

tumor samples according to their invasiveness level. The combination of infrared(IR)

66

absorption spectroscopy and microscopy for tissue analysis, also called Spectral

67

HistoPathology (SHP) proved well suited for differentiating various histological structures

68

and for identifying pathology.6-9 Presented as a twin technique of IR modality, Raman

69

spectroscopy also proved as an effective tool to diagnostic purposes.10,11 Since SHP is a

70

computer-based digital technique, the procedure of tissue evaluation can be automated in an

71

independent manner of the operator subjectivity.12 The large set of spectroscopic data is

72

treated by means of chemometric and statistical methods which can be standardizedto deliver

73

reproducibleresults, objectivelyinterpretable.13-16

74

In addition, some quite recent studies report that tumor samples of different degrees of

75

invasiveness present specific IR markers. Baker et al.employed IR spectroscopy to investigate

76

prostate cancer epithelial cell lines and to discriminate cell lines in regard to their

77

invasiveness phenotype.17 Also, Sabbatini et al. presented the potential of IR spectroscopy to

78

discriminate tissuesof oral squamous cell carcinoma, corresponding to well, moderately and

79

poorly differentiated tumor cells.18 More recently, Diem et al. demonstrated the state of

80

advancement of SHPin a studydedicated to malignant and benign tumors of the lung

81

carcinoma.19 Actually, major of human carcinoma SHP studies show the discriminant power

82

and the limitations of infrared spectroscopy. The efficiency of this SHP approach relies on the

83

constitution of the sample set that has to reflect the tissue diversity. Also, in these studies, the

Page 4 sur 32 ACS Paragon Plus Environment

Page 4 of 32

Page 5 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

84

discriminant potential of the IR approach was highlighted, but without exploring its ability to

85

quantify the tumor invasiveness. So, the objective of our original work was to develop a

86

methodology based on IR spectroscopy permitting to score the invasiveness phenotype of

87

tumor cells.

88

More precisely, our research concerned broncho-epithelial squamous cell carcinoma as we

89

argued above. Indeed, our aim was to establish an invasiveness scale based on IR signatures

90

of the cell lines (16HBE, BEAS-2B, BZR and BZR-T33) included for presenting different

91

invasive phenotypes. To achieve this objective, specific chemometric algorithms were

92

developed specifically for the IR data collected on these cellular specimens. Firstly, a spectral

93

preprocessing protocol, based on EMSC (Extended Multiplicative Signal Correction), was

94

optimized to producehomogeneous IR data bank bycorrecting mathematically the spectral

95

interferences originating from the paraffin used as embedding material, and by normalizingthe

96

cell signal independentlyof the cyto-block preparation. Secondly, PLS-DA (Partial Least

97

Square - Discriminant Analysis) and PLS (Partial Least Square) were run to establish

98

qualitative (i.e. discriminative) and quantitative models respectively. The PLS approach

99

aimed at determining a multivariate regression curve to associate an “invasiveness score” to

100

each bronco-epithelial cell lines provided.These models were developed and optimized by an

101

image-basedcross validation, in which all pixels/spectra of one image constituted a unique

102

independent sample.

103 104

Page 5 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

105

3 Materials and Methods

106 107

3.1 Cell culture

108

First, the prediction IR model was constructed from 4 different cell lines. Human lung 16HBE

109

14o-, BEAS-2B and BZR cell lines were obtained from the American Type Culture Collection

110

(Rockville, MD, USA). BZR-T33 lung cells were provided by Curtis C. Harris (National

111

Cancer Institute, Bethesda, MD, USA). In a second step of external validation, 2 additional

112

cell lines, CALU3 and NCIH, were included.

113

The 6 cell lines were cultured in DMEM (Gibco, Invitrogen, Carlsbad, CA, USA) containing

114

10% fetal calf serum (FCS) (Gibco). The invasive phenotype of these lung cancercells,

115

assessed by a modified Boyden chamber assay, has been extensivelycharacterized

116

previously.20 Considering the number of invading cells, 16HBE and CALU3 wereconsidered

117

as non-invasive cell lines, BEAS-2B cells display moderate invasive capacities, BZR and

118

BZR-T33 present highly invasive properties. NCIH cells invasiveness is intermediary

119

between BEAS-2B and BZR. For each of these cell lines, several samples were cultured

120

independently. For each of the calibration cell lines (16HBE, BEAS-2B, BZR and BZRt33), 4

121

passages were cultured. For external validation (CALU3 and NCIH1299), 2 passages were

122

cultured. Cell cultures fixed in formalin were centrifuged then embedded in paraffin and 8

123

µm-thick tissue sections were deposited on CaF2 windows suitable for IR measurements.

124

All the cell lines used in these experiments are largely used in the literature and most of them

125

are provided by ATCC. Thus, there is no ethical limitation for their use.

126 127 128

3.2 IR imaging

129

Page 6 sur 32 ACS Paragon Plus Environment

Page 6 of 32

Page 7 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

130

The IR acquisitions were performed with aHyperion imager (Bruker, Germany) equipped

131

with a Focal Plane Array (FPA) detector allowing to image large tissue areas. The FPA used

132

present a matrix of 64x64pixels, 2.7 x 2.7 µm² of size. Each pixel can be considered as an

133

individual IR detector.This device allows collecting a consequent number of spectra, in a

134

limited time,which is required for the statistical processing of the data.

135

Spectra were collected with a 2 cm-1 spectral resolution and a number of accumulations of 32

136

for the cell spectra and 240 for the background signal. The background is recorded on a very

137

clean CaF2 area. Paraffin signal is also collected for each sample on an area surrounding the

138

cell spots. Cell samples are analyzed on area of approximatively 0.15 mm² (around 15000

139

spectra).

140 141

3.3 Preprocessing of infrared data

142 143

To eachcellular sample, it was corresponding 2 spectral images: one image of paraffin and

144

one image of the cell culture. A precise protocol of preprocessing was developed as described

145

in figure 1.This methodology aimsat building homogeneous imaging data bank according to

146

several steps.

147

The first stepcorresponded to a smoothing using the Savitzky-Golay method on a window of 6

148

points withapolynom degree of 1.21

149 150

Then the spectralimages were processed by EMSC (Extended Multiplicative Signal

151

Correction). This correction was first time proposed by Martens and Stark in 1991. A major

152

advantage of EMSC is that prior chemical or physical knowledge can be integrated into the

153

preprocessing model.22 The multivariatesignals were normalized with respect to a reference

Page 7 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 32

154

signal by means of decomposition of original spectrum in signalsaccording to the following

155

equation (1):

156

 = ̂ +  +  +

157

with a (scalar), b and c (both vectors) regression coefficients, ̂ an estimate of the data set (also

158

call “target”), I the matrix of known interferencespectra (interference matrix), P a polynomial

159

modeling background and offsets dueto scatter effects and finallye the model error(also call

160

“residual”). If thereis not a good estimate of ̂ available, one couldchoose the average of the

161

whole data set for ̂ . In our case, ̂ was selected following specific rules as explained later. The

162

,  andcparameters were estimated by minimizing the weighted sum-of-squares of the

163

residual . Thus,

corresponded to EMSC corrected spectrumand was calculated according

164

to equation (2):

165



=  + ̂

(1)



(2)

166 167

In this study, the principal parasite signal was associated to paraffinin the form of an

168

interference matrix as it has already ever been done for IR analysis of paraffin-embedded

169

samples.23 Theinterference matrix was established by considering the first main components

170

of the PCA (Principal Component Analysis) computed on non-centered paraffin

171

spectracollected around cells.

172 173

In our investigation, a double EMSC was carried out: the first one for the correction of each

174

spectral image independently of the other ones, and the second to make these images

175

comparable to the one to the other as illustrated in figure 1. Indeed, first every sample was

176

treated independently in the aim to get high quality spectra with neutralized paraffin

177

interferences. The cell culture spectra computed and qualified were gathered in non-

178

homogeneous multiple imaging data matrix. For each cell image, the paraffin signal was Page 8 sur 32 ACS Paragon Plus Environment

Page 9 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

179

corrected by EMSC pretreatment but the baseline correction and the proportion between

180

paraffin and cell signals were not the same for each cell images.

181

So, in order to transform this non-homogeneous multiple imaging data matrix into a

182

homogeneous one, a second EMSC was used. The interference matrix was constructed with

183

PCA componentscomputed from the set of paraffin spectra of all samples. To perform these

184

EMSC pre-processing, a consequent number of parameters have to be monitored. The set of

185

these parameters are indicated in table 1. Their optimization was obtained from the results of

186

the PLS-DA, described below. One of the main parameters is the “target” that has to be

187

defined as the model spectrum the most representative of the biological sample. The

188

computing of this target will be explained in the result section.

189 190

3.4 Chemometric algorithms

191

PLS

192

PLS-based algorithms were used to construct a prediction model allowing data quantification

193

with the advantage also to highlight the discriminant spectral features.24

194

The PLS algorithm is based on a multivariate regression principle. Particularly, it allows to

195

maximize the covariance between 2 matrix by means of multidimensional and orthonormal

196

regression vectors. The 2 following matrix equations describe the PLS principle, the aim is to

197

solve T with X and Y as known variables.

198

X = T P + R 

(3)

199

Y = T q + f

(4)

200

With X the data matrix (n × m) (spectral data);

201

Y the quantitative references values (n × l) (invasiveness score);

202

T the score matrix (n × k);

203

P the loadings (or regression vectors) matrix (k × m) associates to X prediction fromT ; Page 9 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 32

204

R  the residual associate to X prediction (n × m);

205

q the loading associates to Y(k × l);

206

f the residual associated to Y prediction (n × l);

207

k the number of computed latent variables (corresponding to the PLS model dimension).

208

Where l, m and n are respectively the number of quantitative reference value to predict for

209

each sample (invasiveness value), the number of experimental values for each sample (depend

210

of spectral resolution) and the number of sample (number of spectra).

211 212

The vectorial space established by the regression vectors allows to link spectra (here X)to

213

values, scores orreference quantitative variables (here Y). Thus, the PLS regression permits to

214

quantify data on the basis of quantitative features. It is particularly adapted to wide and

215

covariant data bank. Also this processing can easily thwart by under or over fitting.To prevent

216

this bias, the number of latent variables used for modelmust be well mastered.25

217

The accuracy of the PLS models was assessed by computing the RMSE (Root Mean Square

218

Error) on both calibration and validation sets of data, according the following formula:

219

 = 

∑& !'( ! – #! $ )*+

%

(5)

220

With ,- : reference value of spectra i, ,.- model predicted value of ithspectra and n is the

221

number of used spectra.

222 223

PLS-DA

224

PLS-DA (Partial Least Square - Discriminant Analysis) algorithm was employed to

225

implement qualitative classification models.For every qualitative variable, a binary code (0 or

226

1) is associated. A multivariate regression PLS is then realized between thespectral data

227

matrix and the binary matrix. Then, the PLS-DA model predict multivariate valueslinked to

228

spectra present in data set. The multivariate values obtained are then correlated to Page 10 sur 32 ACS Paragon Plus Environment

Page 11 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

229

corresponding qualitative groups. This classification method is widely used for the

230

exploitation of the vibrational data.26

231

The performance of the PLS-DA models was evaluated from the PGP (percentage of good

232

prediction), on both calibration and validation data sets, calculated as follows:

233

/ = ) ∗ 100

234

With g the number of well predicted spectra.

0

(6)

235 236 237

Cross validation

238

In our study, the estimation of the performances of the prediction models was based on a cross

239

validation method, on the 4 cells lines of the calibration set. More precisely, it was carried out

240

at the level of the spectral images rather that at the pixel scale to avoid overfitting. While the

241

total number of pixels is high, the number of cytoblock sections is quite limited justifying this

242

approach, which we qualify of “partial cross validation”. The principle of cross validation (or

243

also internal prediction) is based on the prediction of a sample (an image in our case)

244

beforehand removed it from the step of modeling. This process is repeated for all the samples.

245

An averaged precision of prediction is then obtained. This average value gives the optimal

246

hope expected from the model.27

247 248

All the computing steps were processed on Matlab R2013a (32 bit) (Mathwork, USA), the

249

PLS algorithm originates from “saisir” toolbox developed by Bertrand and Cordella, INRA,

250

France.28

251

Page 11 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

252

3 Results

253 254

Development and optimization of preprocessing protocols were summarized in figure 1 and

255

table 1. The way to build an objective and homogenous spectral data matrix is complex

256

because it exists a strong variability between each paraffin embedded cell culture. First,cell

257

culture thickness may be variable within each samplesection. This variability is due to

258

material and mechanical property associated to sample preparation. To avoid these variances,

259

the EMSC pretreatment was employed following two steps as indicated in the Material &

260

Methods section.

261 262

During the first EMSC performed at the level of each spectral image, we can notice the

263

influence of the “target” selected as reference as reference spectrum. If the mean spectrumis

264

chosen as target (usual choice), a markedvariability between the targets of the different

265

spectral image is observed. This target variability mainly originates from the numberof

266

pixelsby images who effectivelypresent acellular signal. To avoid this problematic, it was

267

decided to compute targets in function of cell signal for each image. The integrated intensity

268

under the amide I and II bands (1450 –1750cm-1) was computed and sorted (top to down). The

269

spectra thatcorresponded to the second quartile of this ordering were then used to compute the

270

target. For each spectral image, the target was determined by this quartile method, which

271

permitted also to provide quite similar targets representative of the cellular signal within the

272

set of spectral images. First, third and fourth quartiles were also tested but with worst results.

273 274

In addition, the selection of the target strongly influences the EMSC quality test. Computing

275

the target on the second quartile ensures to have a representative cell signal and leads that

276

near 80% of pixels were rejected after the first EMSC. The rejected spectra were deleted due

Page 12 sur 32 ACS Paragon Plus Environment

Page 12 of 32

Page 13 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

277

to the poor signal or/and with a high paraffin contribution. This quality test depends of a and

278

b parameters as indicated in table 1. After the second EMSC, for all spectra of each images,

279

the baseline was homogeneous and the paraffin contribution identical. Figure 2 displays the

280

mean and the minimum-maximum values of all raw spectra of the set of the images, after the

281

first and the second EMSC steps. The proportion between cellular material and paraffin was

282

so variable from one sample to the other that the second EMSC cannot correct entirely this

283

variance. Consequently, to avoid to take into account this signal variability originating from

284

the sample preparation, the spectral ranges were reduced to 920 to 1350 cm-1 and 1500 to

285

1850 cm-1 and a min-max normalization was computed on the 1670cm-1wavenumber

286

corresponding to the amide I band associated to vibrations of the peptide bonds of the protein

287

content.

288 289

Figure 3 presents the results of PLS-DA, performed on the 4 cell lines of the calibration

290

sample set. For this classification model, 3 reference groups were chosen, 16HBE as normal

291

cells, BEAS-2B as moderate invasive cells and BZR and BZR-T33 invasive cells. The BZR

292

and BZR-T33 were gathered together because of the phenotypic likelihood between both cell

293

types.

294

Figure 3.a shows the exploration of the number of latent variablesbased on thepartial cross-

295

validation method used. It displays the evolution of PGP as function of the number of latent

296

variables (until 30) for both calibration and validation steps. The calibration PGP tends to

297

100% when the number of latent variablesincreases, reflecting a convergence of the PLS-DA

298

processing. Examining the evolution of the validation PGP, a number of 10 latent variables

299

was determined as being optimal based on the fact that the PGP maximum, around 90%, is

300

reached for this number; and by preventing under or over fitting. Table 2 indicates the

301

prediction results for the 3 groups of cellular specimens (16HBE, BEAS-2B, and a common

Page 13 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

302

group for BZR and BZR-T33) corresponding to various invasiveness phenotypes. The PGP

303

values are over 90% for the extreme groups of invasiveness while the BEAS-2B cells present

304

a value near to 77% reflecting the intermediary phenotype of this cell line.In order to visualize

305

the discriminant potential of our approach a 3D representation is given (figure 3.b). The latent

306

variables chosen to draw this representation were the first, the 4th and the 7th ones, since these

307

latent variables gave a good visual aspect of the data projection.While the borders of the

308

groups were not totally frank with some spectra superimposed, a distinction between these 3

309

levels of invasiveness can be outlined.

310

The results of PLS-DA processing show the ability to discriminate bronchial SCC of different

311

invasiveness phenotypes from their infrared spectral signatures. Based on these encouraging

312

results, our objective was then to investigate the possibility to order cells quantitatively by

313

constructing a spectral scale of the cell invasiveness. In this purpose, PLS models were then

314

developed.

315 316

While PLS-DA gives qualitative memberships toa restricted number of classes (3 in our

317

study), PLS is a more moderated processing offering the possibility to consider continuous

318

and quantitative scores.29 Thus, PLS was run with the 4 levels of cell invasiveness as

319

reference inputs, by distinguishing the 2 highest invasiveness levels, i.e. BZR and BRZ T33

320

samples.First, various PLS models were tested by changing simultaneously the reference scale

321

of invasiveness and the number of latent variables. The best results were obtained with a scale

322

ranging from 1 to 2.3 as indicated in table 3, and an optimal number of latent variables to 10.

323

Indeed, the number of latent variables was determined by plotting the RMSE for a varying

324

number of variables, similarly to the first step of PLS-DA previously explained (figure 4.a).

325

The calibration and validation RMSE tend to6%and 15% with the increasing of the latent

Page 14 sur 32 ACS Paragon Plus Environment

Page 14 of 32

Page 15 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

326

variablesnumber, respectively. The optimal number, corresponding to the first minimum, is

327

the most appropriate to avoid under and over fitting for the PLS models.

328

The regression results, calculated with these conditions, are given in table 3 with

329

corresponding regression curve depicted in figure 4.b. The average validation RMSE was

330

computed to 16.6%; with 11 and 23.8% as minimal and maximal prediction errors obtained

331

for BZR and BZR-T33 respectively. A significant linear regression was highlighted between

332

the reference and the predicted invasiveness scales, with a correlation coefficient R² equal to

333

0.76 (R² > rα,withrα=0.01 = 0.6226 for n-2=14 as degrees of freedom).30 Interestingly, the

334

reference scale highlightsinvasiveness levels which are not regularly distributed between 1 for

335

16HBE and 2.3 for BZR-T33 cells. Indeed, B2B and BZR cells were positioned at 1.3 and 2.1

336

respectively, reflecting a slight proximity of B2B with 16HBE and a very close invasiveness

337

phenotype of BZR and BZR-T33.

338

In a further step to demonstrate the validity of our approach, 2 additional cell samples

339

considered as an external validation set were projected on the PLS model, previously

340

calibrated with the first 4 cell lines. The prediction results of CALU3 and NCIH1299 cells

341

appear coherent with their invasiveness phenotype, as visible in Table 4.

342

Page 15 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

343

4 Discussion

344 345

Main applications of FT-IR spectrometry in cancer science consist in demonstrating the

346

diagnostic potential of this label-free biophotonic technique. In our study, we tried to exploit

347

further the wealth of the vibrational approach by highlighting and quantifying cell

348

invasiveness phenotype by using distinct human lung cell lines.

349 350

In this work, a double EMSC preprocessing proved its capacity for ensuring quality test and

351

homogenizing the set of data in an automated manner. In paraffin-embedded tissue sections, a

352

one-step EMSC is sufficient,31 on cells samples it was necessary to implement a more

353

sophisticated pre-processing using a specific way to compute the target spectrum and a drastic

354

selection of retained spectra; near 80% of the raw spectra were rejected. Here, the numerous

355

pre-processing parameters were optimized in regards to the classification results, since pre-

356

processing affects strongly the later statistical processing of the data.

357

The advantage of our approach is to give the possibility to work with several spectral images

358

recorded from paraffin embedded cell culture samples. As it is observed on figure 2, the great

359

variability of raw spectra forbids data interpretability. The double EMSC can correct a high

360

part of parasiticvariability due to the paraffin embedding but also to sample preparation.

361

Nevertheless, a rest of variability is still present. This concerns particularly the high

362

wavenumber range with the impossibility to distinguish CH3 and CH2IR absorption signal

363

between paraffin and cell material. To skirt this problem, we were obliged to addin the

364

preprocessing protocol, stepssuch as wavenumber range selection andnormalization on the

365

amide I band. Other advantage of EMSC, is the possibility to perform a quality test to

366

eliminate outlier spectra and retain exploitable cell spectra.

367

Page 16 sur 32 ACS Paragon Plus Environment

Page 16 of 32

Page 17 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

368

A possibility to perform the statistical processing would have been to use cross validation at

369

the image pixel level. In our study, the number of spectra per images was very high, near to

370

ten thousand. This method was not relevant considering the risk of high overfitting.32 To

371

avoid such a methodological bias, we opted for cross-validation at the image level although

372

the number of images was limited. Indeed, a total of 16 samples corresponding to 4 invasive

373

distinct phenotypes was included to develop our approach. In addition, the PLS model of

374

invasiveness scale was further validated by projecting 2 additional human lung cell lines

375

(CALU3 and NCIH1299).

376 377

The retained PLSmodel,lead toa RMSEP error prediction less than 17%, required 10latent

378

variables,translating the fact that the useful spectral information, associated to the tumor

379

invasiveness, is quite subtle and not comprised only in the first latent variables that carry the

380

main covariance between the infrared signals and the reference invasiveness scale.33 The

381

visualization of these latent variables permits to reveal the infrared molecular vibrations

382

implicated in the discrimination process. Figure 5, presenting the 10PLS latent variables,

383

highlights the discriminant wavenumbers. Particularly due to their strong loading coefficients,

384

the following vibrations appear to be involved into the invasiveness scoring.

385

_ 1015 and 1117 cm-1 assigned to C-O stretching (νC-O) vibrations in carbohydrate

386

molecules, _ 1585 cm-1 of the amide II band specific of the protein content,

387

_ 1618 cm-1 attributed to νC=O bonds probably due to β-turn proteins secondary structure,

388

_ 1740 cm-1 attributed to νC=O bonds in lipid compounds (triglycerides or phospholipids).

389 390

The regression PLS model was built by determining the optimal invasiveness scale. We

391

decided to fix the origin at 1, firstly for algorithmic reasons since predicting a null or negative

392

value can generate a strong computing bias and secondly because 16HBE cells present slight

Page 17 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

393

invasiveness characteristics. Then, many arbitral values were tested to quantify cell lines

394

invasiveness.

395

Based on this reference scale, the lowest invasiveness level was over-estimated contrary to the

396

highest level that was underestimated. This particularity emphases the difficulty to precisely

397

predict the samples corresponding to the extreme invasiveness degrees. The reference

398

invasiveness were optimized to limit this effect.

399

As indicated inTable 3, the invasiveness value of the cell lines were not regularly spaced out,

400

nevertheless the corresponding regression curve is linear (Figure 4.b). This observation is

401

closer to the linear nature of the PLS principle and also to the Beer Lambert law lying on a

402

linear relationships between the molecular concentration and the infrared absorbance.

403 404

A marked dispersion of the data was observed. Indeed, as shown on figure 4.b, the repartition

405

of predicted invasiveness for spectral data of a same invasiveness reference score is

406

important. This is also illustrated by the value of the standard deviation in the PLS model

407

(Tables3 and 4), thatclearly illustrates the spectral variability within each cell culture. In our

408

methodological approach, each infrared image was correspondingto a separate culture of cell

409

line. Indeed, the analyzed cell samples obtained at different passages of cell culturemay be a

410

supplementary source of variability. Furthermore, for each culture the cells probed canmay be

411

at various stages of invasivenesswhich mayinfluencethe spectral signal. Thus, our approach

412

presents the capability to assess such variable biological features and could find valuable

413

application in the investigation of tumor heterogeneity.

414

The vibrational signals associated with different invasive properties of the cells may be

415

further applied on human tissue sections of lung cancers, particularly in pre-invasive lesions.

416

Indeed, in these conditions, classical histology cannot detect the potential aggressiveness of

417

such heterogeneous lesions. Thus, this original infrared spectral approach which can be

Page 18 sur 32 ACS Paragon Plus Environment

Page 18 of 32

Page 19 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

418

directly used on formalin fixed paraffin embedded tissue sections, may bring new information

419

on the behavior of these tumor cells.

420

Page 19 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

421

5 Conclusion and perspectives

422 423

This study, led on cell samples of various invasiveness phenotypes, demonstrated the

424

possibility to establish an invasiveness scale on the basis of the cell infrared signature. This

425

quantitative and label-free assessment of the invasiveness relies exclusively on the intrinsic

426

molecular composition of the samples, without requiring any particular preparation and while

427

preserving the cellular integrity. In addition, within a same invasiveness level, a spectral

428

heterogeneity can be highlighted among the cells population. This demonstrates the interest of

429

the vibrational spectroscopic approach to reveal and analyze the tumor heterogeneity for

430

future investigations in cancer science.

431

Further developments will concern the incorporation of feature extraction methods in the data

432

statistical processing. Indeed, algorithms such Sparse PLS or combining a processing by

433

Genetic Algorithm appear as an innovative way to improve the prediction model, with the

434

advantage to identify simultaneously the molecular vibrations involved in the discriminant

435

process.34-36

436

Also, the methodology implemented and the results obtained on cellular samples will be

437

transferred at the tissue scale, in order to take into account the influence of the complex

438

ecosystem on the functionality of the tumor cells.

439 440

Page 20 sur 32 ACS Paragon Plus Environment

Page 20 of 32

Page 21 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

441

Analytical Chemistry

6Acknowledgements

442 443

With financial support from ITMO Cancer AVIESAN (National Alliance for Life Sciences &

444

Health) within the framework of the Cancer Plan.CNRS UMR7369 MEDyC (Matrice

445

Extracellulaire et Dynamique Cellulaire) laboratory and the technological platform PICT

446

“Imagerie Cellulaire et Tissulaire”are gratefully acknowledged for instrumental support.

447

INSERM (SFR CAP-Santé, University of Reims-Champagne-Ardenne) and Pol Bouin

448

laboratory (CHU of Reims) are also gratefully acknowledgedfor sample supply and scientific

449

recommend.

450

Page 21 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

451

7 References

452 453 454 455

(1)

Wissler, M.-P. Bilan de l’analyse du statut mutationnel EGFR de 1000 patients atteints d’adenocarcinomes pulmonaires pris en charge par la plateforme d’oncologie moleculaire du Chu-Cav de Nancy, Thesis, Université de Lorraine, Faculté de Médecine de Nancy, 2012.

456

(2)

Bonastre, E.; Brambilla E.; Sanchez-CespedesM.; J. Pathol.,2016, 238, 606-616.

457 458

(3)

Sato, M.; Sakurada, A.; Sagawa, M.; Minowa, M.; Takahashi, H.; Oyaizu, T.; Okada, Y.; Matsumura, Y.; Tanita, T.; Kondo, T. Lung Cancer, 2001, 32, 247-253.

459 460

(4)

©Recommandations professionnelles Cancer du poumon non à petites cellules, ed.Institut National du cancer, Boulogne-Billancourt, France,2010(www.e-cancer.fr).

461 462 463

(5)

Travis, W.D.; Brambilla, E.; Noguchi, M.; Nicholson, A. G.; Geisinger, K.; Yatabe, Y.; Powell, C. A.; Beer, D.; Riely, G.; Garg, K.; Austin, J. H. M.; Rusch, V. W.; Hirsch, F. R.; Jett, J.; Yang, P.-C.; Gould, M. J. Thorac. Oncol.,2011, 6, 244-85.

464

(6)

Ellis, D.I.; Goodacre, R.; Analyst,2006, 131, 875-885.

465 466

(7)

Travo, A.; Piot, O.; Wolthuis, R.; Gobinet, C.; Manfait, M.; Bara, J.; Forgue-Lafitte, M.-E.; Jeannesson, P. Histopathology, 2010, 56, 921-931.

467 468 469

(8)

Diem, M.; Miljkovic, M.; Bird, B.; Chernenko, T.; Schubert, J.; Marcsisin, E.; Mazur, A.; Kingston, E.; Zuser, E.; Papamarkakis, K.; Laver, N. Spectroscopy,2012, 27, 463496.

470

(9)

Kumar, S.; Shabi, T. S.; Goormaghtigh, E. Plos one,2014, 9, 1-8.

471 472

(10) Depciuch, J.; Kaznowska, E.; Szmuc, K.; Zawlik, I.; Cholewa, M.; Heraud, P.; Cebulski, J. Infrared phys. techn.,2016, 76, 217-226.

473

(11) Austin, L.A.; Osseiran, S.; Evans, C.L.; Analyst,2016, 141, 476-503.

474 475

(12) Lasch, P.; Haensch, W.; Naumannc, D.; Diem, M. BBA, Mol. Basis Dis.,2004, 1688, 176-186.

476 477

(13) Yano, K.; Ohoshimab, S.; Gotouc, Y.; Kumaidod, K.; Moriguchia, T.; Katayama, H. Anal. Biochem.,2000, 287, 218-225.

478 479

(14) Petibois, C.; Drogat, B.; Bikfalvi, A. ; Déléris, G.; Moenner, M. FEBS Lett.2007, 581, 5469-5474.

480 481

(15) Nallala, J.; Diebold, M.-D.; Gobinet, C.; Bouché, O.; Sockalingum, G. D.; Piot, O.; Manfait, M. Analyst, 2014, 139, 4005-4015.

482 483

(16) Mu, X.; Kon, M.; Ergin, A.;Remiszewski, S.,Akalin, A.;Thompson, C. M.; Diem, M.Analyst, 2015, 140, 2449-2464.

484 485

(17) Baker, M. J.; Clarke, C.; Démoulin, D.; Nicholson, J. M.; Lyng, F. M.; Byrne, H. J.; Hart, C. A.; Brown, M. D.; Clarke N. W.; Gardner, P. Analyst,2010, 135, 887-894. Page 22 sur 32 ACS Paragon Plus Environment

Page 22 of 32

Page 23 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

486 487

(18) Sabbatini, S.; Conti, C.; Rubini, C.; Librando, V.; Tosi, G.; Giorgini, E. Vib. Spectrosc.,2013, 68, 196–203.

488 489

(19) Akalin, A.; Mu, X.; Kon, M. A.; Ergin, A.; Remiszewski, S. H.; Thompson, C. M.; Raz, D. J.; Diem, M. Lab. Invest.,2015, 95, 406-421.

490 491

(20) Nawrocki-Raby, B.; Polette, M.; Gilles, C.; Clavel, C.; Strumane, K.; Matos, M.; Zahm, J. M.; Van Roy, F. Bonnet, N.; Birembaut, P. Int. J. Cancer,2001, 93, 644-652.

492

(21) Lasch, P. Chemom. Intell. Lab. Syst.,2012, 117, 100-114.

493 494

(22) Kohler, A.; Böcker, U.; Warringer, J.; Blomberg, A.; Omholt, S. W.; Stark, E.; Martens, H. Appl. Spectrosc.,2009, 63, 296-305.

495 496

(23) Wolthuis, R.; Travo, A.; Nicolet, C.; Neuville, A.; Gaub, M.-P.; Guenot, D.; Ly, E.; Manfait, M.; Jeannesson P.; Piot, O. Anal. Chem.,2008, 80, 8461-8469.

497 498

(24) Walker, H.; Land, Jr.;William, F.;Park J.-W.;Mathur, R.;Hotchkiss, N.;Heine, J.;Eschrich, S.;Qiao, X.;Yeatman, T. Procedia Comput.Sci., 2011, 6, 273-278.

499 500

(25) Bastien, P.; Bastiena, P.; Vinzib, V. E.; Tenenhaus, M. Comput. Stat. Data Anal., 2005, 48, 17-46.

501

(26) Preisner, O.; Lopesb J. A.; Menezesa, J. C. Chemom. Intell. Lab. Syst.,2008, 94, 33-42.

502

(27) Arlot, S.; Celisse, A. Stat. Surv.,2010, 4, 40-79.

503

(28) Cordella, C.B.Y.; Bertrand, D. Trends Anal. Chem., 2014, 54, 75-82.

504 505

(29) Roggo, Y.; Duponchel, L.; Ruckebusch C.; Huvenne, J.-P. J. Mol. Struct.,2003, 654, 253-262.

506 507

(30) Fisher, R. A.; Yates, F. Statistical Tables for Biological, Agricultural and Medical Research. 6th Ed. Oliver & Boyd, Edinburgh and London, 1963.

508 509

(31) D’inca, H.; Namur, J.; Ghegediban, S. H.; Wassef, M.; Pascale, F.; Laurent, A.; Manfait, M. Am. J. Pathol.,2015, 185, 1877-1888.

510

(32) Celisse A.; Robin, S. Comput. Stat. Data Anal., 2008, 52, 2350-2368.

511 512

(33) Gaydou, V.; Lecellier, A.; Toubas, D.; Mounier, J.; Castrec, L.; Barbier, G.; Ablain, W.; Manfait M.; Sockalingum, G. D.Anal. Methods,2015, 7, 766-778.

513

(34) Hyonho, C.; Sündüz, K. J. R. Statist. Soc.,2010, 7, 3-25.

514 515

(35) Karaman, I.; Qannari, E.M.; Martens, H.; Hedemann, M.S.; Bach Knudsen, K.E.; Kohler, A. Chemom. Intell. Lab. Syst.,2013, 122, 65-77.

516

(36)

Grosmaire, L.; Reynès, C.; Sabatier, R. J. Soc. Fr. Statistique,2013, 154, 80-94.

517

Page 23 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

518

8 Tables and figures

519

Table 1 : Glossary of tested parameters Step

Parameter category Pretreatment 01

Paraffin 01

First step : individual paraffin normalization (sample by sample)

EMSC 01

Target 01

Second step : global paraffin normalization (all samples in one Data matrix)

Paraffin 02

EMSC 02 Target 02

Third step : final Chemometrics process (modelization)

Pretreatment 02

PLS_DA/PLS

520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537

Description

Page 24 of 32

Optimal parameters

Parameters tested by Cross Validation

*

CO2 wavelength erased, smoothing (Stavinsky Golay)

[5,10,15,20]

5

[0.5,1,1.5,2,2.5,3,3.5,4]

2,5

[1,2,3,4,5,6,7]

5

[1,1.1,1.2,1.3,1.4,1.5,1.6,1.8,2,2.5,3]

1.3

[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]

0.9

[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2]

1.1

*

mean spectrum of second quartile computed on amide signal areas (1450-1750 cm-1)

[5,10,15,20]

5

[0.5,1,1.5,2,2.5,3,3.5,4]

2,5

[1,2,3,4,5,6,7]

5

*

mean spectrum

Severals possibility (baseline corection, normalisation, derivating, SNV…)

*

reduction spectral range (920-1 1870cm ) erase paraffin signal (1350-1 1500cm ) normalization (min-max) at 1670 cm-1 (amide I signal)

Number of computed latent variable vector

[1 to 30] (one by one)

10**

Several possibilities (baseline corection, normalization, derivating, SNV…) Number of principal PC during PCA on paraffin spectra maximum error fitting between spectra and mean spectra of paraffin spectral image polynom order choice for baseline estimation maximum residual after EMSC decomposition (ai parameter) minimum bi parameter accepted after EMSC maximum bi parameter accepted after EMSC

taget spectrum for EMSC

Number of PC computed during PCA on paraffin spectra maximum error fitting between spectra and mean spectra of paraffin spectral image polynom order choice for baseline estimation taget spectrum for EMSC

* : various non algebraic parameterspossibility Parameters were ordered by their chronological use.The fourth column describes (when is possible) all tested parameters. The fifth column designs the variability range of tested parameters during computing models. The last column presents the selected values as optimal for modeling. Paraffin spectra were excluded when the computed Euclidean distances between spectra and the mean spectrum exceed the value “max_paraffin_error”. The number of PCA components used is ordered by the “number_Paraffin_PC” parameter. The baseline derivation was modeled by means of a polynomial function with polynomial order defined by “mscOrder” parameter (“mscOrder_2” for the second EMSC). Aquality test was applied on spectral image by the way of weighted coefficient matrix and by using “mscMaxResidue”, “mscMinCoef” and “mscMaxCoef” parameters. The “target 01” was the mean of specific part of spectral image described in result section. For second EMSC, the computed “target02” was the average of non-homogeneous multiple imaging data matrix.“max_paraffin_error_2” permits to realise a quality test on the second interference matrix constitute by “number_Paraffin_PC_2” first main component of PCA of multiple imaging paraffin matrix.

Page 24 sur 32 ACS Paragon Plus Environment

Page 25 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

538 539

540 541 542 543 544 545

Analytical Chemistry

Table 2 : Results of PLS-DA discrimination Cell culture type

Number of spectra

Referenc e group

Number of spectra predicted as group 01

Number of spectra predicted as group 02

Number of spectra predicted as group 03

Percentage of Good Prediction (%)

16HBE

3346

01

3077

195

74

92,0

BEAS-2B

2142

02

279

1643

220

76,7

BZR BZR-T33

3428 4082

03

24 8

190 56

3214 4018

93,8 98,4

96,3

Total spectra :

12998

mean :

90,2

88,3

The 3 first columns explain the cell culture name, the number of spectra or pixels taken in account in the computing and the reference group designation chosen (qualitative variable). The columns 4, 5 and 6 show the total number of spectra predicted for each reference group and for each cell cultures. The last column summarize these results by exposing the PGP for each cell cultures (a global PGP was added for both BZR and BZRT33 culture cells).

Page 25 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

546 547

548 549 550 551 552 553 554 555

Page 26 of 32

Table 3 : Results of PLS regression Calibration cell lines

Reference invasivity

PLS prediction invasivity

RMSE of validation (%)

Bias

Standard Deviation

16HBE BEAS-2B BZR BZR-T33

1 1,3 2,1 2,3

1,14 1,42 2,02 2,09

17,0 14,7 11,0 23,8

0,14 0,12 -0,08 -0,21

0,286 0,225 0,195 0,186

Mean :

16,6

-0,01

0,22

The first and second column presents respectively each studied cell culture and the reference invasiveness chosen. The third column presented the mean predicted invasiveness. The 4th, 5th, and 6th columns show (respectively for each cell culture) the detailed statistic results of the PLS regression. The Root Mean Square Error of Prediction illustrate the mean error proportionally to the invasiveness scale during the validation step of cross validation. The bias present the mean interval between reference and predicted invasiveness. The last column contained the standard deviation of prediction for each cell culture.

Page 26 sur 32 ACS Paragon Plus Environment

Page 27 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

556 557

558 559 560 561 562

Analytical Chemistry

Table 4 : Results of external validation of the PLS model Validation Cell lines

Number of IR pixels (7,3 µm²/pixel)

Mean of External Prediction

Standard Deviation

CALU3

4239

1,2

0,24

NCIH1299

3715

1,7

0,35

The first and second columnsindicated the cell lines used as external validation set and the number of qualified IR pixel used for prediction, respectively. The third column presents the invasiveness score predicted by the calibrated PLS model and the last column presents the corresponding standard deviations.

Page 27 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

563

564 565 566 567 568

Figure 1 : Preprocessing Organigram in three steps : Firstly thesamples collecting, secondly theoverriding of paraffin by EMSC preprocessing and thirdly thebuildingof homogeneous data bank linked to biological variables (second EMSC)

Page 28 sur 32 ACS Paragon Plus Environment

Page 28 of 32

Page 29 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

569 570 571 572 573 574 575

Analytical Chemistry

Figure 2 : Infrared spectra evolution of paraffined bronchial squamous cell lines during the pretreatment steps, full lines correspond to mean of spectral data matrix after recording, first and second EMSC processing steps, and colored area represents minimum to maximum space of variability values of these data matrixes,

Page 29 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

576 577 578 579 580 581 582 583

Figure 3 : Discrimination with PLS-DA method, a- optimization by cross validation of PLS-DA latent variable numbers, b- 3 dimensional discrimination volume with latent variable 1, 4 and 7. In black (blue) the 16HBE, in light grey (green) the BEAS-2Band in dark grey (red) the BZR and BZR-T33 cells spectra.

Page 30 sur 32 ACS Paragon Plus Environment

Page 30 of 32

Page 31 of 32

584 585 a-

b-

PLS Cross-Validation result calibration

validation

Linear regression curve 3,5

y = 0,7371x + 0,428 R² = 0,7635

31 3 26

21

16

BEAS-2B pixels

2,5

Predicted invasivity

10 is the optimal number of latent variables RMSE (%)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

16HBE pixels

2

1,5 BZR-T33 pixels

1

BZR pixels

11 0,5

0

6 1

586 587 588 589 590 591 592 593

594 595 596 597 598 599

6

11 16 21 Number of Latent Variables

26

0,7

1,2

1,7 Reference invasivity

2,2

Figure 4 : Regression with PLS method, a- optimization by cross validation of PLS latent variable numbers, b- references versus predicted invasiveness : spectral reparation on invasiveness scale with linear correlation curve.With circle, square, diamond and triangle respectively linked to 16HBE, BEAS-2B, BZR and BZR-T33 cell spectra.

Figure 5 : PLS latentes variables full line is the mean of first ten selected PLS latentes variables and colored area represents minimum to maximum space of variability of these PLS latentes variables, dotted line corresponds to the mean of final spectral data matrix (the target of second EMSC)

Page 31 sur 32 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

600

Table of Contents (TOC)/Abstract (ABS) Graphic

601 602 603

Page 32 sur 32 ACS Paragon Plus Environment

Page 32 of 32