Subscriber access provided by Northern Illinois University
Article
Vibrational analysis of lung tumor cell lines: implementation of an invasiveness scale based on the cell infrared signatures Vincent Daniel Gaydou, Myriam Polette, Cyril Gobinet, Claire Kileztky, JeanFrançois Angiboust, Michel Manfait, Philippe Birembaut, and Olivier Piot Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.6b00590 • Publication Date (Web): 02 Aug 2016 Downloaded from http://pubs.acs.org on August 5, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
1 2
Vibrational analysis of lung tumor cell lines: implementation of an invasiveness scale
3
based on the cell infrared signatures
4 5
Short title: Infrared-based invasiveness scale
6 7 8
Vincent Gaydou1,2, Myriam Polette3,4,5, Cyril Gobinet1,2,5, Claire Kileztky3,4, Jean-François
9
Angiboust1,2, Michel Manfait1,2, Philippe Birembaut3,4, Olivier Piot1,2,5*
10 11
1
12
Champagne-Ardenne, UFR de Pharmacie, 51 rue Cognacq-Jay, 51096 Reims, France.
13
2
14
3
15
Cognacq-Jay, 51092 Reims, France.
16
4
17
Jay, 51092 Reims, France.
18
5
19
Ardenne, 51 rue Cognacq-Jay, 51096 Reims, France.
Equipe MéDIAN - Biophotonique et Technologies pour la Santé Université de Reims
CNRS UMR7369MEDyC, SFR Cap-Santé, 51 rue Cognacq-Jay, 51096 Reims, France. INSERM UMR-S 903, SFR CAP-Santé, University of Reims-Champagne-Ardenne, 45, rue
Biopathology Laboratory, Centre Hospitalier et Universitaire de Reims, 45 Rue Cognacq-
Platform of Cellular and Tissular Imaging (PICT), Université de Reims Champagne-
20 21
*
[email protected] Page 1 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
22
1 Abstract
23
Assessing the tumor invasiveness is a paramount diagnostic step in order to improve the
24
patients care. Infrared spectroscopy access the chemical composition of samples; and in
25
combination with statistical multivariate processing, presents the capacity to highlight subtle
26
molecular alterations associated to malignancy development. Our investigation demonstrated
27
that infrared signatures of cell lines presenting various invasiveness phenotypes contain
28
discriminant spectral features, which are useful informative signals to implement an objective
29
invasiveness scale. This last development reflects the interest of vibrational approach as a
30
candidate biophotonic label-free technique, usable in routine clinics, to characterize
31
quantitatively tumor aggressiveness. In addition, the methodology can reveal the
32
heterogeneity of cancer cells, opening the way to further researches in cancer science.
33
Page 2 sur 32 ACS Paragon Plus Environment
Page 2 of 32
Page 3 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
34 35
2 Introduction
36 37
Metastatic progression is a multistep process involving basement membrane disruption, tumor
38
cell dispersion from the primarytumor cluster, stromal connective tissue invasion by tumor
39
cells,neo-angiogenesis, intravasation and extravasation and finally colonization of a target
40
organ. The estimation of the invasive properties of tumor cells is of paramount interest for a
41
more precise diagnosis, opening the way to a personalized therapy.The implementation of a
42
methodology dedicated to the assessment of the tumor invasivenesswould help for the
43
prognosis and then may guide the management of patients.In addition, in a retrospective
44
approach,invasiveness scoring on biopsy or surgical specimens could help to classify tissues
45
in order to further investigate the biochemical mechanisms implicated.
46
This need is particularly relevant in bronchial cancers and more precisely for non-small cells
47
lung carcinomas (NSCLC). Indeed, the histological examination of bronchial carcinomas
48
distinguishes two main types. The first type, representing only 15 % of the casescorresponds
49
to small cell carcinoma. This tumor has a worse prognosisand requires rarely a surgical
50
treatment.1 NSCLC are also very aggressive lesions frequently discovered at advanced stages.
51
Their treatment, according to the degree of extension, includes surgery, radiotherapy and/or
52
chemotherapy. In the present study, our investigation concerned these aggressive bronchial
53
NSCLC,particularly the squamous cell carcinoma developed from bronchial epithelial cells.2
54
Presently, imagingdiagnostic techniques such as X-ray, MRI (Magnetic Resonance Imaging),
55
thoracic tomodensitometry or PET (Positon Emission Tomography) do not give access to
56
information on the tumor invasiveness.3 Presently, recovering such information requires
57
complex biological test such as genetic analysis to highlight the associated mutations or
58
immune-histochemical studies, performed on cancer tissues.4 These analyses allow
Page 3 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
59
improvingthe diagnostic reliability by classifying the cancer type as well as its staging and
60
level of invasiveness.Nevertheless, they requiresharp protocols with the use of chemicals.
61
Their sensitivity and reproducibility rely also on the expertise of the operator and the
62
pathologist interpreting the data.5
63 64
In this context, vibrational spectroscopy appears as a candidate label-free technique to classify
65
tumor samples according to their invasiveness level. The combination of infrared(IR)
66
absorption spectroscopy and microscopy for tissue analysis, also called Spectral
67
HistoPathology (SHP) proved well suited for differentiating various histological structures
68
and for identifying pathology.6-9 Presented as a twin technique of IR modality, Raman
69
spectroscopy also proved as an effective tool to diagnostic purposes.10,11 Since SHP is a
70
computer-based digital technique, the procedure of tissue evaluation can be automated in an
71
independent manner of the operator subjectivity.12 The large set of spectroscopic data is
72
treated by means of chemometric and statistical methods which can be standardizedto deliver
73
reproducibleresults, objectivelyinterpretable.13-16
74
In addition, some quite recent studies report that tumor samples of different degrees of
75
invasiveness present specific IR markers. Baker et al.employed IR spectroscopy to investigate
76
prostate cancer epithelial cell lines and to discriminate cell lines in regard to their
77
invasiveness phenotype.17 Also, Sabbatini et al. presented the potential of IR spectroscopy to
78
discriminate tissuesof oral squamous cell carcinoma, corresponding to well, moderately and
79
poorly differentiated tumor cells.18 More recently, Diem et al. demonstrated the state of
80
advancement of SHPin a studydedicated to malignant and benign tumors of the lung
81
carcinoma.19 Actually, major of human carcinoma SHP studies show the discriminant power
82
and the limitations of infrared spectroscopy. The efficiency of this SHP approach relies on the
83
constitution of the sample set that has to reflect the tissue diversity. Also, in these studies, the
Page 4 sur 32 ACS Paragon Plus Environment
Page 4 of 32
Page 5 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
84
discriminant potential of the IR approach was highlighted, but without exploring its ability to
85
quantify the tumor invasiveness. So, the objective of our original work was to develop a
86
methodology based on IR spectroscopy permitting to score the invasiveness phenotype of
87
tumor cells.
88
More precisely, our research concerned broncho-epithelial squamous cell carcinoma as we
89
argued above. Indeed, our aim was to establish an invasiveness scale based on IR signatures
90
of the cell lines (16HBE, BEAS-2B, BZR and BZR-T33) included for presenting different
91
invasive phenotypes. To achieve this objective, specific chemometric algorithms were
92
developed specifically for the IR data collected on these cellular specimens. Firstly, a spectral
93
preprocessing protocol, based on EMSC (Extended Multiplicative Signal Correction), was
94
optimized to producehomogeneous IR data bank bycorrecting mathematically the spectral
95
interferences originating from the paraffin used as embedding material, and by normalizingthe
96
cell signal independentlyof the cyto-block preparation. Secondly, PLS-DA (Partial Least
97
Square - Discriminant Analysis) and PLS (Partial Least Square) were run to establish
98
qualitative (i.e. discriminative) and quantitative models respectively. The PLS approach
99
aimed at determining a multivariate regression curve to associate an “invasiveness score” to
100
each bronco-epithelial cell lines provided.These models were developed and optimized by an
101
image-basedcross validation, in which all pixels/spectra of one image constituted a unique
102
independent sample.
103 104
Page 5 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
105
3 Materials and Methods
106 107
3.1 Cell culture
108
First, the prediction IR model was constructed from 4 different cell lines. Human lung 16HBE
109
14o-, BEAS-2B and BZR cell lines were obtained from the American Type Culture Collection
110
(Rockville, MD, USA). BZR-T33 lung cells were provided by Curtis C. Harris (National
111
Cancer Institute, Bethesda, MD, USA). In a second step of external validation, 2 additional
112
cell lines, CALU3 and NCIH, were included.
113
The 6 cell lines were cultured in DMEM (Gibco, Invitrogen, Carlsbad, CA, USA) containing
114
10% fetal calf serum (FCS) (Gibco). The invasive phenotype of these lung cancercells,
115
assessed by a modified Boyden chamber assay, has been extensivelycharacterized
116
previously.20 Considering the number of invading cells, 16HBE and CALU3 wereconsidered
117
as non-invasive cell lines, BEAS-2B cells display moderate invasive capacities, BZR and
118
BZR-T33 present highly invasive properties. NCIH cells invasiveness is intermediary
119
between BEAS-2B and BZR. For each of these cell lines, several samples were cultured
120
independently. For each of the calibration cell lines (16HBE, BEAS-2B, BZR and BZRt33), 4
121
passages were cultured. For external validation (CALU3 and NCIH1299), 2 passages were
122
cultured. Cell cultures fixed in formalin were centrifuged then embedded in paraffin and 8
123
µm-thick tissue sections were deposited on CaF2 windows suitable for IR measurements.
124
All the cell lines used in these experiments are largely used in the literature and most of them
125
are provided by ATCC. Thus, there is no ethical limitation for their use.
126 127 128
3.2 IR imaging
129
Page 6 sur 32 ACS Paragon Plus Environment
Page 6 of 32
Page 7 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
130
The IR acquisitions were performed with aHyperion imager (Bruker, Germany) equipped
131
with a Focal Plane Array (FPA) detector allowing to image large tissue areas. The FPA used
132
present a matrix of 64x64pixels, 2.7 x 2.7 µm² of size. Each pixel can be considered as an
133
individual IR detector.This device allows collecting a consequent number of spectra, in a
134
limited time,which is required for the statistical processing of the data.
135
Spectra were collected with a 2 cm-1 spectral resolution and a number of accumulations of 32
136
for the cell spectra and 240 for the background signal. The background is recorded on a very
137
clean CaF2 area. Paraffin signal is also collected for each sample on an area surrounding the
138
cell spots. Cell samples are analyzed on area of approximatively 0.15 mm² (around 15000
139
spectra).
140 141
3.3 Preprocessing of infrared data
142 143
To eachcellular sample, it was corresponding 2 spectral images: one image of paraffin and
144
one image of the cell culture. A precise protocol of preprocessing was developed as described
145
in figure 1.This methodology aimsat building homogeneous imaging data bank according to
146
several steps.
147
The first stepcorresponded to a smoothing using the Savitzky-Golay method on a window of 6
148
points withapolynom degree of 1.21
149 150
Then the spectralimages were processed by EMSC (Extended Multiplicative Signal
151
Correction). This correction was first time proposed by Martens and Stark in 1991. A major
152
advantage of EMSC is that prior chemical or physical knowledge can be integrated into the
153
preprocessing model.22 The multivariatesignals were normalized with respect to a reference
Page 7 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 32
154
signal by means of decomposition of original spectrum in signalsaccording to the following
155
equation (1):
156
= ̂ + + +
157
with a (scalar), b and c (both vectors) regression coefficients, ̂ an estimate of the data set (also
158
call “target”), I the matrix of known interferencespectra (interference matrix), P a polynomial
159
modeling background and offsets dueto scatter effects and finallye the model error(also call
160
“residual”). If thereis not a good estimate of ̂ available, one couldchoose the average of the
161
whole data set for ̂ . In our case, ̂ was selected following specific rules as explained later. The
162
, andcparameters were estimated by minimizing the weighted sum-of-squares of the
163
residual . Thus,
corresponded to EMSC corrected spectrumand was calculated according
164
to equation (2):
165
= + ̂
(1)
(2)
166 167
In this study, the principal parasite signal was associated to paraffinin the form of an
168
interference matrix as it has already ever been done for IR analysis of paraffin-embedded
169
samples.23 Theinterference matrix was established by considering the first main components
170
of the PCA (Principal Component Analysis) computed on non-centered paraffin
171
spectracollected around cells.
172 173
In our investigation, a double EMSC was carried out: the first one for the correction of each
174
spectral image independently of the other ones, and the second to make these images
175
comparable to the one to the other as illustrated in figure 1. Indeed, first every sample was
176
treated independently in the aim to get high quality spectra with neutralized paraffin
177
interferences. The cell culture spectra computed and qualified were gathered in non-
178
homogeneous multiple imaging data matrix. For each cell image, the paraffin signal was Page 8 sur 32 ACS Paragon Plus Environment
Page 9 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
179
corrected by EMSC pretreatment but the baseline correction and the proportion between
180
paraffin and cell signals were not the same for each cell images.
181
So, in order to transform this non-homogeneous multiple imaging data matrix into a
182
homogeneous one, a second EMSC was used. The interference matrix was constructed with
183
PCA componentscomputed from the set of paraffin spectra of all samples. To perform these
184
EMSC pre-processing, a consequent number of parameters have to be monitored. The set of
185
these parameters are indicated in table 1. Their optimization was obtained from the results of
186
the PLS-DA, described below. One of the main parameters is the “target” that has to be
187
defined as the model spectrum the most representative of the biological sample. The
188
computing of this target will be explained in the result section.
189 190
3.4 Chemometric algorithms
191
PLS
192
PLS-based algorithms were used to construct a prediction model allowing data quantification
193
with the advantage also to highlight the discriminant spectral features.24
194
The PLS algorithm is based on a multivariate regression principle. Particularly, it allows to
195
maximize the covariance between 2 matrix by means of multidimensional and orthonormal
196
regression vectors. The 2 following matrix equations describe the PLS principle, the aim is to
197
solve T with X and Y as known variables.
198
X = T P + R
(3)
199
Y = T q + f
(4)
200
With X the data matrix (n × m) (spectral data);
201
Y the quantitative references values (n × l) (invasiveness score);
202
T the score matrix (n × k);
203
P the loadings (or regression vectors) matrix (k × m) associates to X prediction fromT ; Page 9 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 32
204
R the residual associate to X prediction (n × m);
205
q the loading associates to Y(k × l);
206
f the residual associated to Y prediction (n × l);
207
k the number of computed latent variables (corresponding to the PLS model dimension).
208
Where l, m and n are respectively the number of quantitative reference value to predict for
209
each sample (invasiveness value), the number of experimental values for each sample (depend
210
of spectral resolution) and the number of sample (number of spectra).
211 212
The vectorial space established by the regression vectors allows to link spectra (here X)to
213
values, scores orreference quantitative variables (here Y). Thus, the PLS regression permits to
214
quantify data on the basis of quantitative features. It is particularly adapted to wide and
215
covariant data bank. Also this processing can easily thwart by under or over fitting.To prevent
216
this bias, the number of latent variables used for modelmust be well mastered.25
217
The accuracy of the PLS models was assessed by computing the RMSE (Root Mean Square
218
Error) on both calibration and validation sets of data, according the following formula:
219
=
∑& !'( ! – #! $ )*+
%
(5)
220
With ,- : reference value of spectra i, ,.- model predicted value of ithspectra and n is the
221
number of used spectra.
222 223
PLS-DA
224
PLS-DA (Partial Least Square - Discriminant Analysis) algorithm was employed to
225
implement qualitative classification models.For every qualitative variable, a binary code (0 or
226
1) is associated. A multivariate regression PLS is then realized between thespectral data
227
matrix and the binary matrix. Then, the PLS-DA model predict multivariate valueslinked to
228
spectra present in data set. The multivariate values obtained are then correlated to Page 10 sur 32 ACS Paragon Plus Environment
Page 11 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
229
corresponding qualitative groups. This classification method is widely used for the
230
exploitation of the vibrational data.26
231
The performance of the PLS-DA models was evaluated from the PGP (percentage of good
232
prediction), on both calibration and validation data sets, calculated as follows:
233
/ = ) ∗ 100
234
With g the number of well predicted spectra.
0
(6)
235 236 237
Cross validation
238
In our study, the estimation of the performances of the prediction models was based on a cross
239
validation method, on the 4 cells lines of the calibration set. More precisely, it was carried out
240
at the level of the spectral images rather that at the pixel scale to avoid overfitting. While the
241
total number of pixels is high, the number of cytoblock sections is quite limited justifying this
242
approach, which we qualify of “partial cross validation”. The principle of cross validation (or
243
also internal prediction) is based on the prediction of a sample (an image in our case)
244
beforehand removed it from the step of modeling. This process is repeated for all the samples.
245
An averaged precision of prediction is then obtained. This average value gives the optimal
246
hope expected from the model.27
247 248
All the computing steps were processed on Matlab R2013a (32 bit) (Mathwork, USA), the
249
PLS algorithm originates from “saisir” toolbox developed by Bertrand and Cordella, INRA,
250
France.28
251
Page 11 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
252
3 Results
253 254
Development and optimization of preprocessing protocols were summarized in figure 1 and
255
table 1. The way to build an objective and homogenous spectral data matrix is complex
256
because it exists a strong variability between each paraffin embedded cell culture. First,cell
257
culture thickness may be variable within each samplesection. This variability is due to
258
material and mechanical property associated to sample preparation. To avoid these variances,
259
the EMSC pretreatment was employed following two steps as indicated in the Material &
260
Methods section.
261 262
During the first EMSC performed at the level of each spectral image, we can notice the
263
influence of the “target” selected as reference as reference spectrum. If the mean spectrumis
264
chosen as target (usual choice), a markedvariability between the targets of the different
265
spectral image is observed. This target variability mainly originates from the numberof
266
pixelsby images who effectivelypresent acellular signal. To avoid this problematic, it was
267
decided to compute targets in function of cell signal for each image. The integrated intensity
268
under the amide I and II bands (1450 –1750cm-1) was computed and sorted (top to down). The
269
spectra thatcorresponded to the second quartile of this ordering were then used to compute the
270
target. For each spectral image, the target was determined by this quartile method, which
271
permitted also to provide quite similar targets representative of the cellular signal within the
272
set of spectral images. First, third and fourth quartiles were also tested but with worst results.
273 274
In addition, the selection of the target strongly influences the EMSC quality test. Computing
275
the target on the second quartile ensures to have a representative cell signal and leads that
276
near 80% of pixels were rejected after the first EMSC. The rejected spectra were deleted due
Page 12 sur 32 ACS Paragon Plus Environment
Page 12 of 32
Page 13 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
277
to the poor signal or/and with a high paraffin contribution. This quality test depends of a and
278
b parameters as indicated in table 1. After the second EMSC, for all spectra of each images,
279
the baseline was homogeneous and the paraffin contribution identical. Figure 2 displays the
280
mean and the minimum-maximum values of all raw spectra of the set of the images, after the
281
first and the second EMSC steps. The proportion between cellular material and paraffin was
282
so variable from one sample to the other that the second EMSC cannot correct entirely this
283
variance. Consequently, to avoid to take into account this signal variability originating from
284
the sample preparation, the spectral ranges were reduced to 920 to 1350 cm-1 and 1500 to
285
1850 cm-1 and a min-max normalization was computed on the 1670cm-1wavenumber
286
corresponding to the amide I band associated to vibrations of the peptide bonds of the protein
287
content.
288 289
Figure 3 presents the results of PLS-DA, performed on the 4 cell lines of the calibration
290
sample set. For this classification model, 3 reference groups were chosen, 16HBE as normal
291
cells, BEAS-2B as moderate invasive cells and BZR and BZR-T33 invasive cells. The BZR
292
and BZR-T33 were gathered together because of the phenotypic likelihood between both cell
293
types.
294
Figure 3.a shows the exploration of the number of latent variablesbased on thepartial cross-
295
validation method used. It displays the evolution of PGP as function of the number of latent
296
variables (until 30) for both calibration and validation steps. The calibration PGP tends to
297
100% when the number of latent variablesincreases, reflecting a convergence of the PLS-DA
298
processing. Examining the evolution of the validation PGP, a number of 10 latent variables
299
was determined as being optimal based on the fact that the PGP maximum, around 90%, is
300
reached for this number; and by preventing under or over fitting. Table 2 indicates the
301
prediction results for the 3 groups of cellular specimens (16HBE, BEAS-2B, and a common
Page 13 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
302
group for BZR and BZR-T33) corresponding to various invasiveness phenotypes. The PGP
303
values are over 90% for the extreme groups of invasiveness while the BEAS-2B cells present
304
a value near to 77% reflecting the intermediary phenotype of this cell line.In order to visualize
305
the discriminant potential of our approach a 3D representation is given (figure 3.b). The latent
306
variables chosen to draw this representation were the first, the 4th and the 7th ones, since these
307
latent variables gave a good visual aspect of the data projection.While the borders of the
308
groups were not totally frank with some spectra superimposed, a distinction between these 3
309
levels of invasiveness can be outlined.
310
The results of PLS-DA processing show the ability to discriminate bronchial SCC of different
311
invasiveness phenotypes from their infrared spectral signatures. Based on these encouraging
312
results, our objective was then to investigate the possibility to order cells quantitatively by
313
constructing a spectral scale of the cell invasiveness. In this purpose, PLS models were then
314
developed.
315 316
While PLS-DA gives qualitative memberships toa restricted number of classes (3 in our
317
study), PLS is a more moderated processing offering the possibility to consider continuous
318
and quantitative scores.29 Thus, PLS was run with the 4 levels of cell invasiveness as
319
reference inputs, by distinguishing the 2 highest invasiveness levels, i.e. BZR and BRZ T33
320
samples.First, various PLS models were tested by changing simultaneously the reference scale
321
of invasiveness and the number of latent variables. The best results were obtained with a scale
322
ranging from 1 to 2.3 as indicated in table 3, and an optimal number of latent variables to 10.
323
Indeed, the number of latent variables was determined by plotting the RMSE for a varying
324
number of variables, similarly to the first step of PLS-DA previously explained (figure 4.a).
325
The calibration and validation RMSE tend to6%and 15% with the increasing of the latent
Page 14 sur 32 ACS Paragon Plus Environment
Page 14 of 32
Page 15 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
326
variablesnumber, respectively. The optimal number, corresponding to the first minimum, is
327
the most appropriate to avoid under and over fitting for the PLS models.
328
The regression results, calculated with these conditions, are given in table 3 with
329
corresponding regression curve depicted in figure 4.b. The average validation RMSE was
330
computed to 16.6%; with 11 and 23.8% as minimal and maximal prediction errors obtained
331
for BZR and BZR-T33 respectively. A significant linear regression was highlighted between
332
the reference and the predicted invasiveness scales, with a correlation coefficient R² equal to
333
0.76 (R² > rα,withrα=0.01 = 0.6226 for n-2=14 as degrees of freedom).30 Interestingly, the
334
reference scale highlightsinvasiveness levels which are not regularly distributed between 1 for
335
16HBE and 2.3 for BZR-T33 cells. Indeed, B2B and BZR cells were positioned at 1.3 and 2.1
336
respectively, reflecting a slight proximity of B2B with 16HBE and a very close invasiveness
337
phenotype of BZR and BZR-T33.
338
In a further step to demonstrate the validity of our approach, 2 additional cell samples
339
considered as an external validation set were projected on the PLS model, previously
340
calibrated with the first 4 cell lines. The prediction results of CALU3 and NCIH1299 cells
341
appear coherent with their invasiveness phenotype, as visible in Table 4.
342
Page 15 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
343
4 Discussion
344 345
Main applications of FT-IR spectrometry in cancer science consist in demonstrating the
346
diagnostic potential of this label-free biophotonic technique. In our study, we tried to exploit
347
further the wealth of the vibrational approach by highlighting and quantifying cell
348
invasiveness phenotype by using distinct human lung cell lines.
349 350
In this work, a double EMSC preprocessing proved its capacity for ensuring quality test and
351
homogenizing the set of data in an automated manner. In paraffin-embedded tissue sections, a
352
one-step EMSC is sufficient,31 on cells samples it was necessary to implement a more
353
sophisticated pre-processing using a specific way to compute the target spectrum and a drastic
354
selection of retained spectra; near 80% of the raw spectra were rejected. Here, the numerous
355
pre-processing parameters were optimized in regards to the classification results, since pre-
356
processing affects strongly the later statistical processing of the data.
357
The advantage of our approach is to give the possibility to work with several spectral images
358
recorded from paraffin embedded cell culture samples. As it is observed on figure 2, the great
359
variability of raw spectra forbids data interpretability. The double EMSC can correct a high
360
part of parasiticvariability due to the paraffin embedding but also to sample preparation.
361
Nevertheless, a rest of variability is still present. This concerns particularly the high
362
wavenumber range with the impossibility to distinguish CH3 and CH2IR absorption signal
363
between paraffin and cell material. To skirt this problem, we were obliged to addin the
364
preprocessing protocol, stepssuch as wavenumber range selection andnormalization on the
365
amide I band. Other advantage of EMSC, is the possibility to perform a quality test to
366
eliminate outlier spectra and retain exploitable cell spectra.
367
Page 16 sur 32 ACS Paragon Plus Environment
Page 16 of 32
Page 17 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
368
A possibility to perform the statistical processing would have been to use cross validation at
369
the image pixel level. In our study, the number of spectra per images was very high, near to
370
ten thousand. This method was not relevant considering the risk of high overfitting.32 To
371
avoid such a methodological bias, we opted for cross-validation at the image level although
372
the number of images was limited. Indeed, a total of 16 samples corresponding to 4 invasive
373
distinct phenotypes was included to develop our approach. In addition, the PLS model of
374
invasiveness scale was further validated by projecting 2 additional human lung cell lines
375
(CALU3 and NCIH1299).
376 377
The retained PLSmodel,lead toa RMSEP error prediction less than 17%, required 10latent
378
variables,translating the fact that the useful spectral information, associated to the tumor
379
invasiveness, is quite subtle and not comprised only in the first latent variables that carry the
380
main covariance between the infrared signals and the reference invasiveness scale.33 The
381
visualization of these latent variables permits to reveal the infrared molecular vibrations
382
implicated in the discrimination process. Figure 5, presenting the 10PLS latent variables,
383
highlights the discriminant wavenumbers. Particularly due to their strong loading coefficients,
384
the following vibrations appear to be involved into the invasiveness scoring.
385
_ 1015 and 1117 cm-1 assigned to C-O stretching (νC-O) vibrations in carbohydrate
386
molecules, _ 1585 cm-1 of the amide II band specific of the protein content,
387
_ 1618 cm-1 attributed to νC=O bonds probably due to β-turn proteins secondary structure,
388
_ 1740 cm-1 attributed to νC=O bonds in lipid compounds (triglycerides or phospholipids).
389 390
The regression PLS model was built by determining the optimal invasiveness scale. We
391
decided to fix the origin at 1, firstly for algorithmic reasons since predicting a null or negative
392
value can generate a strong computing bias and secondly because 16HBE cells present slight
Page 17 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
393
invasiveness characteristics. Then, many arbitral values were tested to quantify cell lines
394
invasiveness.
395
Based on this reference scale, the lowest invasiveness level was over-estimated contrary to the
396
highest level that was underestimated. This particularity emphases the difficulty to precisely
397
predict the samples corresponding to the extreme invasiveness degrees. The reference
398
invasiveness were optimized to limit this effect.
399
As indicated inTable 3, the invasiveness value of the cell lines were not regularly spaced out,
400
nevertheless the corresponding regression curve is linear (Figure 4.b). This observation is
401
closer to the linear nature of the PLS principle and also to the Beer Lambert law lying on a
402
linear relationships between the molecular concentration and the infrared absorbance.
403 404
A marked dispersion of the data was observed. Indeed, as shown on figure 4.b, the repartition
405
of predicted invasiveness for spectral data of a same invasiveness reference score is
406
important. This is also illustrated by the value of the standard deviation in the PLS model
407
(Tables3 and 4), thatclearly illustrates the spectral variability within each cell culture. In our
408
methodological approach, each infrared image was correspondingto a separate culture of cell
409
line. Indeed, the analyzed cell samples obtained at different passages of cell culturemay be a
410
supplementary source of variability. Furthermore, for each culture the cells probed canmay be
411
at various stages of invasivenesswhich mayinfluencethe spectral signal. Thus, our approach
412
presents the capability to assess such variable biological features and could find valuable
413
application in the investigation of tumor heterogeneity.
414
The vibrational signals associated with different invasive properties of the cells may be
415
further applied on human tissue sections of lung cancers, particularly in pre-invasive lesions.
416
Indeed, in these conditions, classical histology cannot detect the potential aggressiveness of
417
such heterogeneous lesions. Thus, this original infrared spectral approach which can be
Page 18 sur 32 ACS Paragon Plus Environment
Page 18 of 32
Page 19 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
418
directly used on formalin fixed paraffin embedded tissue sections, may bring new information
419
on the behavior of these tumor cells.
420
Page 19 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
421
5 Conclusion and perspectives
422 423
This study, led on cell samples of various invasiveness phenotypes, demonstrated the
424
possibility to establish an invasiveness scale on the basis of the cell infrared signature. This
425
quantitative and label-free assessment of the invasiveness relies exclusively on the intrinsic
426
molecular composition of the samples, without requiring any particular preparation and while
427
preserving the cellular integrity. In addition, within a same invasiveness level, a spectral
428
heterogeneity can be highlighted among the cells population. This demonstrates the interest of
429
the vibrational spectroscopic approach to reveal and analyze the tumor heterogeneity for
430
future investigations in cancer science.
431
Further developments will concern the incorporation of feature extraction methods in the data
432
statistical processing. Indeed, algorithms such Sparse PLS or combining a processing by
433
Genetic Algorithm appear as an innovative way to improve the prediction model, with the
434
advantage to identify simultaneously the molecular vibrations involved in the discriminant
435
process.34-36
436
Also, the methodology implemented and the results obtained on cellular samples will be
437
transferred at the tissue scale, in order to take into account the influence of the complex
438
ecosystem on the functionality of the tumor cells.
439 440
Page 20 sur 32 ACS Paragon Plus Environment
Page 20 of 32
Page 21 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
441
Analytical Chemistry
6Acknowledgements
442 443
With financial support from ITMO Cancer AVIESAN (National Alliance for Life Sciences &
444
Health) within the framework of the Cancer Plan.CNRS UMR7369 MEDyC (Matrice
445
Extracellulaire et Dynamique Cellulaire) laboratory and the technological platform PICT
446
“Imagerie Cellulaire et Tissulaire”are gratefully acknowledged for instrumental support.
447
INSERM (SFR CAP-Santé, University of Reims-Champagne-Ardenne) and Pol Bouin
448
laboratory (CHU of Reims) are also gratefully acknowledgedfor sample supply and scientific
449
recommend.
450
Page 21 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
451
7 References
452 453 454 455
(1)
Wissler, M.-P. Bilan de l’analyse du statut mutationnel EGFR de 1000 patients atteints d’adenocarcinomes pulmonaires pris en charge par la plateforme d’oncologie moleculaire du Chu-Cav de Nancy, Thesis, Université de Lorraine, Faculté de Médecine de Nancy, 2012.
456
(2)
Bonastre, E.; Brambilla E.; Sanchez-CespedesM.; J. Pathol.,2016, 238, 606-616.
457 458
(3)
Sato, M.; Sakurada, A.; Sagawa, M.; Minowa, M.; Takahashi, H.; Oyaizu, T.; Okada, Y.; Matsumura, Y.; Tanita, T.; Kondo, T. Lung Cancer, 2001, 32, 247-253.
459 460
(4)
©Recommandations professionnelles Cancer du poumon non à petites cellules, ed.Institut National du cancer, Boulogne-Billancourt, France,2010(www.e-cancer.fr).
461 462 463
(5)
Travis, W.D.; Brambilla, E.; Noguchi, M.; Nicholson, A. G.; Geisinger, K.; Yatabe, Y.; Powell, C. A.; Beer, D.; Riely, G.; Garg, K.; Austin, J. H. M.; Rusch, V. W.; Hirsch, F. R.; Jett, J.; Yang, P.-C.; Gould, M. J. Thorac. Oncol.,2011, 6, 244-85.
464
(6)
Ellis, D.I.; Goodacre, R.; Analyst,2006, 131, 875-885.
465 466
(7)
Travo, A.; Piot, O.; Wolthuis, R.; Gobinet, C.; Manfait, M.; Bara, J.; Forgue-Lafitte, M.-E.; Jeannesson, P. Histopathology, 2010, 56, 921-931.
467 468 469
(8)
Diem, M.; Miljkovic, M.; Bird, B.; Chernenko, T.; Schubert, J.; Marcsisin, E.; Mazur, A.; Kingston, E.; Zuser, E.; Papamarkakis, K.; Laver, N. Spectroscopy,2012, 27, 463496.
470
(9)
Kumar, S.; Shabi, T. S.; Goormaghtigh, E. Plos one,2014, 9, 1-8.
471 472
(10) Depciuch, J.; Kaznowska, E.; Szmuc, K.; Zawlik, I.; Cholewa, M.; Heraud, P.; Cebulski, J. Infrared phys. techn.,2016, 76, 217-226.
473
(11) Austin, L.A.; Osseiran, S.; Evans, C.L.; Analyst,2016, 141, 476-503.
474 475
(12) Lasch, P.; Haensch, W.; Naumannc, D.; Diem, M. BBA, Mol. Basis Dis.,2004, 1688, 176-186.
476 477
(13) Yano, K.; Ohoshimab, S.; Gotouc, Y.; Kumaidod, K.; Moriguchia, T.; Katayama, H. Anal. Biochem.,2000, 287, 218-225.
478 479
(14) Petibois, C.; Drogat, B.; Bikfalvi, A. ; Déléris, G.; Moenner, M. FEBS Lett.2007, 581, 5469-5474.
480 481
(15) Nallala, J.; Diebold, M.-D.; Gobinet, C.; Bouché, O.; Sockalingum, G. D.; Piot, O.; Manfait, M. Analyst, 2014, 139, 4005-4015.
482 483
(16) Mu, X.; Kon, M.; Ergin, A.;Remiszewski, S.,Akalin, A.;Thompson, C. M.; Diem, M.Analyst, 2015, 140, 2449-2464.
484 485
(17) Baker, M. J.; Clarke, C.; Démoulin, D.; Nicholson, J. M.; Lyng, F. M.; Byrne, H. J.; Hart, C. A.; Brown, M. D.; Clarke N. W.; Gardner, P. Analyst,2010, 135, 887-894. Page 22 sur 32 ACS Paragon Plus Environment
Page 22 of 32
Page 23 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
486 487
(18) Sabbatini, S.; Conti, C.; Rubini, C.; Librando, V.; Tosi, G.; Giorgini, E. Vib. Spectrosc.,2013, 68, 196–203.
488 489
(19) Akalin, A.; Mu, X.; Kon, M. A.; Ergin, A.; Remiszewski, S. H.; Thompson, C. M.; Raz, D. J.; Diem, M. Lab. Invest.,2015, 95, 406-421.
490 491
(20) Nawrocki-Raby, B.; Polette, M.; Gilles, C.; Clavel, C.; Strumane, K.; Matos, M.; Zahm, J. M.; Van Roy, F. Bonnet, N.; Birembaut, P. Int. J. Cancer,2001, 93, 644-652.
492
(21) Lasch, P. Chemom. Intell. Lab. Syst.,2012, 117, 100-114.
493 494
(22) Kohler, A.; Böcker, U.; Warringer, J.; Blomberg, A.; Omholt, S. W.; Stark, E.; Martens, H. Appl. Spectrosc.,2009, 63, 296-305.
495 496
(23) Wolthuis, R.; Travo, A.; Nicolet, C.; Neuville, A.; Gaub, M.-P.; Guenot, D.; Ly, E.; Manfait, M.; Jeannesson P.; Piot, O. Anal. Chem.,2008, 80, 8461-8469.
497 498
(24) Walker, H.; Land, Jr.;William, F.;Park J.-W.;Mathur, R.;Hotchkiss, N.;Heine, J.;Eschrich, S.;Qiao, X.;Yeatman, T. Procedia Comput.Sci., 2011, 6, 273-278.
499 500
(25) Bastien, P.; Bastiena, P.; Vinzib, V. E.; Tenenhaus, M. Comput. Stat. Data Anal., 2005, 48, 17-46.
501
(26) Preisner, O.; Lopesb J. A.; Menezesa, J. C. Chemom. Intell. Lab. Syst.,2008, 94, 33-42.
502
(27) Arlot, S.; Celisse, A. Stat. Surv.,2010, 4, 40-79.
503
(28) Cordella, C.B.Y.; Bertrand, D. Trends Anal. Chem., 2014, 54, 75-82.
504 505
(29) Roggo, Y.; Duponchel, L.; Ruckebusch C.; Huvenne, J.-P. J. Mol. Struct.,2003, 654, 253-262.
506 507
(30) Fisher, R. A.; Yates, F. Statistical Tables for Biological, Agricultural and Medical Research. 6th Ed. Oliver & Boyd, Edinburgh and London, 1963.
508 509
(31) D’inca, H.; Namur, J.; Ghegediban, S. H.; Wassef, M.; Pascale, F.; Laurent, A.; Manfait, M. Am. J. Pathol.,2015, 185, 1877-1888.
510
(32) Celisse A.; Robin, S. Comput. Stat. Data Anal., 2008, 52, 2350-2368.
511 512
(33) Gaydou, V.; Lecellier, A.; Toubas, D.; Mounier, J.; Castrec, L.; Barbier, G.; Ablain, W.; Manfait M.; Sockalingum, G. D.Anal. Methods,2015, 7, 766-778.
513
(34) Hyonho, C.; Sündüz, K. J. R. Statist. Soc.,2010, 7, 3-25.
514 515
(35) Karaman, I.; Qannari, E.M.; Martens, H.; Hedemann, M.S.; Bach Knudsen, K.E.; Kohler, A. Chemom. Intell. Lab. Syst.,2013, 122, 65-77.
516
(36)
Grosmaire, L.; Reynès, C.; Sabatier, R. J. Soc. Fr. Statistique,2013, 154, 80-94.
517
Page 23 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
518
8 Tables and figures
519
Table 1 : Glossary of tested parameters Step
Parameter category Pretreatment 01
Paraffin 01
First step : individual paraffin normalization (sample by sample)
EMSC 01
Target 01
Second step : global paraffin normalization (all samples in one Data matrix)
Paraffin 02
EMSC 02 Target 02
Third step : final Chemometrics process (modelization)
Pretreatment 02
PLS_DA/PLS
520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537
Description
Page 24 of 32
Optimal parameters
Parameters tested by Cross Validation
*
CO2 wavelength erased, smoothing (Stavinsky Golay)
[5,10,15,20]
5
[0.5,1,1.5,2,2.5,3,3.5,4]
2,5
[1,2,3,4,5,6,7]
5
[1,1.1,1.2,1.3,1.4,1.5,1.6,1.8,2,2.5,3]
1.3
[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]
0.9
[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2]
1.1
*
mean spectrum of second quartile computed on amide signal areas (1450-1750 cm-1)
[5,10,15,20]
5
[0.5,1,1.5,2,2.5,3,3.5,4]
2,5
[1,2,3,4,5,6,7]
5
*
mean spectrum
Severals possibility (baseline corection, normalisation, derivating, SNV…)
*
reduction spectral range (920-1 1870cm ) erase paraffin signal (1350-1 1500cm ) normalization (min-max) at 1670 cm-1 (amide I signal)
Number of computed latent variable vector
[1 to 30] (one by one)
10**
Several possibilities (baseline corection, normalization, derivating, SNV…) Number of principal PC during PCA on paraffin spectra maximum error fitting between spectra and mean spectra of paraffin spectral image polynom order choice for baseline estimation maximum residual after EMSC decomposition (ai parameter) minimum bi parameter accepted after EMSC maximum bi parameter accepted after EMSC
taget spectrum for EMSC
Number of PC computed during PCA on paraffin spectra maximum error fitting between spectra and mean spectra of paraffin spectral image polynom order choice for baseline estimation taget spectrum for EMSC
* : various non algebraic parameterspossibility Parameters were ordered by their chronological use.The fourth column describes (when is possible) all tested parameters. The fifth column designs the variability range of tested parameters during computing models. The last column presents the selected values as optimal for modeling. Paraffin spectra were excluded when the computed Euclidean distances between spectra and the mean spectrum exceed the value “max_paraffin_error”. The number of PCA components used is ordered by the “number_Paraffin_PC” parameter. The baseline derivation was modeled by means of a polynomial function with polynomial order defined by “mscOrder” parameter (“mscOrder_2” for the second EMSC). Aquality test was applied on spectral image by the way of weighted coefficient matrix and by using “mscMaxResidue”, “mscMinCoef” and “mscMaxCoef” parameters. The “target 01” was the mean of specific part of spectral image described in result section. For second EMSC, the computed “target02” was the average of non-homogeneous multiple imaging data matrix.“max_paraffin_error_2” permits to realise a quality test on the second interference matrix constitute by “number_Paraffin_PC_2” first main component of PCA of multiple imaging paraffin matrix.
Page 24 sur 32 ACS Paragon Plus Environment
Page 25 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
538 539
540 541 542 543 544 545
Analytical Chemistry
Table 2 : Results of PLS-DA discrimination Cell culture type
Number of spectra
Referenc e group
Number of spectra predicted as group 01
Number of spectra predicted as group 02
Number of spectra predicted as group 03
Percentage of Good Prediction (%)
16HBE
3346
01
3077
195
74
92,0
BEAS-2B
2142
02
279
1643
220
76,7
BZR BZR-T33
3428 4082
03
24 8
190 56
3214 4018
93,8 98,4
96,3
Total spectra :
12998
mean :
90,2
88,3
The 3 first columns explain the cell culture name, the number of spectra or pixels taken in account in the computing and the reference group designation chosen (qualitative variable). The columns 4, 5 and 6 show the total number of spectra predicted for each reference group and for each cell cultures. The last column summarize these results by exposing the PGP for each cell cultures (a global PGP was added for both BZR and BZRT33 culture cells).
Page 25 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
546 547
548 549 550 551 552 553 554 555
Page 26 of 32
Table 3 : Results of PLS regression Calibration cell lines
Reference invasivity
PLS prediction invasivity
RMSE of validation (%)
Bias
Standard Deviation
16HBE BEAS-2B BZR BZR-T33
1 1,3 2,1 2,3
1,14 1,42 2,02 2,09
17,0 14,7 11,0 23,8
0,14 0,12 -0,08 -0,21
0,286 0,225 0,195 0,186
Mean :
16,6
-0,01
0,22
The first and second column presents respectively each studied cell culture and the reference invasiveness chosen. The third column presented the mean predicted invasiveness. The 4th, 5th, and 6th columns show (respectively for each cell culture) the detailed statistic results of the PLS regression. The Root Mean Square Error of Prediction illustrate the mean error proportionally to the invasiveness scale during the validation step of cross validation. The bias present the mean interval between reference and predicted invasiveness. The last column contained the standard deviation of prediction for each cell culture.
Page 26 sur 32 ACS Paragon Plus Environment
Page 27 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
556 557
558 559 560 561 562
Analytical Chemistry
Table 4 : Results of external validation of the PLS model Validation Cell lines
Number of IR pixels (7,3 µm²/pixel)
Mean of External Prediction
Standard Deviation
CALU3
4239
1,2
0,24
NCIH1299
3715
1,7
0,35
The first and second columnsindicated the cell lines used as external validation set and the number of qualified IR pixel used for prediction, respectively. The third column presents the invasiveness score predicted by the calibrated PLS model and the last column presents the corresponding standard deviations.
Page 27 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
563
564 565 566 567 568
Figure 1 : Preprocessing Organigram in three steps : Firstly thesamples collecting, secondly theoverriding of paraffin by EMSC preprocessing and thirdly thebuildingof homogeneous data bank linked to biological variables (second EMSC)
Page 28 sur 32 ACS Paragon Plus Environment
Page 28 of 32
Page 29 of 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
569 570 571 572 573 574 575
Analytical Chemistry
Figure 2 : Infrared spectra evolution of paraffined bronchial squamous cell lines during the pretreatment steps, full lines correspond to mean of spectral data matrix after recording, first and second EMSC processing steps, and colored area represents minimum to maximum space of variability values of these data matrixes,
Page 29 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
576 577 578 579 580 581 582 583
Figure 3 : Discrimination with PLS-DA method, a- optimization by cross validation of PLS-DA latent variable numbers, b- 3 dimensional discrimination volume with latent variable 1, 4 and 7. In black (blue) the 16HBE, in light grey (green) the BEAS-2Band in dark grey (red) the BZR and BZR-T33 cells spectra.
Page 30 sur 32 ACS Paragon Plus Environment
Page 30 of 32
Page 31 of 32
584 585 a-
b-
PLS Cross-Validation result calibration
validation
Linear regression curve 3,5
y = 0,7371x + 0,428 R² = 0,7635
31 3 26
21
16
BEAS-2B pixels
2,5
Predicted invasivity
10 is the optimal number of latent variables RMSE (%)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
16HBE pixels
2
1,5 BZR-T33 pixels
1
BZR pixels
11 0,5
0
6 1
586 587 588 589 590 591 592 593
594 595 596 597 598 599
6
11 16 21 Number of Latent Variables
26
0,7
1,2
1,7 Reference invasivity
2,2
Figure 4 : Regression with PLS method, a- optimization by cross validation of PLS latent variable numbers, b- references versus predicted invasiveness : spectral reparation on invasiveness scale with linear correlation curve.With circle, square, diamond and triangle respectively linked to 16HBE, BEAS-2B, BZR and BZR-T33 cell spectra.
Figure 5 : PLS latentes variables full line is the mean of first ten selected PLS latentes variables and colored area represents minimum to maximum space of variability of these PLS latentes variables, dotted line corresponds to the mean of final spectral data matrix (the target of second EMSC)
Page 31 sur 32 ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
600
Table of Contents (TOC)/Abstract (ABS) Graphic
601 602 603
Page 32 sur 32 ACS Paragon Plus Environment
Page 32 of 32