Subscriber access provided by University of Newcastle, Australia
Article
A soil-bacterium compatibility model as a decision-making tool for soil bioremediation Benjamin Horemans, Philip Breugelmans, Wouter Saeys, and Dirk Springael Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.6b04956 • Publication Date (Web): 21 Dec 2016 Downloaded from http://pubs.acs.org on December 26, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 32
Environmental Science & Technology
1
A soil-bacterium compatibility model as a decision-making tool for soil
2
bioremediation
3
Running title: Soil-bacterium compatibility model for soil bioremediation
4 5
Authors: Benjamin Horemans1*, Philip Breugelmans1, Wouter Saeys2, Dirk Springael1
6 7
Affiliation:
8
1
9
Heverlee, Belgium
KU Leuven, Division of Soil and Water Management, Kasteelpark Arenberg 20, 3001
10
2
11
Belgium
12
* Correspondence:
13
Dr. Ir. Benjamin Horemans
14
Division of Soil and Water Management, KU Leuven, Kasteelpark Arenberg 20, 3001
15
Heverlee, Belgium.
16
Phone: +32(0)16329675; fax: +32(0)16321997
17
e-mail:
[email protected] 18
Keywords: soil bioremediation, bioaugmentation, physico-chemical soil properties, survival,
19
biodegradation activity, multivariate regression
KU Leuven, Department of Biosystems, MeBioS, Kasteelpark Arenberg 30, 3001 Leuven,
ACS Paragon Plus Environment
1
Environmental Science & Technology
Page 2 of 32
20
Abstract
21
Bioremediation of organic pollutant contaminated soil involving bioaugmentation with
22
dedicated bacteria specialized in degrading the pollutant is suggested as a green and
23
economically sound alternative to physico-chemical treatment. Intrinsic soil characteristics
24
though impact the success of bioaugmentation. The feasibility of using partial least squares
25
regression (PLSR) to predict the success of bioaugmentation in contaminated soil based on
26
the intrinsic physico-chemical soil characteristics and hence improve the success of
27
bioaugmentation, was examined. As a proof of principle, PLSR was used to build soil-
28
bacterium compatibility models to predict the bioaugmentation success of the
29
phenanthrene-degrading Novosphingobium sp. LH128. The survival and biodegradation
30
activity of strain LH128 were measured in 20 soils and correlated with the soil
31
characteristics. PLSR was able to predict the strain’s survival using 12 variables or less while
32
the PAH-degrading activity of strain LH128 in soils that show survival, was predicted using 9
33
variables. A three-step approach using the developed soil-bacterium compatibility models is
34
proposed as a decision making tool and first estimation to select compatible soils and
35
organisms and increase the chance of success of bioaugmentation.
ACS Paragon Plus Environment
2
Page 3 of 32
Environmental Science & Technology
36
Introduction
37
The development and availability of technologies for remediation of soils contaminated by
38
hazardous and recalcitrant organic pollutants remain an important issue in environmental
39
technology. To date, the number of sites with major soil pollution, including compounds like
40
polycyclic aromatic hydrocarbons (PAHs), pesticides and oil, in Europe is estimated at 2.5
41
million1. Soils are currently treated either physico-chemically or biologically in both ex situ
42
and in situ treatment approaches. Biological approaches use the intrinsic pollutant degrading
43
capacity of microbial populations in the soil or that of cultured pollutant degrading isolates
44
in a bioaugmentation approach. In the latter case, the added strains often harbour catabolic
45
gene functions which allow them to use the compound as source of energy and/or carbon or
46
another essential element. Bioremediation aims to degrade the pollutants to innocuous
47
degradation products which are funnelled into the biogeochemical carbon cycle and to
48
reduce soil toxicity to acceptable levels. Bioremediation has a lower environmental impact
49
and is considered as more cost-effective than physico-chemical treatment when time is not
50
critical2-4.
51
Bioaugmentation is used when the intrinsic biodegradation capacity is insufficient and relies
52
on the successful introduction and activity of bacterial strains or consortia into a foreign
53
contaminated environment. In that respect, compatibility of the selected strain and
54
contaminated environment is crucial. The most efficiently degrading organisms are however
55
not necessarily those which are best adapted to the environment that needs treatment.
56
Although several studies have reported the successful application of bioaugmentation in
57
bioremediation of organic polluted soils either under aerobic and anaerobic conditions5, its
58
outcome remains unpredictable. Several studies reported the failure of microbial inoculants
59
to stimulate pollutant degradation in soils6-9. Biodegradation of organic pollutants in soil is
ACS Paragon Plus Environment
3
Environmental Science & Technology
Page 4 of 32
60
influenced by biotic and abiotic environmental factors10. Apart from pollutant composition
61
and concentration and the engineering approach, intrinsic abiotic soil factors will influence
62
survival and activity of the inoculum including properties as pH, temperature, oxygen,
63
nutrient availability, geochemistry and soil texture2. However, the effect of these parameters
64
on bioaugmentation, has not been quantified or classified and will be organism-dependent11.
65
A higher bioremediation success could, therefore, be obtained when the strains are selected
66
based on a tradeoff between the compatibility with the contaminated environment and its
67
catabolic efficiency. Determining the compatibility of a set of bacterial strains for every new
68
contaminated environment is rather tedious and costly. Therefore, it would be useful to
69
predict the biodegradation activity of these strains from environmental variables. Such
70
predictive models can be of major help for assessing the bioaugmentation success. To study
71
the feasibility of such an approach, the survival and degradation activity of the
72
phenanthrene-degrading soil isolate Novosphingobium sp. LH128, was examined in a wide
73
variety of soils differing in physico-chemistry and the strains survival and catabolic activity
74
were correlated to the soil physico-chemical variables using partial least squares regression
75
(PLSR) in order to develop models to predict the strains behavior in soil. PAHs such as
76
phenanthrene
77
Sphingomonad bacteria like Novosphingobium sp. LH128 are known as degraders of multiple
78
PAHs such as fluoranthene, pyrene, phenanthrene13-17 and other pollutants like pesticides,
79
dibenzofurans, azo dyes and chlorinated phenols18-25. They are considered key players in the
80
natural attenuation of organic pollutants and important biocatalysts in soil remediation14. In
81
order to clearly examine the role of the physico-chemistry of the soil on inoculant fate and
82
activity sterile soils were used. Soils were sterilized and air-dried using γ-sterilization to
83
minimize physico-chemical changes26.
are
ubiquitous
environmental
hazardous
ACS Paragon Plus Environment
soil
pollutants12
while
4
Page 5 of 32
Environmental Science & Technology
84
Materials and methods
85
Soils
86
Twenty uncontaminated soils, originating from different geographical locations all over
87
Europe, with different physico-chemical characteristics were selected from the Division of
88
Soil and Water Management soil collection at KULeuven (Belgium). Soils and their respective
89
soil classification and basic soil properties are listed in Table 1. Major soil types were weakly
90
developed soils such as cambisols, regosols, fluvisols, leptosols and more developed soils
91
such as spodosols, alfisols, luvisols and histosols. The soils represented most of soil types
92
found on the European continent and cover about 50% of the earth’s glacier free surface
93
area27. Based on texture analysis and US Department of Agriculture (USDA) soil type
94
classification, all soil types except for silt, clay loam and silt clay were included in this study
95
(SI Figure S1). Soils were mortared, sieved at 2 mm and air dried at 60°C upon arrival. Soil
96
variables pH, texture, CaCO3 content, organic carbon (OC) content, total C content, total N
97
content, CEC, WHC, exchangeable cations, contents of oxalate extractable Al, Fe and Mn and
98
total metal content were determined as described.28. Concentrations of cations in pore
99
water extracts were determined by ICP-OES28 while anion concentrations were determined
100
with Dionex ICS2000 system, equipped with an AG15 250 mm guard column, an AS15 2 250
101
mm analytical column, and a conductivity detector (CD25) preceded by an anion self-
102
regenerating suppressor (ASRS300, 2 mm)29. The concentration of ortho-phosphate in pore
103
water extracts was measured colorimetrically with the malachite green method29.
104
Bacterial strain and culture conditions
105
Novosphingobium sp. LH128 was isolated from a PAH-contaminated soil and uses
106
phenanthrene as sole source of carbon and energy14. Strain LH128 was grown on R2 agar
ACS Paragon Plus Environment
5
Environmental Science & Technology
Page 6 of 32
107
(R2A) plates and cultured in suspension in R2 Broth (R2B), a broth version of R2A30, at 25°C
108
on a rotary shaker (125 rpm) till an optical density at 600 nm (OD600) of 0.4 to 0.7. Solid
109
media contained 1.5% (w/v) agar. All materials/media were sterilized by autoclaving (121°C,
110
20 minutes).
111
Soil microcosm setup and monitoring
112
Soil microcosms consisted of sterile 20 mL glass vials (Pyrex®) containing (1.00±0.02 g) of soil
113
sterilized by γ–ray irradiation (dose of 29.9 kGy; Sterigenics, Belgium). Sterility of soil
114
samples was checked by plating on R2A. One g of soil was suspended in 5 mL 10 µM MgSO4
115
solution in sterile 10 mL tubes and after head-over-end shaking for 30 minutes, settlement
116
of soil particles for 15 min, 50 μL of the suspension was plated in triplicate on R2A plates to
117
determine CFU numbers after 7 days of incubation at 25°C. For each soil, nine replicate
118
microcosms were prepared. Phenanthrene was added to the soil as a 10 g L-1 phenanthrene
119
stock solution in acetone at a nominal concentration of 500 mg phenanthrene kg-1 soil. After
120
mixing by vortexing, the acetone was left to evaporate overnight at 70°C. Sterile ultrapure
121
water (Millipore®) was added to obtain a moisture content corresponding to 100% of the soil
122
water holding capacity (WHC) (Table 1). An R2B-grown culture of Novosphingobium sp.
123
LH128 was harvested by centrifugation (7000 x g; 10 min). The bacterial pellet was washed
124
three times and suspended in 10 mM MgSO4 solution at an OD600nm of 0.5. Viable cell
125
numbers in the suspension were determined by triplicate plating ten-fold dilution series on
126
R2A and counting of colonies after three days of incubation at 25°C. Twenty µL of the
127
bacterial suspension was added to the soil in the microcosms resulting in 8.4±2.2 x 107
128
cells/g soil and the tubes tightly closed to prevent water and phenanthrene evaporation. The
129
amount of oxygen provided in the tube head space was 4 to 5-fold higher than the amount
130
of oxygen needed to convert all 0.5 mg phenanthrene present in the tube to CO2 and was as
ACS Paragon Plus Environment
6
Page 7 of 32
Environmental Science & Technology
131
such considered not limiting. Microcosms were vortexed and incubated statically at 20°C in
132
the dark. For two soils, i.e., soils n° 151 and 152 representing a soil with relatively high (4.4%
133
OC/g soil) and low (1% OC/g soil) organic C content, controls were included amended with
134
phenanthrene but without inoculation to determine abiotic phenanthrene removal and
135
phenanthrene extraction efficiency. Viable cell numbers and residual phenanthrene
136
concentrations in the microcosms were determined at day 1, 10 and 20 after inoculation.
137
Three replica microcosms for each soil were sacrificed to determine LH128 cell numbers and
138
residual phenanthrene concentration. To this end , five mL 10 µM MgSO4 solution was added
139
to the tubes and shaken head-over-end for 30 minutes. Soil particles were allowed to settle
140
for 15 min and 200 µL suspensions were taken and serially diluted. Five μL of the serial
141
dilutions were spotted in triplicate on square R2A plates to determine CFU numbers. After 3
142
days of incubation at 25 °C, CFU at the two highest dilutions where individual colonies were
143
observed, were counted and used to calculate the number of CFU/g soil dry weight as
144
reported31. In all cases, all CFU showed the typical morphology and yellow color of strain
145
LH128 and CFU never developed from abiotic control tubes. Residual phenanthrene
146
concentrations in the soil microcosms were determined as follows. After taking the sample
147
for CFU counting, the tubes were centrifuged (7000 x g; 10 min) and the supernatant
148
removed. The tubes were filled with 2.5 mL of a cold (-20°C) hexane:acetone (4:1) solution
149
and vortexed for 4 min. After centrifugation (7000 x g; 10 min), 500 μL of the supernatant
150
was collected in a 1.5 mL glass recipient. The remaining hexane:acetone solution in the tube
151
was discarded and replaced with 2.5 mL of fresh hexane:acetone solution (80:20). The
152
procedure of vortexing and centrifugation was repeated and 500 μL of the supernatant was
153
collected and pooled with the previously sampled 500 μL. Phenanthrene concentrations in
154
the extracts were determined with HPLC (LaChrom Classic, Hitachi)31. Phenanthrene removal
ACS Paragon Plus Environment
7
Environmental Science & Technology
Page 8 of 32
155
extent (in %) in a soil x was defined as (1 – [Phe]day i /Avg [Phe]day 0, abiotic) x 100 at day i. The
156
overall phenanthrene degradation activity APhe, x calculated according to Eq. 1:
157
Eq. 1:
Phe, x = 1 −
158
Eq. 2:
IPhe, i =
159
with IPhe,x calculated using Eq. 2 being the sum of the average residual phenanthrene
160
concentration in a replica of soil x between day 0 and day 10 and day 10 and day 20
161
multiplied by the number of days (∆t = 10 days). Avg IPhe,abiotic is the average IPhe calculated
162
for all replicas of the abiotic controls (Soil n° 151 and 152) with i being a replicate and ∆t the
163
time between two measurements, i.e., 10 days. APhe captures simultaneously the lag phase,
164
the rate of phenanthrene removal and final removal extent in the 20 days of incubation and
165
as such represents an indicator for overall phenanthrene removal activity in the micocosms,
166
i.e., APhe allowed to differentiate for instance between soils that showed the same removal
167
extent at day 20 but differed in removal extent at day 10, and was therefore used in model
168
development.
169
Statistical analyses
170
Linear regression analysis of the individual 47 soil variables as independent variables with
171
CFU numbers at day 1, 10 and 20 and the APhe as dependent variables was done using
172
SigmaPlot v12.0. Prior to regression analysis, values for the 20 soils in triplicate replicate
173
were first mean centered and divided by the standard deviation to normalize the data.
174
Principal Component Analysis (PCA) and Partial Least Squares Regression (PLSR) analysis
175
were performed using the Solo software package (Eigenvector Research Incorporated,
176
Wenatchee, WA, USA). The soil data matrix (60 rows x 47 columns) consisted of 20 soils for
Phe, x
Phe, abiotic
X 100
[Phe]day0,i [Phe]day10,i
X ∆t
[Phe]day10,i [Phe]day20,i
ACS Paragon Plus Environment
X ∆t
8
Page 9 of 32
Environmental Science & Technology
177
which data on 47 soil variables was collected in triplicate. The biological data matrix (60 rows
178
x 4 columns) was composed of 20 soils for which CFU numbers on day 1, 10 and 20 and APhe
179
were determined in triplicate. Both data matrices were preprocessed by autoscaling (mean
180
centering and dividing by standard deviation) for each soil variable. During multivariate
181
analysis, outlier detection was based on Q residual and Hoteling T² values for soils outside
182
the 95% confidence interval. The cross-validation of the calibrated model was done by
183
leaving out each soil (triplicate data) once. PCA was used to describe the variation between
184
the soils and was performed on the soil data matrix on either all variables (n=47) or all
185
variables except the pore water variables (n=26). The number of principal components (PCs)
186
in the PCA models was selected based on the captured variance (>2%) by each PC. An
187
unsupervised hierarchical cluster analysis was performed on the soil data with the PCA
188
scores as a distance measure (Euclidean distance) by using an agglomerative method called
189
Ward’s method where clusters are joined when within-cluster variance is minimized.
190
PLSR models32 were built to correlate and predict LH128 survival and activity using the soil
191
variables as predictor variables and CFU numbers on day 1, 10 and 20 and APhe as predicted
192
variable. The number of Latent Variables (LVs) was selected at a local minimum in prediction
193
error (root mean square error for the calibration (RMSE) and cross-validation (RMSECV)) or
194
up till a point that the prediction error does not improve substantially (0.1 are considered of importance
200
to the PLSR model.
ACS Paragon Plus Environment
9
Environmental Science & Technology
Page 10 of 32
201
Partial least square discriminant analysis (PLSDA)34 was performed to discriminate between
202
soils showing LH128 survival (1) or not (0) directly after inoculation (day 1). PLSDA models
203
used all soil variables (i), soil variables retained after iPLS variable selection for developing
204
the PLSR model predicting CFU numbers at day 1 (ii) and 20 (iii). Two additional PLSDA
205
models were tested with the variables after iPLS from which the variables obtained with
206
pore water analysis were excluded predicting CFU numbers at day 1 (iv) and 20 (v).
207
Results
208
Soil properties
209
The selection of the different soil types resulted in a natural variation in often
210
interdependent soil properties. The variation in soil texture explains the variation in WHC (pF
211
2.0), which ranged from 10 till 55%. Soils varied in pH from 3 to 8 and a CEC of 1.8 to 36
212
cmolc/kg soil (Table 1). Carbon content ranged from 0.4 to 12.9% OC and 0 to 5.7% inorganic
213
carbon/g soil and the C to N ratio ranged from 7.0 to 30.4 (SI Figure S2). Detailed information
214
of the elemental analysis of the soil and pore water chemistry are shown in SI Table S1 and
215
Table S2, respectively. Principal component analysis performed on the 20 soils using either
216
all variables or all variables but excluding those related to the pore water chemistry
217
illustrated the physico-chemical variation amongst the soils (SI Figure S3). PCA recognized
218
two larger clusters (cluster A including 8 soils and cluster E consisting of 6 soils) and three
219
smaller clusters of soils (clusters B, C and D each including 2 soils) in case the pore water
220
variables excluded (Table 1). Cluster A grouped the spodosols which are in general acidic
221
soils with a high Fe, Al and organic content while cluster E grouped the alfisols which are
222
more alkalic soils with high Fe and Al content.
223
Survival and phenanthrene degrading activity of strain LH128 in soils
ACS Paragon Plus Environment
10
Page 11 of 32
Environmental Science & Technology
224
CFU numbers were determined after inoculation at day 1 and after incubation at day 10 and
225
20 (Figure 1). LH128 CFU dynamics were highly variable among the 20 tested soils. In seven
226
soils, no LH128 CFU were detected one day after inoculation or on day 10 and day 20 and
227
were categorized as soils with ‘no survival’. The remaining 13 soils were categorized as soils
228
with ‘survival’. At day 1, soils Souli I (n°145) and Barcelona (n°157) showed a decrease in CFU
229
numbers till below 104 CFU/g soil, while soils Montpellier (n°147), Brécy (n°158), Guadalara
230
(n°159) and Cordoba (n°283) showed CFU numbers between 105 and 106 CFU/g soil. Except
231
for Souli I (n°145), those soils showed an increase in CFU numbers after day 1 to reach
232
around 108 CFU/g soil at day 10 for soils Montpellier (n°147) and Brecy (n°158) and at day 20
233
for Barcelona (n°157), Guadalajara (n° 159) and Cordoba (n° 283). Soil Vault de Lugny (n°153)
234
showed CFU numbers of around 108 CFU/g soil at day 1 that decreased to 103 CFU/g soil by
235
day 10 to finally reach 108 CFU/g soil at day 20. The remaining six soils showed CFU numbers
236
between 8 x 106 and 2 x 108 CFU/g soil at day 1 and increased in numbers to a range of 3 x
237
107 and 5 x 109 CFU/g soil at day 10 which remained constant afterwards. No CFU were
238
detected after 20 days in the abiotic controls.
239
The extent of phenanthrene removal was determined at day 10 and day 20 after inoculation
240
(Figure 2). At day 20, abiotic control soils showed no significant loss of phenanthrene
241
(average of 7±7%) compared to day 0. Six soils (n° 147, 149, 151, 155, 158 and 278) showed
242
phenanthrene removal of 70% or more by day 10 (Figure 1) and had a final APhe higher than
243
60%. Soil n°152 showed 59±6% phenanthrene removal by day 10 and 83±3% by day 20 and
244
had a APhe around 50%. Two other soils (n°141 and 144) showed moderate phenanthrene
245
removal of around 30% at day 10 and reached a final removal of 40%. Soil n°145 had a
246
phenanthrene removal of 50% at day 10 which did not change by day 20. Soils n° 141, 144
247
and 145 showed a similar APhe of around 30%. One soil (n° 156) showed a moderate
ACS Paragon Plus Environment
11
Environmental Science & Technology
Page 12 of 32
248
phenanthrene removal of 30% at day 10 which increased to 80% at day 20 resulting into a
249
final APhe of 60%. Nine soils showed no significant phenanthrene removal at day 10. Four of
250
these soils (n° 153, 157, 159 and 283) showed removal of 80% or more at day 20 (Figure 1)
251
resulting in an APhe between 20 and 40%. The remaining five soils (n° 138, 139, 140, 277 and
252
324) did not show significant removal compared to the abiotic control by day 20 (14±4 %)
253
and had an APhe of around 0%.
254
To see whether the CFU numbers at day 1, 10 and 20 correlated with the phenanthrene
255
degrading activity APhe, a linear regression of log10 transformed CFU numbers in function of
256
the APhe (APhe [%] = y0 + b x log10 (CFU)) was performed. Soils with no survival of LH128 at day
257
1, 10 and 20 were omitted from regression. At day 1, 10 and 20, R2 values of 0.08, 0.50 and
258
0.12 were respectively obtained indicating that the correlation between log CFU and APhe is
259
rather weak on day 1 and 20 and only moderate on day 10 (data not shown). Based on the t-
260
test for the coefficient b (slope) which was significantly different from 0, APhe correlated
261
significantly with CFU determined on day 10 and day 20 but not with CFU determined on day
262
1 (data not shown). As such APhe mainly correlates with CFU numbers during phenanthrene
263
degradation (day 10 and 20) rather than with CFU numbers immediately after inoculation
264
(day 1).
265
Prediction of survival of strain LH128 based on soil properties
266
Significant linear correlations of CFU numbers at day 1, 10 and 20 with individual soil
267
variables were found (SI Table S3). To deal with co-variation of the soil variables, PLSR was
268
performed using all soil variables (n = 47) to predict the dependent variable CFU numbers at
269
day 1, 10 and 20 separately in the soils (n = 60). A PLSR model was created for each of the
270
time points. Two soils (Borris n°278 and Montpellier n°147) were identified as outliers by all
ACS Paragon Plus Environment
12
Page 13 of 32
Environmental Science & Technology
271
three models based on high Hoteling T² (distance from center) and Q residual (lack-of-fit
272
statistic) values. Soils n°278 and 147 were grouped within cluster A by the PCA of the soil
273
data, a group that mainly included acidic soils such as spodosols and histosols (Table 1).
274
None of these soils showed survival of LH128 except soils n°278 and n°147. The number of
275
Latent Variables (LVs) was selected at the minimal RMSE for the cross-validation set and was
276
for the three models set at six LVs explaining around 70% of the variance within the soil
277
variable values (SI Figure S4). For the calibration set, R2 values were 0.92 or higher and as
278
such CFU numbers at day 1, 10 and 20 correlated strongly with soil properties. RMSE for
279
calibration was higher for the PLSR models obtained for day 10 compared to day 1 and 20 (SI
280
Figure S4). However, the predictive power of all three PLSR models in cross-validation (i.e.,
281
soils which were not included in the model building) was low, i.e., low R² values of 0.74 or
282
lower and high RMSECV values were noted for all three PLSR models compared to RMSE for
283
calibration, indicating that the PLSR models are not suitable to predict CFU in a new soil
284
based on soil properties. This is illustrated in SI Figure S4 where the predicted CFU numbers
285
are plotted as a function of the measured CFU numbers for both the calibration set and the
286
cross-validation set.
287
As not all soil properties may be equally informative and some might even be detrimental for
288
CFU prediction, forward interval PLS (iPLS) for variable selection was used to search for the
289
most informative soil properties for CFU prediction. iPLS variable selection selected, 9, 6 and
290
12 soil variables for the PLSR models predicting CFU numbers at day 1, 10 and 20,
291
respectively. This resulted in a reduction of the RMSECV compared to the PLSR models
292
predicting CFU numbers based on all soil variables (Figure 2). The RMSE for calibration
293
remained similar with or without iPLS selection. The predicted CFU numbers as a function of
294
the measured ones are shown in Figure 2. Both for calibration and cross-validation high
ACS Paragon Plus Environment
13
Environmental Science & Technology
Page 14 of 32
295
correlations between the predicted and measured CFU numbers were found after iPLS
296
variable selection. The selected variables for the PLSR models predicting CFU numbers at day
297
1, 10 and 20 are shown in Table 2 together with the RCV. RCV of soil variables are indicative
298
for the direction and extent of change in a predicted value when the soil variable changes. In
299
case of the PLSR model predicting LH128 CFU numbers in soils at day 1, nine soil variables
300
were retained after iPLS of which six with high │RCV│ (pH, exchangeable Mg, oxalate
301
extractable Fe and AL, total Cr and sand fraction) (Table 2). For prediction of LH128 CFU
302
numbers at day 10, eight soil variables were retained after iPLS of which two with a negative
303
RCV less than -0.2 (oxalate extractable Al and total Pb) (Table 2). For prediction of LH128 CFU
304
numbers at day 20 in soils, 12 soil variables were retained after iPLS of which three with high
305
|RCV| (pH, oxalate extractable Fe and total Ni) (Table 2). Soil variables pH, sand fraction,
306
exchangeable Mg, oxalate extractable Fe, total Cr and inorganic P were retained by iPLS
307
variable selection in at least two of the three PLSR models to predict CFU numbers (Table 2).
308
We also examined whether it was possible to predict by means of the soil characteristics
309
whether LH128 will survive or not. To this end, PLSDA was performed to discriminate
310
between soils showing LH128 CFU numbers and hence LH128 survival at day 20, given value
311
1, or soil showing no CFU and hence no survival at day 20, given value 0, using either all soil
312
variables or the variables selected by iPLS analysis performed during the PLSR model building
313
for CFU numbers at day 1 and day 20. Moreover, pore water variables were omitted to see
314
whether PLSDA is still successful in predicting survival using only the regularly available data
315
from soil databases. The performance of all five PLSDA models is shown in SI Table S4. All
316
PLSDA models classified most soils correctly (Table S4). The exceptions were soils
317
Montpellier (n° 147) and Borris (n° 278), that were previously identified as outliers in the
318
PLSR models predicting CFU numbers at day 1, 10 and 20. Only in the PLSDA model using the
ACS Paragon Plus Environment
14
Page 15 of 32
Environmental Science & Technology
319
variables selected after iPLS for the PLSR model for CFU numbers at day 20 and with pore
320
water data excluded was soil Borris correctly classified (SI Table S5).
321
Prediction of phenanthrene degrading activity of strain LH128 based on soil properties
322
Phenanthrene degrading activity APhe showed significant linear correlations with several
323
individual soil variables (SI Table S3). Soils with similar APhe did not correspond to the clusters
324
of soils with similar soil properties based on PCA. To deal with co-variation of the soil
325
variables, PLSR was performed using all soil variables (n = 47) to predict APhe. The inclusion in
326
the PLSR analysis of soils that showed no survival of LH128 at day 1 resulted in poor
327
prediction (R2 = 0.15). A plausible explanation is that the PLSR models assume a linear
328
relation between soil parameters and APhe. This assumption is not valid for soils with no
329
survival at day 1, resulting in zero degradation, as one of these soils could actually have soil
330
properties with a more negative impact on LH128 compared to other soils. Therefore, an
331
alternative PLSR model for predicting APhe was built using only soils that showed survival of
332
LH128 on day 1 (13 soils). As before, soils Montpellier and Borris were identified as outliers
333
based on their Hoteling T² and Q residuals. The number of Latent Variables (LVs) was
334
selected at the minimal RMSE for the calibration and cross-validation set and was set at five
335
LVs explaining around 70% of the variance within the soil variables and 95% of the variance
336
in the phenanthrene degradation activity (SI Figure S4). However, for the cross-validation
337
set, the R² value was only 0.18. The poor prediction performance for APhe of phenanthrene
338
degradation by the PLSR model is further illustrated in Figure S4 where the predicted APhe is
339
plotted as a function of the measured APhe. As some variables might have had a detrimental
340
effect on the prediction performance as was the case for predicting survival, variable
341
selection was performed by means of iPLS. This resulted in a similar RMSE of 4% for both
ACS Paragon Plus Environment
15
Environmental Science & Technology
Page 16 of 32
342
calibration and cross-validation and R² values of 0.96 and 0.95, respectively. The lower
343
prediction error after iPLS variable selection is illustrated in the predicted versus measured
344
plot (Figure 2). The average measured APhe values are ranked from high to low (SI Table S5).
345
The nine variables retained after iPLS variable selection are summarized in Table 2. From
346
these variables, oxalate extractable Al had a highly negative RCV (< -0.1), while WHC, total Ni
347
and pore water Cr had a high positive RCV (> 0.1) (Table 2).
348
Discussion
349
In this study, we show the feasibility to develop soil-strain compatibility models to predict
350
the behavior and success of bioaugmentation inoculation (survival and activity) in treatment
351
of contaminated soil based on intrinsic physico-chemical soil properties using a proof of
352
principle the phenanthrene degrading Novosphingobium sp. LH128 and phenanthrene as a
353
target pollutant. Multivariate regression analysis was used to build the models since many of
354
the soil variables are interdependent and co-vary, e.g. clay fraction correlates with WHC and
355
CEC. Other studies have investigated effects of soil variables on pollutant degradation but by
356
changing the conditions in one soil or sediment35. To the best of our knowledge, our study is
357
the first one to use multivariate regression analysis to assess and predict organism survival
358
and activity by means of the intrinsic physico-chemical soil characteristics using data
359
obtained from/with multiple soils. To reduce complexity and clearly estimate intrinsic
360
physico-chemical factors sterilized soils were used. As such biotic factors are not considered
361
while they might affect, survival and activity of inoculants and hence the bioaugmentation
362
success. For instance, biotic factors such as protozoan grazing36, and competition for
363
nutrients and space with the native microbial community can influence the inoculants
364
survival or activity negatively37. Otherwise, resident bacteria were shown to improve
ACS Paragon Plus Environment
16
Page 17 of 32
Environmental Science & Technology
365
degradation for instance by increasing the pollutants bioavailability38, delivery of co-factors39
366
or metabolic cooperation40. It has to be awaited whether biotic factors will affect prediction
367
and that physico-chemical models need to be further adapted including biotic factors.
368
Moreover, some variables which might differ between polluted soils and affect the model
369
were kept constant to reduce complexity such as parameters related to the pollution itself
370
like the phenanthrene concentration and the presence of co-pollutants41. Moreover,
371
inoculum size and mode of inoculation as well as soil management might affect survival and
372
activity of the inoculum. Chen et al.35 showed that in addition to salinity, inoculum size
373
determined phenanthrene degradation by a Novosphingobium sp. strain in mangrove
374
sediments while effects of phenanthrene concentrations, nutrient addition and
375
temperatures were insignificant. The optimal inoculum size was 106 cells/g sediment but
376
higher inoculum sizes were not tested. The inoculum size of around 108 cells/g soil used in
377
our study resembles frequently used numbers of inocula in bioaugmentation studies5.
378
Finally, we did not include aging effects in this study. Aging effects are limited in short term
379
experiments of less than a month42. Phenanthrene removal rates in situ were shown to be
380
affected by aging of phenanthrene in soil43 and should be considered in the long term. As
381
such, predictions made by the models should be used as a first estimation.
382
Two types of models were optimized that successfully predicted survival of LH128. The first
383
type, based on PLSR and improved by means of forward iPLS variable selection, predicts
384
LH128 CFU numbers at day 1, day 10 and day 20 after inoculation based on a few soil
385
variables. The iPLS mediated approach not only ensured the predictive power of the model
386
but has also another advantage, i.e., that it reduces the number of analytical techniques
387
(from 10 to 7) required to deliver the needed input data and as such is cost saving.
388
Moreover, most of the selected iPLS variables are basic soil characteristics which are often
ACS Paragon Plus Environment
17
Environmental Science & Technology
Page 18 of 32
389
included in routine soil physico-chemical characterization and hence available in databases.
390
The second model type, based on a PLSDA approach, predicts whether LH128 will survive in
391
a soil or not. As such, although the latter model will provide less details regarding number of
392
expected LH128 cells that survive in the soil, it reduces the number of analytical techniques
393
even further to five excluding even pore water analysis. In addition, a model was developed
394
that predicts the crucial PAH degrading activity of LH128. This model only uses soils in which
395
the inoculum survives and hence its use can only be implemented after using one of the two
396
survival prediction models. Similar to survival, LH128 PAH degrading activity expressed in
397
terms of the APhe, could be well predicted from the soil properties by PLSR after selection of
398
the most informative variables using iPLS involving nine soil variables that can be
399
determined with only six analytical techniques. These six analytical techniques include pore
400
water analysis which as such cannot be omitted to finally predict the success of
401
bioaugmentation with LH128 for a particular soil. However, pore water analysis can be
402
restricted to those soils with probable survival of LH128 as determined by the model(s) that
403
predicted survival.
404
The models developed in this study not only predicts fate and activity of the inoculum strain
405
but provides also information about crucial soil characteristics for bioaugmentation with
406
LH128. In addition, the regression coefficient values of the PLSR model allowed to identify
407
whether a soil property will positively or negatively affect the survival and pollutant
408
degradation activity of the introduced organism in soil. Caution, however, should be taken to
409
decide on causal relationships between the soil characteristics and the observed effects on
410
LH128 survival and activity. These effects can be indirectly caused or simultaneously by more
411
than one soil characteristic at the same time. For instance, a decreasing soil pH influenced
412
the LH128 survival in a negative way but low pH also increases the availability of toxic
ACS Paragon Plus Environment
18
Page 19 of 32
Environmental Science & Technology
413
cations like Al3+ and Ni2+
44, 45
414
survival and oxalate extractable Al which represents the amorphous Al in soil which is
415
considered to be the ‘active’ form of Al in soil and becomes entirely soluble at low pH (