Soil-Bacterium Compatibility Model as a Decision-Making Tool for Soil

Dec 21, 2016 - Bioremediation of organic pollutant contaminated soil involving bioaugmentation with dedicated bacteria specialized in degrading the po...
1 downloads 7 Views 1MB Size
Subscriber access provided by University of Newcastle, Australia

Article

A soil-bacterium compatibility model as a decision-making tool for soil bioremediation Benjamin Horemans, Philip Breugelmans, Wouter Saeys, and Dirk Springael Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.6b04956 • Publication Date (Web): 21 Dec 2016 Downloaded from http://pubs.acs.org on December 26, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32

Environmental Science & Technology

1

A soil-bacterium compatibility model as a decision-making tool for soil

2

bioremediation

3

Running title: Soil-bacterium compatibility model for soil bioremediation

4 5

Authors: Benjamin Horemans1*, Philip Breugelmans1, Wouter Saeys2, Dirk Springael1

6 7

Affiliation:

8

1

9

Heverlee, Belgium

KU Leuven, Division of Soil and Water Management, Kasteelpark Arenberg 20, 3001

10

2

11

Belgium

12

* Correspondence:

13

Dr. Ir. Benjamin Horemans

14

Division of Soil and Water Management, KU Leuven, Kasteelpark Arenberg 20, 3001

15

Heverlee, Belgium.

16

Phone: +32(0)16329675; fax: +32(0)16321997

17

e-mail: [email protected]

18

Keywords: soil bioremediation, bioaugmentation, physico-chemical soil properties, survival,

19

biodegradation activity, multivariate regression

KU Leuven, Department of Biosystems, MeBioS, Kasteelpark Arenberg 30, 3001 Leuven,

ACS Paragon Plus Environment

1

Environmental Science & Technology

Page 2 of 32

20

Abstract

21

Bioremediation of organic pollutant contaminated soil involving bioaugmentation with

22

dedicated bacteria specialized in degrading the pollutant is suggested as a green and

23

economically sound alternative to physico-chemical treatment. Intrinsic soil characteristics

24

though impact the success of bioaugmentation. The feasibility of using partial least squares

25

regression (PLSR) to predict the success of bioaugmentation in contaminated soil based on

26

the intrinsic physico-chemical soil characteristics and hence improve the success of

27

bioaugmentation, was examined. As a proof of principle, PLSR was used to build soil-

28

bacterium compatibility models to predict the bioaugmentation success of the

29

phenanthrene-degrading Novosphingobium sp. LH128. The survival and biodegradation

30

activity of strain LH128 were measured in 20 soils and correlated with the soil

31

characteristics. PLSR was able to predict the strain’s survival using 12 variables or less while

32

the PAH-degrading activity of strain LH128 in soils that show survival, was predicted using 9

33

variables. A three-step approach using the developed soil-bacterium compatibility models is

34

proposed as a decision making tool and first estimation to select compatible soils and

35

organisms and increase the chance of success of bioaugmentation.

ACS Paragon Plus Environment

2

Page 3 of 32

Environmental Science & Technology

36

Introduction

37

The development and availability of technologies for remediation of soils contaminated by

38

hazardous and recalcitrant organic pollutants remain an important issue in environmental

39

technology. To date, the number of sites with major soil pollution, including compounds like

40

polycyclic aromatic hydrocarbons (PAHs), pesticides and oil, in Europe is estimated at 2.5

41

million1. Soils are currently treated either physico-chemically or biologically in both ex situ

42

and in situ treatment approaches. Biological approaches use the intrinsic pollutant degrading

43

capacity of microbial populations in the soil or that of cultured pollutant degrading isolates

44

in a bioaugmentation approach. In the latter case, the added strains often harbour catabolic

45

gene functions which allow them to use the compound as source of energy and/or carbon or

46

another essential element. Bioremediation aims to degrade the pollutants to innocuous

47

degradation products which are funnelled into the biogeochemical carbon cycle and to

48

reduce soil toxicity to acceptable levels. Bioremediation has a lower environmental impact

49

and is considered as more cost-effective than physico-chemical treatment when time is not

50

critical2-4.

51

Bioaugmentation is used when the intrinsic biodegradation capacity is insufficient and relies

52

on the successful introduction and activity of bacterial strains or consortia into a foreign

53

contaminated environment. In that respect, compatibility of the selected strain and

54

contaminated environment is crucial. The most efficiently degrading organisms are however

55

not necessarily those which are best adapted to the environment that needs treatment.

56

Although several studies have reported the successful application of bioaugmentation in

57

bioremediation of organic polluted soils either under aerobic and anaerobic conditions5, its

58

outcome remains unpredictable. Several studies reported the failure of microbial inoculants

59

to stimulate pollutant degradation in soils6-9. Biodegradation of organic pollutants in soil is

ACS Paragon Plus Environment

3

Environmental Science & Technology

Page 4 of 32

60

influenced by biotic and abiotic environmental factors10. Apart from pollutant composition

61

and concentration and the engineering approach, intrinsic abiotic soil factors will influence

62

survival and activity of the inoculum including properties as pH, temperature, oxygen,

63

nutrient availability, geochemistry and soil texture2. However, the effect of these parameters

64

on bioaugmentation, has not been quantified or classified and will be organism-dependent11.

65

A higher bioremediation success could, therefore, be obtained when the strains are selected

66

based on a tradeoff between the compatibility with the contaminated environment and its

67

catabolic efficiency. Determining the compatibility of a set of bacterial strains for every new

68

contaminated environment is rather tedious and costly. Therefore, it would be useful to

69

predict the biodegradation activity of these strains from environmental variables. Such

70

predictive models can be of major help for assessing the bioaugmentation success. To study

71

the feasibility of such an approach, the survival and degradation activity of the

72

phenanthrene-degrading soil isolate Novosphingobium sp. LH128, was examined in a wide

73

variety of soils differing in physico-chemistry and the strains survival and catabolic activity

74

were correlated to the soil physico-chemical variables using partial least squares regression

75

(PLSR) in order to develop models to predict the strains behavior in soil. PAHs such as

76

phenanthrene

77

Sphingomonad bacteria like Novosphingobium sp. LH128 are known as degraders of multiple

78

PAHs such as fluoranthene, pyrene, phenanthrene13-17 and other pollutants like pesticides,

79

dibenzofurans, azo dyes and chlorinated phenols18-25. They are considered key players in the

80

natural attenuation of organic pollutants and important biocatalysts in soil remediation14. In

81

order to clearly examine the role of the physico-chemistry of the soil on inoculant fate and

82

activity sterile soils were used. Soils were sterilized and air-dried using γ-sterilization to

83

minimize physico-chemical changes26.

are

ubiquitous

environmental

hazardous

ACS Paragon Plus Environment

soil

pollutants12

while

4

Page 5 of 32

Environmental Science & Technology

84

Materials and methods

85

Soils

86

Twenty uncontaminated soils, originating from different geographical locations all over

87

Europe, with different physico-chemical characteristics were selected from the Division of

88

Soil and Water Management soil collection at KULeuven (Belgium). Soils and their respective

89

soil classification and basic soil properties are listed in Table 1. Major soil types were weakly

90

developed soils such as cambisols, regosols, fluvisols, leptosols and more developed soils

91

such as spodosols, alfisols, luvisols and histosols. The soils represented most of soil types

92

found on the European continent and cover about 50% of the earth’s glacier free surface

93

area27. Based on texture analysis and US Department of Agriculture (USDA) soil type

94

classification, all soil types except for silt, clay loam and silt clay were included in this study

95

(SI Figure S1). Soils were mortared, sieved at 2 mm and air dried at 60°C upon arrival. Soil

96

variables pH, texture, CaCO3 content, organic carbon (OC) content, total C content, total N

97

content, CEC, WHC, exchangeable cations, contents of oxalate extractable Al, Fe and Mn and

98

total metal content were determined as described.28. Concentrations of cations in pore

99

water extracts were determined by ICP-OES28 while anion concentrations were determined

100

with Dionex ICS2000 system, equipped with an AG15 250 mm guard column, an AS15 2 250

101

mm analytical column, and a conductivity detector (CD25) preceded by an anion self-

102

regenerating suppressor (ASRS300, 2 mm)29. The concentration of ortho-phosphate in pore

103

water extracts was measured colorimetrically with the malachite green method29.

104

Bacterial strain and culture conditions

105

Novosphingobium sp. LH128 was isolated from a PAH-contaminated soil and uses

106

phenanthrene as sole source of carbon and energy14. Strain LH128 was grown on R2 agar

ACS Paragon Plus Environment

5

Environmental Science & Technology

Page 6 of 32

107

(R2A) plates and cultured in suspension in R2 Broth (R2B), a broth version of R2A30, at 25°C

108

on a rotary shaker (125 rpm) till an optical density at 600 nm (OD600) of 0.4 to 0.7. Solid

109

media contained 1.5% (w/v) agar. All materials/media were sterilized by autoclaving (121°C,

110

20 minutes).

111

Soil microcosm setup and monitoring

112

Soil microcosms consisted of sterile 20 mL glass vials (Pyrex®) containing (1.00±0.02 g) of soil

113

sterilized by γ–ray irradiation (dose of 29.9 kGy; Sterigenics, Belgium). Sterility of soil

114

samples was checked by plating on R2A. One g of soil was suspended in 5 mL 10 µM MgSO4

115

solution in sterile 10 mL tubes and after head-over-end shaking for 30 minutes, settlement

116

of soil particles for 15 min, 50 μL of the suspension was plated in triplicate on R2A plates to

117

determine CFU numbers after 7 days of incubation at 25°C. For each soil, nine replicate

118

microcosms were prepared. Phenanthrene was added to the soil as a 10 g L-1 phenanthrene

119

stock solution in acetone at a nominal concentration of 500 mg phenanthrene kg-1 soil. After

120

mixing by vortexing, the acetone was left to evaporate overnight at 70°C. Sterile ultrapure

121

water (Millipore®) was added to obtain a moisture content corresponding to 100% of the soil

122

water holding capacity (WHC) (Table 1). An R2B-grown culture of Novosphingobium sp.

123

LH128 was harvested by centrifugation (7000 x g; 10 min). The bacterial pellet was washed

124

three times and suspended in 10 mM MgSO4 solution at an OD600nm of 0.5. Viable cell

125

numbers in the suspension were determined by triplicate plating ten-fold dilution series on

126

R2A and counting of colonies after three days of incubation at 25°C. Twenty µL of the

127

bacterial suspension was added to the soil in the microcosms resulting in 8.4±2.2 x 107

128

cells/g soil and the tubes tightly closed to prevent water and phenanthrene evaporation. The

129

amount of oxygen provided in the tube head space was 4 to 5-fold higher than the amount

130

of oxygen needed to convert all 0.5 mg phenanthrene present in the tube to CO2 and was as

ACS Paragon Plus Environment

6

Page 7 of 32

Environmental Science & Technology

131

such considered not limiting. Microcosms were vortexed and incubated statically at 20°C in

132

the dark. For two soils, i.e., soils n° 151 and 152 representing a soil with relatively high (4.4%

133

OC/g soil) and low (1% OC/g soil) organic C content, controls were included amended with

134

phenanthrene but without inoculation to determine abiotic phenanthrene removal and

135

phenanthrene extraction efficiency. Viable cell numbers and residual phenanthrene

136

concentrations in the microcosms were determined at day 1, 10 and 20 after inoculation.

137

Three replica microcosms for each soil were sacrificed to determine LH128 cell numbers and

138

residual phenanthrene concentration. To this end , five mL 10 µM MgSO4 solution was added

139

to the tubes and shaken head-over-end for 30 minutes. Soil particles were allowed to settle

140

for 15 min and 200 µL suspensions were taken and serially diluted. Five μL of the serial

141

dilutions were spotted in triplicate on square R2A plates to determine CFU numbers. After 3

142

days of incubation at 25 °C, CFU at the two highest dilutions where individual colonies were

143

observed, were counted and used to calculate the number of CFU/g soil dry weight as

144

reported31. In all cases, all CFU showed the typical morphology and yellow color of strain

145

LH128 and CFU never developed from abiotic control tubes. Residual phenanthrene

146

concentrations in the soil microcosms were determined as follows. After taking the sample

147

for CFU counting, the tubes were centrifuged (7000 x g; 10 min) and the supernatant

148

removed. The tubes were filled with 2.5 mL of a cold (-20°C) hexane:acetone (4:1) solution

149

and vortexed for 4 min. After centrifugation (7000 x g; 10 min), 500 μL of the supernatant

150

was collected in a 1.5 mL glass recipient. The remaining hexane:acetone solution in the tube

151

was discarded and replaced with 2.5 mL of fresh hexane:acetone solution (80:20). The

152

procedure of vortexing and centrifugation was repeated and 500 μL of the supernatant was

153

collected and pooled with the previously sampled 500 μL. Phenanthrene concentrations in

154

the extracts were determined with HPLC (LaChrom Classic, Hitachi)31. Phenanthrene removal

ACS Paragon Plus Environment

7

Environmental Science & Technology

Page 8 of 32

155

extent (in %) in a soil x was defined as (1 – [Phe]day i /Avg [Phe]day 0, abiotic) x 100 at day i. The

156

overall phenanthrene degradation activity APhe, x calculated according to Eq. 1:

157

Eq. 1:

Phe, x = 1 −

158

Eq. 2:

IPhe, i =

159

with IPhe,x calculated using Eq. 2 being the sum of the average residual phenanthrene

160

concentration in a replica of soil x between day 0 and day 10 and day 10 and day 20

161

multiplied by the number of days (∆t = 10 days). Avg IPhe,abiotic is the average IPhe calculated

162

for all replicas of the abiotic controls (Soil n° 151 and 152) with i being a replicate and ∆t the

163

time between two measurements, i.e., 10 days. APhe captures simultaneously the lag phase,

164

the rate of phenanthrene removal and final removal extent in the 20 days of incubation and

165

as such represents an indicator for overall phenanthrene removal activity in the micocosms,

166

i.e., APhe allowed to differentiate for instance between soils that showed the same removal

167

extent at day 20 but differed in removal extent at day 10, and was therefore used in model

168

development.

169

Statistical analyses

170

Linear regression analysis of the individual 47 soil variables as independent variables with

171

CFU numbers at day 1, 10 and 20 and the APhe as dependent variables was done using

172

SigmaPlot v12.0. Prior to regression analysis, values for the 20 soils in triplicate replicate

173

were first mean centered and divided by the standard deviation to normalize the data.

174

Principal Component Analysis (PCA) and Partial Least Squares Regression (PLSR) analysis

175

were performed using the Solo software package (Eigenvector Research Incorporated,

176

Wenatchee, WA, USA). The soil data matrix (60 rows x 47 columns) consisted of 20 soils for

Phe, x

 Phe, abiotic

 X 100

[Phe]day0,i  [Phe]day10,i 

X ∆t 

[Phe]day10,i  [Phe]day20,i

ACS Paragon Plus Environment



X ∆t

8

Page 9 of 32

Environmental Science & Technology

177

which data on 47 soil variables was collected in triplicate. The biological data matrix (60 rows

178

x 4 columns) was composed of 20 soils for which CFU numbers on day 1, 10 and 20 and APhe

179

were determined in triplicate. Both data matrices were preprocessed by autoscaling (mean

180

centering and dividing by standard deviation) for each soil variable. During multivariate

181

analysis, outlier detection was based on Q residual and Hoteling T² values for soils outside

182

the 95% confidence interval. The cross-validation of the calibrated model was done by

183

leaving out each soil (triplicate data) once. PCA was used to describe the variation between

184

the soils and was performed on the soil data matrix on either all variables (n=47) or all

185

variables except the pore water variables (n=26). The number of principal components (PCs)

186

in the PCA models was selected based on the captured variance (>2%) by each PC. An

187

unsupervised hierarchical cluster analysis was performed on the soil data with the PCA

188

scores as a distance measure (Euclidean distance) by using an agglomerative method called

189

Ward’s method where clusters are joined when within-cluster variance is minimized.

190

PLSR models32 were built to correlate and predict LH128 survival and activity using the soil

191

variables as predictor variables and CFU numbers on day 1, 10 and 20 and APhe as predicted

192

variable. The number of Latent Variables (LVs) was selected at a local minimum in prediction

193

error (root mean square error for the calibration (RMSE) and cross-validation (RMSECV)) or

194

up till a point that the prediction error does not improve substantially (0.1 are considered of importance

200

to the PLSR model.

ACS Paragon Plus Environment

9

Environmental Science & Technology

Page 10 of 32

201

Partial least square discriminant analysis (PLSDA)34 was performed to discriminate between

202

soils showing LH128 survival (1) or not (0) directly after inoculation (day 1). PLSDA models

203

used all soil variables (i), soil variables retained after iPLS variable selection for developing

204

the PLSR model predicting CFU numbers at day 1 (ii) and 20 (iii). Two additional PLSDA

205

models were tested with the variables after iPLS from which the variables obtained with

206

pore water analysis were excluded predicting CFU numbers at day 1 (iv) and 20 (v).

207

Results

208

Soil properties

209

The selection of the different soil types resulted in a natural variation in often

210

interdependent soil properties. The variation in soil texture explains the variation in WHC (pF

211

2.0), which ranged from 10 till 55%. Soils varied in pH from 3 to 8 and a CEC of 1.8 to 36

212

cmolc/kg soil (Table 1). Carbon content ranged from 0.4 to 12.9% OC and 0 to 5.7% inorganic

213

carbon/g soil and the C to N ratio ranged from 7.0 to 30.4 (SI Figure S2). Detailed information

214

of the elemental analysis of the soil and pore water chemistry are shown in SI Table S1 and

215

Table S2, respectively. Principal component analysis performed on the 20 soils using either

216

all variables or all variables but excluding those related to the pore water chemistry

217

illustrated the physico-chemical variation amongst the soils (SI Figure S3). PCA recognized

218

two larger clusters (cluster A including 8 soils and cluster E consisting of 6 soils) and three

219

smaller clusters of soils (clusters B, C and D each including 2 soils) in case the pore water

220

variables excluded (Table 1). Cluster A grouped the spodosols which are in general acidic

221

soils with a high Fe, Al and organic content while cluster E grouped the alfisols which are

222

more alkalic soils with high Fe and Al content.

223

Survival and phenanthrene degrading activity of strain LH128 in soils

ACS Paragon Plus Environment

10

Page 11 of 32

Environmental Science & Technology

224

CFU numbers were determined after inoculation at day 1 and after incubation at day 10 and

225

20 (Figure 1). LH128 CFU dynamics were highly variable among the 20 tested soils. In seven

226

soils, no LH128 CFU were detected one day after inoculation or on day 10 and day 20 and

227

were categorized as soils with ‘no survival’. The remaining 13 soils were categorized as soils

228

with ‘survival’. At day 1, soils Souli I (n°145) and Barcelona (n°157) showed a decrease in CFU

229

numbers till below 104 CFU/g soil, while soils Montpellier (n°147), Brécy (n°158), Guadalara

230

(n°159) and Cordoba (n°283) showed CFU numbers between 105 and 106 CFU/g soil. Except

231

for Souli I (n°145), those soils showed an increase in CFU numbers after day 1 to reach

232

around 108 CFU/g soil at day 10 for soils Montpellier (n°147) and Brecy (n°158) and at day 20

233

for Barcelona (n°157), Guadalajara (n° 159) and Cordoba (n° 283). Soil Vault de Lugny (n°153)

234

showed CFU numbers of around 108 CFU/g soil at day 1 that decreased to 103 CFU/g soil by

235

day 10 to finally reach 108 CFU/g soil at day 20. The remaining six soils showed CFU numbers

236

between 8 x 106 and 2 x 108 CFU/g soil at day 1 and increased in numbers to a range of 3 x

237

107 and 5 x 109 CFU/g soil at day 10 which remained constant afterwards. No CFU were

238

detected after 20 days in the abiotic controls.

239

The extent of phenanthrene removal was determined at day 10 and day 20 after inoculation

240

(Figure 2). At day 20, abiotic control soils showed no significant loss of phenanthrene

241

(average of 7±7%) compared to day 0. Six soils (n° 147, 149, 151, 155, 158 and 278) showed

242

phenanthrene removal of 70% or more by day 10 (Figure 1) and had a final APhe higher than

243

60%. Soil n°152 showed 59±6% phenanthrene removal by day 10 and 83±3% by day 20 and

244

had a APhe around 50%. Two other soils (n°141 and 144) showed moderate phenanthrene

245

removal of around 30% at day 10 and reached a final removal of 40%. Soil n°145 had a

246

phenanthrene removal of 50% at day 10 which did not change by day 20. Soils n° 141, 144

247

and 145 showed a similar APhe of around 30%. One soil (n° 156) showed a moderate

ACS Paragon Plus Environment

11

Environmental Science & Technology

Page 12 of 32

248

phenanthrene removal of 30% at day 10 which increased to 80% at day 20 resulting into a

249

final APhe of 60%. Nine soils showed no significant phenanthrene removal at day 10. Four of

250

these soils (n° 153, 157, 159 and 283) showed removal of 80% or more at day 20 (Figure 1)

251

resulting in an APhe between 20 and 40%. The remaining five soils (n° 138, 139, 140, 277 and

252

324) did not show significant removal compared to the abiotic control by day 20 (14±4 %)

253

and had an APhe of around 0%.

254

To see whether the CFU numbers at day 1, 10 and 20 correlated with the phenanthrene

255

degrading activity APhe, a linear regression of log10 transformed CFU numbers in function of

256

the APhe (APhe [%] = y0 + b x log10 (CFU)) was performed. Soils with no survival of LH128 at day

257

1, 10 and 20 were omitted from regression. At day 1, 10 and 20, R2 values of 0.08, 0.50 and

258

0.12 were respectively obtained indicating that the correlation between log CFU and APhe is

259

rather weak on day 1 and 20 and only moderate on day 10 (data not shown). Based on the t-

260

test for the coefficient b (slope) which was significantly different from 0, APhe correlated

261

significantly with CFU determined on day 10 and day 20 but not with CFU determined on day

262

1 (data not shown). As such APhe mainly correlates with CFU numbers during phenanthrene

263

degradation (day 10 and 20) rather than with CFU numbers immediately after inoculation

264

(day 1).

265

Prediction of survival of strain LH128 based on soil properties

266

Significant linear correlations of CFU numbers at day 1, 10 and 20 with individual soil

267

variables were found (SI Table S3). To deal with co-variation of the soil variables, PLSR was

268

performed using all soil variables (n = 47) to predict the dependent variable CFU numbers at

269

day 1, 10 and 20 separately in the soils (n = 60). A PLSR model was created for each of the

270

time points. Two soils (Borris n°278 and Montpellier n°147) were identified as outliers by all

ACS Paragon Plus Environment

12

Page 13 of 32

Environmental Science & Technology

271

three models based on high Hoteling T² (distance from center) and Q residual (lack-of-fit

272

statistic) values. Soils n°278 and 147 were grouped within cluster A by the PCA of the soil

273

data, a group that mainly included acidic soils such as spodosols and histosols (Table 1).

274

None of these soils showed survival of LH128 except soils n°278 and n°147. The number of

275

Latent Variables (LVs) was selected at the minimal RMSE for the cross-validation set and was

276

for the three models set at six LVs explaining around 70% of the variance within the soil

277

variable values (SI Figure S4). For the calibration set, R2 values were 0.92 or higher and as

278

such CFU numbers at day 1, 10 and 20 correlated strongly with soil properties. RMSE for

279

calibration was higher for the PLSR models obtained for day 10 compared to day 1 and 20 (SI

280

Figure S4). However, the predictive power of all three PLSR models in cross-validation (i.e.,

281

soils which were not included in the model building) was low, i.e., low R² values of 0.74 or

282

lower and high RMSECV values were noted for all three PLSR models compared to RMSE for

283

calibration, indicating that the PLSR models are not suitable to predict CFU in a new soil

284

based on soil properties. This is illustrated in SI Figure S4 where the predicted CFU numbers

285

are plotted as a function of the measured CFU numbers for both the calibration set and the

286

cross-validation set.

287

As not all soil properties may be equally informative and some might even be detrimental for

288

CFU prediction, forward interval PLS (iPLS) for variable selection was used to search for the

289

most informative soil properties for CFU prediction. iPLS variable selection selected, 9, 6 and

290

12 soil variables for the PLSR models predicting CFU numbers at day 1, 10 and 20,

291

respectively. This resulted in a reduction of the RMSECV compared to the PLSR models

292

predicting CFU numbers based on all soil variables (Figure 2). The RMSE for calibration

293

remained similar with or without iPLS selection. The predicted CFU numbers as a function of

294

the measured ones are shown in Figure 2. Both for calibration and cross-validation high

ACS Paragon Plus Environment

13

Environmental Science & Technology

Page 14 of 32

295

correlations between the predicted and measured CFU numbers were found after iPLS

296

variable selection. The selected variables for the PLSR models predicting CFU numbers at day

297

1, 10 and 20 are shown in Table 2 together with the RCV. RCV of soil variables are indicative

298

for the direction and extent of change in a predicted value when the soil variable changes. In

299

case of the PLSR model predicting LH128 CFU numbers in soils at day 1, nine soil variables

300

were retained after iPLS of which six with high │RCV│ (pH, exchangeable Mg, oxalate

301

extractable Fe and AL, total Cr and sand fraction) (Table 2). For prediction of LH128 CFU

302

numbers at day 10, eight soil variables were retained after iPLS of which two with a negative

303

RCV less than -0.2 (oxalate extractable Al and total Pb) (Table 2). For prediction of LH128 CFU

304

numbers at day 20 in soils, 12 soil variables were retained after iPLS of which three with high

305

|RCV| (pH, oxalate extractable Fe and total Ni) (Table 2). Soil variables pH, sand fraction,

306

exchangeable Mg, oxalate extractable Fe, total Cr and inorganic P were retained by iPLS

307

variable selection in at least two of the three PLSR models to predict CFU numbers (Table 2).

308

We also examined whether it was possible to predict by means of the soil characteristics

309

whether LH128 will survive or not. To this end, PLSDA was performed to discriminate

310

between soils showing LH128 CFU numbers and hence LH128 survival at day 20, given value

311

1, or soil showing no CFU and hence no survival at day 20, given value 0, using either all soil

312

variables or the variables selected by iPLS analysis performed during the PLSR model building

313

for CFU numbers at day 1 and day 20. Moreover, pore water variables were omitted to see

314

whether PLSDA is still successful in predicting survival using only the regularly available data

315

from soil databases. The performance of all five PLSDA models is shown in SI Table S4. All

316

PLSDA models classified most soils correctly (Table S4). The exceptions were soils

317

Montpellier (n° 147) and Borris (n° 278), that were previously identified as outliers in the

318

PLSR models predicting CFU numbers at day 1, 10 and 20. Only in the PLSDA model using the

ACS Paragon Plus Environment

14

Page 15 of 32

Environmental Science & Technology

319

variables selected after iPLS for the PLSR model for CFU numbers at day 20 and with pore

320

water data excluded was soil Borris correctly classified (SI Table S5).

321

Prediction of phenanthrene degrading activity of strain LH128 based on soil properties

322

Phenanthrene degrading activity APhe showed significant linear correlations with several

323

individual soil variables (SI Table S3). Soils with similar APhe did not correspond to the clusters

324

of soils with similar soil properties based on PCA. To deal with co-variation of the soil

325

variables, PLSR was performed using all soil variables (n = 47) to predict APhe. The inclusion in

326

the PLSR analysis of soils that showed no survival of LH128 at day 1 resulted in poor

327

prediction (R2 = 0.15). A plausible explanation is that the PLSR models assume a linear

328

relation between soil parameters and APhe. This assumption is not valid for soils with no

329

survival at day 1, resulting in zero degradation, as one of these soils could actually have soil

330

properties with a more negative impact on LH128 compared to other soils. Therefore, an

331

alternative PLSR model for predicting APhe was built using only soils that showed survival of

332

LH128 on day 1 (13 soils). As before, soils Montpellier and Borris were identified as outliers

333

based on their Hoteling T² and Q residuals. The number of Latent Variables (LVs) was

334

selected at the minimal RMSE for the calibration and cross-validation set and was set at five

335

LVs explaining around 70% of the variance within the soil variables and 95% of the variance

336

in the phenanthrene degradation activity (SI Figure S4). However, for the cross-validation

337

set, the R² value was only 0.18. The poor prediction performance for APhe of phenanthrene

338

degradation by the PLSR model is further illustrated in Figure S4 where the predicted APhe is

339

plotted as a function of the measured APhe. As some variables might have had a detrimental

340

effect on the prediction performance as was the case for predicting survival, variable

341

selection was performed by means of iPLS. This resulted in a similar RMSE of 4% for both

ACS Paragon Plus Environment

15

Environmental Science & Technology

Page 16 of 32

342

calibration and cross-validation and R² values of 0.96 and 0.95, respectively. The lower

343

prediction error after iPLS variable selection is illustrated in the predicted versus measured

344

plot (Figure 2). The average measured APhe values are ranked from high to low (SI Table S5).

345

The nine variables retained after iPLS variable selection are summarized in Table 2. From

346

these variables, oxalate extractable Al had a highly negative RCV (< -0.1), while WHC, total Ni

347

and pore water Cr had a high positive RCV (> 0.1) (Table 2).

348

Discussion

349

In this study, we show the feasibility to develop soil-strain compatibility models to predict

350

the behavior and success of bioaugmentation inoculation (survival and activity) in treatment

351

of contaminated soil based on intrinsic physico-chemical soil properties using a proof of

352

principle the phenanthrene degrading Novosphingobium sp. LH128 and phenanthrene as a

353

target pollutant. Multivariate regression analysis was used to build the models since many of

354

the soil variables are interdependent and co-vary, e.g. clay fraction correlates with WHC and

355

CEC. Other studies have investigated effects of soil variables on pollutant degradation but by

356

changing the conditions in one soil or sediment35. To the best of our knowledge, our study is

357

the first one to use multivariate regression analysis to assess and predict organism survival

358

and activity by means of the intrinsic physico-chemical soil characteristics using data

359

obtained from/with multiple soils. To reduce complexity and clearly estimate intrinsic

360

physico-chemical factors sterilized soils were used. As such biotic factors are not considered

361

while they might affect, survival and activity of inoculants and hence the bioaugmentation

362

success. For instance, biotic factors such as protozoan grazing36, and competition for

363

nutrients and space with the native microbial community can influence the inoculants

364

survival or activity negatively37. Otherwise, resident bacteria were shown to improve

ACS Paragon Plus Environment

16

Page 17 of 32

Environmental Science & Technology

365

degradation for instance by increasing the pollutants bioavailability38, delivery of co-factors39

366

or metabolic cooperation40. It has to be awaited whether biotic factors will affect prediction

367

and that physico-chemical models need to be further adapted including biotic factors.

368

Moreover, some variables which might differ between polluted soils and affect the model

369

were kept constant to reduce complexity such as parameters related to the pollution itself

370

like the phenanthrene concentration and the presence of co-pollutants41. Moreover,

371

inoculum size and mode of inoculation as well as soil management might affect survival and

372

activity of the inoculum. Chen et al.35 showed that in addition to salinity, inoculum size

373

determined phenanthrene degradation by a Novosphingobium sp. strain in mangrove

374

sediments while effects of phenanthrene concentrations, nutrient addition and

375

temperatures were insignificant. The optimal inoculum size was 106 cells/g sediment but

376

higher inoculum sizes were not tested. The inoculum size of around 108 cells/g soil used in

377

our study resembles frequently used numbers of inocula in bioaugmentation studies5.

378

Finally, we did not include aging effects in this study. Aging effects are limited in short term

379

experiments of less than a month42. Phenanthrene removal rates in situ were shown to be

380

affected by aging of phenanthrene in soil43 and should be considered in the long term. As

381

such, predictions made by the models should be used as a first estimation.

382

Two types of models were optimized that successfully predicted survival of LH128. The first

383

type, based on PLSR and improved by means of forward iPLS variable selection, predicts

384

LH128 CFU numbers at day 1, day 10 and day 20 after inoculation based on a few soil

385

variables. The iPLS mediated approach not only ensured the predictive power of the model

386

but has also another advantage, i.e., that it reduces the number of analytical techniques

387

(from 10 to 7) required to deliver the needed input data and as such is cost saving.

388

Moreover, most of the selected iPLS variables are basic soil characteristics which are often

ACS Paragon Plus Environment

17

Environmental Science & Technology

Page 18 of 32

389

included in routine soil physico-chemical characterization and hence available in databases.

390

The second model type, based on a PLSDA approach, predicts whether LH128 will survive in

391

a soil or not. As such, although the latter model will provide less details regarding number of

392

expected LH128 cells that survive in the soil, it reduces the number of analytical techniques

393

even further to five excluding even pore water analysis. In addition, a model was developed

394

that predicts the crucial PAH degrading activity of LH128. This model only uses soils in which

395

the inoculum survives and hence its use can only be implemented after using one of the two

396

survival prediction models. Similar to survival, LH128 PAH degrading activity expressed in

397

terms of the APhe, could be well predicted from the soil properties by PLSR after selection of

398

the most informative variables using iPLS involving nine soil variables that can be

399

determined with only six analytical techniques. These six analytical techniques include pore

400

water analysis which as such cannot be omitted to finally predict the success of

401

bioaugmentation with LH128 for a particular soil. However, pore water analysis can be

402

restricted to those soils with probable survival of LH128 as determined by the model(s) that

403

predicted survival.

404

The models developed in this study not only predicts fate and activity of the inoculum strain

405

but provides also information about crucial soil characteristics for bioaugmentation with

406

LH128. In addition, the regression coefficient values of the PLSR model allowed to identify

407

whether a soil property will positively or negatively affect the survival and pollutant

408

degradation activity of the introduced organism in soil. Caution, however, should be taken to

409

decide on causal relationships between the soil characteristics and the observed effects on

410

LH128 survival and activity. These effects can be indirectly caused or simultaneously by more

411

than one soil characteristic at the same time. For instance, a decreasing soil pH influenced

412

the LH128 survival in a negative way but low pH also increases the availability of toxic

ACS Paragon Plus Environment

18

Page 19 of 32

Environmental Science & Technology

413

cations like Al3+ and Ni2+

44, 45

414

survival and oxalate extractable Al which represents the amorphous Al in soil which is

415

considered to be the ‘active’ form of Al in soil and becomes entirely soluble at low pH (