Correlating Asphaltene Dimerization with Its Molecular Structure by

Apr 11, 2018 - School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University , Shanghai 200240 , China ... The linkage pe...
0 downloads 5 Views 792KB Size
Subscriber access provided by UNIV OF DURHAM

Fossil Fuels

Correlating asphaltene dimerization with its molecular structure by potential of mean force calculation and data mining Xinzhe Zhu, Guozhong Wu, Frederic Coulon, Lvwen Wu, and Daoyi Chen Energy Fuels, Just Accepted Manuscript • DOI: 10.1021/acs.energyfuels.8b00470 • Publication Date (Web): 11 Apr 2018 Downloaded from http://pubs.acs.org on April 11, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

1

Correlating asphaltene dimerization with its molecular structure

2

by potential of mean force calculation and data mining

3

Xinzhe Zhu a, b, Guozhong Wu a, b, Frédéric Coulon c, Lvwen Wu d, Daoyi Chen a, b, *

4 5 6 7 8 9 10 11 12

a

Division of Ocean Science and Technology, Graduate School at Shenzhen, Tsinghua

University, Shenzhen 518055, China b

School of Environment, Tsinghua University, Beijing 100084, China

c

School of Water, Energy and Environment, Cranfield University, Cranfield MK43

0AL, UK d

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong

University, Shanghai 200240, China

13 14 15

*Corresponding Author:

16

Phone: +86 0755 2603 0544

17

Fax: +86 0755 2603 0544

18

Email: [email protected]

19 20 21

1

ACS Paragon Plus Environment

Energy & Fuels

22

0 in water in toluene

-1

PMF (kJ mol )

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 2 of 37

Molecular structure

-40 ∆G

-80 0.0

0.5

1.0 1.5 Distance (nm)

MACHINE LEARNING

2

ACS Paragon Plus Environment

2.0

Dimerization prediction

Page 3 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

23

Abstract: Asphaltene aggregation affects the entire production chain of

24

petrochemical industry, which also poses environmental challenges for oil pollution

25

remediation. The aggregation process has been investigated for decades, but it

26

remains unclear how the free energy of asphaltene association in solvents is correlated

27

with its molecular structure. In this study, dimerization energy of 28 types of

28

asphaltenes in water and toluene were calculated using the umbrella sampling method.

29

Structural parameters related to the atom types and functional groups were screened to

30

identify the factors most influencing the dimerization energy using multiple linear

31

regression,

32

demonstrated that the influences of molecular structure on asphaltene association in

33

water was nonlinear, while attempts to capture the relationship using linear regression

34

had large error. The linkage per aromatic ring, number of aromatic carbons and

35

aliphatic chains were the top three factors accounting for 52% of the dimerization

36

energy variation in water. Asphaltene dimerization in toluene was dominated by the

37

content of sulfur in aromatic rings and the number of aromatic carbons which

38

contributed to 55% of the energy variation. To the best of our knowledge, this was the

39

first study successfully predicting asphaltene dimerization using molecular structure

40

(R > 0.9) and quantifying simultaneously the relative importance of each structural

41

parameter. The proposed modelling approach supported the decision making on the

42

number of structural parameters to investigate for predicting asphaltene aggregation.

43

Keywords: Asphaltene aggregation; PMF; Multiple linear regression; Multi-layer

44

perceptron; Support vector regression

multi-layer

perceptron

and

support

vector

3

ACS Paragon Plus Environment

regression.

Results

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

45

1. Introduction

46

Asphaltenes are the largest, heaviest and most polar fraction of crude oil, which

47

usually contain poly-condensed aromatic ring linked by aliphatic chains and

48

heteroatoms like nitrogen, sulfur and oxygen. Such structural properties make them

49

easy to aggregate, flocculate and deposit even at low concentrations, which affects the

50

entire production chain of petrochemical industry.1 A large number of studies have

51

been carried out to address the flow-assurance issues arose from asphaltene

52

aggregation.2 Our recent studies also demonstrated the environmental significance of

53

asphaltene aggregation process on the fate and transport of oil pollutants in the

54

contaminated soils. For example, asphaltene aggregates could form porous network in

55

which the small oil molecules are sequestrated and become difficult to remove. 3 They

56

also have the tendency to co-aggregate with the soil organic matter (SOM), a key

57

component of soil responsible for the persistence of recalcitrant residual oils in the

58

oil-soil matrix, at the oil-water interface and in the bulk water.4 The formation of

59

asphaltene-humic acids complex decrease the diffusion coefficient and increases the

60

hydrophilicity of the asphaltenes in the clay pores.5 These findings highlight the

61

potential role of asphaltene aggregation on the extractability and bioavailability of the

62

residual oil in the aged soil.

63

Successful prediction of asphaltene macro-scale behaviour requires an understanding

64

of the aggregation in terms of micro-scale mechanism and aggregation strength.

65

Without this fundamental knowledge, any attempt to model the environmental

66

behaviour of asphaltenes will be incomplete or inaccurate. Previous studies indicated

4

ACS Paragon Plus Environment

Page 4 of 37

Page 5 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

67

that asphaltene aggregation is influenced by many factors such as the molecular

68

structure and concentration of asphaltenes,6-8 temperature and pressure,9,

69

solvent types.11, 12 Among these factors, the molecular structure of asphaltenes are of

70

particular interest due to the great diversity and complexity of asphaltene from various

71

sources.13 For instance, it was estimated that the number of rings in a single

72

asphaltene fused ring system can range from a few rings to up to 20 rings.13

73

Several studies have been carried out to investigate the relationship between

74

asphaltene aggregation and its molecular structure using molecular simulation

75

techniques such as classical molecular mechanics, molecular dynamics (MD)

76

simulation and density functional theory method in quantum mechanics. Generally,

77

the π-π interaction between poly-aromatic rings was acknowledged as the main

78

association force.4, 14-16 The lower H/C ratio and the higher molecular weight and

79

aromaticity were also proved to favour the asphaltene aggregation.17, 18 However, the

80

contribution of some other structural parameters to the asphaltene aggregation varied

81

against studies with different simulation conditions. For example, hydrogen bonding

82

has been proposed as one of the driving forces for asphaltene association in toluene or

83

crude oil,

84

observed during the asphaltene dimerization in vacuum condition.20 According to the

85

modified Yen model,21 the steric repulsion between aliphatic side chains was a force

86

limiting asphaltene aggregation in toluene or heptane. In contrast other studies

87

demonstrated that the asphaltene aggregates formation in water was favored by the

88

very long or very short side chains in the asphaltene molecules.22

15, 19

10

and

but in other studies the intermolecular hydrogen bonds were not

5

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

89

Another issue that was less taken into account in previous studies was the existence of

90

co-factors effects among the molecular structure parameters. For example, the same

91

backbone structure was used by only changing the side-chain length,22 but actually the

92

varied side-chain length also led to the changes in the molecular weight and the ratio

93

of hydrogen to carbon which might have synergistic or antagonistic effects on the

94

asphaltene aggregation. If the side-chain length was investigated without changing the

95

latter two parameters, then the number of chains had to be altered to keep mass

96

balance. When more than one structural parameter varied simultaneously, the relative

97

contribution of each parameter to the binding energy of two asphaltene molecules

98

remained unelucidated. To the best of our knowledge, this was mainly due to the lack

99

of statistical data. As a way forward it is suggested to develop multi-factor regression

100

to evaluate the importance of concomitant molecular structure parameters on the

101

strength of asphaltene dimerization.

102

Quantitative structure-property relationship (QSPR) modeling is a useful tool to

103

predict the relationships between molecular structures and the properties.23 One

104

challenge is the selection of a number of molecular models for asphaltenes with

105

different structures, which is then used to calculate the free energy for the binding of

106

each pair of asphaltenes and to establish mathematical models correlating these

107

energies with the structural parameters. Although the asphaltenes are complex

108

heterogeneous fractions, a variety of asphaltene model molecules have been proposed

109

by many research groups that can mimic the properties of asphaltenes,24 making it

110

possible to address the above issue. In the present study, 28 asphaltene models from

6

ACS Paragon Plus Environment

Page 6 of 37

Page 7 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

111

the literatures were selected to explore the relationship between asphaltene

112

dimerization and its molecular structure using conventional statistical methods (e.g.

113

multiple linear regression) and machine learning methods (e.g. multi-layer perceptron

114

and support vector regression) that have been successfully used in our previous

115

studies for predicting environmental phenomenon.25-27 Specific objectives of this

116

study were to screen structural parameters that were significant to the asphaltene

117

dimerization, predict asphaltene dimerization by structural parameter, and quantify the

118

relative importance of each structural parameter to the asphaltene dimerization.

119 120

2. Methodology

121

2.1 Molecular dynamics simulation

122

MD simulations were performed using the GROMACS 5.1 software,28 while the

123

molecular interactions was calculated by the CHARMM36 force field.29, 30 As shown

124

in Fig. 1, 28 types of asphaltene molecules with different structures were selected

125

from literatures.6, 11, 22, 31-34 Each molecule contained a single polyaromatic core with

126

side chains, because previous experiments evidenced that such “continental model”

127

was the dominant asphaltene architecture, providing the main aspects proposed by the

128

Yen-Mullins model.35 For example, using the Laser desorption laser ionization mass

129

spectra, Sabbah et al.36 demonstrated that all the 23 types of model compounds having

130

one aromatic core showed little or no fragmentation, which well-mimicked the

131

behavior observed for the 2 petroleum asphaltene samples. However, all model

132

compounds with more than one aromatic core showed energy-dependent

7

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 37

133

fragmentation, suggesting that the “archipelago models” were not dominant in

134

asphaltenes. Recently, using the atomic force microscopy and scanning tunnelling

135

microscopy, Schuler et al.37 measured the structure of more than 100 asphaltene

136

molecules, which also concluded that a single aromatic core was the dominant

137

asphaltene architecture although in some cases the “archipelago structure” was also

138

present.

139

The free energy of dimerization of asphaltene molecules in water and toluene was

140

obtained using the umbrella sampling method.6,

141

umbrella

142

(http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/umbrella/in

143

dex.html). Before umbrella sampling, the configuration of each asphaltene dimer was

144

obtained from a 2 ns MD simulation, where each two asphaltene molecules were

145

randomly placed in a simulation box (4 nm × 4 nm × 4 nm) filled with water or

146

toluene. The coordinates of asphaltene dimer at the end of MD simulation was

147

extracted and located in the edge of a new simulation box (4 nm × 4 nm × 7.5 nm)

148

with the aromatic rings parallel to the x-y plane. The simulation box was filled with

149

water and toluene, respectively. It was then equilibrated by running MD simulation at

150

NVT and NPT ensemble for 1 ns, respectively, where the asphaltenes were kept

151

restrained. The center of mass (COM) pulling was employed to generate a series of

152

initial configurations for each umbrella window. The COM of one asphaltene

153

molecules was pulled along the Z-axis at the rate of 2.6 nm ns-1 for 1 ns, while the

154

other one was restrained with a harmonic force constant of 1000 kJ mol-1 nm-2.

sampling

was

available

from

38

the

A step-by-step guide for the Gromacs

8

ACS Paragon Plus Environment

tutorials

website

Page 9 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

155

Approximately 35 windows were selected with a space of 0.05 nm between two

156

consecutive distances from the resulted pulling simulation. For each window, separate

157

MD simulation (10 ns) was performed using the biased umbrella potentials to restrain

158

the molecules within the window. The potential of mean force (PMF) was calculated

159

from unbiased probability distributions of the systems with the weighted histogram

160

analysis method.39 The free energy for asphaltene dimerization was obtained from the

161

PMF curves, which was defined as the free energy when two asphaltene monomers

162

moved from very far away (i.e., distance larger than 1.8 nm when the interaction

163

between two monomers was negligible) to the most stable state with the lowest energy.

164

Accordingly to previous studies, the most stable structure of asphaltenes dimers were

165

formed when the two monomers were parallel stacked and the distance between each

166

other was about 0.4 nm.4, 40, 41 The absolute values of the dimerization energy were

167

used for the following data processing and results discussions. A larger value

168

suggested an easier aggregation of asphaltenes. Throughout simulations, the

169

temperature (298 K) and pressure (1 bar) were controlled using the V-rescale

170

thermostat42 and Parrinello-Rahman method43, respectively.

171 172

2.2 Data preprocessing

173

Structural parameters definition: In order to gain insights into the influence of

174

molecular structures on the binding energy of asphaltene dimer, eleven molecular

175

structural parameters were selected as follows: molecular weight (MW), number of

176

aromatic carbons (N Aro-C), number of aliphatic chains (N

9

ACS Paragon Plus Environment

chains),

number of aliphatic

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

177

carbons (N Ali-C), number of carbons in the longest chains (N Ali-CL), mass percentage

178

of heteroatoms in the aliphatic chains (HA%) , mass percentage of N in the aromatic

179

ring (N%), mass percentage of S in the aromatic ring (S%), the number of hydroxyl

180

groups (NOH), the ratio of hydrogen to carbon (H/C), number of carboxyl groups

181

(NCOOH), and the linkage per aromatic ring where a linkage was counted when each

182

two aromatic rings shared one edge (Nlink/ Naro).

183

Normalization: To eliminate the effects from the units of different variables during

184

regression, the data was standardized to the same scale with Z-score normalization

185

according to the following equation: ∗ =

 − ̅

186

where ∗ and  represent the standardized value and initial value of the variable,

187

̅ is the mean value of the variable, is the standard error of the variable. The

188

standardized value of each parameter obeyed a normal distribution.

189

Cluster analysis: Cluster analysis was performed to split the dataset into several

190

classes using the WEKA program package44 with the expectation maximization

191

algorithm.45 The molecular structures were similar in one class and different from that

192

in other classes. Molecules (80%) were randomly selected from each class for

193

regression, while the remaining 20% molecules were used for external validation.

194

Structural parameters screening: when multiple influence factors coexisted, there

195

might be multicollinearity among them, which would impact the accuracy of

196

regression model and should be removed before model development. The first step

197

was to carry out bivariate correlation between each two parameters and between each

10

ACS Paragon Plus Environment

Page 10 of 37

Page 11 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

198

parameter and the dimerization free energy. If the correlation coefficient between two

199

parameters exceed 0.95, the one with poorer correlation with the dimerization free

200

energy was removed and the other was retained.46 Next step was to perform stepwise

201

multiple linear regression (SMLR) to identify the significant parameters to the

202

dimerization free energy among the above screened parameters.47, 48 Briefly, partial

203

F-statistics were computed for each parameter and the one with the highest F-value

204

was inserted into the model. The partial F-statistics were computed again for all the

205

remaining parameters and the one with the highest F-value was added to the model if

206

the corresponding F-value exceeded a specified threshold value. These two

207

parameters in the model were evaluated with partial F-test to see if each one was still

208

significant. The parameter was removed from the model if it was no longer significant.

209

This procedure was repeated for the remaining parameters until no more parameters

210

yielded a partial F-value greater than the threshold and all parameters in the model

211

remained significant.49 The parameters screening process was carried out with the

212

SPSS (version 22.0), from which the selected parameters were used for regression

213

models building in the next subsection.

214 215

2. 3 Regression and validation

216

Multiple linear regression (MLR): MLR was employed to predict the relationship

217

between the molecular structure parameters and the binding energy, which could be

218

expressed by the following equation: y =  +   + ⋯ +   +  , where the

219

,  ,…,  were the regression coefficients, and  was the intercede of the

11

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

220

regression line.

221

Multi-layer perceptron (MLP): MLP is one of the most commonly used artificial

222

neural network (ANN) models, consisting of an input layer, hidden layers and an

223

output layer. Each layer consists of nodes that are connected with a certain weight to

224

every node in the following layer. Except for the input nodes, each node is a nonlinear

225

activation function that enables the network to compute complex nonlinear problems.

226

The connection weights are changed after each data passing through the nodes in the

227

network using the 'back propagation' technique. By this way, the error in the output

228

compared to the expected result is minimized. This process is termed as training,

229

which stops automatically when no more decrease in the errors of cross-validation

230

samples. To seek for the model with high precision, two parameters are optimized by

231

trial and error during the training process including (i) the learning rate, which

232

controls the degree of changes in the connection weights during each iteration,50 and

233

(ii) the number of nodes in the hidden layer, which ranges from (2n1/2 + m) to (2n + 1)

234

where n and m represented the number of input node and output node, respectively.51

235

Support vector regression (SVR): SVR is a regression technique with excellent

236

performances in regression and time series prediction application, which is capable of

237

rearranging the input data from nonlinear to linear using mathematical functions

238

known as kernels. In this study, the SMOreg with a RBF kernel or poly-kernel

239

function was also used to develop regression model, which could transform nominal

240

attributes into binary ones and had been successfully applied to predict the QSPR

241

models.52, 53 The complexity parameter, which determines the trade-off between the

12

ACS Paragon Plus Environment

Page 12 of 37

Page 13 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

242

model complexity and the tolerance of errors, was optimized by a number of trials.

243

Validation: The 10-fold cross-validation was used to test the performance of the

244

models.54 Briefly, the datasets used for training were evenly split into 10 folds. The

245

instances from 9 folds were used for training while the remaining one fold was used

246

for testing. This process was repeated 10 times using a different fold for testing at

247

each cycle. The effectiveness of the model training was assessed by the correlation

248

coefficient (R) and root mean squared error (RMSE). To validate the predictability of

249

the model, these two parameters were also calculated for those validation datasets not

250

used in the training process.

13

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

251

3. Results and discussion

252

3.1 Free energy for asphaltene dimerization

253

Fig. 2 shows the changes in the PMF during the dimerization process of 28 asphaltene

254

molecules in water and toluene, respectively. Despite the variances in the shape of the

255

PMF curves, the deepest well of all the PMF curves were observed at 0.3 ~ 0.5 nm,

256

which corresponded to the separation distance between the centers of mass of each

257

two asphaltene molecules parallel stacking due to the π-π interactions in the aromatic

258

regions.4,

259

asphaltene dimerization and therefore offered the stable configuration for the

260

asphaltene dimers in both water and toluene. By a closer examination of the structure

261

of asphaltene dimers at different positions on the PMF curves (two examples are

262

shown in Fig. 3), we noted that the stacking manner between the two monomers

263

changed by rotation during pulling process. In other words, if two monomers were

264

initially located as the T-shape stacking, they would spontaneously transform to the

265

parallel stacking when they moved closer to each other.

266

The free energy of dimerization was determined by the difference between the

267

minimum value of PMF (r = 0.3 ~ 0.5 nm) and the value of PMF at higher distances

268

when it reached a plateau (r > 1.8 nm). The calculated binding energy of asphaltenes

269

with different molecular structures, which ranged from 22.3 to 81.5 kJ mol-1 in water

270

and from 2.5 to 39.0 kJ mol-1 in toluene (Fig. 2), respectively. Particularly, the

271

dimerization free energy for each specific type of asphaltene was 48 - 91% lower in

272

toluene than in water (Fig. 2). This might be attributed to the interactions between the

55, 56

It suggested that such type of stacking required low energy for

14

ACS Paragon Plus Environment

Page 14 of 37

Page 15 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

273

benzene ring of toluene molecules with the aromatic sheets of asphaltenes through π -

274

π interaction.57 It suggested a greater tendency for asphaltenes aggregation in water

275

than in toluene, which was consistent with previous studies.4, 56

276 277

3.2 Bivariate correlation and structural parameters screening

278

A preliminary examination of the bivariate correlation indicated that the absolute

279

value of the Pearson correlation coefficient between the free energy of asphaltene

280

dimerization in water and each individual structural parameter ranged from 0.068 to

281

0.459 (Table 1). The mass percentage of nitrogen in the aromatic rings was

282

characterized as the most significant parameter (p < 0.01), while relatively less

283

significance was found for the mass percentage of sulfur in the aromatic rings,

284

molecular weight, number of aromatic carbons, linkage per aromatic ring, and number

285

of aliphatic chains (p < 0.05). The remaining five parameters showed insignificant

286

influence on the asphaltene dimerization in water (p > 0.05). By contrast, only four

287

parameters were identified as significant factors influencing asphaltene dimerization

288

in toluene including the mass percentage of sulfur in the aromatic rings, number of

289

aromatic carbons, ratio of hydrogen to carbon and number of carboxyl groups. It

290

indicated that some insignificant parameters became significant after changing the

291

solvent. For example, the Pearson correlation coefficient between the dimerization

292

free energy in toluene and the number of carboxyl groups was 0.518, which decreased

293

to 0.053 in water (Table 1). It might be attributed to the fact that the hydrogen

294

bonding between the carboxylic groups of two asphaltene molecules was a driving

15

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

295

force for its dimerization in toluene, but dimerization driven by this way was less in

296

water because the carboxylic groups in asphaltene would also form hydrogen bonds

297

with water.

298

Results also demonstrated that there were significant correlation among some

299

independent variables. For example, the molecular weight was significantly correlated

300

with all the other parameters except the linkage per aromatic ring, the mass

301

percentage of sulfur in the aromatic rings and the number of carboxyl groups (Table

302

1). The number of aliphatic carbons was highly correlated with the molecular weight,

303

the ratio of hydrogen to carbon and the number of carbon in the longest chains with

304

the corresponding Pearson correlation coefficient of 0.797, 0.883 and 0.824,

305

respectively. This meant that the changes in one structural parameter would result in

306

the changes in some other structural parameter. Therefore, when we tried to interpret

307

the effects of one specific structural parameter on the asphaltene aggregation, we

308

should take into account the multi-factor interactions between this parameter and

309

other parameters. Moreover, the Pearson correlation coefficients among independent

310

variables were not large enough to allow removing any one before input to the

311

regression models. According to the parameter screening protocol (see section 2.2), it

312

remained impossible at this stage to remove any of these structural parameters

313

because the Pearson correlation coefficients between each two structural parameters

314

were all less than 0.95 (Table 1).

315

To reduce the parameters to a suitable number without losing any important

316

information, we further performed SMLR analysis using the dimerization free energy

16

ACS Paragon Plus Environment

Page 16 of 37

Page 17 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

317

data in the water and toluene, respectively. After removal of the non-significant

318

parameters, seven and five parameters were selected to correlate with dimerization

319

energy in water and in toluene, respectively (Table 2). The variance inflation factor

320

(VIF) was less than the

321

parameters were of no multicollinearity.58

322

According to the similarity in the molecular structure, the 28 asphaltene molecules

323

were divided into three representative classes by cluster analysis, in which the number

324

of asphaltene molecules were 8, 8 and 12, respectively. From each class, 80%

325

molecules were randomly selected for regression model development and the

326

remaining molecules were used for model validation.

threshold value of 10 suggesting that the remaining

327 328

3.3 Prediction of asphaltene dimerization in water

329

MLR: The free energy of asphaltene dimerization in water was correlated with the

330

screened structural parameters using the MLR model as follows: Ewater = 0.6168

331

NAro-C + 0.5819 Nlink/Naro - 0.3979 Nali-C - 0.2488 HA% - 0.4006 N% + 0.4042 S% +

332

0.695 Nchains - 0.0012. The R and RMSE for the training dataset calculated by 10-fold

333

cross validation was 0.9231 and 0.3624, respectively (Table 3). These two parameters

334

were about 4% lower and 80% higher for the validation dataset, respectively,

335

compared with that during model training. It should be noted that the energy predicted

336

from this equation was dimensionless due to the normalization treatment before

337

modelling. It could be transformed to the energy with units (kJ mol-1) using the mean

338

value and standard error as indicated in Section 2.2. The input and predicted free

17

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

339

energies are shown in Fig. 4.

340

MLP: The well trained MLP network consisted of 7, 6 and 1 nodes in the input layer,

341

hidden layer and output layer, respectively. The optimized learning rate was 0.02.

342

Higher or lower learning rate both resulted in larger error in the training and

343

validation datasets. For example, when the learning rate increased from 0.02 to 0.05,

344

the RMSE for the validation data almost doubled (Fig. 5). After training the data with

345

these optimal parameters, the R and RMSE was 0.9247 and 0.3608, respectively (Fig.

346

4b). Little difference was observed in the R between training and validation data, but

347

the RMSE was about 14% higher in the latter.

348

SVR: The optimized SVR model was obtained using the complexity parameter of

349

1200 and the RBF Kernel with gamma of 0.008. The R for the training and validation

350

datasets was 0.9462 and 0.9104, respectively (Table 3). The RMSE for the training

351

and validation datasets was 0.3055 and 0.5173, respectively.

352

Model comparison: The performance of the three models for predicting the

353

dimerization free energy in water was compared by the RMSE for the validation

354

dataset (Table 3). It demonstrated that the MLP model had the best performance with

355

the lowest RMSE. The highest RMSE was observed in the MLR model, which was 13

356

- 32% higher than that in the other two models. This finding suggested that the

357

influences of asphaltene molecular structure on its aggregation in water was more

358

likely nonlinear, while attempts to capture the relationship using a linear regression

359

method might resulted in larger error. Accordingly, the relative contribution of the

360

structure descriptors to the predictive dimerization free energy was calculated using

18

ACS Paragon Plus Environment

Page 18 of 37

Page 19 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

361

the hidden-input and hidden-output connection weights resulted from the MLP

362

modelling.25 As shown in Fig. 6a, the linkage per aromatic ring was characterized as

363

the most important factor which accounted for 18% of the changes in the dimerization

364

free energy. Similar importance was found between the number of aromatic carbons

365

and the number of aliphatic chains. The least important factor was the number of

366

aliphatic carbons which contributed to less than 10% of the asphaltene dimerization.

367 368

3.4 Prediction of asphaltene dimerization in toluene

369

MLR: The free energy of asphaltene dimerization in toluene was correlated with the

370

five selected parameters using the MLR model as follows: Etoluene = 0.5192 NAro-C +

371

0.1745 Nlink/Naro + 0.5757 S% + 0.4413 NCOOH + 0.1817 Nchains - 0.0203. The R and

372

RMSE for the training dataset was 0.8931 and 0.4436, respectively (Table 3). For the

373

validation dataset, little change was found in the R while the RMSE increased by

374

about 6%. The input and predicted free energies are shown in Fig. 7.

375

MLP: The well trained MLP network consisted of 5, 6 and 1 nodes in the input layer,

376

hidden layer and output layer, respectively. The optimized learning rate was 0.04. The

377

R and RMSE for the training dataset was 0.8934 and 0.4217, respectively (Table 3).

378

Little difference was observed in the R between training and validation data, but the

379

RMSE was about 10% higher in the latter.

380

SVR: The optimized SVR model was obtained using the polynomial kernel with

381

exponent of 0.72. The R for the training and validation datasets was 0.9114 and

382

0.8989, respectively (Table 3). The RMSE for the training datasets was 0.3663 which

19

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

383

increased by 30% for the validation dataset.

384

Model comparison: As we did in the water, the predictability of the three models for

385

dimerization free energy in toluene was compared by the RMSE for the validation

386

dataset (Table 3). The highest RMSE was found in the SVR model, while very little

387

difference (~ 1%) was observed between MLR and MLP. Same tendency was

388

observed in the relative contribution of each structural parameter to the predicted

389

dimerization free energy between MLR and MLP (Fig. 6b). As suggested by both

390

models, the mass percentage of sulfur in the aromatic ring and the number of aromatic

391

carbons were recognized as the two most important factor which contributed to more

392

than 55% for the asphaltene aggregation (Fig. 6b).

393

It was interesting to find that (i) the number of aromatic carbons was strongly

394

correlated with the asphaltene dimerization both in water and toluene, suggesting the

395

attraction between poly-aromatic cores was the main driving force in both cases; (ii)

396

the number of carboxyl groups had ignorable influence on the asphaltene aggregation

397

in water, but it significantly influence the asphaltene aggregation in toluene with

398

about 20% contributions. This was attributed to the hydrogen bonding phenomenon as

399

aforementioned in the bivariate correlation; (iii) the top two most important factors

400

(linkage per aromatic ring and number of aliphatic chains) for asphaltene aggregation

401

in water became the least important in toluene. It suggested that the interactions

402

between aliphatic chains contributed little to the strength of aggregation in toluene,

403

but played very important roles for aggregation in water. This finding was consistent

404

with previous study which demonstrated that the contacts between aliphatic chains in

20

ACS Paragon Plus Environment

Page 20 of 37

Page 21 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

405

water was 2.6 - 5.7 folds of that in toluene. 22, 59

406

Overall results indicated that the combination of MD simulation with machine

407

learning provided a potential way to predict asphaltene dimerization using the

408

molecular structure parameters. It is well-acknowledged that asphaltene aggregation

409

highly depend upon its molecular structure and efforts have been made to identify the

410

sensitivity of asphaltene aggregation toward the molecular architecture. However, the

411

correlation between asphaltene binding energy and multiple structure parameters with

412

statistical significance has not been well determined. This study is one of the first

413

attempts to do so. Unlike conventional statistical regression methods, machine

414

learning approaches do not rely on any assumption of thermodynamic equation before

415

modelling. For example, asphaltene aggregation was often hypothesized as a first or

416

pseudo-first order reaction in some reported models.60,

417

approximation was simple enough, the value of the rate constant was uncertain or

418

even lack for asphaltenes with various molecular structures. By contrast, there is not a

419

parameter like rate constant during machine learning prediction. Instead, it is able to

420

distinguish the given data based on their different patterns and make useful decisions

421

in new data. Therefore, machine learning methods do not have the issue of rate

422

constant encountered in statistical methods. Such methods are particularly useful to

423

handle problems with high non-linearity that will increase the difficulty in

424

determining the rate constant or the form of thermodynamic equations for asphaltene

425

aggregation. While not all parameters associated with the molecular architecture are

426

necessary for understanding the asphaltene aggregation phenomenon, its prediction

21

ACS Paragon Plus Environment

61

Even though such

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

427

still requires detailed informative data in order to assess to what extent each parameter

428

contribute to the aggregation. In the approach proposed in this study, the concomitant

429

effects of parameters were taken into account instead of only focusing one or two

430

parameters by fixing the others. The study helps to identify and reduce the number of

431

parameters that need to focus for predicting asphaltene aggregation.

432 433

4. Conclusions

434

Results demonstrated that the dimerization strength for each type of asphaltene was

435

48 - 91% lower in toluene than in water. Asphaltenes association in water was

436

significantly influenced by seven structural parameters including the linkage per

437

aromatic ring, number of aliphatic chains, number of aromatic carbons, number of

438

aliphatic carbons, mass percentage of heteroatoms in the aliphatic chains, mass

439

percentage of N in the aromatic ring and mass percentage of S in the aromatic ring.

440

The first three were characterized as the most important parameters by the MLP

441

model (R = 0.9243) which outperformed the MLR and SVR models. Five structural

442

parameters were screened for predicting asphaltene association in toluene including

443

the mass percentage of S in the aromatic ring, number of aromatic carbons, number of

444

carboxyl groups, linkage per aromatic ring and number of aliphatic chains, while the

445

first two parameters contributed to 55% of the energy variation for dimerization.

446

Comparison between the two solvents highlighted the strong contribution of the

447

aliphatic side chains to the asphaltene aggregation in water, which was insensitive in

448

toluene. Overall results from this study provided a sound basis for improving the

22

ACS Paragon Plus Environment

Page 22 of 37

Page 23 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

449

asphaltene phase behavior prediction. Further works are needed to investigate how

450

microscopic predictions can be integrated into macroscopic deposition models to

451

allow easy integration with commercially available multiphase flow prediction tools.

452 453

Acknowledgements

454

This study was financially supported by the Shenzhen Peacock Plan Research Grant

455

(KQJSCX20170330151956264) and the Fundamental Research Project of Shenzhen,

456

China (JCYJ20160513103756736).

457

23

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

458

References

459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501

(1) Adams, J. J., Asphaltene Adsorption, a Literature Review. Energy Fuels 2014, 28, 2831-2856. (2) Zhang, X.; Pedrosa, N.; Moorwood, T., Modeling asphaltene phase behavior: comparison of methods for flow assurance studies. Energy Fuels 2012, 26, 2611-2620. (3) Li, X.; He, L.; Wu, G.; Sun, W.; Li, H.; Sui, H., Operational Parameters, Evaluation Methods, And Fundamental Mechanisms: Aspects of Nonaqueous Extraction of Bitumen from Oil Sands. Energy Fuels 2012, 26, 3553-3563. (4) Zhu, X.; Chen, D.; Wu, G., Molecular dynamic simulation of asphaltene co-aggregation with humic acid during oil spill. Chemosphere 2015, 138, 412-421. (5) Zhu, X.; Chen, D.; Wu, G., Insights into asphaltene aggregation in the Na-montmorillonite interlayer. Chemosphere 2016, 160, 62-70. (6) Sedghi, M.; Goual, L.; Welch, W.; Kubelka, J., Effect of asphaltene structure on association and aggregation using molecular dynamics. J. Phys. Chem. B 2013, 117, 5765-5776. (7) Mullins, O. C., Review of the molecular structure and aggregation of asphaltenes and petroleomics. Spe. J. 2008, 13, 48-57. (8) Roux, J. N.; Broseta, D.; Demé, B., SANS study of asphaltene aggregation: concentration and solvent quality effects. Langmuir 2001, 17, 5085-5092. (9) Maqbool, T.; Srikiratiwong, P.; Fogler, H. S., Effect of temperature on the precipitation kinetics of asphaltenes. Energy Fuels 2011, 25, 694-700. (10) Espinat, D.; Fenistein, D.; Barre, L.; Frot, D.; Briolant, Y., Effects of temperature and pressure on asphaltenes agglomeration in toluene. A light, X-ray, and neutron scattering investigation. Energy fuels 2004, 18, 1243-1249. (11) Kuznicki, T.; Masliyah, J. H.; Bhattacharjee, S., Aggregation and partitioning of model asphaltenes at toluene-water interfaces: Molecular dynamics simulations. Energy Fuels 2009, 23, 5027-5035. (12) Rane, J. P.; Harbottle, D.; Pauchard, V.; Couzis, A.; Banerjee, S., Adsorption kinetics of asphaltenes at the oil–water interface and nanoaggregation in the bulk. Langmuir 2012, 28, 9986-9995. (13) Groenzin, H.; Mullins, O. C., Molecular size and structure of asphaltenes from various sources. Energy Fuels 2000, 14, 677-684. (14) da Costa, L. M.; Stoyanov, S. R.; Gusarov, S.; Tan, X.; Gray, M. R.; Stryker, J. M.; Tykwinski, R.; de M. Carneiro, J. W.; Seidl, P. R.; Kovalenko, A., Density Functional Theory Investigation of the Contributions of π–π Stacking and Hydrogen-Bonding Interactions to the Aggregation of Model Asphaltene Compounds. Energy Fuels 2012, 26, 2727-2735. (15) Murgich, J., Intermolecular Forces in Aggregates of Asphaltenes and Resins. Petrol. Sci. Technol. 2002, 20, 983-997. (16) Tan, X.; Fenniri, H.; Gray, M. R., Pyrene derivatives of 2, 2-bipyridine as models for asphaltenes: synthesis, characterization, and supramolecular organization. Energy Fuels 2008, 22, 715-720. (17) Rogel, E., Simulation of interactions in asphaltene aggregates. Energy Fuels 2000, 14, 566-574.. (18) Ungerer, P.; Rigby, D.; Leblanc, B.; Yiannourakou, M., Sensitivity of the aggregation behaviour of asphaltenes to molecular weight and structure using molecular dynamics. Mol. Simulat. 2014, 40, 115-122. (19) Liao, Z.; Zhou, H.; Graciaa, A.; Chrostowska, A.; Creux, P.; Geng, A., Adsorption/occlusion characteristics of asphaltenes: Some implication for asphaltene structural features. Energy fuels 2005, 19, 180-186. (20) Carauta, A. N.; Correia, J. C.; Seidl, P. R.; Silva, D. M., Conformational search and dimerization 24

ACS Paragon Plus Environment

Page 24 of 37

Page 25 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545

study of average structures of asphaltenes. J. Mol. Struc. THEOCHEM 2005, 755, 1-8. (21) Mullins, O. C., The modified Yen model. Energy Fuels 2010, 24, 2179-2207. (22) Jian, C.; Tang, T.; Bhattacharjee, S., Probing the Effect of Side-Chain Length on the Aggregation of a Model Asphaltene Using Molecular Dynamics Simulations. Energy Fuels 2013, 27, 2057-2067. (23) Katritzky, A. R.; Maran, U.; Lobanov, V. S.; Karelson, M., Structurally diverse quantitative structure− property relaKonship correlaKons of technologically relevant physical properties. J. Chem. Inf. Comp. Sci. 2000, 40, 1-18. (24) Sjöblom, J.; Simon, S.; Xu, Z., Model molecules mimicking asphaltenes. Adv. Colloid. Interfac. 2015, 218, 1-16. (25) Wu, G.; Kechavarzi, C.; Li, X.; Wu, S.; Pollard, S. J. T.; Sui, H.; Coulon, F., Machine learning models for predicting PAHs bioavailability in compost amended soils. Chem. Eng. J. 2013, 223, 747-754. (26) Sui, H.; Li, L.; Zhu, X.; Chen, D.; Wu, G., Modeling the adsorption of PAH mixture in silica nanopores by molecular dynamic simulation combined with machine learning. Chemosphere 2016, 144, 1950-1959. (27) Wu, G.; Coulon, F., Modelling the Environmental Fate of Petroleum Hydrocarbons During Bioremediation. In: McGenity TJ, Timmis KN, Nogales B, editors. Hydrocarbon and Lipid Microbiology Protocols: Statistics, Data Analysis, Bioinformatics and Modelling, Springer Berlin Heidelberg; 2016, pp 165-180. (28) Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J., GROMACS: fast, flexible, and free. J. Comput. Chem. 2005, 26, 1701-1718. (29) Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I.; Mackerell, A. D., Jr., CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem. 2010, 31, 671-90. (30) Rane, K. S.; Kumar, V.; Wierzchowski, S.; Shaik, M.; Errington, J. R., Liquid–Vapor Phase Behavior of Asphaltene-like Molecules. Ind. Eng. Chem. Res. 2014, 53, 17833-17842. (31) Gao, F.; Xu, Z.; Liu, G.; Yuan, S., Molecular Dynamics Simulation: The Behavior of Asphaltene in Crude Oil and at the Oil/Water Interface. Energy Fuels 2014, 28, 7368-7376. (32) Mullins, O. C., The asphaltenes. Ann. Rev. Anal. Chem. 2011, 4, 393-418. (33) Aray, Y.; Hernández-Bravo, R.; Parra, J. G.; Rodríguez, J.; Coll, D. S., Exploring the structure– solubility relationship of asphaltene models in toluene, heptane, and amphiphiles using a molecular dynamic atomistic methodology. J. Phys. Chem. A 2011, 115, 11495-11507. (34) Silva, H. S.; Sodero, A. C.; Bouyssiere, B.; Carrier, H.; Korb, J.-P.; Alfarra, A.; Vallverdu, G.; Bégué, D.; Baraille, I., Molecular Dynamics Study of Nanoaggregation in Asphaltene Mixtures: Effects of the N, O, and S Heteroatoms. Energy Fuels 2016, 30, 5656-5664. (35) Mullins, O. C.; Sabbah, H.; Eyssautier, J.; Pomerantz, A. E.; Barré, L.; Andrews, A. B.; Ruiz-Morales, Y.; Mostowfi, F.; McFarlane, R.; Goual, L., Advances in asphaltene science and the Yen–Mullins model. Energy Fuels 2012, 26, 3986-4003. (36) Sabbah, H.; Morrow, A. L.; Pomerantz, A. E.; Zare, R. N., Evidence for island structures as the dominant architecture of asphaltenes. Energy Fuels 2011, 25, 1597-1604. (37) Schuler, B.; Meyer, G.; Pena, D.; Mullins, O. C.; Gross, L., Unraveling the Molecular Structures of Asphaltenes by Atomic Force Microscopy. J. Am. Chem. Soc. 2015, 137, 9870-6. (38) Kästner, J., Umbrella sampling. WIRES Comput. Mol. Sci. 2011, 1, 932-942. (39) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A., The weighted histogram

25

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589

analysis method for free‐energy calculations on biomolecules. I. The method. J. Comput. Chem. 1992, 13, 1011-1021. (40) Alvarez-Ramirez, F.; Ramirez-Jaramillo, E.; Ruiz-Morales, Y., Calculation of the interaction potential curve between asphaltene-asphaltene, asphaltene-resin, and resin-resin systems using density functional theory. Energy Fuels 2006, 20, 195-204. (41) Pahlavan, F.; Mousavi, M.; Hung, A.; Fini, E. H., Investigating molecular interactions and surface morphology of wax-doped asphaltenes. Phys. Chem. Chem. Phys. 2016, 18, 8840-8854. (42) Bussi, G.; Donadio, D.; Parrinello, M., Canonical sampling through velocity rescaling. J. Chem. Phys. 2007, 126, 014101. (43) Parrinello, M.; Rahman, A., Strain fluctuations and elastic constants. J. Chem. Phys. 1982, 76, 2662-2666. (44) Frank, E.; Hall, M.; Holmes, G.; Kirkby, R.; Pfahringer, B.; Witten, I. H.; Trigg, L., Weka-a machine learning workbench for data mining. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. Springer, Boston,MA; 2009, pp 1269-1277. (45) Do, C. B.; Batzoglou, S., What is the expectation maximization algorithm. Nat. Biotechnol. 2008, 26, 897-899. (46) Samari, F.; Yousefinejad, S., Quantitative structural modeling on the wavelength interval (Δλ) in synchronous fluorescence spectroscopy. J. Mol. Struct. 2017, 1148, 101-110. (47) Riahi, S.; Mousavi, M.; Shamsipur, M., Prediction of selectivity coefficients of a theophylline-selective electrode using MLR and ANN. Talanta 2006, 69, 736-740. (48) Bordbar, M.; Ghasemi, J.; Faal, A. Y.; Fazaeli, R., Chemometric modeling to predict aquatic toxicity of benzene derivatives using stepwise-multi linear regression and partial least square. Asian J. Chem. 2013, 25, 331. (49) Fatemi, M. H.; Gharaghani, S.; Mohammadkhani, S.; Rezaie, Z., Prediction of selectivity coefficients of univalent anions for anion-selective electrode using support vector machine. Electrochim. Acta 2008, 53, 4276-4282. (50) Khandelia, H.; Mouritsen, O. G., Lipid gymnastics: evidence of complete acyl chain reversal in oxidized phospholipids from molecular simulations. Biophys. J. 2009, 96, 2734-2743. (51) Fletcher, D.; Goss, E., Forecasting with neural networks: an application using bankruptcy data. Inform. Manag. 1993, 24, 159-167. (52) Chauhan, J. S.; Dhanda, S. K.; Singla, D.; Agarwal, S. M.; Raghava, G. P.; Consortium, O. S. D. D., QSAR-based models for designing quinazoline/imidazothiazoles/pyrazolopyrimidines based inhibitors against wild and mutant EGFR. PLoS One 2014, 9, e101079. (53) Singla, D.; Anurag, M.; Dash, D.; Raghava, G. P., A web server for predicting inhibitors against bacterial target GlmU protein. BMC pharmacol. 2011, 11, 5. (54) Refaeilzadeh, P.; Tang, L.; Liu, H., Cross-validation. In Encyclopedia of database systems, Springer: 2009; pp 532-538. (55) Pisula, W.; Tomovic, Z.; Simpson, C.; Kastler, M.; Pakula, T.; Müllen, K., Relationship between core size, side chain length, and the supramolecular organization of polycyclic aromatic hydrocarbons. Chem. Mater. 2005, 17, 4296-4303. (56) Kuznicki, T.; Masliyah, J. H.; Bhattacharjee, S., Molecular dynamics study of model molecules resembling asphaltene-like structures in aqueous organic solvent systems. Energy Fuels 2008, 22, 2379-2389. (57) Headen TF, Boek ES, Skipper NT. Evidence for asphaltene nanoaggregation in toluene and

26

ACS Paragon Plus Environment

Page 26 of 37

Page 27 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

590 591 592 593 594 595 596 597 598 599 600

heptane from molecular dynamics simulations. Energy Fuels 2009, 23, 1220-1229. (58) Dormann, C. F.; Elith, J.; Bacher, S.; Buchmann, C.; Carl, G.; Carré, G.; Marquéz, J. R. G.; Gruber, B.; Lafourcade, B.; Leitão, P. J., Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 2013, 36, 27-46. (59) Jian, C.; Tang, T.; Bhattacharjee, S., Molecular Dynamics Investigation on the Aggregation of Violanthrone78-based Model Asphaltenes in Toluene. Energy Fuels 2014, 28, 3604-3613. (60) Kurup, A. S.; Vargas, F. M.; Wang, J.; Buckley, J.; Creek, J. L.; Subramani, H., J; Chapman, W. G., Development and application of an asphaltene deposition tool (ADEPT) for well bores. Energy Fuels 2011, 25, 4506-4516. (61) Vargas, F. M.; Creek, J. L.; Chapman, W. G., On the development of an asphaltene deposition simulator. Energy Fuels 2010, 24, 2294-2299.

601

27

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 28 of 37

Table 1 Pearson correlation matrix of different variables (** represents P < 0.01, * represents 0.01 < P < 0.05) MW

N Aro-C

Nlink/ Naro

N Ali-C

HA%

N%

S%

NOH

H/C

NCOOH

N chains

N Ali-CL

MW

1.000

N Aro-C

0.428*

1.000

Nlink/ Naro

0.183

-0.070

1.000

N Ali-C

0.797**

-0.054

0.329*

1.000

HA%

0.461**

0.200

0.039

0.286

1.000

N%

-0.475*

-0.120

-0.178

-0.423*

-0.309

1.000

S%

-0.081

0.096

-0.277

-0.259

-0.283

-0.212

1.000

NOH

-0.166

-0.508**

-0.152

0.083

-0.194

-0.182

0.115

1.000

H/C

0.621**

-0.357*

0.144

0.883**

0.244

-0.287*

-0.250

0.287

1.000

NCOOH

-0.189

0.095

-0.199

-0.430*

-0.176

0.216

0.147

-0.138

-0.515**

1.000

N chains

0.418*

-0.271

-0.128

0.511**

-0.093

-0.099

0.033

0.263

0.673**

-0.154

1.000

N Ali-CL

0.760**

0.057

0.346*

0.824**

0.403*

-0.346*

-0.287

-0.089

0.731**

-0.351*

0.207

1.000

Energy in water

0.387*

0.350*

0.318*

0.170

-0.232

-0.459** 0.425*

-0.139

0.002

0.053

0.346*

0.068

Energy in toluene

0.228

0.561**

-0.053

-0.176

-0.134

-0.263

-0.404*

0.518** 0.009

0.608** -0.212

28

ACS Paragon Plus Environment

-0.234

Page 29 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Energy & Fuels

Table 2 Screening of the structural parameters by stepwise multiple linear regression

(√ represents the parameters selected for the regression; X represents the parameters deleted before input to the regression model) Solvent

MW

N Aro-C

Nlink/ Naro

N Ali-C

HA%

N%

S%

H/C

NOH

NCOOH

N chains

N Ali-CL

VIF

Water

X













X

X

X



X

2.554

Toluene

X





X

X

X



X

X





X

1.148

29

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 37

Table 3 Performance comparison between MLR, MLP and SVR models

Solvent

Model

Parameters

Training dataset

Validation dataset

Water

MLR

R

0.9231

0.8907

RMSE

0.3624

0.6506

R

0.9247

0.9243

RMSE

0.3608

0.4975

R

0.9462

0.9104

RMSE

0.3055

0.5173

R

0.8931

0.9061

RMSE

0.4436

0.4687

R

0.8934

0.9048

RMSE

0.4217

0.4640

R

0.9114

0.8989

RMSE

0.3663

0.4769

MLP

SVR

Toluene

MLR

MLP

SVR

30

ACS Paragon Plus Environment

Page 31 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

A01

A02

A04

A03

S

S

S N

A05

A06

A10-12

A07-09 O

HO

OH

O X

X X

S

X

A14

X

X = O, N, S

X = O, N, S

A13

X

A16

A15

S

N

S

N N H

N

S

A17

A18

HOOC

O

O

N

N

O

O

OH

A19

A24

A21-23

A20

O

C 7H15 C 7H15

O

O

S

O

O

N

O

O OO

CnH2n+1

CnH2n+1

O

n = 1, 4, 8

A25

A26

O

A27

A28

O

HO

HO

S

N O

N

N

HO

OH

Fig. 1 Molecular structure of the 28 types of asphaltene models used in the study

31

ACS Paragon Plus Environment

Energy & Fuels

-2 0

-1

-1

15

(a ) PMF (kJ mol )

0

PMF (kJ mol )

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

-4 0 -6 0

Page 32 of 37

(b )

A01 A03 A05 A07 A09 A11 A13 A15 A17 A19 A21 A23 A25 A27

0

-1 5

-3 0

-8 0 0 .0

0 .5

1 .0

1 .5

-4 5 0 .0

2 .0

0 .6

Ewater Etoluene Ewater Etoluene

A02 76.3 19.4 A16 22.3 2.5

A03 57.4 17.2 A17 34.7 17.6

A04 55.0 8.8 A18 49.9 7.9

1 .8

Z (n m )

Z (n m ) A01 72.3 14.9 A15 81.4 28.0

1 .2

A02 A04 A06 A08 A10 A12 A14 A16 A18 A20 A22 A24 A26 A28

A05 49.7 13.3 A19 72.6 16.8

A06 81.5 22.6 A20 65.5 15.8

A07 67.6 28.0 A21 52.6 18.6

A08 45.1 21.5 A22 47.4 17.3

A09 74.3 39.0 A23 39.7 13.4

A10 62.4 18.6 A24 43.0 18.7

A11 32.1 13.7 A25 51.9 21.3

Fig. 2 PMF curves of asphaltene dimers in (a) water and (b) toluene. Data in the table shows the free energy of dimerization (kJ mol-1) for each type of asphaltene.

32

ACS Paragon Plus Environment

A12 77.1 28.5 A26 48.8 13.8

A13 29.9 2.7 A27 40.6 8.1

A14 49.5 16.4 A28 42.8 14.6

Page 33 of 37

10

0 (a) A06

(b) A17 0 PMF (kJ mol )

-20

-1

PMF (kJ mol-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Energy & Fuels

-40 -60

-10 -20 -30

-80 0.4

0.8

1.2 Z (nm)

1.6

-40 0.0

2.0

0.4

0.8

1.2

1.6

2.0

Z (nm)

Fig. 3 PMF and structure of asphaltene dimers in water (A06 and A17 represent two examples selected from the 28 types of asphaltenes)

33

ACS Paragon Plus Environment

Energy & Fuels

80

(a) MLR

60 40 Training Validation

20 20

Predicted values (kJ mol-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 37

80

40

60

80

(b) MLP

60 40 Training Validation

20 20

40

60

80

80 (c) SVR 60 40 Training Validation

20 20

40

60

80

Input values (kJ mol-1) Fig. 4 Input and predicted values of asphaltene dimerization energy in water

34

ACS Paragon Plus Environment

Page 35 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

1.6

R_TR RMSE_TR R_VR RMSE_VR

1.2

0.8

0.4

0.0 0.001 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

Learning rate

Fig. 5 Variation of R and RMSE for the training (TR) and validation (VR) datasets with different learning rates during MLP training process

35

ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

(a)

Nali-C

Page 36 of 37

Nchains

MLP MLR

S% N%

Nlink/Naro

HA%

NCOOH

Nchains

NAro-C

(b)

NAro-C S%

Nlink/ Naro 0

5 10 15 Relative importance (%)

0

20

7

14 21 28 Relative importance (%)

35

Fig. 6 Relative importance of structural parameters to the asphaltene dimerization in (a) water and (b) toluene. In panel A, only the results calculated by MLP model are shown because this model outperforms the other models.

36

ACS Paragon Plus Environment

Page 37 of 37

40 (a) MLR 30 20 10 0

Predicted values (kJ mol-1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

Training Validation

0

10

20

30

40

(b) MLP

40 30 20 10 0

Training Validation

0

10

20

30

40

40 (c) SVR 30 20 10 Training Validation

0

0

10

20

30

40

Input values (kJ mol-1) Fig. 7 Input and predicted values of asphaltene dimerization energy in toluene

37

ACS Paragon Plus Environment