Subscriber access provided by UNIV OF DURHAM
Fossil Fuels
Correlating asphaltene dimerization with its molecular structure by potential of mean force calculation and data mining Xinzhe Zhu, Guozhong Wu, Frederic Coulon, Lvwen Wu, and Daoyi Chen Energy Fuels, Just Accepted Manuscript • DOI: 10.1021/acs.energyfuels.8b00470 • Publication Date (Web): 11 Apr 2018 Downloaded from http://pubs.acs.org on April 11, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1
Correlating asphaltene dimerization with its molecular structure
2
by potential of mean force calculation and data mining
3
Xinzhe Zhu a, b, Guozhong Wu a, b, Frédéric Coulon c, Lvwen Wu d, Daoyi Chen a, b, *
4 5 6 7 8 9 10 11 12
a
Division of Ocean Science and Technology, Graduate School at Shenzhen, Tsinghua
University, Shenzhen 518055, China b
School of Environment, Tsinghua University, Beijing 100084, China
c
School of Water, Energy and Environment, Cranfield University, Cranfield MK43
0AL, UK d
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong
University, Shanghai 200240, China
13 14 15
*Corresponding Author:
16
Phone: +86 0755 2603 0544
17
Fax: +86 0755 2603 0544
18
Email:
[email protected] 19 20 21
1
ACS Paragon Plus Environment
Energy & Fuels
22
0 in water in toluene
-1
PMF (kJ mol )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Page 2 of 37
Molecular structure
-40 ∆G
-80 0.0
0.5
1.0 1.5 Distance (nm)
MACHINE LEARNING
2
ACS Paragon Plus Environment
2.0
Dimerization prediction
Page 3 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
23
Abstract: Asphaltene aggregation affects the entire production chain of
24
petrochemical industry, which also poses environmental challenges for oil pollution
25
remediation. The aggregation process has been investigated for decades, but it
26
remains unclear how the free energy of asphaltene association in solvents is correlated
27
with its molecular structure. In this study, dimerization energy of 28 types of
28
asphaltenes in water and toluene were calculated using the umbrella sampling method.
29
Structural parameters related to the atom types and functional groups were screened to
30
identify the factors most influencing the dimerization energy using multiple linear
31
regression,
32
demonstrated that the influences of molecular structure on asphaltene association in
33
water was nonlinear, while attempts to capture the relationship using linear regression
34
had large error. The linkage per aromatic ring, number of aromatic carbons and
35
aliphatic chains were the top three factors accounting for 52% of the dimerization
36
energy variation in water. Asphaltene dimerization in toluene was dominated by the
37
content of sulfur in aromatic rings and the number of aromatic carbons which
38
contributed to 55% of the energy variation. To the best of our knowledge, this was the
39
first study successfully predicting asphaltene dimerization using molecular structure
40
(R > 0.9) and quantifying simultaneously the relative importance of each structural
41
parameter. The proposed modelling approach supported the decision making on the
42
number of structural parameters to investigate for predicting asphaltene aggregation.
43
Keywords: Asphaltene aggregation; PMF; Multiple linear regression; Multi-layer
44
perceptron; Support vector regression
multi-layer
perceptron
and
support
vector
3
ACS Paragon Plus Environment
regression.
Results
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
45
1. Introduction
46
Asphaltenes are the largest, heaviest and most polar fraction of crude oil, which
47
usually contain poly-condensed aromatic ring linked by aliphatic chains and
48
heteroatoms like nitrogen, sulfur and oxygen. Such structural properties make them
49
easy to aggregate, flocculate and deposit even at low concentrations, which affects the
50
entire production chain of petrochemical industry.1 A large number of studies have
51
been carried out to address the flow-assurance issues arose from asphaltene
52
aggregation.2 Our recent studies also demonstrated the environmental significance of
53
asphaltene aggregation process on the fate and transport of oil pollutants in the
54
contaminated soils. For example, asphaltene aggregates could form porous network in
55
which the small oil molecules are sequestrated and become difficult to remove. 3 They
56
also have the tendency to co-aggregate with the soil organic matter (SOM), a key
57
component of soil responsible for the persistence of recalcitrant residual oils in the
58
oil-soil matrix, at the oil-water interface and in the bulk water.4 The formation of
59
asphaltene-humic acids complex decrease the diffusion coefficient and increases the
60
hydrophilicity of the asphaltenes in the clay pores.5 These findings highlight the
61
potential role of asphaltene aggregation on the extractability and bioavailability of the
62
residual oil in the aged soil.
63
Successful prediction of asphaltene macro-scale behaviour requires an understanding
64
of the aggregation in terms of micro-scale mechanism and aggregation strength.
65
Without this fundamental knowledge, any attempt to model the environmental
66
behaviour of asphaltenes will be incomplete or inaccurate. Previous studies indicated
4
ACS Paragon Plus Environment
Page 4 of 37
Page 5 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
67
that asphaltene aggregation is influenced by many factors such as the molecular
68
structure and concentration of asphaltenes,6-8 temperature and pressure,9,
69
solvent types.11, 12 Among these factors, the molecular structure of asphaltenes are of
70
particular interest due to the great diversity and complexity of asphaltene from various
71
sources.13 For instance, it was estimated that the number of rings in a single
72
asphaltene fused ring system can range from a few rings to up to 20 rings.13
73
Several studies have been carried out to investigate the relationship between
74
asphaltene aggregation and its molecular structure using molecular simulation
75
techniques such as classical molecular mechanics, molecular dynamics (MD)
76
simulation and density functional theory method in quantum mechanics. Generally,
77
the π-π interaction between poly-aromatic rings was acknowledged as the main
78
association force.4, 14-16 The lower H/C ratio and the higher molecular weight and
79
aromaticity were also proved to favour the asphaltene aggregation.17, 18 However, the
80
contribution of some other structural parameters to the asphaltene aggregation varied
81
against studies with different simulation conditions. For example, hydrogen bonding
82
has been proposed as one of the driving forces for asphaltene association in toluene or
83
crude oil,
84
observed during the asphaltene dimerization in vacuum condition.20 According to the
85
modified Yen model,21 the steric repulsion between aliphatic side chains was a force
86
limiting asphaltene aggregation in toluene or heptane. In contrast other studies
87
demonstrated that the asphaltene aggregates formation in water was favored by the
88
very long or very short side chains in the asphaltene molecules.22
15, 19
10
and
but in other studies the intermolecular hydrogen bonds were not
5
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
89
Another issue that was less taken into account in previous studies was the existence of
90
co-factors effects among the molecular structure parameters. For example, the same
91
backbone structure was used by only changing the side-chain length,22 but actually the
92
varied side-chain length also led to the changes in the molecular weight and the ratio
93
of hydrogen to carbon which might have synergistic or antagonistic effects on the
94
asphaltene aggregation. If the side-chain length was investigated without changing the
95
latter two parameters, then the number of chains had to be altered to keep mass
96
balance. When more than one structural parameter varied simultaneously, the relative
97
contribution of each parameter to the binding energy of two asphaltene molecules
98
remained unelucidated. To the best of our knowledge, this was mainly due to the lack
99
of statistical data. As a way forward it is suggested to develop multi-factor regression
100
to evaluate the importance of concomitant molecular structure parameters on the
101
strength of asphaltene dimerization.
102
Quantitative structure-property relationship (QSPR) modeling is a useful tool to
103
predict the relationships between molecular structures and the properties.23 One
104
challenge is the selection of a number of molecular models for asphaltenes with
105
different structures, which is then used to calculate the free energy for the binding of
106
each pair of asphaltenes and to establish mathematical models correlating these
107
energies with the structural parameters. Although the asphaltenes are complex
108
heterogeneous fractions, a variety of asphaltene model molecules have been proposed
109
by many research groups that can mimic the properties of asphaltenes,24 making it
110
possible to address the above issue. In the present study, 28 asphaltene models from
6
ACS Paragon Plus Environment
Page 6 of 37
Page 7 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
111
the literatures were selected to explore the relationship between asphaltene
112
dimerization and its molecular structure using conventional statistical methods (e.g.
113
multiple linear regression) and machine learning methods (e.g. multi-layer perceptron
114
and support vector regression) that have been successfully used in our previous
115
studies for predicting environmental phenomenon.25-27 Specific objectives of this
116
study were to screen structural parameters that were significant to the asphaltene
117
dimerization, predict asphaltene dimerization by structural parameter, and quantify the
118
relative importance of each structural parameter to the asphaltene dimerization.
119 120
2. Methodology
121
2.1 Molecular dynamics simulation
122
MD simulations were performed using the GROMACS 5.1 software,28 while the
123
molecular interactions was calculated by the CHARMM36 force field.29, 30 As shown
124
in Fig. 1, 28 types of asphaltene molecules with different structures were selected
125
from literatures.6, 11, 22, 31-34 Each molecule contained a single polyaromatic core with
126
side chains, because previous experiments evidenced that such “continental model”
127
was the dominant asphaltene architecture, providing the main aspects proposed by the
128
Yen-Mullins model.35 For example, using the Laser desorption laser ionization mass
129
spectra, Sabbah et al.36 demonstrated that all the 23 types of model compounds having
130
one aromatic core showed little or no fragmentation, which well-mimicked the
131
behavior observed for the 2 petroleum asphaltene samples. However, all model
132
compounds with more than one aromatic core showed energy-dependent
7
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 37
133
fragmentation, suggesting that the “archipelago models” were not dominant in
134
asphaltenes. Recently, using the atomic force microscopy and scanning tunnelling
135
microscopy, Schuler et al.37 measured the structure of more than 100 asphaltene
136
molecules, which also concluded that a single aromatic core was the dominant
137
asphaltene architecture although in some cases the “archipelago structure” was also
138
present.
139
The free energy of dimerization of asphaltene molecules in water and toluene was
140
obtained using the umbrella sampling method.6,
141
umbrella
142
(http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/umbrella/in
143
dex.html). Before umbrella sampling, the configuration of each asphaltene dimer was
144
obtained from a 2 ns MD simulation, where each two asphaltene molecules were
145
randomly placed in a simulation box (4 nm × 4 nm × 4 nm) filled with water or
146
toluene. The coordinates of asphaltene dimer at the end of MD simulation was
147
extracted and located in the edge of a new simulation box (4 nm × 4 nm × 7.5 nm)
148
with the aromatic rings parallel to the x-y plane. The simulation box was filled with
149
water and toluene, respectively. It was then equilibrated by running MD simulation at
150
NVT and NPT ensemble for 1 ns, respectively, where the asphaltenes were kept
151
restrained. The center of mass (COM) pulling was employed to generate a series of
152
initial configurations for each umbrella window. The COM of one asphaltene
153
molecules was pulled along the Z-axis at the rate of 2.6 nm ns-1 for 1 ns, while the
154
other one was restrained with a harmonic force constant of 1000 kJ mol-1 nm-2.
sampling
was
available
from
38
the
A step-by-step guide for the Gromacs
8
ACS Paragon Plus Environment
tutorials
website
Page 9 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
155
Approximately 35 windows were selected with a space of 0.05 nm between two
156
consecutive distances from the resulted pulling simulation. For each window, separate
157
MD simulation (10 ns) was performed using the biased umbrella potentials to restrain
158
the molecules within the window. The potential of mean force (PMF) was calculated
159
from unbiased probability distributions of the systems with the weighted histogram
160
analysis method.39 The free energy for asphaltene dimerization was obtained from the
161
PMF curves, which was defined as the free energy when two asphaltene monomers
162
moved from very far away (i.e., distance larger than 1.8 nm when the interaction
163
between two monomers was negligible) to the most stable state with the lowest energy.
164
Accordingly to previous studies, the most stable structure of asphaltenes dimers were
165
formed when the two monomers were parallel stacked and the distance between each
166
other was about 0.4 nm.4, 40, 41 The absolute values of the dimerization energy were
167
used for the following data processing and results discussions. A larger value
168
suggested an easier aggregation of asphaltenes. Throughout simulations, the
169
temperature (298 K) and pressure (1 bar) were controlled using the V-rescale
170
thermostat42 and Parrinello-Rahman method43, respectively.
171 172
2.2 Data preprocessing
173
Structural parameters definition: In order to gain insights into the influence of
174
molecular structures on the binding energy of asphaltene dimer, eleven molecular
175
structural parameters were selected as follows: molecular weight (MW), number of
176
aromatic carbons (N Aro-C), number of aliphatic chains (N
9
ACS Paragon Plus Environment
chains),
number of aliphatic
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
177
carbons (N Ali-C), number of carbons in the longest chains (N Ali-CL), mass percentage
178
of heteroatoms in the aliphatic chains (HA%) , mass percentage of N in the aromatic
179
ring (N%), mass percentage of S in the aromatic ring (S%), the number of hydroxyl
180
groups (NOH), the ratio of hydrogen to carbon (H/C), number of carboxyl groups
181
(NCOOH), and the linkage per aromatic ring where a linkage was counted when each
182
two aromatic rings shared one edge (Nlink/ Naro).
183
Normalization: To eliminate the effects from the units of different variables during
184
regression, the data was standardized to the same scale with Z-score normalization
185
according to the following equation: ∗ =
− ̅
186
where ∗ and represent the standardized value and initial value of the variable,
187
̅ is the mean value of the variable, is the standard error of the variable. The
188
standardized value of each parameter obeyed a normal distribution.
189
Cluster analysis: Cluster analysis was performed to split the dataset into several
190
classes using the WEKA program package44 with the expectation maximization
191
algorithm.45 The molecular structures were similar in one class and different from that
192
in other classes. Molecules (80%) were randomly selected from each class for
193
regression, while the remaining 20% molecules were used for external validation.
194
Structural parameters screening: when multiple influence factors coexisted, there
195
might be multicollinearity among them, which would impact the accuracy of
196
regression model and should be removed before model development. The first step
197
was to carry out bivariate correlation between each two parameters and between each
10
ACS Paragon Plus Environment
Page 10 of 37
Page 11 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
198
parameter and the dimerization free energy. If the correlation coefficient between two
199
parameters exceed 0.95, the one with poorer correlation with the dimerization free
200
energy was removed and the other was retained.46 Next step was to perform stepwise
201
multiple linear regression (SMLR) to identify the significant parameters to the
202
dimerization free energy among the above screened parameters.47, 48 Briefly, partial
203
F-statistics were computed for each parameter and the one with the highest F-value
204
was inserted into the model. The partial F-statistics were computed again for all the
205
remaining parameters and the one with the highest F-value was added to the model if
206
the corresponding F-value exceeded a specified threshold value. These two
207
parameters in the model were evaluated with partial F-test to see if each one was still
208
significant. The parameter was removed from the model if it was no longer significant.
209
This procedure was repeated for the remaining parameters until no more parameters
210
yielded a partial F-value greater than the threshold and all parameters in the model
211
remained significant.49 The parameters screening process was carried out with the
212
SPSS (version 22.0), from which the selected parameters were used for regression
213
models building in the next subsection.
214 215
2. 3 Regression and validation
216
Multiple linear regression (MLR): MLR was employed to predict the relationship
217
between the molecular structure parameters and the binding energy, which could be
218
expressed by the following equation: y = + + ⋯ + + , where the
219
, ,…, were the regression coefficients, and was the intercede of the
11
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
220
regression line.
221
Multi-layer perceptron (MLP): MLP is one of the most commonly used artificial
222
neural network (ANN) models, consisting of an input layer, hidden layers and an
223
output layer. Each layer consists of nodes that are connected with a certain weight to
224
every node in the following layer. Except for the input nodes, each node is a nonlinear
225
activation function that enables the network to compute complex nonlinear problems.
226
The connection weights are changed after each data passing through the nodes in the
227
network using the 'back propagation' technique. By this way, the error in the output
228
compared to the expected result is minimized. This process is termed as training,
229
which stops automatically when no more decrease in the errors of cross-validation
230
samples. To seek for the model with high precision, two parameters are optimized by
231
trial and error during the training process including (i) the learning rate, which
232
controls the degree of changes in the connection weights during each iteration,50 and
233
(ii) the number of nodes in the hidden layer, which ranges from (2n1/2 + m) to (2n + 1)
234
where n and m represented the number of input node and output node, respectively.51
235
Support vector regression (SVR): SVR is a regression technique with excellent
236
performances in regression and time series prediction application, which is capable of
237
rearranging the input data from nonlinear to linear using mathematical functions
238
known as kernels. In this study, the SMOreg with a RBF kernel or poly-kernel
239
function was also used to develop regression model, which could transform nominal
240
attributes into binary ones and had been successfully applied to predict the QSPR
241
models.52, 53 The complexity parameter, which determines the trade-off between the
12
ACS Paragon Plus Environment
Page 12 of 37
Page 13 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
242
model complexity and the tolerance of errors, was optimized by a number of trials.
243
Validation: The 10-fold cross-validation was used to test the performance of the
244
models.54 Briefly, the datasets used for training were evenly split into 10 folds. The
245
instances from 9 folds were used for training while the remaining one fold was used
246
for testing. This process was repeated 10 times using a different fold for testing at
247
each cycle. The effectiveness of the model training was assessed by the correlation
248
coefficient (R) and root mean squared error (RMSE). To validate the predictability of
249
the model, these two parameters were also calculated for those validation datasets not
250
used in the training process.
13
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
251
3. Results and discussion
252
3.1 Free energy for asphaltene dimerization
253
Fig. 2 shows the changes in the PMF during the dimerization process of 28 asphaltene
254
molecules in water and toluene, respectively. Despite the variances in the shape of the
255
PMF curves, the deepest well of all the PMF curves were observed at 0.3 ~ 0.5 nm,
256
which corresponded to the separation distance between the centers of mass of each
257
two asphaltene molecules parallel stacking due to the π-π interactions in the aromatic
258
regions.4,
259
asphaltene dimerization and therefore offered the stable configuration for the
260
asphaltene dimers in both water and toluene. By a closer examination of the structure
261
of asphaltene dimers at different positions on the PMF curves (two examples are
262
shown in Fig. 3), we noted that the stacking manner between the two monomers
263
changed by rotation during pulling process. In other words, if two monomers were
264
initially located as the T-shape stacking, they would spontaneously transform to the
265
parallel stacking when they moved closer to each other.
266
The free energy of dimerization was determined by the difference between the
267
minimum value of PMF (r = 0.3 ~ 0.5 nm) and the value of PMF at higher distances
268
when it reached a plateau (r > 1.8 nm). The calculated binding energy of asphaltenes
269
with different molecular structures, which ranged from 22.3 to 81.5 kJ mol-1 in water
270
and from 2.5 to 39.0 kJ mol-1 in toluene (Fig. 2), respectively. Particularly, the
271
dimerization free energy for each specific type of asphaltene was 48 - 91% lower in
272
toluene than in water (Fig. 2). This might be attributed to the interactions between the
55, 56
It suggested that such type of stacking required low energy for
14
ACS Paragon Plus Environment
Page 14 of 37
Page 15 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
273
benzene ring of toluene molecules with the aromatic sheets of asphaltenes through π -
274
π interaction.57 It suggested a greater tendency for asphaltenes aggregation in water
275
than in toluene, which was consistent with previous studies.4, 56
276 277
3.2 Bivariate correlation and structural parameters screening
278
A preliminary examination of the bivariate correlation indicated that the absolute
279
value of the Pearson correlation coefficient between the free energy of asphaltene
280
dimerization in water and each individual structural parameter ranged from 0.068 to
281
0.459 (Table 1). The mass percentage of nitrogen in the aromatic rings was
282
characterized as the most significant parameter (p < 0.01), while relatively less
283
significance was found for the mass percentage of sulfur in the aromatic rings,
284
molecular weight, number of aromatic carbons, linkage per aromatic ring, and number
285
of aliphatic chains (p < 0.05). The remaining five parameters showed insignificant
286
influence on the asphaltene dimerization in water (p > 0.05). By contrast, only four
287
parameters were identified as significant factors influencing asphaltene dimerization
288
in toluene including the mass percentage of sulfur in the aromatic rings, number of
289
aromatic carbons, ratio of hydrogen to carbon and number of carboxyl groups. It
290
indicated that some insignificant parameters became significant after changing the
291
solvent. For example, the Pearson correlation coefficient between the dimerization
292
free energy in toluene and the number of carboxyl groups was 0.518, which decreased
293
to 0.053 in water (Table 1). It might be attributed to the fact that the hydrogen
294
bonding between the carboxylic groups of two asphaltene molecules was a driving
15
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
295
force for its dimerization in toluene, but dimerization driven by this way was less in
296
water because the carboxylic groups in asphaltene would also form hydrogen bonds
297
with water.
298
Results also demonstrated that there were significant correlation among some
299
independent variables. For example, the molecular weight was significantly correlated
300
with all the other parameters except the linkage per aromatic ring, the mass
301
percentage of sulfur in the aromatic rings and the number of carboxyl groups (Table
302
1). The number of aliphatic carbons was highly correlated with the molecular weight,
303
the ratio of hydrogen to carbon and the number of carbon in the longest chains with
304
the corresponding Pearson correlation coefficient of 0.797, 0.883 and 0.824,
305
respectively. This meant that the changes in one structural parameter would result in
306
the changes in some other structural parameter. Therefore, when we tried to interpret
307
the effects of one specific structural parameter on the asphaltene aggregation, we
308
should take into account the multi-factor interactions between this parameter and
309
other parameters. Moreover, the Pearson correlation coefficients among independent
310
variables were not large enough to allow removing any one before input to the
311
regression models. According to the parameter screening protocol (see section 2.2), it
312
remained impossible at this stage to remove any of these structural parameters
313
because the Pearson correlation coefficients between each two structural parameters
314
were all less than 0.95 (Table 1).
315
To reduce the parameters to a suitable number without losing any important
316
information, we further performed SMLR analysis using the dimerization free energy
16
ACS Paragon Plus Environment
Page 16 of 37
Page 17 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
317
data in the water and toluene, respectively. After removal of the non-significant
318
parameters, seven and five parameters were selected to correlate with dimerization
319
energy in water and in toluene, respectively (Table 2). The variance inflation factor
320
(VIF) was less than the
321
parameters were of no multicollinearity.58
322
According to the similarity in the molecular structure, the 28 asphaltene molecules
323
were divided into three representative classes by cluster analysis, in which the number
324
of asphaltene molecules were 8, 8 and 12, respectively. From each class, 80%
325
molecules were randomly selected for regression model development and the
326
remaining molecules were used for model validation.
threshold value of 10 suggesting that the remaining
327 328
3.3 Prediction of asphaltene dimerization in water
329
MLR: The free energy of asphaltene dimerization in water was correlated with the
330
screened structural parameters using the MLR model as follows: Ewater = 0.6168
331
NAro-C + 0.5819 Nlink/Naro - 0.3979 Nali-C - 0.2488 HA% - 0.4006 N% + 0.4042 S% +
332
0.695 Nchains - 0.0012. The R and RMSE for the training dataset calculated by 10-fold
333
cross validation was 0.9231 and 0.3624, respectively (Table 3). These two parameters
334
were about 4% lower and 80% higher for the validation dataset, respectively,
335
compared with that during model training. It should be noted that the energy predicted
336
from this equation was dimensionless due to the normalization treatment before
337
modelling. It could be transformed to the energy with units (kJ mol-1) using the mean
338
value and standard error as indicated in Section 2.2. The input and predicted free
17
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
339
energies are shown in Fig. 4.
340
MLP: The well trained MLP network consisted of 7, 6 and 1 nodes in the input layer,
341
hidden layer and output layer, respectively. The optimized learning rate was 0.02.
342
Higher or lower learning rate both resulted in larger error in the training and
343
validation datasets. For example, when the learning rate increased from 0.02 to 0.05,
344
the RMSE for the validation data almost doubled (Fig. 5). After training the data with
345
these optimal parameters, the R and RMSE was 0.9247 and 0.3608, respectively (Fig.
346
4b). Little difference was observed in the R between training and validation data, but
347
the RMSE was about 14% higher in the latter.
348
SVR: The optimized SVR model was obtained using the complexity parameter of
349
1200 and the RBF Kernel with gamma of 0.008. The R for the training and validation
350
datasets was 0.9462 and 0.9104, respectively (Table 3). The RMSE for the training
351
and validation datasets was 0.3055 and 0.5173, respectively.
352
Model comparison: The performance of the three models for predicting the
353
dimerization free energy in water was compared by the RMSE for the validation
354
dataset (Table 3). It demonstrated that the MLP model had the best performance with
355
the lowest RMSE. The highest RMSE was observed in the MLR model, which was 13
356
- 32% higher than that in the other two models. This finding suggested that the
357
influences of asphaltene molecular structure on its aggregation in water was more
358
likely nonlinear, while attempts to capture the relationship using a linear regression
359
method might resulted in larger error. Accordingly, the relative contribution of the
360
structure descriptors to the predictive dimerization free energy was calculated using
18
ACS Paragon Plus Environment
Page 18 of 37
Page 19 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
361
the hidden-input and hidden-output connection weights resulted from the MLP
362
modelling.25 As shown in Fig. 6a, the linkage per aromatic ring was characterized as
363
the most important factor which accounted for 18% of the changes in the dimerization
364
free energy. Similar importance was found between the number of aromatic carbons
365
and the number of aliphatic chains. The least important factor was the number of
366
aliphatic carbons which contributed to less than 10% of the asphaltene dimerization.
367 368
3.4 Prediction of asphaltene dimerization in toluene
369
MLR: The free energy of asphaltene dimerization in toluene was correlated with the
370
five selected parameters using the MLR model as follows: Etoluene = 0.5192 NAro-C +
371
0.1745 Nlink/Naro + 0.5757 S% + 0.4413 NCOOH + 0.1817 Nchains - 0.0203. The R and
372
RMSE for the training dataset was 0.8931 and 0.4436, respectively (Table 3). For the
373
validation dataset, little change was found in the R while the RMSE increased by
374
about 6%. The input and predicted free energies are shown in Fig. 7.
375
MLP: The well trained MLP network consisted of 5, 6 and 1 nodes in the input layer,
376
hidden layer and output layer, respectively. The optimized learning rate was 0.04. The
377
R and RMSE for the training dataset was 0.8934 and 0.4217, respectively (Table 3).
378
Little difference was observed in the R between training and validation data, but the
379
RMSE was about 10% higher in the latter.
380
SVR: The optimized SVR model was obtained using the polynomial kernel with
381
exponent of 0.72. The R for the training and validation datasets was 0.9114 and
382
0.8989, respectively (Table 3). The RMSE for the training datasets was 0.3663 which
19
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
383
increased by 30% for the validation dataset.
384
Model comparison: As we did in the water, the predictability of the three models for
385
dimerization free energy in toluene was compared by the RMSE for the validation
386
dataset (Table 3). The highest RMSE was found in the SVR model, while very little
387
difference (~ 1%) was observed between MLR and MLP. Same tendency was
388
observed in the relative contribution of each structural parameter to the predicted
389
dimerization free energy between MLR and MLP (Fig. 6b). As suggested by both
390
models, the mass percentage of sulfur in the aromatic ring and the number of aromatic
391
carbons were recognized as the two most important factor which contributed to more
392
than 55% for the asphaltene aggregation (Fig. 6b).
393
It was interesting to find that (i) the number of aromatic carbons was strongly
394
correlated with the asphaltene dimerization both in water and toluene, suggesting the
395
attraction between poly-aromatic cores was the main driving force in both cases; (ii)
396
the number of carboxyl groups had ignorable influence on the asphaltene aggregation
397
in water, but it significantly influence the asphaltene aggregation in toluene with
398
about 20% contributions. This was attributed to the hydrogen bonding phenomenon as
399
aforementioned in the bivariate correlation; (iii) the top two most important factors
400
(linkage per aromatic ring and number of aliphatic chains) for asphaltene aggregation
401
in water became the least important in toluene. It suggested that the interactions
402
between aliphatic chains contributed little to the strength of aggregation in toluene,
403
but played very important roles for aggregation in water. This finding was consistent
404
with previous study which demonstrated that the contacts between aliphatic chains in
20
ACS Paragon Plus Environment
Page 20 of 37
Page 21 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
405
water was 2.6 - 5.7 folds of that in toluene. 22, 59
406
Overall results indicated that the combination of MD simulation with machine
407
learning provided a potential way to predict asphaltene dimerization using the
408
molecular structure parameters. It is well-acknowledged that asphaltene aggregation
409
highly depend upon its molecular structure and efforts have been made to identify the
410
sensitivity of asphaltene aggregation toward the molecular architecture. However, the
411
correlation between asphaltene binding energy and multiple structure parameters with
412
statistical significance has not been well determined. This study is one of the first
413
attempts to do so. Unlike conventional statistical regression methods, machine
414
learning approaches do not rely on any assumption of thermodynamic equation before
415
modelling. For example, asphaltene aggregation was often hypothesized as a first or
416
pseudo-first order reaction in some reported models.60,
417
approximation was simple enough, the value of the rate constant was uncertain or
418
even lack for asphaltenes with various molecular structures. By contrast, there is not a
419
parameter like rate constant during machine learning prediction. Instead, it is able to
420
distinguish the given data based on their different patterns and make useful decisions
421
in new data. Therefore, machine learning methods do not have the issue of rate
422
constant encountered in statistical methods. Such methods are particularly useful to
423
handle problems with high non-linearity that will increase the difficulty in
424
determining the rate constant or the form of thermodynamic equations for asphaltene
425
aggregation. While not all parameters associated with the molecular architecture are
426
necessary for understanding the asphaltene aggregation phenomenon, its prediction
21
ACS Paragon Plus Environment
61
Even though such
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
427
still requires detailed informative data in order to assess to what extent each parameter
428
contribute to the aggregation. In the approach proposed in this study, the concomitant
429
effects of parameters were taken into account instead of only focusing one or two
430
parameters by fixing the others. The study helps to identify and reduce the number of
431
parameters that need to focus for predicting asphaltene aggregation.
432 433
4. Conclusions
434
Results demonstrated that the dimerization strength for each type of asphaltene was
435
48 - 91% lower in toluene than in water. Asphaltenes association in water was
436
significantly influenced by seven structural parameters including the linkage per
437
aromatic ring, number of aliphatic chains, number of aromatic carbons, number of
438
aliphatic carbons, mass percentage of heteroatoms in the aliphatic chains, mass
439
percentage of N in the aromatic ring and mass percentage of S in the aromatic ring.
440
The first three were characterized as the most important parameters by the MLP
441
model (R = 0.9243) which outperformed the MLR and SVR models. Five structural
442
parameters were screened for predicting asphaltene association in toluene including
443
the mass percentage of S in the aromatic ring, number of aromatic carbons, number of
444
carboxyl groups, linkage per aromatic ring and number of aliphatic chains, while the
445
first two parameters contributed to 55% of the energy variation for dimerization.
446
Comparison between the two solvents highlighted the strong contribution of the
447
aliphatic side chains to the asphaltene aggregation in water, which was insensitive in
448
toluene. Overall results from this study provided a sound basis for improving the
22
ACS Paragon Plus Environment
Page 22 of 37
Page 23 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
449
asphaltene phase behavior prediction. Further works are needed to investigate how
450
microscopic predictions can be integrated into macroscopic deposition models to
451
allow easy integration with commercially available multiphase flow prediction tools.
452 453
Acknowledgements
454
This study was financially supported by the Shenzhen Peacock Plan Research Grant
455
(KQJSCX20170330151956264) and the Fundamental Research Project of Shenzhen,
456
China (JCYJ20160513103756736).
457
23
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
458
References
459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501
(1) Adams, J. J., Asphaltene Adsorption, a Literature Review. Energy Fuels 2014, 28, 2831-2856. (2) Zhang, X.; Pedrosa, N.; Moorwood, T., Modeling asphaltene phase behavior: comparison of methods for flow assurance studies. Energy Fuels 2012, 26, 2611-2620. (3) Li, X.; He, L.; Wu, G.; Sun, W.; Li, H.; Sui, H., Operational Parameters, Evaluation Methods, And Fundamental Mechanisms: Aspects of Nonaqueous Extraction of Bitumen from Oil Sands. Energy Fuels 2012, 26, 3553-3563. (4) Zhu, X.; Chen, D.; Wu, G., Molecular dynamic simulation of asphaltene co-aggregation with humic acid during oil spill. Chemosphere 2015, 138, 412-421. (5) Zhu, X.; Chen, D.; Wu, G., Insights into asphaltene aggregation in the Na-montmorillonite interlayer. Chemosphere 2016, 160, 62-70. (6) Sedghi, M.; Goual, L.; Welch, W.; Kubelka, J., Effect of asphaltene structure on association and aggregation using molecular dynamics. J. Phys. Chem. B 2013, 117, 5765-5776. (7) Mullins, O. C., Review of the molecular structure and aggregation of asphaltenes and petroleomics. Spe. J. 2008, 13, 48-57. (8) Roux, J. N.; Broseta, D.; Demé, B., SANS study of asphaltene aggregation: concentration and solvent quality effects. Langmuir 2001, 17, 5085-5092. (9) Maqbool, T.; Srikiratiwong, P.; Fogler, H. S., Effect of temperature on the precipitation kinetics of asphaltenes. Energy Fuels 2011, 25, 694-700. (10) Espinat, D.; Fenistein, D.; Barre, L.; Frot, D.; Briolant, Y., Effects of temperature and pressure on asphaltenes agglomeration in toluene. A light, X-ray, and neutron scattering investigation. Energy fuels 2004, 18, 1243-1249. (11) Kuznicki, T.; Masliyah, J. H.; Bhattacharjee, S., Aggregation and partitioning of model asphaltenes at toluene-water interfaces: Molecular dynamics simulations. Energy Fuels 2009, 23, 5027-5035. (12) Rane, J. P.; Harbottle, D.; Pauchard, V.; Couzis, A.; Banerjee, S., Adsorption kinetics of asphaltenes at the oil–water interface and nanoaggregation in the bulk. Langmuir 2012, 28, 9986-9995. (13) Groenzin, H.; Mullins, O. C., Molecular size and structure of asphaltenes from various sources. Energy Fuels 2000, 14, 677-684. (14) da Costa, L. M.; Stoyanov, S. R.; Gusarov, S.; Tan, X.; Gray, M. R.; Stryker, J. M.; Tykwinski, R.; de M. Carneiro, J. W.; Seidl, P. R.; Kovalenko, A., Density Functional Theory Investigation of the Contributions of π–π Stacking and Hydrogen-Bonding Interactions to the Aggregation of Model Asphaltene Compounds. Energy Fuels 2012, 26, 2727-2735. (15) Murgich, J., Intermolecular Forces in Aggregates of Asphaltenes and Resins. Petrol. Sci. Technol. 2002, 20, 983-997. (16) Tan, X.; Fenniri, H.; Gray, M. R., Pyrene derivatives of 2, 2-bipyridine as models for asphaltenes: synthesis, characterization, and supramolecular organization. Energy Fuels 2008, 22, 715-720. (17) Rogel, E., Simulation of interactions in asphaltene aggregates. Energy Fuels 2000, 14, 566-574.. (18) Ungerer, P.; Rigby, D.; Leblanc, B.; Yiannourakou, M., Sensitivity of the aggregation behaviour of asphaltenes to molecular weight and structure using molecular dynamics. Mol. Simulat. 2014, 40, 115-122. (19) Liao, Z.; Zhou, H.; Graciaa, A.; Chrostowska, A.; Creux, P.; Geng, A., Adsorption/occlusion characteristics of asphaltenes: Some implication for asphaltene structural features. Energy fuels 2005, 19, 180-186. (20) Carauta, A. N.; Correia, J. C.; Seidl, P. R.; Silva, D. M., Conformational search and dimerization 24
ACS Paragon Plus Environment
Page 24 of 37
Page 25 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545
study of average structures of asphaltenes. J. Mol. Struc. THEOCHEM 2005, 755, 1-8. (21) Mullins, O. C., The modified Yen model. Energy Fuels 2010, 24, 2179-2207. (22) Jian, C.; Tang, T.; Bhattacharjee, S., Probing the Effect of Side-Chain Length on the Aggregation of a Model Asphaltene Using Molecular Dynamics Simulations. Energy Fuels 2013, 27, 2057-2067. (23) Katritzky, A. R.; Maran, U.; Lobanov, V. S.; Karelson, M., Structurally diverse quantitative structure− property relaKonship correlaKons of technologically relevant physical properties. J. Chem. Inf. Comp. Sci. 2000, 40, 1-18. (24) Sjöblom, J.; Simon, S.; Xu, Z., Model molecules mimicking asphaltenes. Adv. Colloid. Interfac. 2015, 218, 1-16. (25) Wu, G.; Kechavarzi, C.; Li, X.; Wu, S.; Pollard, S. J. T.; Sui, H.; Coulon, F., Machine learning models for predicting PAHs bioavailability in compost amended soils. Chem. Eng. J. 2013, 223, 747-754. (26) Sui, H.; Li, L.; Zhu, X.; Chen, D.; Wu, G., Modeling the adsorption of PAH mixture in silica nanopores by molecular dynamic simulation combined with machine learning. Chemosphere 2016, 144, 1950-1959. (27) Wu, G.; Coulon, F., Modelling the Environmental Fate of Petroleum Hydrocarbons During Bioremediation. In: McGenity TJ, Timmis KN, Nogales B, editors. Hydrocarbon and Lipid Microbiology Protocols: Statistics, Data Analysis, Bioinformatics and Modelling, Springer Berlin Heidelberg; 2016, pp 165-180. (28) Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J., GROMACS: fast, flexible, and free. J. Comput. Chem. 2005, 26, 1701-1718. (29) Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I.; Mackerell, A. D., Jr., CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem. 2010, 31, 671-90. (30) Rane, K. S.; Kumar, V.; Wierzchowski, S.; Shaik, M.; Errington, J. R., Liquid–Vapor Phase Behavior of Asphaltene-like Molecules. Ind. Eng. Chem. Res. 2014, 53, 17833-17842. (31) Gao, F.; Xu, Z.; Liu, G.; Yuan, S., Molecular Dynamics Simulation: The Behavior of Asphaltene in Crude Oil and at the Oil/Water Interface. Energy Fuels 2014, 28, 7368-7376. (32) Mullins, O. C., The asphaltenes. Ann. Rev. Anal. Chem. 2011, 4, 393-418. (33) Aray, Y.; Hernández-Bravo, R.; Parra, J. G.; Rodríguez, J.; Coll, D. S., Exploring the structure– solubility relationship of asphaltene models in toluene, heptane, and amphiphiles using a molecular dynamic atomistic methodology. J. Phys. Chem. A 2011, 115, 11495-11507. (34) Silva, H. S.; Sodero, A. C.; Bouyssiere, B.; Carrier, H.; Korb, J.-P.; Alfarra, A.; Vallverdu, G.; Bégué, D.; Baraille, I., Molecular Dynamics Study of Nanoaggregation in Asphaltene Mixtures: Effects of the N, O, and S Heteroatoms. Energy Fuels 2016, 30, 5656-5664. (35) Mullins, O. C.; Sabbah, H.; Eyssautier, J.; Pomerantz, A. E.; Barré, L.; Andrews, A. B.; Ruiz-Morales, Y.; Mostowfi, F.; McFarlane, R.; Goual, L., Advances in asphaltene science and the Yen–Mullins model. Energy Fuels 2012, 26, 3986-4003. (36) Sabbah, H.; Morrow, A. L.; Pomerantz, A. E.; Zare, R. N., Evidence for island structures as the dominant architecture of asphaltenes. Energy Fuels 2011, 25, 1597-1604. (37) Schuler, B.; Meyer, G.; Pena, D.; Mullins, O. C.; Gross, L., Unraveling the Molecular Structures of Asphaltenes by Atomic Force Microscopy. J. Am. Chem. Soc. 2015, 137, 9870-6. (38) Kästner, J., Umbrella sampling. WIRES Comput. Mol. Sci. 2011, 1, 932-942. (39) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A., The weighted histogram
25
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589
analysis method for free‐energy calculations on biomolecules. I. The method. J. Comput. Chem. 1992, 13, 1011-1021. (40) Alvarez-Ramirez, F.; Ramirez-Jaramillo, E.; Ruiz-Morales, Y., Calculation of the interaction potential curve between asphaltene-asphaltene, asphaltene-resin, and resin-resin systems using density functional theory. Energy Fuels 2006, 20, 195-204. (41) Pahlavan, F.; Mousavi, M.; Hung, A.; Fini, E. H., Investigating molecular interactions and surface morphology of wax-doped asphaltenes. Phys. Chem. Chem. Phys. 2016, 18, 8840-8854. (42) Bussi, G.; Donadio, D.; Parrinello, M., Canonical sampling through velocity rescaling. J. Chem. Phys. 2007, 126, 014101. (43) Parrinello, M.; Rahman, A., Strain fluctuations and elastic constants. J. Chem. Phys. 1982, 76, 2662-2666. (44) Frank, E.; Hall, M.; Holmes, G.; Kirkby, R.; Pfahringer, B.; Witten, I. H.; Trigg, L., Weka-a machine learning workbench for data mining. In: Maimon O, Rokach L, editors. Data mining and knowledge discovery handbook. Springer, Boston,MA; 2009, pp 1269-1277. (45) Do, C. B.; Batzoglou, S., What is the expectation maximization algorithm. Nat. Biotechnol. 2008, 26, 897-899. (46) Samari, F.; Yousefinejad, S., Quantitative structural modeling on the wavelength interval (Δλ) in synchronous fluorescence spectroscopy. J. Mol. Struct. 2017, 1148, 101-110. (47) Riahi, S.; Mousavi, M.; Shamsipur, M., Prediction of selectivity coefficients of a theophylline-selective electrode using MLR and ANN. Talanta 2006, 69, 736-740. (48) Bordbar, M.; Ghasemi, J.; Faal, A. Y.; Fazaeli, R., Chemometric modeling to predict aquatic toxicity of benzene derivatives using stepwise-multi linear regression and partial least square. Asian J. Chem. 2013, 25, 331. (49) Fatemi, M. H.; Gharaghani, S.; Mohammadkhani, S.; Rezaie, Z., Prediction of selectivity coefficients of univalent anions for anion-selective electrode using support vector machine. Electrochim. Acta 2008, 53, 4276-4282. (50) Khandelia, H.; Mouritsen, O. G., Lipid gymnastics: evidence of complete acyl chain reversal in oxidized phospholipids from molecular simulations. Biophys. J. 2009, 96, 2734-2743. (51) Fletcher, D.; Goss, E., Forecasting with neural networks: an application using bankruptcy data. Inform. Manag. 1993, 24, 159-167. (52) Chauhan, J. S.; Dhanda, S. K.; Singla, D.; Agarwal, S. M.; Raghava, G. P.; Consortium, O. S. D. D., QSAR-based models for designing quinazoline/imidazothiazoles/pyrazolopyrimidines based inhibitors against wild and mutant EGFR. PLoS One 2014, 9, e101079. (53) Singla, D.; Anurag, M.; Dash, D.; Raghava, G. P., A web server for predicting inhibitors against bacterial target GlmU protein. BMC pharmacol. 2011, 11, 5. (54) Refaeilzadeh, P.; Tang, L.; Liu, H., Cross-validation. In Encyclopedia of database systems, Springer: 2009; pp 532-538. (55) Pisula, W.; Tomovic, Z.; Simpson, C.; Kastler, M.; Pakula, T.; Müllen, K., Relationship between core size, side chain length, and the supramolecular organization of polycyclic aromatic hydrocarbons. Chem. Mater. 2005, 17, 4296-4303. (56) Kuznicki, T.; Masliyah, J. H.; Bhattacharjee, S., Molecular dynamics study of model molecules resembling asphaltene-like structures in aqueous organic solvent systems. Energy Fuels 2008, 22, 2379-2389. (57) Headen TF, Boek ES, Skipper NT. Evidence for asphaltene nanoaggregation in toluene and
26
ACS Paragon Plus Environment
Page 26 of 37
Page 27 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
590 591 592 593 594 595 596 597 598 599 600
heptane from molecular dynamics simulations. Energy Fuels 2009, 23, 1220-1229. (58) Dormann, C. F.; Elith, J.; Bacher, S.; Buchmann, C.; Carl, G.; Carré, G.; Marquéz, J. R. G.; Gruber, B.; Lafourcade, B.; Leitão, P. J., Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 2013, 36, 27-46. (59) Jian, C.; Tang, T.; Bhattacharjee, S., Molecular Dynamics Investigation on the Aggregation of Violanthrone78-based Model Asphaltenes in Toluene. Energy Fuels 2014, 28, 3604-3613. (60) Kurup, A. S.; Vargas, F. M.; Wang, J.; Buckley, J.; Creek, J. L.; Subramani, H., J; Chapman, W. G., Development and application of an asphaltene deposition tool (ADEPT) for well bores. Energy Fuels 2011, 25, 4506-4516. (61) Vargas, F. M.; Creek, J. L.; Chapman, W. G., On the development of an asphaltene deposition simulator. Energy Fuels 2010, 24, 2294-2299.
601
27
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Page 28 of 37
Table 1 Pearson correlation matrix of different variables (** represents P < 0.01, * represents 0.01 < P < 0.05) MW
N Aro-C
Nlink/ Naro
N Ali-C
HA%
N%
S%
NOH
H/C
NCOOH
N chains
N Ali-CL
MW
1.000
N Aro-C
0.428*
1.000
Nlink/ Naro
0.183
-0.070
1.000
N Ali-C
0.797**
-0.054
0.329*
1.000
HA%
0.461**
0.200
0.039
0.286
1.000
N%
-0.475*
-0.120
-0.178
-0.423*
-0.309
1.000
S%
-0.081
0.096
-0.277
-0.259
-0.283
-0.212
1.000
NOH
-0.166
-0.508**
-0.152
0.083
-0.194
-0.182
0.115
1.000
H/C
0.621**
-0.357*
0.144
0.883**
0.244
-0.287*
-0.250
0.287
1.000
NCOOH
-0.189
0.095
-0.199
-0.430*
-0.176
0.216
0.147
-0.138
-0.515**
1.000
N chains
0.418*
-0.271
-0.128
0.511**
-0.093
-0.099
0.033
0.263
0.673**
-0.154
1.000
N Ali-CL
0.760**
0.057
0.346*
0.824**
0.403*
-0.346*
-0.287
-0.089
0.731**
-0.351*
0.207
1.000
Energy in water
0.387*
0.350*
0.318*
0.170
-0.232
-0.459** 0.425*
-0.139
0.002
0.053
0.346*
0.068
Energy in toluene
0.228
0.561**
-0.053
-0.176
-0.134
-0.263
-0.404*
0.518** 0.009
0.608** -0.212
28
ACS Paragon Plus Environment
-0.234
Page 29 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Energy & Fuels
Table 2 Screening of the structural parameters by stepwise multiple linear regression
(√ represents the parameters selected for the regression; X represents the parameters deleted before input to the regression model) Solvent
MW
N Aro-C
Nlink/ Naro
N Ali-C
HA%
N%
S%
H/C
NOH
NCOOH
N chains
N Ali-CL
VIF
Water
X
√
√
√
√
√
√
X
X
X
√
X
2.554
Toluene
X
√
√
X
X
X
√
X
X
√
√
X
1.148
29
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 30 of 37
Table 3 Performance comparison between MLR, MLP and SVR models
Solvent
Model
Parameters
Training dataset
Validation dataset
Water
MLR
R
0.9231
0.8907
RMSE
0.3624
0.6506
R
0.9247
0.9243
RMSE
0.3608
0.4975
R
0.9462
0.9104
RMSE
0.3055
0.5173
R
0.8931
0.9061
RMSE
0.4436
0.4687
R
0.8934
0.9048
RMSE
0.4217
0.4640
R
0.9114
0.8989
RMSE
0.3663
0.4769
MLP
SVR
Toluene
MLR
MLP
SVR
30
ACS Paragon Plus Environment
Page 31 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
A01
A02
A04
A03
S
S
S N
A05
A06
A10-12
A07-09 O
HO
OH
O X
X X
S
X
A14
X
X = O, N, S
X = O, N, S
A13
X
A16
A15
S
N
S
N N H
N
S
A17
A18
HOOC
O
O
N
N
O
O
OH
A19
A24
A21-23
A20
O
C 7H15 C 7H15
O
O
S
O
O
N
O
O OO
CnH2n+1
CnH2n+1
O
n = 1, 4, 8
A25
A26
O
A27
A28
O
HO
HO
S
N O
N
N
HO
OH
Fig. 1 Molecular structure of the 28 types of asphaltene models used in the study
31
ACS Paragon Plus Environment
Energy & Fuels
-2 0
-1
-1
15
(a ) PMF (kJ mol )
0
PMF (kJ mol )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
-4 0 -6 0
Page 32 of 37
(b )
A01 A03 A05 A07 A09 A11 A13 A15 A17 A19 A21 A23 A25 A27
0
-1 5
-3 0
-8 0 0 .0
0 .5
1 .0
1 .5
-4 5 0 .0
2 .0
0 .6
Ewater Etoluene Ewater Etoluene
A02 76.3 19.4 A16 22.3 2.5
A03 57.4 17.2 A17 34.7 17.6
A04 55.0 8.8 A18 49.9 7.9
1 .8
Z (n m )
Z (n m ) A01 72.3 14.9 A15 81.4 28.0
1 .2
A02 A04 A06 A08 A10 A12 A14 A16 A18 A20 A22 A24 A26 A28
A05 49.7 13.3 A19 72.6 16.8
A06 81.5 22.6 A20 65.5 15.8
A07 67.6 28.0 A21 52.6 18.6
A08 45.1 21.5 A22 47.4 17.3
A09 74.3 39.0 A23 39.7 13.4
A10 62.4 18.6 A24 43.0 18.7
A11 32.1 13.7 A25 51.9 21.3
Fig. 2 PMF curves of asphaltene dimers in (a) water and (b) toluene. Data in the table shows the free energy of dimerization (kJ mol-1) for each type of asphaltene.
32
ACS Paragon Plus Environment
A12 77.1 28.5 A26 48.8 13.8
A13 29.9 2.7 A27 40.6 8.1
A14 49.5 16.4 A28 42.8 14.6
Page 33 of 37
10
0 (a) A06
(b) A17 0 PMF (kJ mol )
-20
-1
PMF (kJ mol-1)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Energy & Fuels
-40 -60
-10 -20 -30
-80 0.4
0.8
1.2 Z (nm)
1.6
-40 0.0
2.0
0.4
0.8
1.2
1.6
2.0
Z (nm)
Fig. 3 PMF and structure of asphaltene dimers in water (A06 and A17 represent two examples selected from the 28 types of asphaltenes)
33
ACS Paragon Plus Environment
Energy & Fuels
80
(a) MLR
60 40 Training Validation
20 20
Predicted values (kJ mol-1)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 34 of 37
80
40
60
80
(b) MLP
60 40 Training Validation
20 20
40
60
80
80 (c) SVR 60 40 Training Validation
20 20
40
60
80
Input values (kJ mol-1) Fig. 4 Input and predicted values of asphaltene dimerization energy in water
34
ACS Paragon Plus Environment
Page 35 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
1.6
R_TR RMSE_TR R_VR RMSE_VR
1.2
0.8
0.4
0.0 0.001 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
Learning rate
Fig. 5 Variation of R and RMSE for the training (TR) and validation (VR) datasets with different learning rates during MLP training process
35
ACS Paragon Plus Environment
Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
(a)
Nali-C
Page 36 of 37
Nchains
MLP MLR
S% N%
Nlink/Naro
HA%
NCOOH
Nchains
NAro-C
(b)
NAro-C S%
Nlink/ Naro 0
5 10 15 Relative importance (%)
0
20
7
14 21 28 Relative importance (%)
35
Fig. 6 Relative importance of structural parameters to the asphaltene dimerization in (a) water and (b) toluene. In panel A, only the results calculated by MLP model are shown because this model outperforms the other models.
36
ACS Paragon Plus Environment
Page 37 of 37
40 (a) MLR 30 20 10 0
Predicted values (kJ mol-1)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Energy & Fuels
Training Validation
0
10
20
30
40
(b) MLP
40 30 20 10 0
Training Validation
0
10
20
30
40
40 (c) SVR 30 20 10 Training Validation
0
0
10
20
30
40
Input values (kJ mol-1) Fig. 7 Input and predicted values of asphaltene dimerization energy in toluene
37
ACS Paragon Plus Environment