Molecular-Based Bayesian Regression Model of ... - ACS Publications

Nov 22, 2017 - from a linear regression model with sigmoidal basis functions whose parameters .... step is to obtain the Bayesian estimation of the mo...
1 downloads 0 Views 549KB Size
Subscriber access provided by READING UNIV

Article

Molecular-based Bayesian regression model of petroleum fractions Hua Mei, Zhenlei Wang, and Biao Huang Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.7b02905 • Publication Date (Web): 22 Nov 2017 Downloaded from http://pubs.acs.org on December 3, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Molecular-based Bayesian regression model of petroleum fractions Hua Meia,b, Zhenlei Wanga,1, Biao Huangb,1 (a. Key Laboratory of Advanced Control and Optimization for Chemical Processes of the Ministry of Education, East China University of Science and Technology, Shanghai 200237, China; b. Department of Chemical & Materials Engineering, University of Alberta, Edmonton, Alberta T6G 1H9, Canada)

Abstract: Molecular reconstruction of petroleum fractions is to determine the detailed molecular compositions in the mixture from a few measured bulk properties, e.g. density, Reid vapor pressure (RVP), molecular weight and ASTM boiling point curves etc., which is of a great challenge because the number of hydrocarbon compounds is much larger than that of the bulk properties. In this paper, a novel molecular reconstruction method is developed which includes two Bayesian regression models for bulk properties’ prediction and molecular reconstruction. By defining a characteristic function of bulk property and then establishing its general mixing rule with respect to compositions, the bulk property is predicted from a linear regression model with sigmoidal basis functions, whose parameters can be estimated by maximizing a posterior distribution from a welldetermined database containing bulk properties and molecular information of petroleum fraction samples. Furthermore, by developing a prior distribution of the molecular information with an assumption that the compounds in the hydrocarbon mixture have an independently and identically distributed (i.i.d.) Gamma distribution and combining the likelihood function used in bulk properties’ prediction, the molecular information is thus reconstructed by maximizing a new posterior distribution. Case studies of naphtha fractions demonstrate the effectiveness of the proposed method. Keywords: Molecular reconstruction, petroleum fractions, Bayesian regression model, bulk property’s characteristic function, general mixing rules

1

corresponding authors: Email: [email protected]; [email protected]

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Nomenclature List of symbols arg min '())

the solution of ) that function '()) obtains its minimum

+,

weight of the i-th composition in characteristic function

+-

bias in characteristic function

./

blending index

01

weight of the j-th basis function in the polynomial approximation

0-

bias of the bulk property’s approximation

2

dimension of the basis functions

3

number of the basis fractions

4

number of the components in the mixture

5

number of the measured bulk properties

567 8 9(:) ℎ< =>?@ A B,,
]jL gh

min ef

M

(1)

O k. m. :] = '] ), , :, , ∀> ]jL , ∀,jL O ,jL ),

Page 6 of 19

(2)

=1

), ≥ 0 in which E = )L , )M , ⋯ , )O s . Here the predicted bulk properties :] , ∀> ]jL

are functions of the

molecular composition E and the corresponding properties of the pure components :, , ∀O,jL . The form of these functions embodies the influence of the mixing rules on the properties of the mixture. For some physical properties, e.g., density, critical properties, molecular weight and refractive index etc., a so-called Kay’s mixing rule20 is usually applied that can be expressed in the form of a weighted linear equation as : =

O ,jL ), :,

(3)

in which ), is the fraction of component i in the mixture. Similarly, for properties that cannot be written as a linear combination of the pure components properties, certain blending indices (BI) of the properties were defined so that the mixing rules for these properties can be determined linearly by the fractions of the pure components shown in Eq. (4) O ,jL ), ./(:, )

./ : =

(4)

For instance, several examples of the blending indices for some typical properties were exhibited in Ref. [20], which however have no uniform structures and are quite error-prone. Therefore, a general function with a uniform structure as well as good generalization is desirable for a highaccuracy prediction of the bulk properties. Let : be an arbitrary property of the petroleum fraction, which is a function of the molecular composition E. Define a scalar function with respect to :, denoted as 9 : , which is also a weighted linear function of compositions, that is, 9 : = Qs E

(5)

in which E = 1, )L , )M , ⋯ , )O s , Q = [+- , +L , ⋯ , +O ]s . Thereby from Eq. (7), the property : can be predicted by an inverse function with respect to Qs E and approximated by a linear polynomial with basis functions as follows : = 9iL Qs E ≈ 0- +

z s 1jL 01 `1 {^1 (Q E)

+ [1 }

ACS Paragon Plus Environment

Page 7 of 19

= Rs ∙ |{V(Qs E) + T} where | = 1, `L , ⋯ , `z

(6)

z , and `1 , ∀1jL are basis functions that are usually selected to be

s

sigmoidal functions, R = [0- , 0L , ⋯ , 0z ]s , V = [^L , ^M , ⋯ , ^z ]s and T = [[L , [M , ⋯ , [z ]s are parameters that should be determined by regression from the experimental data. It should be remarked that, considering a special case of Eq. (2), the relationship between the predicted property and the compositions is : = Qs E

(7)

However, noticing that Equation (6) has an architecture of a four-layer feedforward neural network shown in Fig. 1, linear function in Eq. (7) can also be approximated theoretically by Eq. (6) at any accuracy. Thus Equation (6) is a general mixing rule function and 9 : is named as characteristic function of bulk property : which characterizes the relationship between the bulk property and the composition of the components in the mixture. Clearly, blending index functions are also

input

hidden layer1

ö ÷ ÷ ÷ ø

ö ÷ ÷ ÷ ÷ ÷ ÷ ø

special cases of characteristic function of bulk properties. output

ö ÷ ÷ ÷ ÷ ÷ ÷ ø

ö ÷ ÷ ÷ ÷ ÷ ÷ ø

hidden layer 2

x1

! xn

g

S b0

bn

l

C

S 1

æ ç ç ç ç ç ç è

1

d

S

q

c0

1

æ ç ç ç ç ç ç è

x3

b1 b2 b3

æ ç ç ç ç ç ç è

x2

æ ç ç ç è

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Fig. 1 Network architecture of general mixing rule

3. Bayesian model of the molecular reconstruction Assume that the molecular compositions FL , FM , ⋯ , FO and the measured bulk properties ΘL , ΘM , ⋯ , Θ> are random variables, it is clear that FO depends on FL , FM , ⋯ , FOiL because of O ,jL F,

= 1 . Hence the molecular reconstruction becomes an optimization to maximize the

probability that the predicted properties X with respect to the compositions E is equal to the measured properties X. Such a probability can be expressed with a conditional probability density function (pdf) for E, given the measured properties X, which is proportional to the product of the likelihood function }(X|E) and the prior distribution }(E) in Eq. (8) according to Bayes’ theorem

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 19

} E X ∝ }(X|E)}(E)

(8)

3.1 Bayesian estimation of the parameters in likelihood Suppose that we have molecular composition of N petroleum fraction samples, denoted as G = EL , EM , ⋯ , EÄ , and samples of their bulk property : , denoted as A = {: Å } , ∀Ä ÅjL . Furthermore, we can suppose that the observations of the measured property have an i.i.d. Gaussian distribution, i.e., : Å ~É(: Å , W iL ), whose mean is equal to the predicted property : Å given in Eq. (9). : Å = Rs ∙ |{V ∙ (Qs EÅ ) + T}

(9)

Here the parameter W is a precision parameter corresponding to the inverse variance of the distribution. Therefore, the likelihood function for : is given by }(A|G, Q, R, T, V, W iL ): =

Å Å Ä iL ÅjL É(: |: , W )

(10)

For simplicity, we introduce the prior distributions of the parameters Q, R, T, V as Gaussian distributions with zero means as }(Q|PL ): = É(Q|Ö, PLiL Ü)

(11)

}(R|PM ): = É(R|Ö, PMiL Ü)

(12)

}(T|PS ): = É(T|Ö, PSiL Ü)

(13)

}(V|PU ): = É(V|Ö, PUiL Ü)

(14)

Here PL , PM , PS , PU are precision parameters for Q, R, T, V respectively. Using Bayes’ theorem, the posterior distribution is proportional to the product of the likelihood and prior distribution as } Q, R, T, V|G, A ∝ } Q PL } R PM } T PS } V PU } A G, Q, Q, T, V, W iL

(15)

Substituting equations from Eq. (10) and Eq. (14) into Eq. (15), we have the negative logarithm of the posterior as shown in Eq. (16) −ln } Q, R, T, V|G, A =

âä

+

M é M

Q s Q + Ä ÅjL(:

âã M Å

R s R +

âå M

T s T +

− : Å )M + 0è4km

âç M

V s V (16)

Since the maximum of the posterior is equivalent to the minimum of its negative logarithm, the regressions of Q, R, T, V can be solved by minimizing Eq. (1 ) with iterative algorithms based on gradient. It is noteworthy that the adjustable parameters αL , αM , PS , PU and W play an important role of regularization to achieve a trade-off between the accuracy and generalization of the regression model.

ACS Paragon Plus Environment

Page 9 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

3.2 The prior distribution of the molecular compositions Due to the constraint of

O ,jL ),

= 1, the random variables of the molecular compositions

{FL , FM , ⋯ , FOiL } are not independently distributed within a closed scope from zero to one. However, following the idea proposed in Ref. [6] that the quantities of the components in the mixtures are independent and the molecular compositions are the normalization of these variables, we can define the quantities of the components to be random variables, denoted by {IL , IM , ⋯ , IO }, sampled within the range from zero to positive infinity. Obviously, IL , IM , ⋯ , IO are independently and identically distributed and their joint pdf is the product of the pdfs of the individual variables 'ëä ,ëã ,…,ëì HL , HM , ⋯ , HO =

O ,jL 'ëf

H,

(17)

Now let us define a new random variable of the total amount of IL , IM , ⋯ , IO , denoted as J = IL + IM + ⋯ + IO , then the composition variables FL , FM , ⋯ , FOiL can be derived from IL , IM , ⋯ , IO and J as follows F, =

ëf î

, (F, ∈ 0,1 , ∀OiL ,jL )

(18)

and the joint pdf of FL , FM , ⋯ , FOiL and J is denoted as 'ñä ,ñã ,…,ñìóä ,î . From Eq. (18), we also have I, = F, ∙ J, ∀OiL ,jL OiL ,jL F, )

IO = J ∙ (1 −

(19) (20)

Thus, the joint pdf of FL , FM , ⋯ , FOiL and J can be transformed from the joint pdf of IL , IM , ⋯ , IO according to Eq. (19) and Eq. (20) as follows 'ñä ,ñã ,…,ñìóä ,î )L , )M , ⋯ , )OiL , ò = 'ëä ,ëã ,…,ëì HL , HM , ⋯ , HO ∙ |det ( ú )|

(21)

where ú is a Jacobian matrix defined as ùûä ùüä

ùû

ùûä ä ⋯ ùüìóä ù¢ ú = ⋮ ⋱ ⋮ ⋮ ùûì … ùûì ùûì ùü ùü ù¢ ä

(22)

ìóä

From Eq. (19), the determinant of ú is equal to )L ò 0 ⋯ 0 )M 0 ò ⋯ 0 ⋮ ⋱ det ú = det ⋮ ⋮ ⋮ = ò OiL )OiL 0 0 ⋯ ò ⋯ −ò −ò −ò 1 − OiL ,jL ), ACS Paragon Plus Environment

(23)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 19

Substituting Eq. (17) and Eq. (23) into Eq. (21), we have OiL ,jL 'ëf

'ñä ,ñã ,…,ñìóä ,î )L , )M , ⋯ , )OiL , ò = ò OiL ∙

), ò ∙ 'ëì {ò ∙ 1 −

OiL ,jL ò,

} (24)

Assuming that I, has a Gamma distribution with two parameters: the positive real ‘shape’ Z and the positive real ‘rate’ _, whose pdf has a form of 'ëf H, =

£§

¶iL H ß)} (−_H, ) •(¶) ,

(25)

then the prior distribution, } E , is the marginal pdf of FL , FM , ⋯ , FOiL as } E = 'ñä ,ñã ,…,ñìóä )L , )M , ⋯ , )OiL =

© -

OiL ,jL 'ëf

ò OiL ∙

=

£§ O ,jL •(¶)

=

OiL ¶iL ,jL ),

), ò ∙ 'ëì {ò ∙ 1 −

OiL ¶iL ,jL ),

OiL ¶iL ÅjL )Å

1−

OiL ¶iL ,jL ),

1−

OiL ,jL ),





}®ò

© -

ò O¶iL ∙ ß)} −_ò ®ò

•(O¶)

(26)

(•(¶))ì

From Eq. (26) an interesting fact can be concluded that the prior distribution relies only on the ‘shape’ of the I, ’s distribution regardless of their ‘rate’ parameter. 3.3 Bayesian estimation of the molecular compositions Once both the likelihood and the prior are determined, the molecular reconstruction is equivalent to a Bayesian estimation of the compositions. Now suppose that there is a new petroleum fraction sample within the concerned range and there are M bulk properties known, :] , ∀> ]jL . Hence the posterior distribution is proportional to } EX ∝

OiL ¶iL ,jL ),

1−

OiL ¶iL ,jL ),

iL > ]jL É(:] |:] , W] )

(27)

and its negative logarithm function is then equal to − ln } E X = − Z − 1 +

OiL ,jL ln ),

> éh ]jL M (:]

− Z − 1 ln 1 −

OiL ,jL ),

− :] )M + 0è4km

Consequently, the molecular composition E can be reconstructed by minimizing Eq. ( gradient-based optimization algorithm without constraints. − Z−1

O ,jL ln ),

ACS Paragon Plus Environment

(28) ) via

Page 11 of 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Especially, if Z = 1, i.e., IL , IM , ⋯ , IO are sampled under exponential distribution, Equation (28) is equal to a maximum likelihood estimation. 4. Reconstruction based on the basis fractions The forth-mentioned two-step molecular reconstruction of the petroleum fractions is implemented from an experimental data set containing the detailed compositions and therefore suffers from a high dimensionality and measuring noises. Fortunately, such experimental compositions can be linearly combined by compositions of a few pre-defined petroleum fractions, named as basis fractions8, and the noises can also be eliminated in the meantime. s

Suppose that ™< = BL,< , BM,< , ⋯ , BO,< , ∀´