Rule-Based Intelligent System for Variable Importance Measurement

Dec 7, 2017 - Part 2: Selective Separation Method That Reveals Fractions Enriched in Island and Archipelago Structural Motifs by Mass Spectrometry...
0 downloads 0 Views 1MB Size
Subscriber access provided by READING UNIV

Article

A rule-based intelligent system for variable importance measurement and prediction of Ash fusion indexes Samaneh Yazdani, Esmaeil Hadavandi, and Saeed Chehreh Chelgani Energy Fuels, Just Accepted Manuscript • DOI: 10.1021/acs.energyfuels.7b03280 • Publication Date (Web): 07 Dec 2017 Downloaded from http://pubs.acs.org on December 8, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Energy & Fuels is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

A rule-based intelligent system for variable importance measurement and prediction of Ash fusion indexes S.Yazdania, E. Hadavandib,*, S. Chehreh Chelganic,* a

Department of Computer Engineering, Islamic Azad University, North Tehran Branch, Tehran, Iran b

c

Department of Industrial Engineering, Birjand University of Technology, Birjand, Iran

Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA

Abstract Ash fusion temperatures (AFTs: initial deformation temperature (IDT), softening temperature (ST), and fluid temperature (FT)) are standard keys to estimate behavior of ash oxide for using coal and controlling their slag-making at boilers. In this study, the modeling of AFTs based on ash oxide contents for 6537 US coal samples have been investigated by a rule-based intelligent system (RBIS). Variable importance measurements (VIMs) of RBIS through the database indicated that Al2O3 contents in coal samples have the highest importance for prediction of AFTs. RBIS model based on various rules were generated for predictions of IDT, ST, and FT. A comparison between RBIS and other typical predictive models (linear regression, genetic algorithm-neural network (GA-NN) and a multi-layer perceptron trained by backpropagation algorithm (MLP-BP)) was implemented to assess the capability of this purposed predictive model. Results indicated that RBIS quite satisfactory can predict AFTs where R2 for IDT, ST, and FT for the testing stage of models was over 0.82 and differences between actual and RBIS predicted values for over 80% of data were less than 100 ºC. These comprehensive results indicated that RIBS method can be used for the industry sector to model AFT of coal samples and predict their fouling behavior before feeding them into boilers. Moreover, outcomes of this investigation are introducing RBIS as a powerful method for modeling of other complicated problems in coal geology, fuel and energy sectors. Keywords: Initial deformation temperature; softening temperature; fluid temperature; Coal; Rule-based intelligent system;

1. Introduction During coke making and through coal combustion, as a result of slagging and fouling, the coal inorganic matter (ash) may deposit on the heat transfer surfaces or at the surface of equipment. These phenomena often occur in the hottest part of a boiler (slagging), or in the convection sections (fouling). Ash fusibility is the main factor used to estimate these behaviors 1 *

Corresponding authors: es.hadavandi@birjandut,ac.ir; [email protected]

1 ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 18

and frequently survey by ash fusion temperature (AFT). In other words, AFT is one of factors which used to estimate melting behavior of ash particles for coal samples in boilers 2. AFT determines by ASTM D1857 which is based on the gradual thermal deformation of a pyramidshaped ash sample in either an oxidizing or reducing atmosphere. The test results can be reported on the following temperatures: initial deformation temperature (IDT), softening temperature (ST), and fluid temperature (FT) 3. AFTs, in different laboratories, for a given coal may differ by ±20–100 oC which is an acceptable range 4-6. While coal fusion and possible relationships between ash content with AFTs did not understand completely, it is important for the industry to formulate their interactions. On the other hand, there are some doubts about errors in the laboratory tests (AFT experiment is a time consuming analysis, sample preparation and temperature control are quite difficult). Therefore, several investigations have been performed to estimate AFTs based on the coal ash compositions 2, 6-9

. Intelligence techniques such as artificial neural networks (ANNs) and fuzzy logic are

popular prediction models. They can deal with complex non-linear interactions which are difficult to simulate by classical models (e.g. linear regression) 10-12. These techniques have been successfully developed to model complicated prediction problems in various areas of engineering (coal processing, mining and metallurgy, etc) 13-17. However, there are many intelligent tree-base modeling systems such as random forest which have several advantages over ANNs. Gene expression programming (GEP) as a tree-based prediction model has been used to model complex problems with high accuracy in various systems. Shirani Faradonbeh et al. 18 proposed a GEP model for prediction of flyrock distance. In another investigation, a GEP model was used to estimate ground vibration and its outputs was compared with nonlinear multiple regression (NLMR)

19

. The results indicated the greater

performance of GEP model in prediction problem than NLMR. Moreover, it was reported that a prediction genetic programing based model was used to estimate Backbreak phenomenon 20. While ANN models as comprehensive intelligence techniques are powerful in prediction, they have weak explanatory abilities. Thus, in many applications, tree-based or rule-based systems are preferred to perform decision making. These systems help to understand interactions between variables in a symbolic form (rules)

21

. Rule based intelligent systems can increase the

level of understanding from problem domains in a schematic form of “IF-Then” rules and can facilitate implementation of a decision support system. The user can examine the provided 2 ACS Paragon Plus Environment

Page 3 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

prediction rules (based on his knowledge of the field) and does not need to understand the underlying method. There are some investigations that applied rule based prediction models for real industrial problems

22

although there are a few approaches that developed rule based

prediction models for fuel and energy sector. This investigation is going to propose a rule based intelligent system called “RBIS” to predict AFTs (IDT, ST, and FT) for US coal samples from nineteen different states. The RBIS is developed based on M5 tree model proposed by Kuhn et al., 2014 RBIS has higher accuracy than typical regression-based methods

23, 24

24

. It was reported that the

. Regression based methods

are similar to decision trees but the RBIS leaves are regression functions instead of class labels 23, 24

. One of the main advantages of the RBIS is that their results are very interpretable (in

contrast to ANNs) 23. Such as other rule-based systems, M5 recursively partitions AFT data set into appropriate subsets, and predicts continuous values at the terminal leaf nodes. Unlike other methods such as classification and regression tree (CART) which prediction is a single number, the mean for the subset of data in M5 included in the node, and M5 uses an optimized linear regression as a predictor function in the consequent part of its rules. To assess the capabilities of the proposed RBIS, other typical prediction methods such as linear regression, genetic algorithmneural network (GA-NN) and a multi-layer perceptron trained with back-propagation algorithm (MLP-BP) are examined for the estimation of AFTs. A comparison between results of these methods could be an appropriate indicator to indicate the potential of RBIS in modeling. The outcomes of this investigation can be useful for the industry to formulate AFT interactions of coal samples which feed into a boiler.

2. Materials and methods 2-1. Database In this study 6537 coal samples from 19 different American states provided by the U.S. Geological Survey Coal Quality (COALQUAL) (open file report 97-134) were used for the prediction of AFTs (ASTM Standard D1857 is designed the experimental method for determining AFTs an oxidizing atmosphere) 25. The database includes the determined ash oxide as well as their representative IDT, ST, and FT on an as received basis. The database was given as the supplementary database file. The frequency of records for each state was shown in Fig. 1.

2-2. Variable importance measurement (VIM) 3 ACS Paragon Plus Environment

Energy & Fuels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 18

The determination of importance for input variables is a critical requirement for many applications. Variable weighting methods as VIM procedures are preprocessing techniques in data mining and machine learning tasks which assign a weight to each variable of data set. These weights show the importance of variables. The importance of variables can determine based on output of the proposed RBIS. RBIS output is some prediction rules; therefore, the importance of each variable is related to percentage of times that each of them used in a condition and/or a linear model of the rules in RBIS. In other words, VIM is calculated through linear combination of the usage of a variable in the rule conditions and the model.

2-3. Rule Based Intelligent System (RBIS) The RBIS is constructed using M5' 26 (which is an improvement of M5 24 ) and consist of four following steps 27: a) Building a complete tree: The RBIS constructs a regression tree through recursively splitting the instance space. For splitting the instance space, RBIS applies standard deviation reduction (SDR) factor which is calculated using the following equation:  =  − ∑ 



× 

(1)

where  is the standard deviation, T is the set of samples that reach node, and  is the set of samples that compose one subspace through splitting at that node 26. In the RBIS, input space split into several regions using the multivariate linear regression models which are exist at their leaf nodes (Fig. 2)

28

. This splitting process is terminated when the output

variable of all instances reaches a node vary slightly or when only a few instances remain 28. b) Developing a regression model: During the growth of a tree, the RBIS constructs a linear regression model in each subspace. Regression models are created for each inner node by using data associated with that node and all tested variables in the sub-tree rooted at that node. c) Pruning the tree: In this step for avoiding over fitting problem, the tree is pruned. If the SDR value for the linear regression model at the root of sub-tree is smaller than or equal to the expected error for the sub-tree, some of leaves from the sub-tree are pruned. d) Smoothing the tree: Finally, in the last step, the tree is smoothed to compensate for discontinuing between the pruned leaves and adjacent linear models. This process computes 4 ACS Paragon Plus Environment

Page 5 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Energy & Fuels

the predicted value through linear model at the leaf and smooths it by combining this predicted value with the estimated one using the following equation:

 +  (2) + where ṕ is the prediction passed up to the next higher node, p is the prediction passed to ́ =

this node from below. q is the value predicted by the model at this node, n is the number of training data that reach the node and k is a constant 28 .

3. Results and discussions Relationships of various oxides with their representative AFTs were illustrated at Fig 3. In Fig 3, red shaded zones show the densest area of the scatterplot containing high volumes of samples while the gray shaded zones encompass everything else and are less dense than the red area. The VIMs obtained by the proposed RBIM, ranked variables based on their interactions and predictability of AFTs from significance to weakly important variables (Fig 4). Based on these results, Al2O3 has the highest importance for the prediction of AFTs. Fe2O3, SiO2 and CaO ranked after Al2O3 for all three AFTs. To evaluate the proposed RBIS, randomly 80 % of instances selected as the training set to extract predictor rules and remaining 20% as the testing set. The number of obtained prediction rules of the RBIS for IDT, ST and FT were 19, 15 and 20, respectively. As an example of the RBIS output, Tables 1-3 include a set of six rules with their prediction accuracy in terms of Mean Absolute Error (MAE) (Eq.(3)) for AFTs. 

 = ∑   | −  |

(3)



where  and  are actual and predicted values obtained by the RBIS, respectively and ! is the number of samples. The AFT values are predicted from the linear regression in the terminal node that the sample is classified by the RBIS. Table 1-3 shows the MAE values that is associated with different conditions of the predictor rules.

For example, the rule “1” in Table 1, with antecedent

"#$2 18 #I #$2 > 39 #$2