DeepChemStable: Chemical Stability Prediction with an Attention

Feb 14, 2019 - School of Pharmaceutical Sciences & School of Data and Computer Science, Sun Yat-Sen University , 132 East Circle at University City, ...
0 downloads 0 Views 577KB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Chemical Information

DeepChemStable: Chemical Stability Prediction with an Attention-based Graph Convolution Network Xiuming Li, Xin Yan, Qiong Gu, Huihao Zhou, Di Wu, and Jun Xu J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00672 • Publication Date (Web): 14 Feb 2019 Downloaded from http://pubs.acs.org on February 14, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

DeepChemStable: Chemical Stability Prediction with an Attention-based Graph Convolution Network Xiuming Li§1, Xin Yan§*1, Qiong Gu1, Huihao Zhou1, Di Wu1, and Jun Xu*1,2 1

School of Pharmaceutical Sciences & School of Data and Computer Science, Sun

Yat-Sen University, 132 East Circle at University City, Guangzhou 510006, China 2

School of Computer Science & Technology, Wuyi University, 99 Yingbin Road, Jiangmen 529020, China § Equal

contributors. *To whom correspondence should be addressed.

Contact: [email protected]. Abstract In the drug discovery process, unstable compounds in storage can lead to false positive or false negative bioassay conclusions. Prediction of the chemical stability of a compound by de novo methods is complex. Chemical instability prediction is commonly based on a model derived from empirical data. The COMDECOM (COMpound DECOMposition) project provides the empirical data for prediction of chemical stability. Models such as the extended-connectivity fingerprint and atom center fragments, were built from the COMDECOM data and used for estimation of chemical stability, but deficits in the existing models remain. In this paper, we report DeepChemStable, a model employing an attention-based graph convolution network based on the COMDECOM data. The main advantage of this method is that DeepChemStable is an end-to-end model, which does not predefine structural fingerprint features, but instead, dynamically learns structural features and associates the features through the learning process of an attention-based graph convolution network. The previous ChemStable program relied on a rule-based method to reduce the false negatives. DeepChemStable on the other hand, reduces the risk of false negatives without using a rule-based method. Because minimizing the rate of false negatives is a greater concern for instability prediction, this feature is a major improvement. This model achieves an AUC value of 84.7 %, recall rate of 79.8%, and ten-fold stratified cross validation accuracy of 79.1%.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction High throughput screening requires that compounds should be chemically stable in order to avoid false positive or false negative hits

1,2.

Many factors contribute to the

chemical instability of compounds in a repository, and include chemical substructures, storage conditions, solvents and temperature

3-4.

Chemical substructures, particularly

reactive “warheads” however, are major contributors to instability per se. It is challenging to devise a systemic method to identify the “warhead” substructures from thousands or millions of compounds. It is very difficult to resolve this problem through de novo approaches and consequently the COMDECOM (COMpound DECOMposition) project5 was completed to provide empirical results concerning the chemical stability prediction of compound libraries. COMDECOM data contain structurally diverse compounds whose stabilities were measured experimentally in a mixture of DMSO and H2O. The compound purity was monitored at 0, 14, 35 and 105 day time points and models were built from these data 1, 5-6 using global molecular descriptors, such as topological polar surface area 7. The global descriptors or fingerprints did not work well because they described whole molecular features but failed to deal with local effects involving for example, substructures such as reactive “warheads”. To better deal with reactive “warheads”, reaction site descriptors were introduced to emphasize local effects, but the impact of chemical environment on the reaction site was not taken into account. To resolve problems of this sort in these models, an Atom Center Fragment (ACF) approach was proposed as features of a naïve Bayesian classifier named ChemStable 6. The ACF approach emphasizes the reaction sites and considers the chemical environment impacts, and in this way achieves improved results 8. Additionally, external rules (fragments) for prediction of instability were embedded in ChemStable to strengthen various features. The ACF approach achieves better (76.5%) accuracy in its prediction of stability. However, the power of the resulting ACF descriptors was limited since all fragments were retained without any filtering. This may results in a large storage requirement a system with considerable noise, which limits the performance of the predictive model while rarely occurring ACF features had little effect. Moreover, ACF descriptors only indicated the presence or absence of substructures and failed to embrace structural information beyond connectivity.

ACS Paragon Plus Environment

Page 2 of 14

Page 3 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Recently, deep learning approaches have been introduced in cheminformatics

9-10.

With these approaches, predefined molecular descriptors are used as input features and then multiple hidden layers are designed, producing a deep learning model with which to predict molecular properties11-12. An example of this process is DeepTox 13. Deriving descriptors from compound structure data is a crucial step and a deep learning system,

a graph convolution neural network (GCN) was proposed to

automatically extract features from compound structural data 14-17. Graph convolution layers, standard convolution layers applying local filters, were specialized by learning local features

18-19

which are analogous to molecular structure fragments. GCN is

knowledge-free and has no manual tweaking but can learn the appropriate representations of original molecular graphs 20. A graph convolution operation can capture local features that represent local chemical effects 21. A simple global pooling step

combines all atom features into a

single vector, which is unable to distinguish different learned features14. Attention mechanism can capture global dependencies among features of substructures. It is often used in top conjunction with a recurrent neural network and has been adopted in many applications 22-23. Google 24 proposed a simple network architecture that relies entirely on an attention mechanism to derive global dependencies. To further improve the ACF approach, we propose to build a model, DeepChemStable based on an attention-based graph convolution network. DeepChemStable takes both local and global structural information into account.

Methods Data process COMDECOM data consists of chemical structures and compound purity data measured at days 0, 14, 35 and 105. Compounds used in this work were tested in solution in DMSO/H2O (20 % v/v of water). Compounds whose concentration was 5% higher than that which was measured previously were removed. Salts and duplicate compounds were also removed. The ratio between the purity at the 105th day (p105) and the purity at day zero (p0) was used as an index of the instability of the compound. From the data examined, this resulted in 6442 stable compounds (p105/p0 ≥ 0.8) and 3304 unstable compounds (p105/p0 ≤ 0.7) from the compounds that were tested. Compounds for which 0.7< p105/p0 < 0.8 were rejected from the analysis. In this study, the same criterion for

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 14

data filtering used by ChemStable was applied. The structures of 9746 compounds were converted into SMILES OpenBabel

26

25

format with

software and parsed by RDkit toolbox, an open-source package. Five

compounds encountering errors when parsed by RDkit were removed. The training, validation and test set were created in a ratio of

approximately 8:1:1.

A stratified sampling strategy was used in an effort to maintain the similar proportion of unstable and stable data in each set and the number of compounds in the training, validation and test sets are 7792, 973 and 976 respectively. To alleviate the effect of similarity of compounds among the three sets in the performance in deep learning models, the similarities were calculated based on molecular fingerprints27 with a Tanimoto formula. Low similarities can help describe the capability of generalization of the model because the test set or the validation set includes newer compounds compared to the training set. The average similarities between the training set and the validation set, the training set and the testing set, the validation set and the testing set were 0.354, 0.353 and 0.353 respectively. Graph convolution embedding The DeepChemStable model combines a graph convolution network and attention mechanism. The model architecture is depicted in Figure 1.

Figure 1. The abstract architecture of the attention-based graph convolution model

As shown in Figure 1, the training data are in the format of the SMILES (chemical structure linear notations) and represent the chemical structure connectivity. The graph convolution operation captures the local structural features and an attention layer is applied with learned graph convolution features as embedding to capture global structural dependencies. The attention mechanism identifies important features

ACS Paragon Plus Environment

Page 5 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

which determine chemical stability. DeepChemStable takes both local and global structural information into account, and generates appropriate molecular descriptors to further boost the predictive accuracy of chemical stability. Each compound is represented in an undirected graph with atoms as nodes and bonds as edges. Atoms and bonds are encoded with structural features which preserve attributed information to characterize the surrounding chemical environment. Taking carbon atoms as an example, aromatic carbons, carbons of carbonyl groups or alkyl groups have different chemical environments. These atom and bond features will be learned by an attention-based graph convolution network with their relative contributions. An atom is represented in a feature vector with 38 components and a bond is represented in another feature vector with 6 components. These attributes of atoms and bonds are calculated with RDkit toolbox. Attributes are listed in Tables 1 and 2. Table 1. Description of atom features

Attribute

Description

Dimension

Atom type

C, N, O, S F, Si, P, Cl, Br, I, B or “Unknown”

12

(one-hot). Degree

Number of heavy atom neighbors (one-hot).

6

#Hydrogen

Number of hydrogens neighbors (one-hot).

5

Valence

Implicit valence (one-hot).

6

Aromaticity

Whether the atom is in an aromatic system.

1

#Radical Electrons

Number of radical electrons

1

Hybridization

sp, sp2, sp3, sp3d or sp3d2 (one-hot).

5

Charge

Formal charge

1

Partial charge

Gasteiger partial charge 28

1

Table 2. Description of bond features

Attribute

Description

Dimension

Bond type

Single, double, triple or aromatic (one-hot).

4

Conjugated

Whether the bond is conjugated (one-hot).

1

Ring

Whether the bond is in a ring (one-hot).

1

The atom features learned by a graph convolution layer are regarded as local fragments. Atom features are iteratively updated based on their own features in the

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

previous layer and those of their neighbors. Bond features are constant. As depicted in Figure 2, in the first graph convolution layer, the initial atom features in the molecular graph are applied a standard hidden layer and an activation function to form layer-0 atom features. Layer-1 atom features are constructed based on layer-0 atom features and its neighboring features. The neighboring features are calculated by concatenating neighboring atom features and their respective bond features, and then passed through a standard hidden layer and an activation function. Layer-n atom features are formed by repeating this process. Different network weights are applied to different layers. In the top graph convolution layer, all atom features are the learned fragments in a molecular graph and are packed together as the graph embedding for the subsequent attention layer. The pseudocodes of this model can be found in the Supporting Information (SI) as Table S1. Instead of simply treating a fragment as a single binary integer (1 for presence in and 0 for absence from the molecule), a graph convolution layer can use learnable functions to atoms in the molecular graph by taking the local chemical environment into account.

Figure 2. An example representation of DeepChemStable architecture. A two-layer graph convolution network results in molecular graph embedding followed by an attention layer.

Attention Mechanism As shown in Figure 2, instead of simply combining all learned fragment features with

ACS Paragon Plus Environment

Page 6 of 14

Page 7 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

a global pooling step after several graph convolution layers, an attention layer is applied to capture the different importance of fragments in determination of stability. The attention mechanism captures the influence on an individual fragment of other fragments. We employed a Scaled Dot-Product Attention24. If G denotes the graph embedding of dimension dG, the attention weights are calculated using equation (1). In graph embedding, each learned atom feature is assigned an attention weight representing the extent of their contribution to the predicted result. Attention(𝐺) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥

( ) 𝐺𝐺𝑇

𝑑𝐺

(1)

Model training and hyperparameter optimization In our dataset, there are almost twice as many stable compounds as unstable compounds. The weighted sigmoid cross entropy is applied to the cost function to alleviate this unbalanced problem. L2 regulation is also utilized on the network weights. The cost function is calculated from equations (2)-(4): 𝑐(𝑖) =

{

𝑖𝑓 𝑥(𝑖) 𝑖𝑠 𝑠𝑡𝑎𝑏𝑙𝑒 𝑖𝑓𝑥(𝑖) 𝑖𝑠 𝑢𝑛𝑠𝑡𝑎𝑏𝑙𝑒

1 𝐶

𝐿(𝑦(𝑖), 𝑦(𝑖)) = ― 𝑦(𝑖)log (𝑦(𝑖)) ― (1 ― 𝑦(𝑖))log (1 ― 𝑦(𝑖)) 𝐽(𝑤) =

1

𝑚

∑𝑐

(𝑖)

∑𝑐(𝑖)𝑖 = 1

𝐿(𝑦(𝑖), 𝑦(𝑖)) +

𝜆 ∥ 𝑤 ∥ 22 2𝑚

(2) (3) (4)

(𝑖)

where (𝑥(𝑖), 𝑦 ) denotes the 𝑖𝑡ℎ training example, 𝑦(𝑖) denotes the prediction generated, 𝑐(𝑖) denotes the positive weight of training examples, C is a constant, 𝑚 denotes the number of training examples, 𝑤 denotes weights of neural network and 𝜆 denotes L2 regulation parameter. Table 3 lists the hyperparameters to be optimized and their search range. The model is trained using a training set with different hyperparameter selections and evaluated with a validation set. The best hyperparameters were determined by a random search strategy29 with accuracy of validation sets as the criterion. ReLU is for rectified linear unit and SELU is for scaled exponential linear units. Table 3. Hyperparameters considered for DeepChemStable

Hyperparameters

Value considered

Number of graph convolution layers 3, 4, 5 Number of output embedding size

100, 150, 200

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Learning rate

10-3, 10-4, 10-5

Positive weight

1.0, 1.5, 2.0

L2 regulation parameter

0, 10-2, 10-3, 10-4

Activation function

tanh, ReLU, leakly ReLU, SELU

Among the best accuracy results of validation sets, DeepChemStable model generated 4 graph convolution layers and output atom embedding size of 200. The best learning rate is 10-4, the best positive rate is 1.5 and the best L2 regulation parameter is 10-4. The model uses initial weight standard deviations of 0.01. The best activation function for this architecture is ReLU with additional batch normalization in each layer. All gradient descent steps were done using the Adam optimizer 30 with β1 = 0.001, β2 = 0.01 and a learning rate of 0.0001. With the above best hyperparameters, the model was tested on the test set and applied with ten-fold cross validation across the entire dataset to obtain final results. The graph convolution network can utilize a different size of output atom embedding to control the fitting of models and this makes it more flexible.

Results The accuracies of DeepChemStable model with validation set and testing set were 80.3% and 80.5% respectively. The recall rates were computed to estimate the risks of false negatives because it was more severe to predict an actual unstable compound as a stable compound. False negatives can mislead scientists to continue working futilely on unstable compounds, thus wasting time and resources. The recall rates of the validation set and testing set were 80.2% and 79.8%. Higher recall rates mean lower risk of false negatives. Additionally, the widely-used area under receiver operating characteristic curve (AUC) score was also calculated as a complementary criterion. The DeepChemStable model produced AUC scores of 84.4% and 84.7% of the validation set and testing set, respectively. To investigate whether DeepChemStable model improved the performances, we compared DeepChemStable results against ChemStable results using the same COMDECOM data and a 10-fold stratified cross validation (CV) process. In each validation run, the ratio of unstable compounds to stable compounds was almost the same as the ratio in the entire dataset. We also calculated the average similarity between

ACS Paragon Plus Environment

Page 8 of 14

Page 9 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

the training set and testing set in each run. It turned out that the average similarities in all runs were less than 0.36, and the performance of DeepChemStable was 79.1%. Although DeepChemStable and ChemStable have similar AUC and precision rates, DeepChemStable has higher global accuracy and recall rates with p < 0.01 (Student t test). Higher recall rates mean that DeepChemStable can reduce the number of false negatives. To demonstrate the contribution of the attention layer, the accuracy of the model with a sum-pooling layer instead of an attention layer after graph convolution was also calculated as 78.6%. Figure 3 shows a comparison of the performance of DeepChemStable with two different layers and that of ChemStable. In the ChemStable model, 30 false negatives were corrected by 20 embedded rules. To determine the capacity of DeepChemStable to reduce false negatives, we selected compounds containing the fragments of 20 embedded rules for DeepChemStable to predict. This resulted in only 5 false negatives, and we concluded that DeepChemStable is able to reduce the risk of false negatives.

Figure 3. Ten-fold cross validation performance of DeepChemStable and ChemStable. Error bars represent standard deviation.

Visualizing unstable fragments DeepChemStable allows a user to gain insight into the features which are responsible for a compound’s instability. At each graph convolution layer, a softmax function was applied to the atom features in each layer, resulting in scaled activation of

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

atom features. An atom feature in each graph convolution layer was a distinct fragment with a scaled activation value. At the attention layer, attention weights were assigned to vectors in graph embedding. The model had four graph convolution layers and an attention layer. In a molecule, each vector in graph embedding is considered to be a central atom assigned with an attention weight. Each central atom corresponds to four fragments in the four graph convolution layers. When a compound was predicted to be unstable, our model selected the central atom with the most attention weight, then selected the most activated fragment for the central atom to be the fragment responsible for the instability, and visualized it for the user. If a compound is predicted as unstable by DeepChemStable, the unstable fragment is highlighted in red in the examples shown in Figure 4.

Figure 4. Some examples of unstable fragments (highlighted in red) learned by DeepChemStable model.

Discussion The chemical stability of a compound is related predominantly to a localized substructure. Hence, a model based on a descriptors-based global structure cannot result in satisfactory prediction of a compound’s stability. Fragment-based stability prediction models, such as ChemStable, combined with Bayesian learning can improve the accuracy of predictions. However, ChemStable uses predefined fragments to represent the chemical environment of the “warhead” substructure. The main limit of this approach is the lack of the flexibility and consequently, an increase in the false negatives predicted to be unstable to derive the chemical environment of the “warhead” substructures. In order to solve this issue, we proposed that DeepChemStable should flexibly

ACS Paragon Plus Environment

Page 10 of 14

Page 11 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

identify the chemical environment of the “warhead” substructures. The advantages of this approach are summarized as follows: (1) DeepChemStable is an end-to-end model, which does not pre-define structural fingerprint features, but learns structural features dynamically through an attentionbased graph convolution network, and derives a relationship between the features and the instability. (2) DeepChemStable reduces the risk of false negatives by determining which relevant fragments are responsible for the instability. Our previous version of ChemStable, calculated all 13,340 ACF-2 fragments without explicitly defining which was responsible for the instability. These fragments were assigned scores (the possibility to that they were unstable fragments) based on the Bayesian theorem. Therefore, the previous method could miss some important unstable fragments due to an improper threshold or an improper layer, and had to rely on rule-based method to “brutally” reduce the false negatives. Since minimizing the false-negative rate is a greater concern for instability prediction, this feature is a major improvement. DeepChemStable achieved an AUC value of 84.7% recall rate of 79.8%, and a ten-fold stratified cross validation accuracy of 79.1%. To compare the unstable fragments generated from both methods, we pulled the top 20 unstable fragments from the DeepChemStable model (see SI Figure S1). As expected, there were inconsistencies between ChemStable and DeepChemStable. For smaller “warheads”, they reached the same conclusions. However, the unstable fragments predicted by DeepChemStable can be larger fragments.

ASSOCIATED CONTENT Supporting information This information is available free of charge via the Internet at http://pubs.acs.org

Additional Information The trained DeepChemStable model is free to be used and the codes are accessible at https://github.com/MingCPU/DeepChemStable

Funding

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 14

This work is funded in part of National Key R&D Program of China (2018ZX09735010, 2017YFB02034043), GD Frontier & Key Techn. Innovation Program

(2015B010109004),

GD-NSF

(2016A030310228),

Natural

Science

Foundation of China (81473138, U1611261), Program for GD Introducing Innovative and Entrepreneurial Teams (2016ZT06D211), GD Provincial Key Laboratory of Construction Foundation (2017B030314030), and Fundamental Research Funds for the Central Universities (17LGJC23).

References 1.

Waterman, K. C.; Adami, R. C., Accelerated aging: prediction of chemical stability of

pharmaceuticals. Int. J. Pharm. 2005, 293 (1-2), 101-125. 2.

Blaxill, Z.; Holland-Crimmin, S.; Lifely, R., Stability Through the Ages: The GSK Experience. J.

Biomol. Screen. 2009, 14 (5), 547-556. 3.

Connors, K. A.; Amidon, G. L.; Stella, V. J.; Stella, V. J., Chemical stability of pharmaceuticals: a

handbook for pharmacists. John Wiley & Sons: 1986. 4.

Di, L.; Kerns, E. H., Stability challenges in drug discovery. Chem. Biodivers. 2009, 6 (11), 1875-

1886. 5.

Zitha-Bovens, E.; Maas, P.; Wife, D.; Tijhuis, J.; Hu, Q. N.; Kleinoder, T.; Gasteiger, J.,

COMDECOM: predicting the lifetime of screening compounds in DMSO solution. J. Biomol. Screen. 2009, 14 (5), 557-65. 6.

Liu, Z.; Zheng, M.; Yan, X.; Gu, Q.; Gasteiger, J.; Tijhuis, J.; Maas, P.; Li, J.; Xu, J., ChemStable:

a web server for rule-embedded naïve Bayesian learning approach to predict compound stability. J. Comput. Aided Mol. Des. 2014, 28 (9), 941-950. 7.

Ertl, P.; Rohde, B.; Selzer, P., Fast calculation of molecular polar surface area as a sum of fragment-

based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 2000, 43 (20), 3714-3717. 8.

Xu, J., 13C NMR spectral prediction by means of generalized atom center fragment method.

Molecules 1997, 2 (8), 114-128. 9.

Gawehn, E.; Hiss, J. A.; Schneider, G., Deep Learning in Drug Discovery. Mol. Inform. 2016, 35

(1), 3-14. 10. Unterthiner, T.; Mayr, A.; Klambauer, G.; Steijaert, M.; Wegner, J. K.; Ceulemans, H.; Hochreiter, S. In Deep learning as an opportunity in virtual screening, Proceedings of the deep learning workshop at NIPS, 2014; pp 1-9. 11. Mamoshina, P.; Vieira, A.; Putin, E.; Zhavoronkov, A., Applications of deep learning in biomedicine. Mol. Pharm. 2016, 13 (5), 1445-1454. 12. Lusci, A.; Pollastri, G.; Baldi, P., Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 2013, 53 (7), 1563-1575. 13. Andreas Mayr, G. K., Thomas Unterthiner and Sepp Hochreite, DeepTox: Toxicity Prediction using Deep Learning. Front. Env. Sci. 2016, 3 (80). 14. Duvenaud, D. K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.;

ACS Paragon Plus Environment

Page 13 of 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Adams, R. P. In Convolutional networks on graphs for learning molecular fingerprints, Advances in neural information processing systems, 2015; pp 2224-2232. 15. Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P., Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 2016, 30 (8), 595-608. 16. Coley, C. W.; Barzilay, R.; Green, W. H.; Jaakkola, T. S.; Jensen, K. F., Convolutional embedding of attributed molecular graphs for physical property prediction. J. Chem. Inf. Model. 2017, 57 (8), 17571772. 17. Niepert, M.; Ahmed, M.; Kutzkov, K. In Learning convolutional neural networks for graphs, International conference on machine learning, 2016; pp 2014-2023. 18. Defferrard, M.; Bresson, X.; Vandergheynst, P. In Convolutional neural networks on graphs with fast localized spectral filtering, Advances in Neural Information Processing Systems, 2016; pp 38443852. 19. Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y., Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 2013. 20. Kipf, T. N.; Welling, M., Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 2016. 21. Xu, Y.; Pei, J.; Lai, L., Molecular Graph Encoding Convolutional Neural Networks for Automatic Chemical Feature Extraction. arXiv preprint arXiv:1704.04718 2017. 22. Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V., MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 2018, 9 (2), 513-530. 23. Karimi, M.; Wu, D.; Wang, Z.; Shen, Y., DeepAffinity: Interpretable Deep Learning of CompoundProtein Affinity through Unified Recurrent and Convolutional Neural Networks. arXiv preprint arXiv:1806.07537 2018. 24. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. In Attention is all you need, Advances in Neural Information Processing Systems, 2017; pp 5998-6008. 25. Weininger, D., SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem Inf. Comp. Sci. 1988, 28 (1), 31-36. 26. O'Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R., Open Babel: An open chemical toolbox. J. Cheminformatics 2011, 3 (1), 33. 27. Landrum, G. RDKit: Open-source cheminformatics Version: 2017-09-3. http://www.rdkit.org/ (accessed September 30, 2018). 28. Gasteiger, J.; Marsili, M., Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges. Tetrahedron 1980, 36 (22), 3219-3228. 29. Bergstra, J.; Bengio, Y., Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13 (Feb), 281-305. 30. Kingma, D. P.; Ba, J., Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For Table of Contents use only DeepChemStable: Chemical Stability Prediction with an Attention-based Graph Convolution Network Xiuming Li§1, Xin Yan§1, Qiong Gu1, Huihao Zhou1, Di Wu2and Jun Xu*1

ACS Paragon Plus Environment

Page 14 of 14