Inner and Outer Recursive Neural Networks for ... - ACS Publications

†Department of Computer Science, University of California, Irvine, Irvine, California. 92697, United States. ‡ExxonMobil Reserach and Engineering,...
0 downloads 7 Views 1MB Size
Subscriber access provided by RUTGERS UNIVERSITY

Application Note

Inner and Outer Recursive Neural Networks for Chemoinformatics Applications Gregor Urban, Niranjan Subrahmanya, and Pierre Baldi J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00384 • Publication Date (Web): 10 Jan 2018 Downloaded from http://pubs.acs.org on January 12, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Inner and Outer Recursive Neural Networks for Chemoinformatics Applications Gregor Urban,∗,† Niranjan Subrahmanya,∗,‡ and Pierre Baldi∗,† †Department of Computer Science, University of California, Irvine, Irvine, California 92697, United States ‡ExxonMobil Reserach and Engineering, Annandale, New Jersey 08801, United States E-mail: [email protected]; [email protected]; [email protected]

Abstract Deep learning methods applied to problems in chemoinformatics often require the use of recursive neural networks to handle data with graphical structure and variable size. We present a useful classification of recursive neural network approaches into two classes, the Inner and Outer approach. The inner approach uses recursion inside the underlying graph, to essentially ’crawl’ the edges of the graph, while the outer approach uses recursion outside the underlying graph, to aggregate information over progressively longer distances in an orthogonal direction. We illustrate the inner and outer approaches on several examples. More importantly, we provide open-source implementations1 for both approaches in Tensorflow which can be used in combination with training data to produce efficient models for predicting the physical, chemical, and biological properties of small molecules.

1

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Introduction

In recent years, neural networks 1 and deep learning have been used to successfully tackle a variety of problems in areas ranging from computer vision 2 , speech recognition 3 , high energy physics 4,5 , chemistry 6–8 , to biology 9,10 . Many of these problems involve data items represented by tensors of fixed size, for instance images in computer vision. In these cases, feedforward architectures expecting input of constant size can be applied. However, in many applications the data items are not of fixed size and often come with a structure represented by a graph. This is the case for sentences or parse trees in natural language processing, molecules or reactions in chemoinformatics, and nucleotide or amino acid sequences or 2D contact maps in bioinformatics. In general, recursive neural network architectures must be used to process variable-size structured data, raising the issue of how to design such architectures. Some principled approaches have been developed to design recursive architectures, but they have not been organized systematically. We present a classification of the known approaches into two basic classes: the Inner class and the Outer class. The distinction is helpful both to better understand and organize previous approaches, and to develop new ones more systematically. While we give examples of both, the goal of this brief technical note is not to review the entire literature on recursive networks. The goal, rather is to introduce the inner/outer distinction through a few examples and, most importantly, to deliver open source software implementations of both approaches for chemoinformatics applications. Before delving into further details, it is useful to clarify the use of the terms recurrent and recursive networks. We call a neural network recurrent if it is represented by a directed graph containing at least one directed cycle. A recursive network contains connection weights or entire subnetworks that are shared, often in a systematic way (see Figure 1). There is some flexibility in the degree and multiplicity of sharing that is required in order to be called recursive. For instance, at the low end of the spectrum, convolutional networks and siamese networks (these contain only two copies of the same network) could both be considered recursive. At the high end of the spectrum, any recurrent network unfolded in time yields a highly recursive network. Of course, the notion “recursive network” becomes most useful when there is a regular pattern of connections.

2

ACS Paragon Plus Environment

Page 2 of 13

Page 3 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(a)

(b)

A

B

C

(c)

A

B

C

𝐕𝒕+𝟑

A

B

C

𝐕𝒕+𝟐

A

B

C

𝐕𝒕+𝟏

A

B

C

𝐕𝒕

(d)

(e)

Figure 1: Left to right: (a) recurrent network with three neurons forming a directed cycle; (b) the same network unfolded in time (recursive representation); (c) sequence of vectors Vt such that Vt+1 = Ft+1 (Vt ); (d) reparameterization of (c) using neural networks (trapezoids) with Vt+1 = NNt+1 (Vt ) (each network can be different and, for instance, include many hidden layers); (e) corresponding recursive network obtained by using identical networks, or weight sharing, with Vt+1 = NN(Vt ).

2

The Inner Approach

The inner approach requires that the underlying structure be represented by a directed acyclic graph (DAG), as in the case of Bayesian networks in probabilistic graphical models 11 . In this approach, the variables associated with each node of the DAG are a function of the variables associated with all parent nodes, and this function is parameterized by neural networks (e.g. 12,13 ). In other words, the same neural network is used to incrementally update and propagate a state vector along the directed edges of the underlying molecular graph. This process can be termed “crawling”, and starts from source nodes, which have no incoming directed edges, and terminates in sink nodes, which have no outgoing directed edges and where outputs are produced. A DAG is required for this to work, as the crawling process would be ambiguous if the graph was undirected, and would not terminate if it had cycles. The motivation behind this crawling process is that it is a natural way of aggregating information throughout molecular graphs using recursive neural networks, a general approach that has been successful in other domains, for instance in dealing with biological or natural language sequences. Some examples of models that fall into this category of inner approaches are all the recursive neural networks for processing sequences that are associated with the DAGs of various Markov models 11 , such as hidden Markov models (HMMs), input-output HMMs, factorial HMMs, and their bidirectional variants. The application of the inner approach to hidden Markov models is illustrated in Figure 2. The variable Ht represents the hidden state at time t and the variable Ot represents the

3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

output symbol (or distribution over symbols). They can be parameterized using two neural networks NNH and NNO with the recursive relations Ht = NNH (Ht−1 ),

Ot = NNO (Ht )

where NNH crawls the graph from one hidden state to the next, and NNO produces the outputs. These two neural networks can be shared for all values of t.

NN𝑂

NN𝑂

NN𝐻

NN𝑂

NN𝐻

Figure 2: The inner approach applied to a hidden Markov model parameterized by recursive neural networks. The inner approach has been successfully applied to, for instance, protein secondary structure or relative solvent accessibility prediction using bidirectional Input-Output HMMs 14,15 . It has further been generalized to the case of grids with 2 or more dimensions ( 9 ), e.g. the problem of protein contact map prediction 16 , the game of GO 17 , as sentiment prediction in natural language processing 18 , and to computer vision problems 19,20 . The latter models operate on 2D images, where one way of transforming the highly cyclic grid of pixels into a DAG is by applying recurrent neural networks to rows and columns of pixels separately. Matters are slightly more complicated in the domain of chemoinformatics where a typical goal is to predict physical, chemical, or biological properties of small molecules, such as solubility. Small molecules are represented by bond graphs, with atom labels associated with the vertices and bond labels associated with the edges. These graphs are undirected and often contain cycles (e.g. aspirin, benzene). To apply the inner approach to this problem, one has to either find a canonical way of acyclically orienting the edges, or sample the possible acyclic orientations in a systematic way, or use all possible acyclic orientations. We follow the latter approach (as described in 7 ) for our Tensorflow implementation of the inner approach. Thus, for a molecule with Na atoms we generate Na copies of the bond graph, and select a different node to be the sink node in each one of the copies (see Figure 4 or Figure 1 in supplement). Within each copy, all the edges are directed towards the sink 4

ACS Paragon Plus Environment

Page 4 of 13

Page 5 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

node in order to produce an acyclic orientation of the corresponding graph. Considering all possible orientations is computationally feasible for small molecules in organic chemistry because the number of vertices is relatively small, and so is the number of edges due to valence constraints. Note that the information about atom- and bond-types is contained in the vectors that are processed by the neural networks crawling the molecular graphs. In the implementation, we use three neural networks. NNH1 is a shared neural network used to crawl the Na DAGs associated with the molecular graphs. The crawling process produces a hidden vector in each of the Na sink nodes. A second shared neural network NNH2 is used to process these Na hidden vectors, and the results are aggregated and finally fed to a single feedforward neural network NNO to compute the final output, which can be binary in the case of a classification problem (e.g. toxic vs non-toxic) or numerical in the case of a regression problem (e.g. solubility measure). One is of course free to specify the architecture of each one of these three feedforward neural networks. From experience, we typically use fully connected neural networks with a single hidden layer, and one to a few dozen Rectified-Linear 1 neurons for all three cases. Alternatively, one could for instance use Long-Short Term Memory (LSTM) units 21 in the crawling networks. The maximum number of connected partners that an atom can have is typically four in organic chemistry, but can be greater if heavy atoms (e.g. Pd) are allowed. While the implementation requires this value to be fixed for the crawling network NNH1 , it is a parameter p that can be chosen by the user in advance. Thus, the input vector of NNH1 is of size p × s where s is the size of the feature vector associated with each parent atom, describing for instance the atom type and the bond type. Zero-padding is used for vertices with less than p neighbors. A detailed description of the algorithm in the form of pseudo-code is given in Algorithm 1 in the supplement. When generating the molecular DAGs, one source of ambiguity arises with molecules that have cyclic connections, as their rings have to be broken in a consistent way. One implementation variant uses “ring contraction” where rings are replaced with a single vertex in the DAG 7 . An alternative is to break rings for each molecular sub-DAG in such a way that the distances from all vertices in the sub-DAG to the sink vertex (gray nodes in Figure 4 ) are minimized. We did not observe significant differences in prediction accuracies between these two cycle resolution strategies.

5

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3

The Outer Approach

In the outer approach, the graphs representing the data can be directed or undirected and acyclic orientation is not required. In the inner approach, the neural network recursion is carried through the underlying DAG associated with the data. In the outer approach, the neural network recursion is carried through a new DAG that is built in a direction “orthogonal” to the graph associated with the data, and thus the latter can be directed or undirected, cyclic or acyclic (see 4 for a diagram juxtaposing both approaches). Figure 3 illustrates a simple outer recursive network in the case of a sequence problem. At the first level in the direction orthogonal to the sequence, there is a neural network NN1 shared at all locations, which processes local input windows over the sequences and produces hidden vectors. At the second level, a neural network NN2 operates in the same fashion over the hidden vectors produced by the first stage, and so forth. At each stage of this orthogonal hierarchy, the hidden vectors produced aggregate information about the original sequence over increasingly long distances. The activity Aki of the unit associated with vertex i in layer k is given by Aki = Fik (Ak−1 ), N k−1 (i) where N k−1 (i) denotes the neighborhood of vertex i in layer k − 1 and the function Fik can e.g. be parameterized by a neural network. Convolutional neural networks are merely a special case of the outer approach: in particular, the weight sharing within a layer or across layers is not necessary, or can be partial, or can occur across layers, and so forth. The outer approach can also be deployed on the edges of the original graph, rather than its vertices. 𝐍𝐍𝟑

𝐍𝐍𝟐

𝐍𝐍𝟏

Figure 3: Outer approach in the case of sequence processing with three layers. In this figure, all the networks in a given layer share the same weights, have the same shape, but are distinct across the different layers. All three conditions can be relaxed. All convolutional neural networks used for image processing can be viewed as examples of the outer approach, as well as the early work on protein secondary structure prediction 22 . 6

ACS Paragon Plus Environment

Page 6 of 13

Page 7 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

More recent examples of outer approaches in chemoinformatics are reactivity prediction 23 , automatic molecule design 24 , and predicting chemical properties 8,25 . Our software implementation of the outer approach for chemoinformatics applications closely follows Duvenaud et al. 8 , which is based on stacking neural networks on top of each atom, an approach inspired by circular fingerprints 26 . Figure 4 shows the wiring diagram of the outer approach with two hidden layers, as well as a corresponding graph for the inner approach. At a high level, the model resembles a convolutional neural network whose layers are stacked on top of each other and the lowest one connects to the graph of the molecule. However, here the “convolution” operates on a variable number of elements, as the number of connected partners of atoms in molecules varies from atom to atom. The weights of this convolution are all fixed to one, rather than being learned. Alternatively, one could use a symmetric filter with two learnable values: one value for the current atom at the center of the filter and one value for all the neighboring atoms. Note that unlike the case of temporal sequences, in the case of molecular graphs there is no simple left-right ordering of atoms, and thus any non-symmetric learned filter is likely to produce inconsistent results.

Inner Approach

Outer Approach Prediction

Prediction: 𝐍𝐍𝐎 fn 𝐫…𝟐 , 𝐫…𝟏 , 𝒓…

Second hidden layer: update 𝐫𝑎𝟐  fn 𝐛… , 𝐫…𝟏 ∀ 𝑎𝑡𝑜𝑚𝑠 𝑎 𝚺

Prediction

First hidden layer: update 𝐫𝑎𝟏  fn 𝐛… , 𝐫… ∀ 𝑎𝑡𝑜𝑚𝑠 𝑎

𝐍𝐍𝐎

Input: graph, bond-/atom features 𝐛𝑖 , 𝐫𝑖 ,

Figure 4: Connectivity graph for both the inner- and outer approaches applied to the propionic acid molecule. The outer approach resembles a convolutional network that naturally operates orthogonally to the underlying molecular graph, while the inner approach is operating within the DAGs associated with the molecular graph. Model predictions are computed by the NNO network, whose input is an accumulation of contributions from other layers in the model (blue connections). A detailed pseudo-code description of the outer algorithm is given in the supplement (Algorithm 2). The essential operations per layer are: accumulating the features of directly neighboring atoms and bonds, combining these into a fixed-sized vector by summation, and then using matrices with learned weights to project these into a new set of atom features 7

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(to be used by the next layer of the model) and into a fixed-length fingerprint vector. Thus, feature information is propagated through the molecular graph by exactly one neighbor/step per layer. Thus, a model with R layers computes molecular features with a radius of R atoms and bonds. The overall fingerprint vector is the sum of each layer’s contribution, however it is also reasonable to concatenate the individual contributions instead. The model’s prediction is computed by a final feedforward neural network NNO .

4

Software Implementations and Benchmark

Open source software implementations of the inner and outer approaches, with corresponding documentation, can be found at: www.github.com/Chemoinformatics/InnerOuterRNN, as well at the ChemDB chemoinformatics portal: cdb.ics.uci.edu for anyone to use. The prerequisites are Python, RDKit for parsing data sets, and either Theano or Tensorflow. Both implementations support the use of GPUs via CUDA, however this is optional. A highly optimized multi-threaded CPU GEMM implementation (e.g. numpy with Intel MKL) can outperform a GPU with a CPU, when using small batches for training, small neural networks and small input data graphs, as these cause data transfer latencies to dominate over computational load - especially in GPUs. These conditions can typically be fulfilled for data sets in chemoinformatics. Table 1 shows a comparison of both implementations on a variety of classification and regression tasks in Chemoinformatics. All data sets are publicly available and part of the DeepChem/MoleculeNet 27 benchmark. We chose GraphConvolutions 25 as reference model (using results published in MoleculeNet 27 ), as it performed best on more datasets in the MoleculeNet benchmark than any other model in it (best on 7/17 databases). We optimized hyperparameters for both the inner- and outer approaches on each data set using a fixed training/validation data split. The final results shown in Table 1 were obtained using a ten-fold cross-validation. Please consult the supplementary material for a summary of the tasks, data sets, and further details. Results in Table 1 suggest that the outer approach performs best in general, reaching higher scores than the other two models on six out of seven data sets. The inner approach is roughly on par with the reference, being better on four out of seven data sets and best on one (FreeSolv).

8

ACS Paragon Plus Environment

Page 8 of 13

Page 9 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 1: Test and comparison of the inner- and outer approach implementations on public data sets that are part of the MoleculeNet 27 benchmark. Results for the GraphConv model from Wu et al. 27 . Data Set - [Metric] ESOL [R2 / RMSE] FreeSolv [R2 / RMSE] Lipophilicity [R2 / RMSE] PDBBind-refined [R2 / RMSE] BBBP [ROC-AUC] Tox21 [ROC-AUC] Toxcast [ROC-AUC]

5

Inner Approach 0.88 / 0.69 0.90 / 1.01 0.58 / 0.79 0.33 / 1.48 0.896 0.82 0.71

Outer Approach 0.91 / 0.62 0.90 / 1.16 0.72 / 0.64 0.41 / 1.37 0.902 0.84 0.74

Reference (GraphConv) 0.79 / 0.97 0.85 / 1.40 0.63 / 0.66 0.51 / 1.65 0.87 0.83 0.73

Discussion

We have organized existing deep learning approaches for designing recursive architectures into two classes and provided flexible open source software implementations for their application to regression and classification problems on small molecules in chemoinformatics. One may wonder which approach, inner or outer, works best. Theoretically, as a consequence of the universal approximation theorem of neural networks, anything that can be computed using an inner approach can also be obtained using an outer approach, and vice versa. The simulation results reported here suggest that the implementation of the inner and outer approaches reach comparable results on almost all data sets, but the outer approach performs generally better. Note that the two approaches are not mutually exclusive. For instance, inner and outer approaches could be combined by simple averaging to produce an ensemble prediction, and both approaches can be combined with Long Short Term Memory recurrent units 21 , which have proven to be useful for handling long-ranged dependencies. Finally, in the case of small molecules, inner and outer approaches can be applied to different representations such as: SMILES strings, molecular graphs or contact maps, 3D atom coordinates, or fingerprints. The software implementations we provide are based on the molecular graphs derived from the SMILES strings – thus the input data consists of SMILES strings and the classification or regression targets. Alternatively one could apply the inner and outer approaches directly to the SMILES strings, although it is unlikely that this would provide any systematically better results, as they contain no additional information.

9

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Notes 1

The open-source implementations are available at www.github.com/Chemoinformatics/InnerOuterRNN

and cdb.ics.uci.edu

Acknowledgement This work was supported in part by grants NSF IIS-1321053, NSF IIS-1550705, and ExxonMobil.

Supporting Information Available The file “Supplement-InnerOuter.pdf” contains a detailed description of both implementations for the inner- and outer recursive approach in the form of pseudo-code, and an additional illustration (figure) of the inner approach for chemoinformatics.

References (1) Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Networks 2015, 61, 85–117. (2) He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. Proc. - IEEE Conf. Computer Vision and Pattern Recognition 2016, 770–778. (3) Graves, A.; Mohamed, A.-r.; Hinton, G. Speech Recognition with Deep Recurrent Neural Networks. 2013 IEEE Int. Conf. Acoustics, Speech Signal Processing (ICASSP) 2013, 6645–6649. (4) Baldi, P.; Sadowski, P.; Whiteson, D. Searching for Exotic Particles in High-energy Physics with Deep Learning. Nat. Commun. 2014, 5 . (5) Guest, D.; Collado, J.; Baldi, P.; Hsu, S.-C.; Urban, G.; Whiteson, D. Jet Flavor Classification in High-Energy Physics with Deep Neural Networks. Phys. Rev. D 2016, 94, 112002. (6) Kayala, M.; Baldi, P. ReactionPredictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine Learning. J. Chem. Inf. Model. 2012, 52, 2526–2540. 10

ACS Paragon Plus Environment

Page 10 of 13

Page 11 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(7) Lusci, A.; Pollastri, G.; Baldi, P. Deep Architectures and Deep Learning in Chemoinformatics: the Prediction of Aqueous Solubility for Drug-like Molecules. J. Chem. Inf. Model. 2013, 53, 1563–1575. (8) Duvenaud, D. K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; AspuruGuzik, A.; Adams, R. P. Convolutional Networks on Graphs for Learning Molecular Fingerprints. Adv. Neural Inf. Processing Syst. 2015, 2215–2223. (9) Baldi, P.; Pollastri, G. The Principled Design of Large-scale Recursive Neural Network Architectures–DAG-RNNs and the Protein Structure Prediction Problem. Journal of Machine Learning Research 2003, 4, 575–602. (10) Zhou, J.; Troyanskaya, O. G. Predicting Effects of Noncoding Variants with Deep Learning-based Sequence Model. Nat. Methods 2015, 12, 931–934. (11) Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques. The MIT Press 2009, (12) Baldi, P.; Chauvin, Y. Hybrid Modeling, HMM/NN Architectures, and Protein Applications. Neural Computation 1996, 8, 1541–1565. (13) Goller, C.; Kuchler, A. Learning Task-dependent Distributed Representations by Backpropagation Through Structure. IEEE Int. Conf. Neural Networks, 1996. 1996, 1, 347–352. (14) Mooney, C.; Pollastri, G. Beyond the Twilight Zone: Automated Prediction of Structural Properties of Proteins by Recursive Neural Networks and Remote Homology Information. Proteins: Struct., Funct., Bioinf. 2009, 77, 181–190. (15) Magnan, C. N.; Baldi, P. SSpro/ACCpro 5: Almost Perfect Prediction of Protein Secondary Structure and Relative Solvent Accessibility Using Profiles, Machine Learning, and Structural Similarity. Bioinformatics 2014, 30, 2592–2597. (16) Tegge, A. N.; Wang, Z.; Eickholt, J.; Cheng, J. NNcon: Improved Protein Contact Map Prediction using 2D-Recursive Neural Networks. Nucleic Acids Res. 2009, 37, W515–W518. (17) Wu, L.; Baldi, P. Learning to play Go using Recursive Neural Networks. Neural Networks 2008, 21, 1392–1400.

11

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 13

(18) Socher, R.; Perelygin, A.; Wu, J. Y.; Chuang, J.; Manning, C. D.; Ng, A. Y.; Potts, C. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proc. - Conf. Empirical Methods Nat. Language Processing (EMNLP) 2013, 1631, 1642. (19) Graves, A.; Fernández, S.; Schmidhuber, J. Multi-dimensional Recurrent Neural Networks. Proc. - 17th Int. Conf. Artificial Neural Networks 2007, 549–558. (20) Visin, F.; Kastner, K.; Cho, K.; Matteucci, M.; Courville, A.; Bengio, Y. Renet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv preprint arXiv:1505.00393 2015, (21) Gers, F. A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Computation 2000, 12, 2451–2471. (22) Rost, B.; Sander, C. Improved Prediction of Protein Secondary Structure by use of Sequence Profiles and Neural Networks. Proc. Natl. Acad. Sci. 1993, 90, 7558–7562. (23) Hughes, T. B.; Dang, N. L.; Miller, G. P.; Swamidass, S. J. Modeling Reactivity to Biological Macromolecules with a Deep Multitask Network. ACS Cent. Sci. 2016, 2, 529–537. (24) Gómez-Bombarelli, R.;

Duvenaud, D.;

Hernández-Lobato, J. M.;

Aguilera-

Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. Automatic Chemical Design using a Data-driven Continuous Representation of Molecules. ArXiv Preprint arXiv:1610.02415 2016, (25) Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular Graph Convolutions: Moving Beyond Fingerprints. J. Comput.-Aided Mol. Des. 2016, 30, 595–608. (26) Glen, R. C.; Bender, A.; Arnby, C. H.; Carlsson, L.; Boyer, S.; Smith, J. Circular Fingerprints: Flexible Molecular Descriptors with Applications from Physical Chemistry to ADME. IDrugs 2006, 9, 199. (27) Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V. MoleculeNet: A Benchmark for Molecular Machine Learning. ArXiv Preprint arXiv:1703.00564 2017,

12

ACS Paragon Plus Environment

Page 13 of 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Graphical TOC Entry Inner Approach

Outer Approach Prediction

𝚺

Connectivity graph of inner- and outer recursive neural networks applied to the propionic acid molecule

Prediction

13

ACS Paragon Plus Environment