3D Molecular Representations Based on the Wave Transform for

Feb 23, 2018 - Convolutional neural networks (CNN) have been successfully used to handle three-dimensional data and are a natural match for data with ...
0 downloads 4 Views 2MB Size
Article Cite This: Mol. Pharmaceutics XXXX, XXX, XXX−XXX

3D Molecular Representations Based on the Wave Transform for Convolutional Neural Networks Denis Kuzminykh,† Daniil Polykovskiy,†,‡,§ Artur Kadurin,†,∥,⊥,# Alexander Zhebrak,† Ivan Baskov,† Sergey Nikolenko,†,∥,⊥,⊗ Rim Shayakhmetov,† and Alex Zhavoronkov*,† †

Insilico Medicine, Baltimore, Maryland 21218, United States Moscow State University, Moscow 119234, Russia § National Research University Higher School of Economics, Moscow 125319, Russia ∥ Steklov Institute of Mathematics, St. Petersburg 191023, Russia ⊥ Kazan Federal University, Kazan 420008, Russia # Insilico Taiwan, Taipei City 115, Taiwan R.O.C. ⊗ Neuromation, St. Petersburg 191025, Russia ‡

ABSTRACT: Convolutional neural networks (CNN) have been successfully used to handle three-dimensional data and are a natural match for data with spatial structure such as 3D molecular structures. However, a direct 3D representation of a molecule with atoms localized at voxels is too sparse, which leads to poor performance of the CNNs. In this work, we present a novel approach where atoms are extended to fill other nearby voxels with a transformation based on the wave transform. Experimenting on 4.5 million molecules from the Zinc database, we show that our proposed representation leads to better performance of CNN-based autoencoders than either the voxel-based representation or the previously used Gaussian blur of atoms and then successfully apply the new representation to classification tasks such as MACCS fingerprint prediction. KEYWORDS: autoencoders, 3D convolutional neural networks, wave transform, wavelets

1. INTRODUCTION

While this general framework seems very promising, it still requires screening of predicted fingerprints against a database of existing molecules; a MACCS fingerprint cannot be directly decoded into a molecular structure. If one could devise a less ambiguous molecular representation, it could eliminate the need to choose final molecules from a known database and allow for fully de novo generation of molecular structures. There are many different ways to encode the structure of a molecule; see, e.g., a comprehensive reference7 that lists several thousand different descriptors. The most popular general classes of such encodings include string, graph, and 3D representations. String representations such as SMILES8,9 or InChI10 and representation in the form of a molecular graph uniquely encode molecular structure. Moreover, recent works have suggested modifications of deep learning architectures specifically suited for graph representations, including convolutional networks on graphs specialized for molecular finger-

Most traditional drug development pipelines share a common bottleneck: it is hard for chemists to construct new lead molecules for a specific target that would be reasonable to try in the lab. It would be very useful to automate the initial proposal of lead molecules. In particular, if one could use machine learning to mine existing databases for candidate molecules that are likely to possess desired properties it would result in faster and more efficient drug discovery. Such in silico approaches have been introduced for this and other tasks including drug scoring, biomarker development, and pathway analysis.1−4 In a recent work, Kadurin et al.5 proposed to use generative models based on deep neural networks to discover initial leads for anticancer targets based on Molecular ACCess System (MACCS) fingerprints. In their model, adversarial autoencoders6 (AAE) were trained on a data set of fingerprints for molecules that had been known to be effective against a certain target. The resulting model was able to capture the underlying patterns in fingerprint structure. It was then used to propose new structures that could correspond to other effective molecules, and generated fingerprints were matched against a library of known molecules to select the most relevant molecular structures. © XXXX American Chemical Society

Special Issue: Deep Learning for Drug Discovery and Biomarker Development Received: Revised: Accepted: Published: A

December 15, 2017 February 20, 2018 February 23, 2018 February 23, 2018 DOI: 10.1021/acs.molpharmaceut.7b01134 Mol. Pharmaceutics XXXX, XXX, XXX−XXX

Article

Molecular Pharmaceutics

Figure 1. 3D molecular representations for the same molecule: (a) original molecule; (b) discrete voxel-based representation; (c) Gaussian smoothing, σ = 1; (d) Gaussian smoothing, σ = 4; (e) wave transform smoothing.

“voting” in convolutions for the sparse point cloud input, DeepPano30 model used panoramic projections as a 3D shape descriptor, and Chaudhuri et al.31 created a probabilistic graphical model that encoded semantic and geometric relationships among shape components. Since the invention of two main generative models based on deep neural networks, variational autoencoders (VAE)32 and generative adversarial networks (GAN),33 much research has been focused on various autoencoding architectures and generative models. We highlight the 3D-GAN,34 ShapeVAE,35 and VConv-DAE36 models that modified these models to generate 3D structures. While most research has focused on validating new models on toy data sets, many different approaches have also been proposed to apply these methods to biomedical and chemical data.37 AtomNet38 uses CNN to predict the bioactivity of a molecule. Recently, Gomes et al.39 and Ragoza et al.40 applied deep convolutional neural networks to predict binding affinity of a protein−ligand complex. EnzyNet41 used 3D CNN for enzyme classification.

prints11 and special forms of convolutions that preserve invariants specifically related to molecular structure.12 However, string and graph representations do not encode bond lengths and mutual orientation of atoms in space, which means that they lose information regarding the conformation of a molecule. In this work, we propose a model based on deep neural networks that trains on molecular 3D representations and aims to capture spatial information contained in these 3D structures. Our main contribution is a novel representation for the three-dimensional molecular structure based on the wave transform. The paper is organized as follows. In section 2, we survey related work regarding various representations of molecules and machine learning (especially deep learning) models that work with them. Section 3 introduces two 3D representations that we work with a voxel-based baseline and the representation based on the wave transform that we present in this work. Section 4 presents our deep learning models and experimental results that show the promise of this wave-based representation. Finally, in section 5 we conclude the paper with the discussion of our results.

3. MOLECULAR REPRESENTATIONS 3.1. Voxel-Based Representations. In this work, we aim to develop an autoencoder for the 3D structure of a molecular conformation. Hence, in this section we begin by defining 3D representations of molecules used in what follows. We will compare the baseline voxel-based approach and the approach based on the wave transform which is the primary new method we introduce in this work. To render a molecule in three dimensions, we need to select a unique orientation of the molecule in 3D space to avoid ambiguity. To do so we first apply principal components analysis (PCA) to extract the primary axes of the molecule (with no dimensionality reduction, to get the directions of the axes). We then translate the molecule into the origin and orient it along extracted directions. Finally we discretize the 3D space into a regular grid with element size of 0.5 Å, which ensures that no two atoms fall into the same voxel (3D cell). Then we represent atoms in each cell as a one-hot representation. Each voxel of the grid is represented as a binary vector that has at most one non-zero entry, and each component of this vector corresponds to the presence of some atom type in the corresponding grid element. For our specific task, the data set contained very low amounts of atoms other than the 9 most common: H, C, N, O, F, S, Cl, Br, and I (see also Figure 2 below). Therefore, in what follows we selected molecules that contain only these nine atoms, resulting in a 9dimensional vector. The resulting representation for a sample

2. RELATED WORK With recent advances in the field of deep learning in general, convolutional neural networks (CNNs) have been widely and successfully used for different tasks in computer vision, including object detection,13,14 image classification,15,16 and semantic segmentation.17,18 However, while understanding 3D shapes is crucial for many fields, from autonomous driving to biomedical image classification, scaling CNNs to 3D representations is not always straightforward due to sparsity in the data and increased complexity in the convolution operations. Even very recently, learning from three-dimensional objects has been done with shape descriptors designed by hand, such as light field descriptors,19 mesh DOG,20 spin images,21 heat kernel signatures,22 or spherical harmonics.23 New 3D CAD repositories like ModelNet24 and ShapeNet25 fostered the development of novel 3D representations and model architectures that are capable of learning 3D descriptors by themselves. 3D ShapeNets24 proposed a volumetric representation in the form of binary voxels with a convolutional Deep Belief Network. VoxNet26 introduced the volumetric occupancy grid representation for 3D classification tasks. Different approaches were used to address the sparsity of three-dimensional data: Field Probing Neural Networks27 applied field probing filters to efficiently extract features from volumetric fields, OctNet28 used unbalanced octrees for hierarchical partitioning of the space, Vote3Deep29 proposed B

DOI: 10.1021/acs.molpharmaceut.7b01134 Mol. Pharmaceutics XXXX, XXX, XXX−XXX

Article

Molecular Pharmaceutics

Here, again, the σ parameter controls how quickly the waves fade out: for larger σ waves will spread over larger distances, while small σ will generate only local perturbations. Parameter ω corresponds to the frequency of the waves. In the experiments, we chose ω = 1/σ since this frequency is high enough to make a few oscillations before the wave fades out and low enough to avoid aliasing effects. We also clipped points that were further than 4σ away from the origin (effectively setting h = 4σ in eq 3) since values in further located points are very close to zero. The wave transform representation solves the two major problems mentioned above. First of all, spreading waves allow us to fill up the space and reduce sparsity. Furthermore, unlike a simple Gaussian exponential decay, the interference between waves passes information from one atom to another and allows any point to gain access to more information about the neighboring atoms. Note that although interference occurs independently in different channels, CNNs used in our experiments mix all channels in the very first layer, allowing for interference between all atoms. Both Gaussian and wave transform representations are also more redundant: one can corrupt the resulting image and still be able to reconstruct all atom positions. As we show below in the experiments (section 4), this representation can be used to extract chemical properties from 3D molecules, and in these problems it is used directly as input for regression and classification problems. However, it can also be used to generate new 3D structures using Adversarial Autoencoders (AAE).5,6 Since images generated by an AAE will be in the wave domain, we also need an algorithm that inverts the wave transformation and gets back the centers of the original atoms. Since the wave transform is a form of convolution, it can be inverted in the frequency domain using Wiener deconvolution.43 Let us consider the following model: X̂ = X * k + n, where n is an additive noise that corresponds to the autoencoder’s reconstruction error, and * denotes the convolution as in (3). In this model, the transformed signal X̂ (output of autoencoder) and convolution kernel k (Gaussian or wave) are known, and the distribution of reconstruction noise n can be estimated on a validation data set of molecules. The Wiener deconvolution searches for the inverse transform in a form of convolution X′ = X̂ * g that minimizes the expected reconstruction error:

molecule is shown on Figure 1a, where different colors show different atoms (channels). 3.2. Continuous Smoothing. While the discrete voxelbased 3D representation outlined above captures many structural and geometrical properties, it suffers from two major problems that lead to poor performance of neural networks. The first problem is that this representation does not capture the interaction between atoms. While bonds can be reconstructed by comparing distances between atoms, information contained in the 3D space is highly localized: there is a lot of empty space around atoms. This also leads to the second problem, severe sparsity exacerbated by the threedimensional nature of the image. It turns out that less than 0.1% of the voxels contain nonzero vectors. High data sparsity is known to be an important cause of underfitting in neural networks since the propagated gradients also turn out to be sparse. We have also obtained this result experimentally, when training a convolutional autoencoder on the discrete voxelbased representation (see section 4). To overcome these two problems, we use and compare modified 3D representations of the molecules that aim to eliminate sparsity and independence problems which allows for simpler training of neural networks. The first such modified representation is a Gaussian smoothing (blurring) obtained by convolving the original image with a Gaussian kernel defined as ⎛ x2 + y2 + z2 ⎞ ⎟ k Gauss(x , y , z) = exp⎜ − 2σ 2 ⎝ ⎠

The convolution operation here is defined as X̂Gauss(x , y , z ; c) h

=

h

h

∑ ∑ ∑

X(x + δx , y + δy , z + δz ; c)k Gauss(δx , δy , δz)

δx =−h δy =−h δz =−h

(1)

where h are the limits of the window where we define the convolution and c corresponds to the channel. This leads to an exponentially decaying “ball” for every atom that fills multiple voxels and “blurs” the picture, reducing sparsity and improving the results of a convolutional network; see Figure 1c,d for an illustration. The main parameter of the Gaussian representation is the kernel’s variance σ, which shows how fast it decreases, i.e., how large the “ball” is. Compare Figure 1c with σ = 1 and Figure 1d with σ = 4. Gaussian smoothing has been used in previous works,42 but in the next section we propose a novel kind of smoothing with even better properties. 3.3. Smoothing with the Wave Transform. The main idea of our approach is to replace each atom with concentric waves diverging from it, as shown on Figure 1e. Mathematically, this can be efficiently done with the same convolution operation, but with a different kernel. We define the wave transform kernel as

g * = arg min  X ′ − X g

2

(4)

K†X

G=

and define the transformed representation similar to eq 1 as

⎡ 1 ⎢ |K |2 K ⎢⎣ |K |2 +

N̅ X̅

⎤ ⎡ |K |2 ⎥ = 1⎢ ⎥ K ⎢ |K |2 + 1 ⎣ ⎦ SNR

⎤ ⎥ ⎥⎦

(5)

X

Here SNR = N̅ is signal-to-noise ratio. Ideal autoencoder, ̅ for example, will have infinitely large SNR, since there will be no noise. Formally, one has to estimate SNR on every frequency, but in practice it can be approximated as a constant.

X̂ wave(x , y , z ; c) h

g

2

Fourier domain, G = |K |2 X +̅ N , where G and K are Fourier ̅ ̅ spectrum of g and k, respectively, X̅ and N̅ are average squared spectrum amplitude of the signal and noise estimated on the validation set, and † denotes complex conjugation. |K| is an amplitude spectrum of K. Equivalently, G can be represented as

(2)

h

2

= arg min  Xĝ − X

This optimization problem can be solved analytically in

⎛ x2 + y2 + z2 ⎞ ⎟ cos(2πω x 2 + y 2 + z 2 ) k wave(x , y , z) = exp⎜ − 2σ 2 ⎝ ⎠

=

2

h

∑ ∑ ∑ δx =−h δy =−h δz =−h

X(x + δx , y + δy , z + δz ; c)k wave(δx , δy , δz)

(3) C

DOI: 10.1021/acs.molpharmaceut.7b01134 Mol. Pharmaceutics XXXX, XXX, XXX−XXX

Article

Molecular Pharmaceutics In our experiments, we have set SNR to a constant that minimizes reconstruction error from eq 4. We have chosen the wave transform (and, accordingly, the Wiener deconvolution for inverting it) because it yields a computationally feasible exact formula for deconvolution while also being robust to noise, the two properties that are most important for this kind of problems. In the next section, we show experimental evaluation of proposed 3D representation in detail and apply 3D convolutional autoencoders to learn embeddings of spatial molecular structures.

Table 1. Neural Network Architecture no.

type

channels

blocks

kernel

stride

activation

output

Encoder Architecture

4. EXPERIMENTS 4.1. Data. We conducted our experiments on a data set of clean lead-like molecules from the Zinc database (http://zinc. docking.org/subsets/clean-leads). The data set contains 4.5 million molecules complete with their 3D coordinates. For every molecule, we have performed preprocessing as shown in section 3.1. We have set aside 32000 molecules from the data set as a test set and trained all models on the rest. Figure 2

0

input

9

1

conv

64

1

2

conv

128

1

3

conv

256

8

4

conv

512

8

5

conv

128

1

6

fully connected fully connected

1024

relu

50 × 37 × 25 × 9 25 × 19 × 13 × 64 13 × 10 × 7 × 128 7×5×4 × 256 4×3×2 × 512 4×3×2 × 128 1024

384

tanh

384

input fully connected fully connected conv transposed conv transposed conv transposed conv transposed conv transposed

384 1024

relu

384 1024

3072

relu

7

7×7 ×7 7×7 ×7 7×7 ×7 7×7 ×7 1×1 ×1

2×2 ×2 2×2 ×2 2×2 ×2 2×2 ×2 1×1 ×1

relu relu relu relu relu

Decoder Architecture 0 1 2 3 4 5 6 7

Figure 2. Shares of different atoms in the data set.

512

1

256

8

128

8

64

1

9

1

1×1 ×1 7×7 ×7 7×7 ×7 7×7 ×7 7×7 ×7

1×1 ×1 2×2 ×2 2×2 ×2 2×2 ×2 2×2 ×2

relu relu relu relu relu

4×3×2 × 128 4×3×2 × 512 7×5×4 × 256 13 × 10 × 7 × 128 25 × 19 × 13 × 64 50 × 37 × 25 × 9

different stages of the training for the two blurred representations: Gaussian and wave transform. Figure 3b shows that the naive voxel-based representation missed all atoms except carbon, Gaussian blur with σ = 1 was able to capture nitrogen, and wave transform additionally was able to reconstruct the oxygen atoms. While Gaussian kernel with σ = 4 captured all atoms, its reconstruction is quite poor since most of atoms are merged together, and the 3D structure is rather distorted. Quantitatively, the wave transform also performed better: Table 2 reports the area under precision-recall and ROC curves of the reconstructions made with different representations. Here for each representation we passed the data through the autoencoder and then transformed obtained image back to the voxel representation with Wiener deconvolution. In this space, labels are binary, indicating the presence of specific atoms in different voxels. 4.3. Frequent and Rare Atoms: Reweighted Loss Function. Our second experiment was to compare reconstruction quality for the autoencoders trained on different 3D representations for different atoms. The results are shown in Figure 4, where the bar plots indicate reconstruction AUC-PR for different atoms in the data set. Note that while the data representation based on wave transform achieves higher reconstruction quality for frequent atomshydrogen, carbon, nitrogen, and oxygenas well as the overall quality, it works relatively poorly for rare elements such as fluorine or sulfur. The Gaussian kernel with σ = 4 outperforms the regular wave transform representation with a significant margin for these elements. Therefore, we attempted alternative losses for the model that focuses more on rare atoms. The reconstruction quality for

shows the composition of the data in terms of the total number of atoms in all molecules in the data set. Due to low amounts of the heavier atoms, in what follows we concentrate on the six most common atoms: H, C, N, O, F, and S. 4.2. Comparing Representations with Autoencoder Reconstruction. In the first experiment, we compare the three 3D representations we have outlined in section 3: naive voxelbased, Gaussian blur, and wave transform. We have trained convolutional autoencoders on different representations of molecules. The autoencoder aims to encode an input into a latent representation in such a way that it will then be able to faithfully reconstruct the original input. The neural network architecture employs the units from the wellknown Xception network:44 it mainly consists of depth-wise separable convolutions as introduced in ref 44 in order to reduce the number of parameters and train time. At the same time, the architecture is much simpler than in ref 44; specifically, we have used the convolutional architecture described in Table 1. We have tried to add residual blocks in early experiments, but performance benefits were negligible, and computational costs significant, so we decided to go with a simpler architecture. We used the AdaDelta optimizer45 with learning rate 0.01 and mini-batch size 16, training all networks until convergence. In Figure 3, we show reconstructions obtained during training after applying the inverse transform; atoms on Figure 3 are colored using the CPK notation: C = gray, O = red, N = blue, and so on. Note that Figure 3b shows the final results after applying the respective deconvolutions, while Figure 3c depicts the actual sample reconstructions of 3D representations at D

DOI: 10.1021/acs.molpharmaceut.7b01134 Mol. Pharmaceutics XXXX, XXX, XXX−XXX

Article

Molecular Pharmaceutics

Figure 3. Sample reconstruction results: (a) original molecule; (b) reconstruction from autoencoder after inverse transform; (c) reconstruction from autoencoder. Reconstructed images shown at different number of updates.

Here, ?c represents one channel that corresponds to atom type c from all molecules in a data set. Note that weights are selected globally for each representation and not the molecule. We chose ϵ = 0.05 since it lead to the most stable results. In Figure 4, the “Wave, reweighted” bars show the resulting AUC-PR for the model based on the wave transform where reconstruction loss weights are inversely proportional to the number of occurrences of each atom type in the data set, and Figure 5 shows the precision−recall curves themselves for three special cases. This reweighting leads to drastic improvements in reconstruction quality for fluorine and sulfur and also improved the quality for nitrogen and oxygen, while slightly decreasing the overall reconstruction quality, mostly due to the drop in hydrogen reconstruction quality. 4.4. Classification. Finally, we have trained a classification model to show how the features learned from the 3D representations translate into an ability to distinguish important substructures. For this purpose, we have used MACCS fingerprints46 as targets and trained the encoder model as

Table 2. Reconstruction Quality of Different Representations AUC PR

AUC ROC

representation

train

test

train

test

points Gaussian, σ = 1 Gaussian, σ = 1, reweighted Gaussian, σ = 4 wave, σ = 4 wave, σ = 4, reweighted

0.163 0.293 0.380 0.027 0.652 0.640

0.159 0.291 0.378 0.027 0.649 0.635

0.909 0.984 0.999 0.994 0.996 0.999

0.905 0.984 0.999 0.994 0.996 0.999

such atoms can be greatly improved by by increasing their weights in final loss. The weights for each channel can be computed using the following formula: ⎛ ⎜ ?c 22 w[c] = ⎜ϵ + ⎜⎜ max ?c ′ c′ ⎝

⎞−1 ⎟ ⎟ 2 ⎟⎟ 2⎠

(6) E

DOI: 10.1021/acs.molpharmaceut.7b01134 Mol. Pharmaceutics XXXX, XXX, XXX−XXX

Article

Molecular Pharmaceutics

Figure 4. Reconstruction AUC-PR for different 3D representations.

Figure 5. Sample precision-recall curves for different 3D representations.

shown in Table 1 with an additional linear layer for classification. Table 3 reports the average accuracy per bit of MACCS fingerprints and mean Tanimoto similarity between the target and predicted vectors. While our representation is slightly inferior to a Gaussian blur with σ = 1, it still performs better than the voxel-based representation. The primary difference between training a classifier and training an autoencoder is that gradients from labels are much denser than the ones that are propagated from the autoencoder loss. In our problem, most labels (bits from the MACCS

Table 3. Classification Quality of Different Representations avg bit accuracy

mean Tanimoto similarity

representation

train

test

train

test

points Gaussian, σ = 1 Gaussian, σ = 4 wave, σ = 4

99.26 99.69 99.13 99.45

99.40 99.7 99.23 99.54

97.93 99.21 97.42 98.41

97.49 98.92 97.10 98.10

fingerprint) take different values on different molecules, while most input voxels are uninformative and do not produce useful F

DOI: 10.1021/acs.molpharmaceut.7b01134 Mol. Pharmaceutics XXXX, XXX, XXX−XXX

Molecular Pharmaceutics



gradient values. This leads us to the hypothesis that the key issue in training an autoencoder is the sparsity of gradients and not the sparsity of the inputs. This hypothesis is also supported by the relatively small performance difference between Gaussian and wave-based 3D representations on the classification task.

ABBREVIATIONS principle component analysis t-distributed stochastic neighbor embedding molecular access system autoencoder adversarial autoencoder area under precision-recall curve area under receiver operator characteristic curve

PCA t-SNE MACCS AE AAE AUC-PR AUC-ROC

5. DISCUSSION In this work, we have developed a novel approach to the representations of molecules in three-dimensional space. Our approach, based on the wave transform, serves the same purposes as the previously used Gaussian blur in reducing input sparsity but works better for autoencoder training. The main problem in the naive voxel-based representation is sparsity, but while the Gaussian blur representation also solves the sparsity, redundancy, and independence problems, we believe that the representation based on the wave transform is even better. We hypothesize that the primary reason for this is that convolving with the Gaussian kernel leads to a much bigger information loss than convolving with the wave kernel: the wave kernel produces a dense representation that does not lead to loss of information (as follows from Fourier spectral properties of this kernel) and is robust to noise introduced by the autoencoder that minimizes the reconstruction error. This hypothesis has been supported by our experimental evaluation, where we have established that the wave-based representation performs much better than both the naive voxelbased representation and the Gaussian blur in terms of reconstruction accuracy and information contained in the embedding. We have also introduced a reweighting of the channels, giving larger weights to rare atoms. This modification, which had been previously used in the training of deep learning models in other domains with imbalanced data sets, has helped us improve reconstruction quality for some rare atoms like fluorine and sulfur. By using the wave transform and channel reweighting, we have achieved significant improvements in reconstruction quality on 3D molecular structures for convolutional autoencoders. Experimental results of predicting MACCS fingerprints indicate that the resulting features can be used for further applications in molecular biology.



Article



REFERENCES

(1) Aliper, A. M.; Plis, S. M.; Artemov, A. V.; Ulloa, A.; Mamoshina, P.; Zhavoronkov, A. Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data. Mol. Pharmaceutics 2016, 13 (7), 2524−30. (2) Chen, L. Deep learning models for modeling cellular transcription systems, 2017. (3) Kadurin, A.; Nikolenko, S. I.; Khrabrov, K.; Aliper, A.; Zhavoronkov, A. druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. Mol. Pharmaceutics 2017, 14, 3098−3104. (4) Putin, E.; Mamoshina, P.; Aliper, A. M.; Korzinkin, M.; Moskalev, A.; Kolosov, A.; Ostrovskiy, A.; Cantor, C.; Vijg, J.; Zhavoronkov, A. Deep biomarkers of human aging: Application of deep neural networks to biomarker development. Aging 2016, 8, 1021−33. (5) Kadurin, A.; Aliper, A.; Kazennov, A.; Mamoshina, P.; Vanhaelen, Q.; Khrabrov, K.; Zhavoronkov, A. The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology. Oncotarget 2017, 8, 10883. (6) Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I. Adversarial Autoencoders. International Conference on Learning Representations; San Juan, May 2−4, 2016. (7) Todeschini, R.; Consonni, V.; Mannhold, R.; Kubinyi, H.; Folkers, G. Molecular Descriptors for Chemoinformatics: Vol. I: Alphabetical Listing/Vol. II: Appendices, References; Methods and Principles in Medicinal Chemistry; Wiley, 2009. (8) Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Proc. Edinburgh Math. SOC 1970, 1−14. (9) Weininger, D.; Weininger, A.; Weininger, J. L. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Model. 1989, 29, 97−101. (10) Heller, S. R.; McNaught, A.; Pletnev, I.; Stein, S.; Tchekhovskoi, D. InChI, the IUPAC international chemical identifier. J. Cheminf. 2015, 7, 23. (11) Duvenaud, D. K.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Convolutional Networks on Graphs for Learning Molecular Fingerprints. CoRR 2015, No. abs/1509.09292. (12) Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput.-Aided Mol. Des. 2016, 30, 595−608. (13) Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. CoRR 2013, No. abs/ 1312.6229. (14) Girshick, R. B. Fast R-CNN. 2015 IEEE Int. Conf. Computer Vision (ICCV) 2015, 1440−1448. (15) He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016, 770−778. (16) Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proc. 31st AAAI Conf. Art. Intelligence 2017, 4278. (17) Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv:1505.04597v1 2015.

AUTHOR INFORMATION

Corresponding Author

*E-mail: alex@insilico.com. ORCID

Daniil Polykovskiy: 0000-0002-0899-8368 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Results on wave transform smoothing, shown in Section 3.3, have been obtained by Daniil Polykovskiy and supported by the Russian Science Foundation grant no.∼17-71-20072. The work of Artur Kadurin and Sergey Nikolenko shown in Section 3.2 was done with support of the Russian Government Program of Competitive Growth of Kazan Federal University and the Presidium RAS program #01 “Fundamental mathematics and its applications”, PRAS-18-01 grant; it was partially done in a laboratory created with the Government of the Russian Federation grant 14.Z50.31.0030. G

DOI: 10.1021/acs.molpharmaceut.7b01134 Mol. Pharmaceutics XXXX, XXX, XXX−XXX

Article

Molecular Pharmaceutics (18) Krähenbühl, P.; Koltun, V. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. arXiv:1210.5644 [cs.CV] 2011. (19) Pu, J.; Ramani, K. On visual similarity based 2D drawing retrieval. Computer-Aided Design 2006, 38, 249−259. (20) Zaharescu, A.; Boyer, E.; Varanasi, K.; Horaud, R. Surface feature detection and description with applications to mesh matching. 2009 IEEE Conf. Computer Vision and Pattern Recognition 2009, 373− 380. (21) Johnson, A. E.; Hebert, M. Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21, 433−449. (22) Xiang, Y.; Mottaghi, R.; Savarese, S. Beyond PASCAL: A benchmark for 3D object detection in the wild. IEEE Winter Conference on Applications of Computer Vision Steamboat Springs, Mar 24−26, 2014; IEEE, 2014. (23) Kazhdan, M. M.; Funkhouser, T. A.; Rusinkiewicz, S. Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors. Symposium on Geometry Processing Aachen, Jun 23−25, 2003. (24) Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR) 2015, 1912−1920. (25) Chang, A. X.; Funkhouser, T. A.; Guibas, L. J.; Hanrahan, P.; Huang, Q.-X.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; Xiao, J.; Yi, L.; Yu, F. ShapeNet: An Information-Rich 3D Model Repository. CoRR 2015, No. abs/1512.03012. (26) Maturana, D.; Scherer, S. VoxNet: A 3D Convolutional Neural Network for real-time object recognition. 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2015, 922−928. (27) Li, Y.; Pirk, S.; Su, H.; Qi, C. R.; Guibas, L. J. FPNN: Field Probing Neural Networks for 3D DataarXiv:1605.06240 [cs.CV] 2016. (28) Riegler, G.; Ulusoy, A. O.; Geiger, A. OctNet: Learning Deep 3D Representations at High Resolutions. CoRR 2016, No. abs/ 1611.05009. (29) Engelcke, M.; Rao, D.; Wang, D. Z.; Tong, C. H.; Posner, I. Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks. 2017 IEEE International Conference on Robotics and Automation (ICRA) 2017, 1355−1361. (30) Shi, B.; Bai, S.; Zhou, Z.; Bai, X. DeepPano: Deep Panoramic Representation for 3-D Shape Recognition. IEEE Signal Processing Letters 2015, 22, 2339−2343. (31) Chaudhuri, S.; Kalogerakis, E.; Guibas, L. J.; Koltun, V. Probabilistic reasoning for assembly-based 3D modeling. ACM Trans. Graph. 2011, 30, 1−10. (32) Kingma, D. P.; Welling, M. Auto-Encoding Variational Bayes. CoRR 2013, No. abs/1312.6114. (33) Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A. A. Generative Adversarial Networks: An Overview. CoRR 2017, No. abs/1710.07035. (34) Wu, J.; Zhang, C.; Xue, T.; Freeman, B.; Tenenbaum, J. B. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling arXiv:1610.07584 [cs.CV] 2016. (35) Nash, C.; Williams, C. K. I. The shape variational autoencoder: A deep generative model of part-segmented 3D objects. Comput. Graph. Forum 2017, 36, 1−12. (36) Sharma, A.; Grau, O.; Fritz, M. VConv-DAE: Deep Vol.tric Shape Learning Without Object Labels; ECCV Workshops, 2016. (37) Mamoshina, P.; Vieira, A.; Putin, E.; Zhavoronkov, A. Applications of Deep Learning in Biomedicine. Mol. Pharmaceutics 2016, 13, 1445−1454. (38) Wallach, I.; Dzamba, M.; Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structurebased Drug Discovery. CoRR 2015, No. abs/1510.02855. (39) Gomes, J.; Ramsundar, B.; Feinberg, E. N.; Pande, V. S. Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity. CoRR 2017, No. abs/1703.10603.

(40) Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D. R. Protein-Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model. 2017, 57 (4), 942−957. (41) Amidi, A.; Amidi, S.; Vlachakis, D.; Megalooikonomou, V.; Paragios, N.; Zacharaki, E. I. EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation. CoRR 2017, No. abs/1707.06017. (42) Torng, W.; Altman, R. B. 3D deep convolutional neural networks for amino acid environment similarity analysis. BMC Bioinf. 2017, 18, 302. (43) Rafael Gonzalez, C.; Woods, E. R.; Eddins, L. S. Digital Image processing using MATLAB; Prentice Hall, 2003. (44) Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions arXiv preprint arXiv:1610.02357 2016. (45) Zeiler, M. D. ADADELTA: an adaptive learning rate method arXiv preprint arXiv:1212.5701 2012. (46) Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 1273−1280.

H

DOI: 10.1021/acs.molpharmaceut.7b01134 Mol. Pharmaceutics XXXX, XXX, XXX−XXX