Artificial Intelligence Approach to Find Lead Compounds for Treating

Jul 13, 2019 - However, these TCM compounds fly away during the molecular ...... not binding well and tended to find a “turn” (Figure 14a). φ and...
0 downloads 0 Views 4MB Size
Subscriber access provided by UNIV OF SOUTHERN INDIANA

Biophysical Chemistry, Biomolecules, and Biomaterials; Surfactants and Membranes

Artificial Intelligence Approach to Find Lead Compounds for Treating Tumors JianQiang Chen, Hsin-Yi Chen, Wenjie Dai, Qiu-Jie Lv, and Calvin Yu-Chian Chen J. Phys. Chem. Lett., Just Accepted Manuscript • DOI: 10.1021/acs.jpclett.9b01426 • Publication Date (Web): 13 Jul 2019 Downloaded from pubs.acs.org on July 17, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Artificial Intelligence Approach to Find Lead Compounds for Treating Tumors Jian-Qiang Chen†#, Hsin-Yi Chen†#, Wen-jie Dai‡#, Qiu-Jie Lv†#, Calvin Yu-Chian Chen†, §, ¶* †School

of Intelligent Systems Engineering, Artificial Intelligence Medical Center, Sun

Yat-sen University, Shenzhen, 510275, China ‡ School

of pharmacy, Sun Yat-sen University, Shenzhen, 510275, China

§Department

of Medical Research, China Medical University Hospital, Taichung

40447, Taiwan ¶Department

of Bioinformatics and Medical Engineering, Asia University, Taichung

41354, Taiwan #

Equal contribution

* Corresponding Authors Calvin Yu-Chian Chen, Ph.D. School of Intelligent Systems Engineering, Director of Artificial Intelligence Medical Center, Sun Yat-sen University, Guangzhou 510275, China. TEL: 02039332153 E-mail: [email protected]

1

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 80

Abstract It has had been demonstrated that MMP13 Enzyme is related with to Cancer

most

of tumors cancer cells of tumors. The world largest traditional Chinese medicine database was applied to screened with for structure-based drug design and ligand-based drug design. To predict drug activity, machine learning models (Random Forest (RF), AdaBoost Regressor (ABR), Gradient Boosting Regressor (GBR)), and deep learning model were utilized to obtain validate the docking resultspredicted models, we achievedwe obtain the R2 of 0.922 on training set and 0.804 on the test set in the RF algorithm, respectively. During tThe deep learning algorithm, R2 on of training set is 0.90 and R2 on of test set is 0.810, respectively. After the docking and quantitative structure-activity relationship (QSAR) process, hHowever, they these TCM compounds failed fly away during theat molecular dynamics (MD) simulation period. We put forwardseek another way,

theof peptide design., All peptide edatabase candidates

were screened with by docking process. Modification peptides optimized the interaction modes and the affinities assessed with ZDOCK protocol and Refine Docked protein protocol. The 300ns MD simulation evaluated the stability of receptor-peptide complexes. Double site effect appeared on S2, a designed peptide based on known inhibitor, when complex with Bcl2BCL2. S3, a designed peptide referred from endogenous inhibitor p16, competed against cyclin when binding with CDK6. The MDM2 inhibitor S5 and S6 derived from p53 structure and binding with MDM2 stably. 2

ACS Paragon Plus Environment

Page 3 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

A Flexible flexible region of peptide S5 and S6 maybe enhanced the binding ability by changing its own conformation which out of foreseen. These peptides (S2, S3, S5, and S6) are potentially interesting to treat cancer, however, these findings need to be affirmed by biological testing which will be conducted in the near future.

3

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC GRAPHICS

The

technique oflowchart reveal the whole protocol that we provide a brand new

concept of drug and peptide designf this manuscript can be quickly learned from this graphic.

4

ACS Paragon Plus Environment

Page 4 of 80

Page 5 of 80

The Journal of Physical Chemistry Letters

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 5

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 80

Tumor, the second most deadly disease around world both in developed and developing countries , it was a challenge to catch a metastatic target1. Activating the epidermal growth factor receptor (EGFR)2, which occurred in many types of tumors and promoted tumor progression. Macrophage migration inhibitory factor (MIF)

3

combined with EGFR which further blocked the excitation of EGFR. However, activation of EGFR by mutation or its ligand binding enhanced the secretion of MMP13, which degraded extracellular MIF and resulted in the elimination of negative regulation of MIF on EGFR4. Inhibited MMP13 could slow down the expansion of cancer cells and the deterioration of the disease4. The rRecent researches discovered show that the MMP13 are related to colorectal metastases 5, breast tumor 6, knee osteoarthritis 7. Network pharmacology-based analysis provided a multi-targets concept which was related to the idea of cancer treatment 8. The effects of multiple drugs not only improve the efficacy of the treatment, but also kill the diseased cells before the emergences of drug resistance9. Several related target proteins, cyclin-dependent kinase inhibitor 2A (CDKN2A), p53 and B-cell lymphoma 2 (BCL-2), cyclin-dependent kinase 6 (CDK6) and E3 ubiquitin-protein ligase Mdm2 would study as well. Cyclin-dependent kinase inhibitor 2A (CDKN2A) caused cell cycle arrest and inhibits tumor cell proliferation in cell culture when it overexpressed

10.

The protein

worked by inhibiting the activity of cyclin-dependent kinase 4 (cdk4) or cdk611-12. 6

ACS Paragon Plus Environment

Page 7 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Tumor suppressor p16INK4a could bind with cdk6 which further inhibited tumor growth 13-14. Mutations in p53 tumor suppressor factors are the most common genetic changes in human cancers 15. The inactivation or mutation of tumor suppressor genes and the disorder of the balance that inhibits apoptosis can promote tumor development 16.

The selection of p53 mutations in the course of tumor occurrence and development

may lead to the parallel inactivation of multiple tumor suppressor genes, which may be the main reason for the high frequency of p53 mutations in cancer17-18. The interaction between angiopoietin and the p53 TAD2 domain in cancer cells could inhibit the function of p53 tumor inhibitors and promote cell survival19-20. Abnormal regulation of Bcl-2 family members makes it possible to escape apoptosis and tumor resistance to chemotherapy 21. The literature provided the rationale for testing combined therapies that used C-X-C chemokine receptor type 4 (CXCR4) and Bcl-2 inhibitors to increase the efficacy of these agents. 22-23 Artificial intelligence is realized by a system that combines representation learning with sophisticated ratiocination. Multiple processing layers in deep learning can represent multiple levels of abstraction, which greatly improves the technical level of drug discovery24. It was demonstrated that deep neural nets (DNNs) can be used as a practical quantitative structure−activity relationships (QSAR) method25. The Random forest model is widely applied in the field of bioinformatics and provides compelling 7

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

results26. AdaBoost algorithm yielded the good prediction performance in breast cancer analysis27. Gradient Boosting algorithm can aggregate its tree models to form a stronger predictor.28 Since Targeting targeting to a single protein achieved little effective treatments for many diseases. Hence, tThe network pharmacology-based multiple targets8 were screened with from TCM database29 and applied in curing cancer. Deep learning and other algorithms were used to find the potential drugs. The compounds in this study were revealed poor performance during MD simulation30 so that we should focused on peptides for drug design. The peptides we designed were stable in binding to the receptor through MD simulation. There was reason to believe that these peptides could affect the conformation of receptors. Cancer targeting peptides can significantly improve the selectivity and efficacy of existing chemotherapy drugs31. The peptide

has great prospects in the market.32 这里干嘛粗体? To ensure several related targets of MMP13, the relationship werewas constructed from Stitch database33 给个 reference 吧!网址都好, the first and second shells were set as no more than 20 interactions. Pathways forin cancer were specially highlighted in red points. Several known ligands were displayed with rounded rectangle. Other three targets would seek out based on the combined score which assess by several evidences. The relationship between these three targets and cancer were available in the literature. 8

ACS Paragon Plus Environment

Page 8 of 80

Page 9 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Kyoto Encyclopedia of Genes and Genomes (KEGG) database 34 could provide proteinprotein interaction in a pathway. TheThis way provided a multi-targets for treating method to fight away tumor can be found out by this method. The structure of four target proteins were obtained from Protein Data Bank (PDB) 35.

The crystal structure of MMP13 (PDB: 3ZXH) complex with inhibitor (IC50=3nM)

in a great resolution (1.3 Å)36. The complete structure of CDKN2A (PDB: 1A5E) was acquired and none of the constraints show violation bigger than 0.5Å and dihedral angle violation bigger than 5 degrees 11. One of the p53 variants (5 site mutation) was gained from PDB (5O1H) 37, the mutation sites were in the hyper-variable region when a tumor occurs. A more rational way was screening after sequencing for different patient. Here, a multi-sites mutation model could provide more mutation information even if different mutation models could be very different in fact. The origin p53 protein (1TSR) 15 could anti-tumor so that its conformation could set as a control. The spatial structure was collected from PDB (2XA0)38. All of the proteins docked screening the TCM database using the Docking (ligandfit) protocol 39. The structure of docking proteins displayed the conformation rationality from disorder validation, the blue area mean the key residues. The 41 MMP13 inhibitors were collected in known literature

40

to create machine

learning models and deep learning model. To verify the reasonable of QSAR models, 9

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

20% of the data was set to a test set for external validation. The predicted activities were provided as an assessment. The “Remove cell” module of Accelrys Discovery Studio 2.5.5.9350 (DS 2.5) was employed to process these crystal structures before screening process. Remove all top-level cells while leaving their constituent contents intact. Split structures into separate molecules. The “Prepare Protein” module was used to clean protein molecules, dewater and hydrogenation to this four proteins. The “Calculate Molecular Properties” module was employed to get 204 properties of these inhibitors. Using Pearson correlation coefficient matrix (Figure 1) to judge the correlation and orthogonality of the features, principal component analysis (PCA) and Lasso feature selection were applied for data preprocessing (Figure 2). The residual plot (Figure 10) shows the difference between the dependent variables on the vertical axis and the horizontal axis. During these models, the residual is the difference between the observed value of the target variable (y) and the predicted value (y). AdaBoost Regressor model. In each iteration, the weight of the data misclassified by the previous classifier is improved, while the weight of the data correctly classified is reduced. Finally, AdaBoost27 takes the linear combination of basic classifiers as a strong classifier, in which the basic classifier with small classification error rate is given large weights, and the basic classifier with large classification error rate is given small weights.𝑀 means that the lifting tree has 𝑀 weak classifiers. 𝐺𝑚(x) denotes the m10

ACS Paragon Plus Environment

Page 10 of 80

Page 11 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

th weak classifier, 𝛼𝑚 is the parameter of the 𝑀th weak classifier.𝑤𝑖,𝑚 represents the weight of the i instance in round m. The Algorithmic Principle of AdaBoost is as follow:

Algorithm1 : AdaBoost Regressor Learning basic classifiers 𝐺1(x) from training set For m = 1 𝑡𝑜 𝑀 𝑑𝑜: Learning the basic classifier 𝐺𝑚(x) using the currently distributed 𝐷𝑚-weighted training data set (1) 𝐷𝑚 = (𝑤𝑚 + 1,1…𝑤𝑚 + 1,𝑖𝑤𝑚 + 1,𝑁) Calculating the classification error rate of the basic classifier 𝐺𝑚(x) on the weighted training data set Calculating the Coefficient of Basic Classifier 𝐺𝑚(x) 1 ― 𝑒𝑚

1

𝛼𝑚 = 2𝑙𝑜𝑔

(2)

𝑒𝑚

Update the weight distribution of training data endFor A new combination of basic classifiers: 𝑀

𝑓(x) = ∑𝑚 = 1𝛼𝑚𝐺𝑚(x)

(3)

end Algorithm

Random Forest model. For the same batch of data, only one tree can be generated by the same algorithm, and the Bagging strategy can generate different data sets. 𝑁1 samples are resampled from the sample set (assuming the sample set N data points) (there are samples that are put back, the number of sample data points remains unchanged to N), and the n samples are established created based on all samples. The 11

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 80

classifier repeats the above two steps m times to obtain m classifiers, and finally determines which class the data belongs to .The classifiers {f(a,b),b = 1,…N} where the {k} are independent identically and each tree votes for the best class at input a by using the weight vector(equation 1).41 Trees in random forests are represented as 𝛿1,𝛿2 ,𝛿3,..,𝛿𝑇, and 𝑤𝑖(x) is the average weights41. The Mean square error (equation 2) is used as an evaluation method of the model. 1

𝑁

(4)

𝑤𝑖(x) = 𝑁∑𝑡 = 1𝑤𝑖(x,𝛿𝑡) 1

𝑁

MSE = 𝑁∑𝑡 = 1(𝑦𝑡𝑟𝑢𝑒 ― 𝑦𝑝𝑟𝑒𝑑𝑖𝑐𝑡)2

12

ACS Paragon Plus Environment

(5)

Page 13 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Gradient Boosting Regressor. The pseudo-residuals are the gradient descent direction of the loss function of the established model, which could be used to ensure better results in model iteration. 𝑥 = {𝑥1,𝑥2,𝑥3,..,𝑥𝑛} represents the random “input” values, 𝑦 is the random “output” value. ℎ(𝑥𝑖;𝑎) is the base learner, 𝜑(𝑦,𝐹(𝑥)) is the loss function. 𝑦𝑖𝑖 is the current pseudo-residuals. The pseudo code of Gradient Boosting28 is as follow:

Algorithm 2: Gradient Boosting Regressor F0(𝑥) = 𝑎𝑟𝑔𝑚𝑖𝑛𝜌



𝑁

𝜑(𝑦𝑖,𝜌)

(6)

𝑖=1

For m = 1 𝑡𝑜 𝑀 𝑑𝑜: 𝑦𝑖𝑖 = ―

[

∂𝜑(𝑦𝑖,𝐹(𝑥𝑖)) ∂𝐹(𝑥𝑖)

]

,i=1,N

(7)

𝐹(𝑥) = 𝐹𝑚 ― 1(𝑥)

𝑛

𝑎𝑚 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑎,𝛽∑𝑖 = 1[ 𝑦𝑖𝑖 ― 𝛽ℎ(𝑥𝑖;𝑎)]2

13

ACS Paragon Plus Environment

(8)

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

𝑛

𝜌𝑚 = 𝑎𝑟𝑔𝑚𝑖𝑛𝜌∑𝑖 = 1𝜑(𝑦𝑖,𝐹𝑚 ― 1(𝑥) + 𝜌ℎ(𝑥𝑖;𝑎𝑚))

Page 14 of 80

(9)

𝐹𝑚 = (𝑥)𝐹𝑚 ― 1(𝑥) + 𝜌𝑚ℎ(𝑥;𝑎𝑚) (10) endFor end Algorithm

Deep Learning model. Deep Learning has made breakthroughs in image classification, speech recognition and automatic driving. 24Depth represents the number of layers in the deep learning model. These layers represent the learning of the neural network model. The change of each layer in the neural network is parameterized by the weight of the layer. The loss function controls the output of the neural network. The weight value can be fine-tuned by the optimizer (Figure 3) to reduce the loss value. The in-depth learning model can learn all the presentation layers together rather than successively. Once the model modifies an internal feature, all other features depending on the feature will automatically adjust and adapt accordingly. It can learn these representations by decomposing complex and abstract representations into many intermediate layers, each of which is only a simple transformation of the previous space. The Adam optimizer algorithm is as follow: Algorithm 3: Adam optimizer Learning rate ϵ = 0.0006 Moment Estimation Exponential Decay Rate, 𝜌1,𝜌2 ∈ [0,1) Small Constants for Numerical Stability δ = 10 ―8 Initial parameter θ 14

ACS Paragon Plus Environment

Page 15 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Initialization of first and second order matrix variables s = 0,r = 0 Initialization time step t = 0 While no stop do m samples x(1),…x(𝑚) were collected from training set, The target is y(𝑖) Computational gradient :

1

g ← 𝑚∇𝜃∑𝑖𝐿(𝑓(𝑥(𝑖);𝜃),𝑦(𝑖))

(11)

(12) t← t + 1 Updating biased first order matrix estimation s←𝜌1s + (1 ― 𝜌1)g (13) Updating biased partial second matrix estimation r←𝜌2r + (1 ― 𝜌2)g⨀g (14) Correcting the deviation of first-order matrix Correcting the deviation of second order matrix Calculation

𝑠

(15)

𝑠←1 ― 𝜌𝑡

1

𝑟

(16)

𝑟←1 ― 𝜌𝑡

update

2

∆θ = ― ϵ

(17) Application update θ←θ + ∆θ End while

𝑆 𝑟 +δ

(18)

Apart from the Dock score, Random Forest predicted, AdaBoost Regressor predicted, Gradient Boosting Regressor predicted and Deep Learning algorithm predicted, network interactions were focused to search if there were any candidates related to multi-targets (Table 12). Top 50 compounds towards different targets (Table 27and Table 38) were further to find intersection point to ensure the multifunctional drug candidates (Figure 45). After the cross-screening method, the candidates we selected influenced as many related targets as possible. The yellow dots represented proteins, and the blue squares were small molecules. The multi-target compounds were specifically identified in a box. 15

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 80

Four receptor-ligand complexes were further validated with MD in 300 ns. Four compounds were selected according to their Dock score, H-bonds and pi-pi interaction based on the intersecting information (Figure 58). Four complexes replaced NazlininMMP13, Subaphylline-TP53, Adrenaline-CDKN2A and E41 (control)-MMP13. Explicit solvent water model tip3p were added before MD stage, certain amounts of sodium and chloride ions were added to the system to mimic the humoral environment. The energy minimization, NVT equilibrium and NPT equilibrium were performed. Given the steepest descent algorithm to energy minimization, the bond constrained with Lincs algorithm for all bonds in NVT and NPT stages. With using the Particle Mesh Ewald (PME) for long-range electrostatics, the temperature coupling was on of Vrescale to modified Berendsen thermostat during NVT and NPT. MD simulation for 300ns to evaluate the binding stability. The periodic boundary conditions were set as 3D PBC and we considered the dispersion correction in system. Since the small moleculeTCM compounds candidates did not achieve the desired effect, an idea of screening peptides was put forward. For different targets, several peptides in bioactivity peptides database and SATPdb database

42

were screening out

with ligandfit protocol in DS software, respectively. The ZDOCK program was used to generate different conformations between biological macromolecules. Top pose in top cluster was deemed as the most likely combination, and the poses which root mean 16

ACS Paragon Plus Environment

Page 17 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

square deviation (RMSD) below 1 were refined with RDOCK program. Protein interface

–protein

interaction

would

be

further

analyzed

the

binding

phenomenainteraction. Inhibitors of cdk6 screened for SATPdb based on the structure of p16, a tumor suppressor. The candidate of Bcl2 protein, SATPdb26921 was further designed based on calculate mutation energy (stability) protocol. Saturated mutation provided the whole energy change of mutation. Residue 15 and 47 were designed as cysteine (Cys) to form disulfide bond so that enhanced space stability. The aggregation of peptides would evaluatedevaluate with aggregation score and developability indices (DI) to recognize aggregation in 5Å and 10Å. Hydrophobic residue aggregation region was replaced based on mutation energy result. After energy minimization and the refinement of loop and side-chain, ZDOCK and RDOCK displayed the interaction modes as well as the assessment score. Ligand of protein p53 was designed on the basis of designed drug of Bcl2 and further optimized. P16 could inhibited the activation of cdk6, which bind the catalytic cleft, opposite of cyclin binding site13. It was interesting that Bcl2 BCL2 peptide candidate displayed a synergistic effect binding with different sites. Binding with catalytic cleft and cyclin binding site in cdk6 inhibited the interaction with cyclin. E3 ubiquitinprotein ligase Mdm2 inhibited activity of p53, which represented a MDM2 inhibitor could develop. The structure of MDM2-p53 (1YCQ) was employed. 17

ACS Paragon Plus Environment

43

300ns MD

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

simulation were run to evaluate the complex effect. Three proteins CDKN2A, TP53 and BCL2 were confirmed. The false discovery rate of pathways in cancer was 1.73e-09 which was highly related in these proteins. It was interesting that the MMP13 protein was not directly related to cancer. The combined score between MMP13 and TP53 was 0.944 and likely the MMP13 inhibited natural TP53 to cause cancer (Figure 64). The melanoma pathway (map05218) in KEGG database displayed the relationship between selected targets, especially the tumor suppressor pathways. MITF and TP53 were implicated in further melanoma progression. Data from literatures and databases displayed a high correlation between these targets and tumors. All of the top 50 candidates for different targets would intersected in a network (Figure 45) and multi-target compounds was focused. Yellow points replaced the target proteins, and the blue square mean different TCM ligands. Several molecules related to multi-targets were specially expressed in the center. Top 10 ligands and control E41 were assessed with Dock score, -PMF, -PLP1, -PLP2. The hydrogen bond forming residues and the quantity of H-bond were provided (Table 41). The introduction of hydrogen bonds would greatly improve the stability of binding. The Dock score of compounds screened from TCM database were much higher than control as well as other score functions, there was a chance that we could discover better 18

ACS Paragon Plus Environment

Page 18 of 80

Page 19 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

binding substrates. Disorder validation result displayed a reasonable structure for docking (Figure 76), and when these lower than 0.5 on behalf of a more stable region and more conservative and credible. The information of dock score, H-bond and pi-pi interaction provided comprehensive assessment. AdaBoost Regressor model. Most of the predicted activities were stayed at proper section (Figure 810a), as can be seen from the histogram, the errors are usually distributed near zero, which usually indicates that the model is a good fit. The n_estimators value was set as 25. The mean square error (MSE) of training set is 0.020, and that of test set is 0.235. R2 on training set is 0.973, and R2 on test set is 0.781. The suitable model was used to predict activity of screening candidates. A higher predicted activity could provide confidence for us. Gradient Boosting Regressor model. The n_estimators, which is the number of iterations of gradient lifting or the number of weak classifiers, was set as 650, and the learning_rate was 0.003. The smaller the learning_rate is, the smaller the test error is. Maximum depth of Decision Stump (where tree depth does not include root) was set as 4. Some forecast points are scattered (Figure 810b), but overall they are better. The mean square error (MSE) of training set is 9.83e-08, and that of test set is 0.253. R2 on training set is 1, and R2 on test set is 0.765.

19

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Random Forest model. The model was built using the python and the jupyter notebook tool. The data of 41 datasets are preprocessed. Firstly, principal component analysis (PCA) is used to reduce the dimension of 204 high-dimensional data, and screen out the features with variance greater than 0.05, and obtain 83 features. The data with mean value of 0 and variance of 1 were obtained by standardization . Lasso feature selection is used to obtain features with small correlation coefficient and good orthogonality. The alpha of Lasso was set as 0.171, and 7 features were obtained by Lasso feature selection, aAccording to Pearson correlation coefficient matrix, the selected feature correlation coefficient was small and the orthogonality was good. In the Random Forest model, the n_estimators value was set as 7, and the random_state was 1. The two-dimensional residual distribution is quite random and uniform, and these points are randomly distributed around the horizontal axis (Figure 810c). This seems to indicate that our linear model works well. The mean square error (MSE) of training set is 0.056, and that of test set is 0.211. R2 on training set is 0.922, and R2 on test set is 0.804. The same as docking score, the predicted activities of candidates appeared better than control. Deep Learning model. To predict candidates’ activity value better, a simple 4-layers full connected neural network using ReLu (Rectified Linear Units) function as activation function model was constructed. We used a very small network, which 20

ACS Paragon Plus Environment

Page 20 of 80

Page 21 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

contains two hidden layers. The last layer of the network has only one unit and is not activated. It is a linear layer. Generally speaking, the less training data, the more serious the over-fitting. Dropout is one of the most effective and commonly used regularization methods for neural networks. To use dropout for a certain layer is to randomly discard some output features of that layer (set to 0) in the training process. Dropout technique was also applied in the second and third layer (with rate 0.4 for the second and 0.6 for the third) to reduce over-fitting. We use the Adam optimizer (Kingma & Ba, 2014) with learning rate 0.0006 . We did 350 times experiment on CPU. Scatter plot (Figure 911) shows the partial results that R2 is greater than 0 on training set and on test set. In all the experiments, we got one model that R2 on training set is 0.90 and R2 on test set is 0.810.The predicted bioactivity value is shown in Table 41. During all algorithms, R2 on training set is more than 0.9, and R2 on test set is more than 0.7, which show good validation of the predicted results. The Voting system(Table 53)was established from the multiple QSAR (RF, ABR, DBR, DL) and dock-score validations, and we selected the multi-target one from the top ten candidates.NazlininMMP13 complex included the most key residues for pi-pi interactions as well as the highest total score. Subaphylline-TP53 complex had the most quantity (5) of H-bond in all TP53 complex. Adrenaline-CDKN2A complex had 6 H-bonds, and 3 residues (Asp14, Pro41 21

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and Asn42) formed the H-bonds (Figure 42). Nazlinin, Subaphyline and Adrenalline (Figure 107) would further be verified molecular dynamics. Interaction of complexes in 2D and 3D view included hydrogen bonds, pi interaction and van der Waals force (Figure 58). Some hydrophobic effects also provided stability of complexes (Figure 119). However, several unfavorable bumps warned unreasonable risk, which foreboded “fly away” during MD period. With the disappointing results of molecules developments, our research interest tends to develop peptides. Six proteins screened with bioactivity peptide library. Top 5 peptides which almost higher than molecular candidates which implied developable potential. The Dock score of lead peptide (207.901), with MMP13 was far higher than control (57.629). Oligopeptides behaved with an awful result during MD, so a long chain peptide was considered. Saturated mutation result appeared which amino acids could be changed. The designed peptides for bcl2 displayed a double binding mechanism, which had reason to believe it could develop as a better inhibitor. We wanted to design a peptide targeting catalytic cleft in cdk6, and this peptide end to binding at cyclin biding site. It was a benefit outcome which could directly inhibited the interaction between cdk6 and cyclin. Sequences of potential peptides were provided (Table 64). Some of the docking function score and energy items were assessed the

22

ACS Paragon Plus Environment

Page 22 of 80

Page 23 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

stability of complexes. The mutation energy from 26921 to S2 was -18.18 kcal/mol in bcl2, from S2 to S3 was -13.65 kcal/mol in p53. MD validation for 300ns in different complexes displayed the interactions between various receptors and ligands during simulation period. However, all of our potential candidates were fail in this test. All of the ligands “flied away” at the end of MD. Representative conformation could search after cluster analysis at last 10ns (Figure 12). The MD result reminded us candidates could not binding stably even if the dock poses displayed a great interaction. However, what’s wrong with our candidates? The conformation changing should be focused. An effective summary would provide principles for further design. Double site effect. Peptide26921 was screened out for a candidate to bind Bcl2 in BH1 domain. An optimized peptide S2 was discovered a double site effect which bound BH3 and BH4 domain, either (Figure 13a). It was named as O and T binding. RMSD of Complex rose at the first 25ns because of a conformation turn of T. For O, it had 3 stages. N terminal unwound at the first 200ns, and then at 200ns to 230ns the unordered N terminal searched a stable structure such as cyclized itself. And finally the N terminal could binding with Bcl2 protein which improve the interaction activity (Figure 13d), just as the change of SASA, RMSD and gyrate (Figure 13b-c, cyan). As for T, the main binding region was N terminal which different from O (O just for assistant). An “8” 23

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

type structure was presented in most of MD time rather than origin “P” type. N terminal could bind stably but the cycle C terminal wasn’t binding well and tended to found a “turn” (Figure 14a). φ and ψ angles of node residues displayed validation of “turn”, these two amino acids tended to form β secondary structure (Figure 14b). The cavity of Bcl2 protein provided anchor point (Figure 14c). BH1, BH3 and BH4 domain displayed low RMSF due to binding with ligand and less flexible. As the same, O bound through the cycle C terminal and with low RMSF values these residues. T bound through N terminal (Figure 14d). Based on the probability distribution of RMSD and gyrate value in p53-S3, Gibbs free energy could be estimated (Figure 15a). Low free energy time point set as a reference. The C terminal of S3 tended to be disordered, which could influent the structure of p53. Main hydrogen bonds were displayed (Figure 15b). The ending structure was simply as the low energy structure. The MSD of S3 significantly changed at 230ns due to the decentralization of C terminal (Figure 15c). From non-competitive to competitive peptide. The CDK6 could be inhibited by p16 (CDKN2A). P16 bound at the catalytic cleft, where opposite the cyclin would bind13. Two peptide p16 (yellow) and S4 (pink) displayed (Figure 16a). S4 designed from 23678, a peptide which screened based on p16 binding site. However, S4 bound at the cyclin site. ATP binding site K43 (red), proton acceptor D145 (yellow), 24

ACS Paragon Plus Environment

Page 24 of 80

Page 25 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

nucleotide binding region IGEGAYGKV (19-27, orange), phosphorylation site T177 (green) and de-phosphorylation site Y24 (magenta) all near the ligand S4 (300ns). It could block lots of CDK6 effect including cyclin by similar. However, the ligand could not connect with T177 directly that could be modified further. And the proton acceptor region also could optimize to obtain a more stable interaction. S4 binding pathway reminded which amino acids it could reach (Figure 16c). Residues distance matrix was displayed (Figure 16d). Flexible area rebuild. MDM2 and its ligands S5, S6 were colored by Debye-Waller factor, which displayed the flexible region (red) of conformation (Figure 17a). Apart from terminal, residues 20-28 of S6 transform conformation constantly during MD. As the same as S6, the S5 altered structure in shorter area (20-22). Based on two parallel terminals interacted with MDM2, residues in the middle began to change conformation. The jumped sharply of cyan line in 25ns was the tight structure loosed (gyrate could intuitively reflex) (Figure 17b-c). The occupancy, maximum distance, minimum distance of significant hydrogen bonds in complexes displayed (Table 75). The high occupancy of H-bond represented an opportunity to binding potential. High occupancy hydrogen bonds (like bcl2-O complex

H-bond) reflexed a steady interaction mode (Figure 18). To the contrary,

the acquired H-bond (like MDM2-S6) was result from conformation change (like the 25

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

flexible residue 20-28 mentioned above). Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) calculation provided binding free energy information, most of complexes became lower energy after 300ns MD (Table 86). The 300ns MD simulations displayed bad results even if favorable docking consequence, which reminded us the irrationality during short-time MD. It was possible that peptides could be an alternative option due to more binding potentials. It is hard to explain that our small molecules have a higher docking fraction than the control (E41, IC50=3nM), it could become a nanomole-level drug but failed. We put forward de novo peptides in pathway network to anti-tumor for the first time to overcome multidrug resistance. The MMP13 was associated with arthritis, it was the first time to design drugs for tumor. Inflammation-induced tumorigenesis theory was reported in many researches 44-46. Inflammation related proteins could develop as tumor targets. It was clearly that inflammation proteins influent tumor related proteins in pathway. The Bcl2 inhibitors were classic cancer drugs 47-49, but designed molecular drug for it was a challenge due to bcl2 protein act through protein-protein interaction. It was similar to a finger could not completely blocked two palms. The designed peptide could act on BH1, BH3 and BH4 domain meanwhile which could better intervene. A novel polypeptide was designed based on endogenous inhibitor p16 to obtain a non-ATP 26

ACS Paragon Plus Environment

Page 26 of 80

Page 27 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

competitive inhibitor however end up with a competitive inhibitorsa competitive inhibitor. The selection of CDK6 probably a way to reduce the toxicity so that we want to develop non-competitive peptide. Our CDK6 inhibited peptide must be useful and the next step was increase selectivity. It was funny that MDM2 inhibited the activity of p53, however, p53 peptide (QETFSDLWKLLP) could block the MDM2. Inhibiting the MDM2 to release the action of p53 was a great approach, but it should not be the mutated p53 which most of researcher emphasize. As for S5 and S6, it was interesting that the flexible area became another binding site which improve binding ability. Less researcher pinpins their hope on flexible region because less controllable. Multidrug resistance (MDR) is a problem to be considered in the study of cancer drugs. Our foothold was network analysis and peptide de novo. The use of synergistic drug reduces both side effects and the occurrence of drug resistance. The peptide segments studied are promising. The discovering of this research demonstrate the existent development direction, in patients with tumor, of MMP13 and in some cases the related protein that overexpression. We propose two methods to find lead compounds for the tumor target – MMP13. First, we provide a novel deep learning and other algorithms for finding the best (optima) potential drug and using computer-aided drug design they selected potential inhibitors. However, 300ns MD simulations displayed awful results. The 27

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 80

candidates “fly away”. We put forward another way of peptide design. Peptides were found in few cases of tumor that we analysis. Inflammation - cancer alternation represented a potential situation and in some signal pathways indeed displayed an interaction. In this context, it was worth noting that multiple related proteins which, blocked at the same time meeting the clinical therapeutic principle. Double site effect appeared on S2, a designed peptide based on known inhibitor, when complex with Bcl2. S3, a designed peptide referred from endogenous inhibitor p16, competed against cyclin when binding with CDK6. MDM2 inhibitor S5 and S6 derived from p53 structure and binding with MDM2 stably. Flexible region of peptide S5 and S6 enhanced the binding ability by changing its own conformation which out of foreseen. Peptides and peptidomimetics have recently attracted attention in the treatment of cancer. 32 Peptidebased therapeutics have many goodness. Peptides are easy to penetrate into deep tissues with low immunogenicity and can be synthesized rapidly in good effectiveness.

31

Compared with large protein binding, peptide is simpler and strong repeatability. Besides, under different storage conditions, peptides show greater stability. Peptides also have some drawbacks, such as rapid renal clearance poor enzyme stability and its secondary structure is difficult to maintain. 31 In this manuscript, 300ns MD proved that the selected peptides can bind protein targets with high affinity and specificity. The designed peptides (S2, S3, S5, and S6) with the drug potential could treat Tumors. Also, 28

ACS Paragon Plus Environment

Page 29 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

we have contacted Shenzhen Institute of Advanced Research, Chinese Academy of Sciences, and cooperated to test whether these peptides can effectively inhibit MMP13.

ASSOCIATED CONTENT Supporting Information Source codes for algorithms is available in Supporting Information.

AUTHOR INFORMATION Corresponding Author *E-mail: [email protected] Notes The authors declare no competing financial interest. 29

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACKNOWLEDGEMENT This work was supported by Guangzhou science and technology fund (Grant No 201803010072), Science, Technology& Innovation Commission of Shenzhen Municipality (JCYJ20170818165305521), and from China Medical University Hospital (DMR-107-110). We also acknowledge the start-up funding from SYSU “Hundred Talent Program”.

30

ACS Paragon Plus Environment

Page 30 of 80

Page 31 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

References: 1.

Schwitalla, S., Tumor cell plasticity: the challenge to catch a moving target. Journal of

gastroenterology 2014, 49 (4), 618-27. 2.

Chong, C. R.; Janne, P. A., The quest to overcome resistance to EGFR-targeted therapies in cancer.

Nat Med 2013, 19 (11), 1389-400. 3.

Calandra, T.; Roger, T., Macrophage migration inhibitory factor: a regulator of innate immunity.

Nature reviews. Immunology 2003, 3 (10), 791-800. 4.

Zheng, Y.; Li, X.; Qian, X.; Wang, Y.; Lee, J. H.; Xia, Y.; Hawke, D. H.; Zhang, G.; Lyu, J.; Lu, Z., Secreted

and O-GlcNAcylated MIF binds to the human EGF receptor and inhibits its activation. Nature cell biology 2015, 17 (10), 1348-55. 5.

Mendonsa, A. M.; VanSaun, M. N.; Ustione, A.; Piston, D. W.; Fingleton, B. M.; Gorden, D. L., Host

and tumor derived MMP13 regulate extravasation and establishment of colorectal metastases in the liver. Molecular cancer 2015, 14, 49. 6.

Dumortier, M.; Ladam, F.; Damour, I.; Vacher, S.; Bieche, I.; Marchand, N.; de Launoit, Y.; Tulasne,

D.; Chotteau-Lelievre, A., ETV4 transcription factor and MMP13 metalloprotease are interplaying actors of breast tumorigenesis. Breast cancer research : BCR 2018, 20 (1), 73. 7.

Ruan, G.; Xu, J.; Wang, K.; Wu, J.; Zhu, Q.; Ren, J.; Bian, F.; Chang, B.; Bai, X.; Han, W.; Ding, C.,

Associations between knee structural measures, circulating inflammatory factors and MMP13 in patients with knee osteoarthritis. Osteoarthritis and cartilage 2018, 26 (8), 1063-1069. 8.

Hao da, C.; Xiao, P. G., Network pharmacology: a Rosetta Stone for traditional Chinese medicine.

Drug development research 2014, 75 (5), 299-312. 9.

Wu, Q.; Yang, Z.; Nie, Y.; Shi, Y.; Fan, D., Multi-drug resistance in cancer chemotherapeutics:

mechanisms and lab approaches. Cancer letters 2014, 347 (2), 159-66. 10. LaPak, K. M.; Burd, C. E., The molecular balancing act of p16(INK4a) in cancer and aging. Molecular cancer research : MCR 2014, 12 (2), 167-83. 11. Byeon, I. J.; Li, J.; Ericson, K.; Selby, T. L.; Tevelev, A.; Kim, H. J.; O'Maille, P.; Tsai, M. D., Tumor suppressor p16INK4A: determination of solution structure and analyses of its interaction with cyclindependent kinase 4. Molecular cell 1998, 1 (3), 421-31. 12. Matsuda, Y.; Ichida, T., p16 and p27 are functionally correlated during the progress of hepatocarcinogenesis. Medical molecular morphology 2006, 39 (4), 169-75. 13. Russo, A. A.; Tong, L.; Lee, J. O.; Jeffrey, P. D.; Pavletich, N. P., Structural basis for inhibition of the cyclin-dependent kinase Cdk6 by the tumour suppressor p16INK4a. Nature 1998, 395 (6699), 237-43. 14. Sherr, C. J.; Beach, D.; Shapiro, G. I., Targeting CDK4 and CDK6: From Discovery to Therapy. Cancer discovery 2016, 6 (4), 353-67. 15. Cho, Y.; Gorina, S.; Jeffrey, P. D.; Pavletich, N. P., Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Science 1994, 265 (5170), 346-55. 16. Conover, C. A., The IGF-p53 connection in cancer. Growth Horm IGF Res 2018, 39, 25-28. 17. Pappas, K.; Xu, J.; Zairis, S.; Resnick-Silverman, L.; Abate, F.; Steinbach, N.; Ozturk, S.; Saal, L. H.; 31

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Su, T.; Cheung, P.; Schmidt, H.; Aaronson, S.; Hibshoosh, H.; Manfredi, J.; Rabadan, R.; Parsons, R., p53 Maintains Baseline Expression of Multiple Tumor Suppressor Genes. Molecular cancer research : MCR 2017, 15 (8), 1051-1062. 18. Gurpinar, E.; Vousden, K. H., Hitting cancers' weak spots: vulnerabilities imposed by p53 mutation. Trends in cell biology 2015, 25 (8), 486-95. 19. Yeo, K. J.; Jee, J. G.; Hwang, E.; Kim, E. H.; Jeon, Y. H.; Cheong, H. K., Interaction between human angiogenin and the p53 TAD2 domain and its implication for inhibitor discovery. FEBS letters 2017, 591 (23), 3916-3925. 20. Raj, N.; Attardi, L. D., The Transactivation Domains of the p53 Protein. Cold Spring Harbor perspectives in medicine 2017, 7 (1). 21. Dey, J.; Deckwerth, T. L.; Kerwin, W. S.; Casalini, J. R.; Merrell, A. J.; Grenley, M. O.; Burns, C.; Ditzler, S. H.; Dixon, C. P.; Beirne, E.; Gillespie, K. C.; Kleinman, E. F.; Klinghoffer, R. A., Voruciclib, a clinical stage oral CDK9 inhibitor, represses MCL-1 and sensitizes high-risk Diffuse Large B-cell Lymphoma to BCL2 inhibition. Scientific reports 2017, 7 (1), 18007. 22. Klein, S.; Abraham, M.; Bulvik, B.; Dery, E.; Weiss, I. D.; Barashi, N.; Abramovitch, R.; Wald, H.; Harel, Y.; Olam, D.; Weiss, L.; Beider, K.; Eizenberg, O.; Wald, O.; Galun, E.; Pereg, Y.; Peled, A., CXCR4 Promotes Neuroblastoma Growth and Therapeutic Resistance through miR-15a/16-1-Mediated ERK and BCL2/Cyclin D1 Pathways. Cancer research 2018, 78 (6), 1471-1483. 23. Kremer, K. N.; Peterson, K. L.; Schneider, P. A.; Meng, X. W.; Dai, H.; Hess, A. D.; Smith, B. D.; Rodriguez-Ramirez, C.; Karp, J. E.; Kaufmann, S. H.; Hedin, K. E., CXCR4 chemokine receptor signaling induces apoptosis in acute myeloid leukemia cells via regulation of the Bcl-2 family members Bcl-XL, Noxa, and Bak. The Journal of biological chemistry 2013, 288 (32), 22899-914. 24. LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning. Nature 2015, 521, 436. 25. Ma, J.; Sheridan, R. P.; Liaw, A.; Dahl, G. E.; Svetnik, V., Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships. Journal of Chemical Information and Modeling 2015, 55 (2), 263-274. 26. Van Echelpoel, W.; Goethals, P. L. M., Variable importance for sustaining macrophyte presence via random forests: data imputation and model settings. Scientific Reports 2018, 8 (1), 14557. 27. Huang, Q.; Chen, Y.; Liu, L.; Tao, D.; Li, X., On Combining Biclustering Mining and AdaBoost for Breast Tumor Classification. IEEE Transactions on Knowledge and Data Engineering 2019, 1-1. 28. Zhang, C.; Zhang, Y.; Shi, X.; Almpanidis, G.; Fan, G.; Shen, X., On Incremental Learning for Gradient Boosting Decision Trees. Neural Processing Letters 2019. 29. Chen, Y.-C., Beware of docking! 2014; Vol. 36. 30. Hess, B.; Kutzer, C.; van der Spoel, D.; Lindahl, E., GROMACS 4: algorithms for Highly Efficient, LoadBalanced, and Scalable Molecular Simulation. 2008; Vol. 4, p 435-447. 31. Soudy, R.; Byeon, N.; Raghuwanshi, Y.; Ahmed, S.; Lavasanifar, A.; Kaur, K., Engineered Peptides for Applications in Cancer-Targeted Drug Delivery and Tumor Detection. Mini Rev Med Chem 2017, 17 (18), 1696-1712. 32

ACS Paragon Plus Environment

Page 32 of 80

Page 33 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

32. Ladner, R.; Sato, A.; Gorzelany, J.; de Souza, M., Phage display-derived peptides as therapeutic alternatives to antibodies. 2004; Vol. 9, p 525-9. 33. Szklarczyk, D.; Santos, A.; von Mering, C.; Jensen, L. J.; Bork, P.; Kuhn, M., STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data. Nucleic Acids Research 2015, 44 (D1), D380-D384. 34. Kanehisa, M., The KEGG database. Novartis Foundation symposium 2002, 247, 91-101; discussion 101-3, 119-28, 244-52. 35. Burley, S. K.; Berman, H. M.; Christie, C.; Duarte, J. M.; Feng, Z.; Westbrook, J.; Young, J.; Zardecki, C., RCSB Protein Data Bank: Sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education. Protein science : a publication of the Protein Society 2018, 27 (1), 316-330. 36. Tommasi, R. A.; Weiler, S.; McQuire, L. W.; Rogel, O.; Chambers, M.; Clark, K.; Doughty, J.; Fang, J.; Ganu, V.; Grob, J.; Goldberg, R.; Goldstein, R.; Lavoie, S.; Kulathila, R.; Macchia, W.; Melton, R.; Springer, C.; Walker, M.; Zhang, J.; Zhu, L.; Shultz, M., Potent and selective 2-naphthylsulfonamide substituted hydroxamic acid inhibitors of matrix metalloproteinase-13. Bioorganic & medicinal chemistry letters 2011, 21 (21), 6440-5. 37. Baud, M. G. J.; Bauer, M. R.; Verduci, L.; Dingler, F. A.; Patel, K. J.; Horil Roy, D.; Joerger, A. C.; Fersht, A. R., Aminobenzothiazole derivatives stabilize the thermolabile p53 cancer mutant Y220C and show anticancer activity in p53-Y220C cell lines. European journal of medicinal chemistry 2018, 152, 101-114. 38. Ku, B.; Liang, C.; Jung, J. U.; Oh, B. H., Evidence that inhibition of BAX activation by BCL-2 involves its tight and preferential interaction with the BH3 domain of BAX. Cell research 2011, 21 (4), 627-41. 39. Venkatachalam, C. M.; Jiang, X.; Oldfield, T.; Waldman, M., LigandFit: a novel method for the shape-directed rapid docking of ligands to protein active sites. Journal of molecular graphics & modelling 2003, 21 (4), 289-307. 40. Fuerst, R.; Yong Choi, J.; Knapinska, A. M.; Smith, L.; Cameron, M. D.; Ruiz, C.; Fields, G. B.; Roush, W. R., Development of matrix metalloproteinase-13 inhibitors – A structure-activity/structure-property relationship study. Bioorganic & Medicinal Chemistry 2018, 26 (18), 4984-4995. 41. Rahman, R.; Dhruba, S. R.; Ghosh, S.; Pal, R., Functional random forest with applications in doseresponse predictions. Scientific Reports 2019, 9 (1), 1628. 42. Singh, S.; Chaudhary, K.; Dhanda, S. K.; Bhalla, S.; Usmani, S. S.; Gautam, A.; Tuknait, A.; Agrawal, P.; Mathur, D.; Raghava, G. P., SATPdb: a database of structurally annotated therapeutic peptides. Nucleic acids research 2016, 44 (D1), D1119-26. 43. Kussie, P. H.; Gorina, S.; Marechal, V.; Elenbaas, B.; Moreau, J.; Levine, A. J.; Pavletich, N. P., Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science 1996, 274 (5289), 948-53. 44. Elinav, E.; Nowarski, R.; Thaiss, C. A.; Hu, B.; Jin, C.; Flavell, R. A., Inflammation-induced cancer: crosstalk between tumours, immune cells and microorganisms. Nature reviews. Cancer 2013, 13 (11), 33

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

759-71. 45. Hu, B.; Elinav, E.; Huber, S.; Booth, C. J.; Strowig, T.; Jin, C.; Eisenbarth, S. C.; Flavell, R. A., Inflammation-induced tumorigenesis in the colon is regulated by caspase-1 and NLRC4. Proceedings of the National Academy of Sciences of the United States of America 2010, 107 (50), 21635-40. 46. Zhang, X.; Ai, F.; Li, X.; She, X.; Li, N.; Tang, A.; Qin, Z.; Ye, Q.; Tian, L.; Li, G.; Shen, S.; Ma, J., Inflammation-induced S100A8 activates Id3 and promotes colorectal tumorigenesis. International journal of cancer 2015, 137 (12), 2803-14. 47. Iyer, D.; Vartak, S. V.; Mishra, A.; Goldsmith, G.; Kumar, S.; Srivastava, M.; Hegde, M.; Gopalakrishnan, V.; Glenn, M.; Velusamy, M.; Choudhary, B.; Kalakonda, N.; Karki, S. S.; Surolia, A.; Raghavan, S. C., Identification of a novel BCL2-specific inhibitor that binds predominantly to the BH1 domain. The FEBS journal 2016, 283 (18), 3408-37. 48. Kawashima-Goto, S.; Imamura, T.; Tomoyasu, C.; Yano, M.; Yoshida, H.; Fujiki, A.; Tamura, S.; Osone, S.; Ishida, H.; Morimoto, A.; Kuroda, H.; Hosoi, H., BCL2 Inhibitor (ABT-737): A Restorer of Prednisolone Sensitivity in Early T-Cell Precursor-Acute Lymphoblastic Leukemia with High MEF2C Expression? PloS one 2015, 10 (7), e0132926. 49. Vartak, S. V.; Hegde, M.; Iyer, D.; Gaikwad, S.; Gopalakrishnan, V.; Srivastava, M.; Karki, S. S.; Choudhary, B.; Ray, P.; Santhoshkumar, T. R.; Raghavan, S. C., A novel inhibitor of BCL2, Disarib abrogates tumor growth while sparing platelets, by activating intrinsic pathway of apoptosis. Biochemical pharmacology 2016, 122, 10-22.

34

ACS Paragon Plus Environment

Page 34 of 80

Page 35 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry Letters

Table 1. Dock score,RF,ABR,GBR and DLpredicted activities for top10 TCM database towards MMP13. Name Nazlinin

Dock score 135.895

–PMF 63.240

–PLP1 49.490

–PLP2 52.420

Predicted activity*

H-bond quantity

RF*

4

4.430916 67 4.506833

ABR*

GBR*

DL*

4.056

4.304040

4.110227

3.83

09 3.876606

3.204858

8

5

Subaphylline

110.056

50.590

28.800

19.770

9

33

Ochrolifuanine A

108.029

55.470

74.170

75.910

4

N-Methyl tyramine-O-α-L-rhamnopyranoside

107.7070

57.920

52.870

45.740

2

Usambarine

105.303

66.820

77.850

77.610

2

Tubulosine

103.955

39.680

67.200

69.120

3

Emetine

103.495

36.960

55.640

60.170

7

5.018305 4.735444 56 4.977472 44 4.982027 22 4.9735 78

5.058230 4.125 77 5.214 5.09125 5.250062

5.080192 4.294524 55 5.250005 61 4.996268 44 5.278868

5.560901 4.813676 6 5.268369 4 8.218174 9.047226

Alangimarckine

102.011

36.570

66.140

66.370

4

4.979055

5 5.1322

36 5.069089

6.467669

Vitamin B1

98.292

57.010

65.330

52.850

2

Hydrachine A E41*

89.277

63.290

50.410

46.090

1

42.241

42.690

61.880

58.330

2

4.345916 56 4.910305 67 3.955305 56

3.725333 5.279666 33 4.866 67

4.251917 4 5.215940 6 5.104626 21

5.065189 3.978405 5.135390

26

3

*E41:control

56

RF:Random Forest ABR: AdaBoost Regressor GBR:Gradient Boosting DL:Deep Learning

35

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 36 of 80

Table 12. The intersection TCM candidates of MMP13, CDKN2A, TP53 and BCL2 docking result. TCM name

Nazlinin

Subaphylline

Emetine Hernovine Adrenaline 1,7-Bis(4hydroxyphenyl)1,4,6-heptatrien3-one Diiodotyrosine Mescaline

Code a b

PDB name

Dock score

H-bond forming residues

Pi-Pi interaction

H-bond quantity

His226,Asp231,His232,Pro242 Met160,Ser99,Pro98 His232

4 3

Arg267 His222 Asp14 His226, His232 --Ala17 --Ser43, Tyr44, Ala17

5 7 2 3 2 6 3 2

MMP13 TP53 MMP13

135.895 89.796 110.056

d e f g h i j k

TP53 MMP13 CDKN2A MMP13 TP53 CDKN2A TP53 CDKN2A

80.638 103.495 72.432 99.644 48.776 77.896 67.328 76.144

Asp231, Ala186, Glu223 Asp208 Ala186, His222, Glu223, His226,Asp231, His232, Pro242 Ser99, Arg158, Asp208 Ala186, His222, Glu223,His226, Pro242 Asp14, Pro41 Ala188, His226, Asp231 Asp208, Thr256 Asp14, Pro41, Asn42 Asp208, Glu258 Asp14, Pro41

l

TP53

45.838

Arg158, Asp208, Thr256

Arg158, Met160

3

m n o p

CDKN2A TP53 CDKN2A TP53

73.831 41.146 70.371 56.574

Asp14, Pro41, Ser43 Asp208 Asp14, Pro41, Asn42 Ser99, Asp208, Thr256

Asp14 Pro98 Asp14 Met160

4 1 4 5

c

36

ACS Paragon Plus Environment

9

Page 37 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry Letters

Table 3 Vote score of top ten candidates Vote score Name Nazlinin Subaphylline Ochrolifuanine A N-Methyl tyramine-O-α-L-rhamnopyranoside Usambarine Tubulosine Emetine Alangimarckine Vitamin B1 Hydrachine A E41*

pKi RF

ABR

GBR

1 1 1 1 1 1 1 1 1 1 0

0 0 1 0 1 1 1 1 0 1 0

0 0 0 0 1 0 1 0 0 1 0

DL 0 0 1 0 1 1 1 1 0 0 0

Dockscore 5 5 5 5 5 0 0 0 0 0 0

Total- Multiscore target 6 6 8 6 9 3 4 3 1 3 0

1 1 0 0 0 0 1 0 0 0 0

*E41:control Vote score: For all activity values predicted by one algorithm, candidates larger than control were voted 1 point, and others were voted as 0 point. Top 50% of the Dock score were voted 5 point (Dock score is critical) and others were voted as 0 point.

37

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 38 of 80

Table 4. Designed peptides sequences. Sequences

Dock

*S1 /KKKK E41(control) /GGGG /RGGS 9GN(control) 26921/FKFGSFIKRMWRSKLAKKLRAK GKELLRDYANRVLSPEEEAAAPAPVPA *S2 (O and T) /THQGQHHCCKHLIKCWKLLRIWGIEL LRDYANRVLSPEEEAAANCDCYK 23678/THQGQHHCCKHLIKCWKLLRIW GIELLRDYANRVLSPEEEAAANCDCYK *S3/ESEFDPQEYYECKRQCMQLETSGQ YRRCHSQCLKRFEEDWPWSKYDCEE P16 (PDB code:1BI7 –B) 23678/ESEFDRQEYEECKRQCMQLETSG QMRRCVSQCDKRFEEDIDWSKYDNQD

207.901 57.629 87.021 107.402 50.305 1205.99

Zdock Zrank

Rdock

39.78

-93.019

14.442

42.75 42.37

-94.232 -73.495

-12.961 -2.6243

39.19

-96.464

-27.668

51.13

-93.09

-19.994

71.09 49.81

-134.42 -115.69

-23.466 -28.180

38

ACS Paragon Plus Environment

PDF Total Energy

1600.86

PDF Physical Energy

109.953

DOPE Score

-25221

Targets

Remark

MMP13 MMP13 CDKN2A P53 mut P53 mut Bcl2

Success

Bcl2

Fail Fail Screening Two sites synergies,

P53 mut 3199.54

84.7084

-36991

P53 mut Cdk6 Cdk6

pI =8.05

Screening based on p16

Page 39 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry Letters

*S4/ESEFDPQEYYECKRQCMQLETSGQ YRRCHSQCLKRFEEDWPWSKYDCEE

51.13

-93.09

-19.994

*S5(16322)/CLAGRLDKQCTCRRSQPSR RSGHEVGRPSPHCGPSRQCGCHMD *S6/CWDHWLRKQHICRMWQYYLRFG HEVGRPSPHCGPSRQCGCHMD

37.6

-64.907

-8.8279

35.64

-80.812

-11.218

*Potential peptides PDF: probability density function; DOPE: Discrete Optimized Protein Energy;

39

ACS Paragon Plus Environment

3199.54

84.7084

-36991

Cdk6

MDM2 2008.51

107.514

-15588

MDM2

Cyclin binding site Based on p16 Based on S5

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 80

Table 5. Pivotal hydrogen bonds distances data. Hbond

Occupancy

Max (nm)

Min (nm)

lig/H20:O/A186 lig/H20:OE1/E223 lig/H20:OE2/E223 lig/H64:OD1/D179 R109/HE:OD1/D46(O) R109/HE:OD2/D46(O) R109/1HH1:OD1/D46(O) R109/1HH1:OD2/D46(O) D46/HN(O):NE2/Q25 R129/HE:ND1/H7(lig) R129/1HH1:ND1/H7(lig) K10/HZ1(T):OE2/E114 H11/HE2(T):O/A149 F113/HN:O:Q4(lig) S227/HG1:OH/Y48(lig) D228/HN:OE1/Q11(lig) D228/HN:NE2/Q11(lig) Q4/1HE2(lig):O/L111 R144/1HH1:OE1/Q18(lig) I169/HN:OG1/T21(lig) K13/HZ1(lig):O/A23 R14/HE(lig):NH2/R144 Y25/HH(lig):O/G22 K51/HZ1:OE1/E24(lig) K51/HZ1:OE2/E24(lig) R97/1HH1:OE1/E24(lig)

46.03% 74.03% 78.13% 68.80% 97.70% 97.90% 95.13% 94.67% 51.47% 47.20% 42.07% 53.37% 42.63% 81.73% 60.80% 73.30% 64.43% 56.93% 64.60% 42.37% 44.73% 45.10% 75.00% 43.25% 37.70% 40.43%

1.857 2.277 2.352 2.325 1.654 1.576 1.574 1.585 1.128 0.978 1.134 0.981 0.998 0.608 1.430 0.588 0.690 0.836 1.417 2.140 0.836 1.443 1.229 2.584 2.551 4.145

0.154 0.143 0.147 0.143 0.151 0.154 0.148 0.149 0.217 0.166 0.181 0.146 0.187 0.167 0.166 0.163 0.192 0.155 0.150 0.173 0.154 0.228 0.152 0.143 0.144 0.151

Y100/HH:O/P28(lig)

56.45%

3.162

0.149

40

ACS Paragon Plus Environment

Complex MMP13-S1

Bcl2-O

Bcl2-T

p53-S3

CDK6-S4

MDM2-S5 MDM2-S6

Page 41 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Table 6. Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) calculation for binding free energy. 0ns (kcal/mol) Complex

MMP13-S1 Bcl2-O Bcl2-T P53-S3 CDK6-S4 MDM2-S5 MDM2-S6

300ns (kcal/mol)

Binding energy

Complex energy

Binding energy

Complex energy

-13.8371 5.2504 38.8589 15.3423 5.5667 13.0436 11.3177

-7213.272 -8389.766 -8344.393 -10243.48 -13634.24 -5077.350 -5159.072

12.6730 -51.8586 -22.0154 -10.1375 -18.4367 -8.9623 -47.8072

-7097.7526 -8464.5357 -8398.3407 -10098.453 -14119.587 -5143.5054 -5209.4696

41

ACS Paragon Plus Environment

Mutation energy

-18.18 -13.65 0.42 -2.39

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 42 of 80

Table 27. Top 50 TCM candidates for MMP13 and TP53. PDB MMP13

Dockscore

PDB

name

Dockscore

Budmunchiamine L4

153.969

TP53

Glyasperin B

111.883

isofebrifugine

139.379

L-Valine-L-valine anhydride

109.544

Nazlinin Tetrahydrodeoxyoxolucidine A

135.895 134.046

Nazlinin Assamicadine

108.487 99.263

Nudicaulin

133.82

Eupachinilide J

97.207

Subaphyllin Febrifugine Anhydrocannabisativine

133.155 131.852 130.993

Saussureamine C Lindechunine B 11-Hydroxycephalotaxine

97.207 96.513 96.387

Carpaine

125.472

Gomisin D

95.16

Carpaine

125.472

94.627

L-Valine-L-valine anhydride

120.07

1-(1,5-Dimethyl-4-hexenyl)-4methyl benzene N-Norarmepavine

L-Valine-L-valine anhydride

118.651

2,6-Decamethylene pyridine

94.078

42

ACS Paragon Plus Environment

94.627

Page 43 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry Letters

Phytosphingosine Trypargine N-Methyl tyramine-O-α-Lrhamnopyranoside Tetrahydrodeoxylucidine B

117.696 115.203 114.945

Launobine Subaphylline Arnidiol 3-O-palmitate

94.078 93.977 93.541

114.056

Cularimine

93.541

Trypargine

113.485

Cassyfiline

92.661

Acanthoine Acanthoine

113.403 113.403

Hernangerine S-(2-Carboxyethyl)-L-cysteine

92.541 91.438

Tetrahydrodeoxyoxolucidine B

112.488

Funtumine

91.438

Tetrahydrodeoxylucidine A

112.486

(−)-Nordicentrine

91.328

beta-Dichroine 111.848 N-Methyl tyramine-O-alpha-L- 111.106 rhamnopyranoside Subaphylline 110.056

Norannuradhapurine FB1

91.185 91.103

(+)-Guaiacin

91.049

β-Dichroine gamma-Dichroine Acanthoidine

109.045 109.044 108.365

Xylopine 7,4'-Dihydroxyflavan β-Acoradiene

91.049 91.042 90.765

Acanthoidine Ochrolifuanine A Norerythrostachaldine

108.365 108.029 107.999

Boldine Casearlucin A Annulide

90.681 90.279 90.19

43

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 44 of 80

Febrifugine

107.589

Cheliensisamine

90.19

Ochrolifuanine A

107.09

Adrenaline

89.704

Dihydrooxolucidine A

105.639

5,15-Dimethylmorindol

89.671

Usambarine Tubulosine Tubulosine 3_4_5-trimethoxy_benzeneethanamine

105.354 105.29 104.825 104

6-O-Acetylstritosamide Diiodotyrosine (+)-Nordicentrine Cinchonicine

89.297 89.284 89.213 89.092

Acanthoidine

103.633

89.006

Acanthoidine Emetine

103.633 103.495

3,4,5-trimethoxy benzeneethanamine Hernovine 3α-Acetoxydiversifolol

Norerythrostachaldine Cephaeline

102.5 102.392

Actinodaphnine 88.602 Anticancer Flavonoid PMV70P691- 88.534

Alangimarckine

102.338

95 norboldine

Cephaeline

101.603

Alangimarckine

101.055

1,7-Bis(4-hydroxyphenyl)-1,4,6heptatrien-3-one Benzoylpaeoniflorin

44

ACS Paragon Plus Environment

88.769 88.602

88.302 88.268 88.22

Page 45 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry Letters

Conessimine

100.448

Norcinnamolaurine

88.127

Broussonetine V

100.294

4-Epi-larreatricin

88.035

Hernovine (−)-Cassine Merresectine A

99.644 99.596 99.473

Deformylflustrabromine B Norhyoscyamine Mescaline

88.019 87.762 87.675

45

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 46 of 80

Table 38. Top 50 TCM candidates for CDKN2A and BCL2. PDB

Dock-

PDB

name

score CDKN2A

dopamine

82.325

25-Anhydroalisol A 24-acetate

81.977

Dockscore

BCL2

trans-Phenylitaconic acid

102.302

5-(Hydroxymethyl)-furan-2-

80.881

carboxylic acid D-Norpseudoephedrine

81.977

Arillanin C

73.776

Anhydroalkannin

81.634

4(18),13-Clerodadien-3-oxo-15-oic

73.418

acid methyl ester Chinese bittersweet alkaloid I

81.561

trans-2-Hexenoic acid

73.418

Noradrenaline

80.595

Diterpenoid EF-D

72.909

(S)-cathinone

80.028

Oxalic acid

72.909

Dopamine

79.664

3-Butenoic acid

72.824

Broceaketolic acid

79.661

5-Carboxy-7-hydroxy-2-methyl-

72.356

benzopyran-γ-one

46

ACS Paragon Plus Environment

Page 47 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry Letters

Ethyl O-β-D-oleandropyranosyl-(1→4)- 79.24

2-Furancarboxylic acid

72.356

O-3-O-methyl-6-deoxy-β-Dallopyranoside Salicylamine

79.24

trans-Phenylitaconic acid

71.798

Norephedrine

79.14

Aristolochic acid II methyl ester

69.671

Bufotalin

78.855

crotonic acid

69.671

ephedrine hydrochloridum

78.855

Crotonic acid

69.671

octopamine

78.447

Crocusatin B

68.655

Embelin

78.108

Imidazolylpropionic acid

68.655

Phenethylamine

78.108

succinic acid methyl ester

68.576

3-O-β-D-glucopyranosyl-(1→2)-β-D-

77.963

heptenoic acid

68.415

Tyramine

77.963

α-Aminoadipic acid

67.982

Tyramine

77.963

Butanoic acid

67.982

Adrenaline

77.896

Butyric acid

67.982

quinovopyranosyl quinovic acid

47

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 48 of 80

4,4-Dimethyl-1,7-heptanedioic acid

77.645

hexanoic acid

67.813

noradrenaline

77.645

Caproic acid

67.813

Evocarpine

77.423

5β-Androstan-3α,17β-diol

67.584

Serotonin

77.423

Clausenidin

67.3

Coniferyl diangelate

76.589

2-Heptenic acid

67.3

4-Hydroxybenzylamine

76.589

3-methyl-butanoic acid

66.439

D-Cathinone

76.466

Ecliptasaponin A

65.662

1,7-Bis(4-hydroxyphenyl)-1,4,6-

76.144

Pentanic acid

65.662

9-Acetoxyfukinanolide

75.84

(2S)-(O-Hydroxyphenyl)lactate

64.75

13β,17β-Epoxyalisol A 24-acetate

75.801

m-Hydroxyphenylpyruvic acid

64.328

Propylamine

75.801

3-O-(E)-Coumaroylerythrodiol

64.327

Synephrine

75.446

Ginsenoside Rf

63.127

(2S,6ζ)-3,7-Dimethyloct-3(10)-ene-

75.161

Tiglic acid

63.127

heptatrien-3-one

48

ACS Paragon Plus Environment

Page 49 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry Letters

1,2,6,7-tetrol 1-O-β-D-gluco-pyranoside (1S,2S)-norpseudoephedrine

74.665

cis-cou-maric acid

63.007

Cucurbitoside A

74.123

Ascosonchine

62.667

Isoamylamine

74.123

4'-O-β-D-Glucosyl-9-O-(6''-

62.595

deoxysaccharosyl)olivil (-)-synephrine

74.074

(2R)-Sodium 3-phenyllactate

62.454

Clerosterol

74.046

Melilotic acid

62.336

Hexyl amine-1

74.046

Dihydrooroxylin A

61.565

Diiodotyrosine

73.831

Methyl glutarate

61.565

7,7-Dimethyl-2-

73.718

1,4-Dimethyl-cis-cyclohexane

61.275

73.551

2,4-Nonadienic acid

61.275

73.551

ent-15α,18-Dihydroxy-16-kaurene

60.553

methylenebicyclo[3.1.1]heptan-6-ol acetate (2S)-2-O-β-D-Glucopyranosyl-2hydroxyphenylacetic acid Tryptophan

49

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 50 of 80

Phenylalanine

72.865

2-Minaline

60.553

Emetine

72.432

6'-O-Acetylloganic acid

60.289

l-tyrosine

72.114

Angelic acid

60.289

D-Pseudoephedrine

71.543

Gallic acid

59.983

(1R,2S)-norephedrine

70.671

Urocanic acid

59.753

Mescaline

70.371

Ferulic acid

59.079

50

ACS Paragon Plus Environment

Page 51 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry Letters

Table 4. Dock score,RF,ABR,GBR and DL predicted activities for top10 TCM database towards MMP13. RF*

Predicted activity* ABR* GBR*

DL*

4

4.43091

4.056

4.30404

4.11022

667 4.50683 333

3.83

009 3.87660 68

7 3.20485 85

5.01830 4.73544 556 4.97747 444 4.98202 222 4.9735 778

5.05823 4.125 077 5.214 5.09125 5.25006 25 5.1322 3.72533 5.27966 333 4.866 667

5.08019 4.29452 255 5.25000 461 4.99626 544 5.27886 8 836 5.06908 4.25191 94 5.21594 76 5.10462 021 626

5.56090 4.81367 16 5.26836 64 8.21817 9 9.04722 4 6 6.46766 5.06518 9 3.97840 9 5.13539 5 03

Name

Dock score

–PMF

–PLP1

–PLP2

H-bond quantity

Nazlinin

135.895

63.240

49.490

52.420

Subaphylline Ochrolifuanine A N-Methyl Usambarine rhamnopyranoside Tubulosine

110.056 108.029 tyramine-O-α-L- 107.7070 105.303 103.955

50.590

28.800

19.770

9

55.470 57.920 66.820 39.680

74.170 52.870 77.850 67.200

75.910 45.740 77.610 69.120

4 2 2 3

Emetine

103.495

36.960

55.640

60.170

7

Alangimarckine Vitamin B1 Hydrachine A E41*

102.011 98.292 89.277 42.241

36.570 57.010 63.290 42.690

66.140 65.330 50.410 61.880

66.370 52.850 46.090 58.330

4 2 1 2

*E41:control RF:Random Forest ABR: AdaBoost Regressor GBR:Gradient Boosting DL:Deep Learning

51

ACS Paragon Plus Environment

4.97905 4.34591 556 4.91030 667 3.95530 556 556

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 52 of 80

Table 5. Vote score of top ten candidates. Vote score Name Nazlinin Subaphylline Ochrolifuanine A N-Methyl tyramine-O-α-L-rhamnopyranoside Usambarine Tubulosine Emetine Alangimarckine Vitamin B1 Hydrachine A E41*

pKi RF

ABR

GBR

DL

Dockscore

1 1 1

0 0 1

0 0 0

0 0 1

5 5 5

6 6 8

1 1 0

0

5

6

0

1 1 1 1 0 0 0

5 0 0 0 0 0 0

9 3 4 3 1 3 0

0 0 1 0 0 0 0

1 1 1 1 1 1 1 0

0

0

1 1 1 1 0 1 0

1 0 1 0 0 1 0

Totalscore

Multitarget

*E41:control Vote score: For all activity values predicted by one algorithm, candidates larger than control were voted 1 point, and others were voted as 0 point. Top 50% of the Dock score were voted 5 point (Dock score is critical) and others were voted as 0 point.

52

ACS Paragon Plus Environment

Page 53 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

The Journal of Physical Chemistry Letters

Table 6. Designed peptides sequences. Sequences

Dock

*S1 /KKKK E41(control) /GGGG /RGGS 9GN(control) 26921/FKFGSFIKRMWRSKLAKKLRAK GKELLRDYANRVLSPEEEAAAPAPVPA *S2 (O and T) /THQGQHHCCKHLIKCWKLLRIWGIEL LRDYANRVLSPEEEAAANCDCYK 23678/THQGQHHCCKHLIKCWKLLRIW GIELLRDYANRVLSPEEEAAANCDCYK *S3/ESEFDPQEYYECKRQCMQLETSGQ YRRCHSQCLKRFEEDWPWSKYDCEE P16 (PDB code:1BI7 –B) 23678/ESEFDRQEYEECKRQCMQLETSG

207.901 57.629 87.021 107.402 50.305 1205.99

Zdock Zrank

Rdock

39.78

-93.019

14.442

42.75 42.37

-94.232 -73.495

-12.961 -2.6243

39.19

-96.464

-27.668

51.13

-93.09

-19.994

71.09 49.81

-134.42 -115.69

-23.466 -28.180

53

ACS Paragon Plus Environment

PDF Total Energy

1600.86

PDF Physical Energy

109.953

DOPE Score

-25221

Targets

Remark

MMP13 MMP13 CDKN2A P53 mut P53 mut Bcl2

Success

Bcl2

Two sites synergies,

Fail Fail Screening

P53 mut 3199.54

84.7084

-36991

P53 mut

pI =8.05

Cdk6 Cdk6

Screening

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 54 of 80

QMRRCVSQCDKRFEEDIDWSKYDNQD *S4/ESEFDPQEYYECKRQCMQLETSGQ YRRCHSQCLKRFEEDWPWSKYDCEE

51.13

-93.09

-19.994

*S5(16322)/CLAGRLDKQCTCRRSQPSR RSGHEVGRPSPHCGPSRQCGCHMD *S6/CWDHWLRKQHICRMWQYYLRFG HEVGRPSPHCGPSRQCGCHMD

37.6

-64.907

-8.8279

35.64

-80.812

-11.218

*Potential peptides PDF: probability density function; DOPE: Discrete Optimized Protein Energy

54

ACS Paragon Plus Environment

3199.54

84.7084

-36991

Cdk6

MDM2 2008.51

107.514

-15588

MDM2

based on p16 Cyclin binding site Based on p16 Based on S5

Page 55 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Table 7. Pivotal hydrogen bonds distances data. Hbond Occupancy Max (nm) lig/H20:O/A186 46.03% 1.857 lig/H20:OE1/E223 74.03% 2.277 lig/H20:OE2/E223 78.13% 2.352 lig/H64:OD1/D179 68.80% 2.325 R109/HE:OD1/D46(O) 97.70% 1.654 R109/HE:OD2/D46(O) 97.90% 1.576 R109/1HH1:OD1/D46(O) 95.13% 1.574 R109/1HH1:OD2/D46(O) 94.67% 1.585 D46/HN(O):NE2/Q25 51.47% 1.128 R129/HE:ND1/H7(lig) 47.20% 0.978 R129/1HH1:ND1/H7(lig) 42.07% 1.134 K10/HZ1(T):OE2/E114 53.37% 0.981 H11/HE2(T):O/A149 42.63% 0.998 F113/HN:O:Q4(lig) 81.73% 0.608 S227/HG1:OH/Y48(lig) 60.80% 1.430 D228/HN:OE1/Q11(lig) 73.30% 0.588 D228/HN:NE2/Q11(lig) 64.43% 0.690 Q4/1HE2(lig):O/L111 56.93% 0.836 R144/1HH1:OE1/Q18(lig) 64.60% 1.417 I169/HN:OG1/T21(lig) 42.37% 2.140 K13/HZ1(lig):O/A23 44.73% 0.836 R14/HE(lig):NH2/R144 45.10% 1.443 Y25/HH(lig):O/G22 75.00% 1.229 K51/HZ1:OE1/E24(lig) 43.25% 2.584 K51/HZ1:OE2/E24(lig) 37.70% 2.551 R97/1HH1:OE1/E24(lig) 40.43% 4.145 Y100/HH:O/P28(lig) 56.45% 3.162

Min (nm) 0.154 0.143 0.147 0.143 0.151 0.154 0.148 0.149 0.217 0.166 0.181 0.146 0.187 0.167 0.166 0.163 0.192 0.155 0.150 0.173 0.154 0.228 0.152 0.143 0.144 0.151 0.149

55

ACS Paragon Plus Environment

Complex MMP13-S1

Bcl2-O

Bcl2-T

p53-S3

CDK6-S4

MDM2-S5 MDM2-S6

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 8. Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) calculation for binding free energy. Complex 0ns (kcal/mol) 300ns (kcal/mol) Mutation energy Binding Complex Binding Complex energy energy energy energy MMP13-S1 -13.8371 -7213.272 12.6730 -7097.7526 Bcl2-O 5.2504 -8389.766 -51.8586 -8464.5357 -18.18 Bcl2-T 38.8589 -8344.393 -22.0154 -8398.3407 P53-S3 15.3423 -10243.48 -10.1375 -10098.453 -13.65 CDK6-S4 5.5667 -13634.24 -18.4367 -14119.587 0.42 MDM2-S5 13.0436 -5077.350 -8.9623 -5143.5054 MDM2-S6 11.3177 -5159.072 -47.8072 -5209.4696 -2.39

56

ACS Paragon Plus Environment

Page 56 of 80

Page 57 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 1. The Pearson correlation coefficient matrix heat map of 204 selected features.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

(b)

Figure 2. The Principal component analysis (PCA) visualization (a) 2D, (b) 3D.

ACS Paragon Plus Environment

Page 58 of 80

Page 59 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 3. Fine tuning structure of optimizer in neural network.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Figure 4. Four target proteins were docked screening in TCM database, respectively. Top 50 TCM candidates for different proteins were integrated in a network and the intersection were focused especially.

ACS Paragon Plus Environment

Page 60 of 80

Page 61 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 5. Interaction modes of different complexes candidates in 2D and 3D horizon. A. Nazlinin-MMP13 (a); B. Subaphylline-TP53 (d); C. Adrenaline-CDKN2A (i); D. E41 (control)-MMP13.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Related proteins. Three proteins (CDKN2A、TP53、BCL2) were selected in the stitch interaction database. Sphere points replaced several proteins, and the rounded rectangle displayed the compounds related to MMP13. False discovery rate of pathway in cancer was 1.73e-09. The first and the second shells both set as no more than 20. It was funny that MMP13 resulted in the cancer through other related protein like TP53.

ACS Paragon Plus Environment

Page 62 of 80

Page 63 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 7. Disorder analysis for four proteins. Disorder value lower than 0.5 could be a stable structure. The amino acids around binding areas displayed with a cyan color. Protein: A. MMP13, B. CDKN2A, C. TP53, D. BCL2.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

(b)

ACS Paragon Plus Environment

Page 64 of 80

Page 65 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

(c)

Figure 8. Residual plot.Different prediction models predicted compounds activities based on known MMP13 inhibitors.(a) AdaBoost Regressor (b) Gradient Boosting Regressor (c) Random Forest

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 9. Scatter plots to present the results of 350 experiments in Deep Learning model

ACS Paragon Plus Environment

Page 66 of 80

Page 67 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 10. TCM candidates. Based on the common ligands with multiple targets, four ligands were ascertained referred to Dock score H-bond and pi-pi interactions for MD analysis. A. Nazlinin-MMP13 (a); B. Subaphylline-TP53 (d); C. Adrenaline-CDKN2A (i); D. E41 (control)-MMP13.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 11. Hydrophobic effect of MMP13 binding site with different ligands displayed with 2D and 3D vision.

ACS Paragon Plus Environment

Page 68 of 80

Page 69 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 12. Cluster of MD result during 290ns~300ns. Clustering result corresponding to different times and the ratio of different groups (pie graph) were provided. Unfortunately, all of the candidates (top1_Nazlinin, top3_Adrenaline, cont_E41) flown away at the end of simulation.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

(b)

(c)

(d)

Figure 13. Complex of Bcl2-S2 and MMP13-KKKK. (a) Double sites binding of S2

ACS Paragon Plus Environment

Page 70 of 80

Page 71 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

peptide (O and T) in Bcl2, the peptide designed to target the BH1 domain based on an inhibitor and acquired a high affinity of BH3 domain; (b) RMSD and MSD during 300ns MD simulation. Both complexes were binding stability in MD period. T changed the conformation in very beginning (nearly 25ns) and leaded the change of complex RMSD. O altered conformation at 200na to 250ns. (c) SASA and gyrate analysis. All of the value were relatively stable. The SASA of O changed the same as its RMSD. (d) The change of N terminal in O. Auxiliary binding when the N terminal matched the conformation of Bcl2.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

(b)

(c)

(d)

ACS Paragon Plus Environment

Page 72 of 80

Page 73 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 14. Structure alternation of T. (a) Different type of T. (b) Remarkable amino acid of “turn” node. In fact, residues of 1 to 15 all had obvious changes. Two of the most significant residues were displayed; (c) Cavity pathway of Bcl2. (d) RMSF analysis. The same structure of O and T had nearly different fluctuation. O bound with the cycle “O” type region, and the residue 1 to 15 were flexible. T was just the opposite.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

(b)

(c)

(d)

ACS Paragon Plus Environment

Page 74 of 80

Page 75 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 15. MD analysis of p53 and CDK6 protein. (a) Free energy landscape (FEL). The Gibbs free energy was estimated based on the distribution of conformation. The structure with low Gibbs free energy would be set as a reference. (b) Vital hydrogen bonds in p53-S3 during MD period. Binding modes changed a lots compared with the origin structure. The final binding type was similar with the low Gibbs energy structure. (c) RMSD and MSD of different complexes were displayed. The change of protein and ligand were provided, respectively. (d) Gyrate and SASA analysis. They were stable during MD.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

(b)

(c)

(d)

ACS Paragon Plus Environment

Page 76 of 80

Page 77 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 16. The structure of CDK6 binding with S4. (a) Binding sites of p16 and S4. (b) Significant site of CDK6. S4 could influent these region but Thr177 phosphorylation. It could be an improved scheme (c) The cavity that the peptide could reach. (d) Residues distances matrix of CDK6 when binding with S4.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

(b)

(c)

(d)

ACS Paragon Plus Environment

Page 78 of 80

Page 79 of 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 17. MD analysis of MDM2 with ligand. (a) Flexible region of MDM2 and S6. The most inconstant amino acids printed as red. Residues 20-28 of S6 could evolve into potential binding area. (b) RMSD and MSD value; (c) gyrate and SASA change. S5 and S6 had similar effect to MDM2; (d) Flexible area reconstructed enhance the binding ability of S5 and S6. The variability of peptide implied another connection potential although most of conditions regard as a bad region that cannot control.

ACS Paragon Plus Environment

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 18. Vital hydrogen bonds of candidate complexes. The donor H and hydrogen acceptors were shared.

ACS Paragon Plus Environment

Page 80 of 80