Perturbation Theory–Machine Learning Study of ... - ACS Publications

In this work, we demonstrate that Perturbation Theory and Machine Learning can be combined in a PTML multioutput model describing the effects of desil...
0 downloads 0 Views 2MB Size
Article Cite This: J. Chem. Inf. Model. 2018, 58, 2414−2419

pubs.acs.org/jcim

Perturbation Theory−Machine Learning Study of Zeolite Materials Desilication Vincent Blay,*,† Toshiyuki Yokoi,‡ and Humbert Gonzaĺ ez-Díaz*,§,∥ †

Fisher College of Business, The Ohio State University, Gerlach Hall, 2108 Neil Avenue, Columbus, Ohio 43210, United States Institute of Innovative Research, Chemical Resources Laboratory, Tokyo Institute of Technology, 4259 Nagatsuta, Midori-ku, Yokohama 226-8503, Japan § Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940 Leioa, Spain ∥ IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain Downloaded via UNIV OF WINNIPEG on January 21, 2019 at 13:45:08 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: Zeolites are important materials for research and industrial applications. Mesopores are often introduced by desilication but other properties are also affected, making its optimization difficult. In this work, we demonstrate that Perturbation Theory and Machine Learning can be combined in a PTML multioutput model describing the effects of desilication. The PTML model achieves a notable accuracy (R2 = 0.98) in the external validation and can be useful for the rational design of novel materials.

1. INTRODUCTION Zeolites are microporous crystalline silicoaluminates with an enormous economic impact. They are used extensively as adsorbents, catalysts, and cation exchangers, among others.1,2 Over two hundred different zeolite frameworks have been synthesized up to date,3 but only a dozen are applied in the industry.4 Among these, zeolite ZSM-5 has attracted much interest because of its stable three-dimensional framework and its high selectivity to propylene in several processes.5 Zeolites have micropores of subnanometric dimensions, which allows them to act as molecular sieves and selective adsorbents of small molecules, but it also makes the processing of larger molecules, like those present in crude oil, difficult. In response, multiple strategies have been developed to incorporate mesopores (⟩2 nm) into zeolites while preserving their microporosity, thus generating so-called hierarchical materials.6 Desilication is a postsynthesis (top-down) hierarchization method of special relevance (Figure 1a) because of its versatility, scalability, and the quality of the thus-generated mesopores.7,8 Desilication is achieved by digestion of the zeolite crystals in alkaline media under controlled conditions (Figure 1b). Part of the Al dissolved from the material is realuminated on the external surface of the zeolite, thus decreasing the Si/Al ratio of the material and often introducing new Lewis acid sites.7 The benefits of zeolite catalysts treated by desilication in terms of activity, selectivity and/or catalytic lifetime have been demonstrated in many reactions, including isomerization, alkylation, acylation, aromatization, catalytic cracking, pyrolysis, methanol-to-hydrocarbons, etc.9 However, the behavior of the alkaline-treated zeolites often depends on a balance between the introduced mesopores and the effects on other variables, like the © 2018 American Chemical Society

micropore volume, that is highly dependent on the reaction to be catalyzed. Moreover, an ever-expanding range of starting materials, conditions, pore-directing agents (PDAs), and sequence of treatments to tune the properties of the final material make it increasingly difficult to define the optimal desilication conditions for a given application.7 Optimization studies are expensive and often limited by the availability of novel zeolites synthesized in the laboratory and by the growing number of variables or dimensions to consider in the treatment. And, still, since desilication dissolves valuable catalytic material (losses of 30 wt % are common7), highly optimized treatments are necessary for them to be applicable in the industry. In this context, data analysis Machine Learning (ML) techniques can be useful to model and optimize zeolite materials and processes. In fact, ML techniques have been applied to obtain relevant descriptors of zeolite frameworks.10 Recently, Perturbation Theory (PT) operators and Machine Learning techniques have been combined to create powerful PTML (PT + ML) models, which are being applied to complex problems in medicinal chemistry, nanotechnology, materials science, etc.11−20 PTML models predict the properties of a query compound or material (q) starting with the property for a system of reference (r) and adding PT operators to measure the deviations (perturbations) from the reference.21,22 They can be used to study large data sets with multiple experimental input conditions. Special Issue: Materials Informatics Received: June 15, 2018 Published: August 23, 2018 2414

DOI: 10.1021/acs.jcim.8b00383 J. Chem. Inf. Model. 2018, 58, 2414−2419

Article

Journal of Chemical Information and Modeling

used, c2 = method used to characterize the micropore specific volume of the starting material, c3 = method used to characterize the mesopore specific surface area of the starting material, c4 = base/PDA/acid combination used, c5 = base/PDA/acid used in step 1, c6 = base/PDA/acid used in step 2, c7 = method used to characterize the micropore specific volume of the treated materials, c8 = method used to characterize the mesopore specific surface area of the treated materials, etc. The eight output properties/parameters εij of the materials studied are BET surface area (m2 g−1), mesopore volume (cm3 g−1), micropore volume (cm3 g−1), Si/Al ratio, mesopore size (nm), mesopore area (m2 g−1), total volume (cm3 g−1), and treatment yield (wt %). Detailed information about this data set is provided as Supporting Information file ci8b00383_si_001.xlsx. Monte Carlo Pseudorandom Number Generation of Data Pairs. Using this original (raw data) set we generated ⟩4000 pairs of new vs reference materials. In order to generate the pairs of materials, we have taken the following steps. First, we labeled each material in the data set with a number (ni = 1− 1019). Second, we sorted the data using two criteria: (1) sort by property (from A to Z) and (2) sort the by value of the property from the highest to the lowest value. Next, using a pseudorandom number generator, we sampled nnew = 5000 values of ni for the new materials. In particular, we used the version of the Wichmann-Hill algorithm as implemented in Microsoft Excel. This algorithm is useful for pseudorandom number generation in Monte Carlo (MC) simulations.23 The function used has the form nnew = Random(np‑start; np‑last), with arguments np‑start and np‑last equal to the first and the last label ni for the values belonging to the same experimental property p. Next, we sampled other nref = 5000 values out of ni to label the reference material in the pair. In this case, we used the same MC pseudorandom number generator with a modification. The modification was nref = nnew + Random(1; 5). Consequently, we sampled both nref and nnew that belong to materials with a similar value of the property p. This procedure is aimed to sample pairs with low perturbations in the ouput/input values. As the initial data ni = 1019 ⟨nref = nnew = 5000, a given value of ni was sampled more than once. Finally, we deleted all duplicated pairs (with the same nref and nnew) and/or different properties (if any) to avoid duplicated and/or cross-property pairs. Detailed information about the data set generated, including observed values, input variables, etc., is available as Supporting Information file ci8b00383_si_002.xlsx. PTML Linear Model. PTML modeling techniques are useful to quantify the effect of perturbations in complex biomolecular systems.21,24 The aim of the PTML model proposed here is to predict the value εk(mi, cj) of a property εk of type k of a material under experimental conditions cj. The model starts using as input the value εk(′mi, ′cj)ref of the same property εk for a material of reference ′mi measured under similar experimental conditions ′cj. Next, the model adds up the values of PT operators to account for the effect of differences between the new material and the material of reference. These PT operators are differences of moving averages21,22 with the following form ΔΔVk(mi, cj) = [(Vk(mj) − ⟨Vk(cj)⟩)new − (Vk(′mi) − Vk(′cj)⟩)]. Specifically, the PT operators use the structural variables (Vk(mi) and Vk(′mi)) to account for differences between the new material mj and the material of reference ′mi, respectively. The PT operators use the average values (⟨Vk(cj)⟩ and ⟨Vk(′cj)⟩) to account for the differences on the experimental conditions (cj and ′cj) of the procedures used to modify and characterize the new material and the material of reference, respectively. We a Multivariate

Figure 1. (a) Publications per year indexed in Scopus on zeolite hierarchization and desilication. (b) Illustration of desilication of zeolite ZSM-5. Adapted with permission from ref 8. Copyright 2015 The Royal Society of Chemistry.

In this work, we report for the first time a PTML model applied to zeolite science. We compiled from the literature a data set with ⟩1000 data points from ZSM-5 materials. The data are very heterogeneous and contain different ZSM-5 starting materials (including H-ZSM-5, Na-ZSM-5, Na,K-ZSM-5, NH4-ZSM-5, etc.), multiple experimental conditions, and present multiple output properties of interest. Overall, we analyzed ⟩50 000 pairs of new and reference materials by the PTML methodology. A simple yet powerful PTML model could be developed based on multicondition moving-average (MA) operators. The PTML model developed is a multioutput linear equation able to predict up to eight different properties: BET area (m2 g−1), mesopore volume (cm3 g−1), micropore volume (cm3 g−1), Si/Al molar ratio, mesopore size (nm), mesopore area (m2 g−1), total volume (cm3 g−1), and treatment yield (wt %). The present model demonstrates the usefulness of PTML in catalyst engineering and may become a versatile tool for the rational modification of ZSM-5 materials.

2. MATERIALS AND METHODS Data set. In this work, we studied a data set of 1019 data points collected from the literature and investigated here as a benchmark data set for the first time. These 1019 values come from experiments measuring properties/parameters εij for the alkaline desilication of ZSM-5 materials. These parameters depend on a series of experimental conditions cj = (c1, c2, ..., cn). The data also present variations in multiple conditions: c0 = property of interest of the material, c1 = starting raw material 2415

DOI: 10.1021/acs.jcim.8b00383 J. Chem. Inf. Model. 2018, 58, 2414−2419

Article

Journal of Chemical Information and Modeling

and must not be confused with the output variables εk(mi, cj) measured after the treatment. After collecting the values of the original input variables Vk, we used them to calculate the values of the PT operators. The most common PT operators used in PTML models are the onecondition moving-average (MA) operators. These MA operators are analogous to the MA used in the Box−Jenkins ARIMA models for time series analysis.26 However, we can also develop PTML models using multicondition PT operators (moving averages).27 In a multicondition PT operator, we use the same moving-average idea: ΔVk(cj) = Vk − ⟨Vk(cj)⟩. In Table 1, we

Linear Regression (MLR) algorithm to seek the model.25 Figure 2 shows the general workflow and the main steps taken in this

Table 1. Output Properties Predicted by the PTML Model k 1 2 3

Figure 2. General workflow used to develop the PTML model. 4 5

work to develop the PTML model. The compact and extended forms of the equations for a PTML linear model are as follows:

6

k max

7

εk(mi , cj)new = e0 + a0 ·εk(′mi , ′cj)ref + ak · ∑ Vk k=1

k max , jmax

+ akj ·



8

⟨εk(mi, cj)⟩

SD

139 139

424.26 0.26

117.17 0.14

79.170 79.170

0.005 0.005

139

0.12

0.04

79.170

0.005

138 130

47.71 9.78

86.96 3.70

79.439 83.515

0.005 0.005

123

181.48

121.51

87.026

0.006

123

0.53

0.15

87.026

0.006

88

51.91

13.17

47.082

0.008

⟨V01(cj)⟩ ⟨V10(cj)⟩

also show the average values for two of these properties, ⟨V01(cj)⟩ and ⟨V10(cj)⟩, which resulted relevant to the model. Note that, in our model, the average calculation ⟨Vk(cj)⟩ does not run over one single condition but over multiple conditions. For instance, we can calculate a triple-condition average for the input variable Vk for all the cases with the same set of conditions c1, c2, c3 as ⟨Vk(c1, c2, c3)⟩ instead of calculating three separated one-condition averages as ⟨Vk(c1)⟩, ⟨Vk(c2)⟩, and ⟨Vk(c3)⟩. In this case, the PT operator ΔVk(c1, c2, c3) = [Vk − ⟨Vk(c1, c2, c3)⟩] quantifies at the same time the structure of the system in terms of Vk and three boundary conditions with ⟨Vk(c1, c2, c3)⟩. The detailed list of multicondition averages is available as Supporting Information file ci8b00383_si_003.xlsx. We found a simple yet powerful linear PTML model using only two PT multicondition operators. The PT operators used codify changes in eight different experimental conditions at the same time: BET surface area (m2 g−1), mesopore volume (cm3 g−1), micropore volume (cm3 g−1), Si/Al molar ratio, mesopore size (nm), mesopore area (m2 g−1), total volume (cm3 g−1), and treatment yield (wt %). The equation of the resulting model is the following:

k max

εk(mi , cj)new = e0 + a0 ·εk(′mi , ′cj)ref + ak · ∑ Vk(mi) k=1

k max , jmax



BET area (m2 g−1) mesopore volume (cm3 g−1) micropore volume (cm3 g−1) Si/Al molar ratio mesopore size (nm) mesopore area (m2 g−1) total volume (cm3 g−1) treatment yield (wt %)

nk

ΔΔVkj(cj)

k = 1, j = 1

+ akj ·

property εk(mi, cj) (units)

[(Vk(mi) − ⟨Vk(cj)⟩)new

k = 1, j = 1

− (Vk(′mi) − ⟨Vk(′cj)⟩)ref ]

3. RESULTS AND DISCUSSION In the present materials chemistry problem we analyzed ntotal = 4975 pairs of new and reference materials. Specifically, 3732 pairs of materials were used to train the model and 1243 pairs were used as the validation series. These cases are labeled with a t = train or a v = validation in the Supporting Information file ci8b00383_si_002.xlsx. The first input variable is the value εk(mi, cj) of the property εk of one material of reference measured under the same experimental conditions cj = (c1, c2, ..., c8) as the new material. In order to seek the PTML model, we collected the values of 10 different input variables, Vk. These variables represent properties of the starting materials as well as information about the treatments used in the experimental methods. The variables studied in this work are V01 = Si/Al molar ratio before treatment (b.t.), V02 = crystal size b.t. (μm), V03 = BET surface area b.t. (m2 g−1), V04 = total volume b.t. (cm3 g−1), V05 = micropore volume b.t. (cm3 g−1), V06 = mesopore volume b.t. (cm3 g−1), V07 = mesopore area b.t. (cm2 g−1), V08 = NaOH concentration (M), V09 = TPAOH concentration (M), V10 = MCTAB concentration (M), V11 = temperature of treatment (°C), V12 = time of treatment (min), V13 = no. of steps in the treatment, V14 = solid weight (g), V15 = solution volume (mL). All these variables Vk are input variables measured previously or during the treatment

εk(mi , cj)new = − 0.22881 + 1.02864·εk(′mi , ′cj)ref + 0.02792·V1 + 67.29053·V10 + 0.01264·ΔΔV1(c1, c 2 , c3 , c5 , c6 , c 7 , c8) + 0.05753·ΔΔV7(c1, c 2 , c3 , c5 , c6 , c 7 , c8) ntot = 4975, R2 train = 0.980, R2 val = 0.985, F(1, 3730) = 228700, p < 0.05

The model outputs the values εk(mi, cj) of the property εk of the material measured under different experimental conditions cj = (c1, c2, ..., c8). They refer to the same properties as some of the input variables but measured after the treatment. In this model, the output property predicted is always of the same type as the 2416

DOI: 10.1021/acs.jcim.8b00383 J. Chem. Inf. Model. 2018, 58, 2414−2419

Article

Journal of Chemical Information and Modeling Table 2. Results of the Linear PTML Model sampling Set-01

sampling Set-01

sampling Set-02

variable

t

−95.00%/+95.00%

p

coefficient

std. error

−0.22881 1.02864 0.02792 67.29053 0.01264 0.05753 training

0.46129 0.00240 0.00226 16.53283 0.00220 0.02603 validation

−0.4960 429.0550 12.3701 4.0701 5.7480 2.2099 parametera

0.619916 0.000000 0.000000 0.000048 0.000000 0.027173 training

−1.13/0.68 1.02/1.03 0.02/0.03 34.9/99.7 0.01/0.02 0.01/0.11 validation

cases n R2 R2adjusted r2m parameter

3732 0.980 0.980 0.911 training

1243 0.985 0.985 0.923 validation

SEE SDEP F(1, 3730) p parameter

22.450 22.444 228700 1000 data points.

ORCID

Vincent Blay: 0000-0001-9602-2375 Toshiyuki Yokoi: 0000-0002-3315-3172 Humbert González-Díaz: 0000-0002-9392-2797

treatment vs micropore and mesopore volumes before treatment for >1000 data points. The graph was fitted by distanceweighted least-squares. The z-axis represents mesopore size (nm) (color scale), the inputs being the micropore and mesopore volumes before the treatment. Color scale ranges from 0 to 2 nm (dark green) to >10 nm dark red. This kind of predictions may help to obtain new mesopore-containing materials useful in the industry with minimal loss of microporosity. We are working to expland our PTML approach to a wider range of conditions, materials, and properties of interest to zeolites and materials chemistry.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The Ministerio de Economiá y Competitividad (FEDER CTQ2016-74881-P and CTQ2013-41229-P) and Basque Government (IT1045-16) are gratefully acknowledged for their financial support. V.B.acknowledges the support fromThe Ohio State University



4. CONCLUSION Mesopore-containing zeolites may offer better performance over conventional zeolites in multiple applications, including heterogeneous catalysis. Given its flexibility, desilication in alkaline media is often attempted as a method to introduce these mesopores. However, this same flexibility makes it difficult to select the proper treatment conditions, as multiple treatment

REFERENCES

(1) Yilmaz, B.; Trukhan, N.; Müller, U. Industrial Outlook on Zeolites and Metal Organic Frameworks. Chin. J. Catal. 2012, 33, 3−10. (2) Li, Y.; Li, L.; Yu, J. Applications of Zeolites in Sustainable Chemistry. Chem. 2017, 3, 928−949. (3) Baerlocher, C.; McCusker, L. B. Database of Zeolite Structures. http://www.iza-structure.org/databases/ (access date: 06/13/2018).

2418

DOI: 10.1021/acs.jcim.8b00383 J. Chem. Inf. Model. 2018, 58, 2414−2419

Article

Journal of Chemical Information and Modeling (4) Vermeiren, W.; Gilson, J.-P. Impact of Zeolites on the Petroleum and Petrochemical Industry. Top. Catal. 2009, 52, 1131−1161. (5) Blay, V.; Miguel, P. J.; Corma, A. Theta-1 Zeolite Catalyst for Increasing the Yield of Propene when Cracking Olefins and its Potential Integration with an Olefin Metathesis Unit. Catal. Sci. Technol. 2017, 7, 5847−5859. (6) Blay, V.; Louis, B.; Miravalles, R.; Yokoi, T.; Peccatiello, K. A.; Clough, M.; Yilmaz, B. Engineering Zeolites for Catalytic Cracking to Light Olefins. ACS Catal. 2017, 7, 6542−6566. (7) Verboekend, D.; Perez-Ramirez, J. Design of Hierarchical Zeolite Catalysts by Desilication. Catal. Sci. Technol. 2011, 1, 879−890. (8) Wang, D.; Zhang, L.; Chen, L.; Wu, H.; Wu, P. Postsynthesis of Mesoporous ZSM-5 Zeolite by Piperidine-assisted Desilication and its Superior Catalytic Properties in Hydrocarbon Cracking. J. Mater. Chem. A 2015, 3, 3511−3521. (9) Pérez-Ramírez, J.; Christensen, C. H.; Egeblad, K.; Christensen, C. H.; Groen, J. C. Hierarchical Zeolites: Enhanced Utilisation of Microporous Crystals in Catalysis by Advances in Materials Design. Chem. Soc. Rev. 2008, 37, 2530−2542. (10) Martin, R. L.; Smit, B.; Haranczyk, M. Addressing Challenges of Identifying Geometrically Diverse Sets of Crystalline Porous Materials. J. Chem. Inf. Model. 2012, 52, 308−318. (11) Blazquez-Barbadillo, C.; Aranzamendi, E.; Coya, E.; Lete, E.; Sotomayor, N.; Gonzalez-Diaz, H. Perturbation Theory Model of Reactivity and Enantioselectivity of Palladium-catalyzed Heck-Heck Cascade Reactions. RSC Adv. 2016, 6, 38602−38610. (12) Casanola-Martin, G. M.; Le-Thi-Thu, H.; Perez-Gimenez, F.; Marrero-Ponce, Y.; Merino-Sanjuan, M.; Abad, C.; Gonzalez-Diaz, H. Multi-output Model with Box-Jenkins Operators of Quadratic Indices for Prediction of Malaria and Cancer Inhibitors Targeting UbiquitinProteasome Pathway (UPP) Proteins. Curr. Protein Pept. Sci. 2016, 17, 220−227. (13) Romero-Duran, F. J.; Alonso, N.; Yanez, M.; Caamano, O.; Garcia-Mera, X.; Gonzalez-Diaz, H. Brain-inspired Cheminformatics of Drug-target Brain Interactome, Synthesis, and Assay of TVP1022 Derivatives. Neuropharmacology 2016, 103, 270−278. (14) Kleandrova, V. V.; Luan, F.; Gonzalez-Diaz, H.; Ruso, J. M.; Speck-Planche, A.; Cordeiro, M. N. D. S. Computational Tool for Risk Assessment of Nanomaterials: Novel QSTR-Perturbation Model for Simultaneous Prediction of Ecotoxicity and Cytotoxicity of Uncoated and Coated Nanoparticles under Multiple Experimental Conditions. Environ. Sci. Technol. 2014, 48, 14686−14694. (15) Luan, F.; Kleandrova, V. V.; Gonzalez-Diaz, H.; Ruso, J. M.; Melo, A.; Speck-Planche, A.; Cordeiro, M. N. Computer-aided Nanotoxicology: Assessing Cytotoxicity of Nanoparticles under Diverse Experimental Conditions by Using a Novel QSTRperturbation Approach. Nanoscale 2014, 6, 10623−10630. (16) Alonso, N.; Caamano, O.; Romero-Duran, F. J.; Luan, F.; Cordeiro, M. N. D. S.; Yanez, M.; Gonzalez-Diaz, H.; Garcia-Mera, X. Model for High-Throughput Screening of Multitarget Drugs in Chemical Neurosciences: Synthesis, Assay, and Theoretic Study of Rasagiline Carbamates. ACS Chem. Neurosci. 2013, 4, 1393−1403. (17) Speck-Planche, A.; Dias Soeiro Cordeiro, M. N. Speeding up Early Drug Discovery in Antiviral Research: A Fragment-Based in Silico Approach for the Design of Virtual Anti-Hepatitis C Leads. ACS Comb. Sci. 2017, 19, 501−512. (18) Kleandrova, V. V.; Ruso, J. M.; Speck-Planche, A.; Dias Soeiro Cordeiro, M. N. Enabling the Discovery and Virtual Screening of Potent and Safe Antimicrobial Peptides. Simultaneous Prediction of Antibacterial Activity and Cytotoxicity. ACS Comb. Sci. 2016, 18, 490− 498. (19) Speck-Planche, A.; Cordeiro, M. N. Computer-aided Discovery in Antimicrobial Research: In Silico Model for Virtual Screening of Potent and Safe Anti-pseudomonas Agents. Comb. Chem. High Throughput Screening 2015, 18, 305−314. (20) Speck-Planche, A.; Cordeiro, M. N. Simultaneous Virtual Prediction of Anti-Escherichia Coli Activities and ADMET Profiles: a Chemoinformatic Complementary Approach for High-throughput Screening. ACS Comb. Sci. 2014, 16, 78−84.

(21) Gonzalez-Diaz, H.; Arrasate, S.; Gomez-SanJuan, A.; Sotomayor, N.; Lete, E.; Besada-Porto, L.; Ruso, J. M. General Theory for Multiple Input-Output Perturbations in Complex Molecular Systems. 1. Linear QSPR Electronegativity Models in Physical, Organic, and Medicinal Chemistry. Curr. Top. Med. Chem. 2013, 13, 1713−1741. (22) Martinez-Arzate, S. G.; Tenorio-Borroto, E.; Barbabosa Pliego, A.; Diaz-Albiter, H.; Vazquez-Chagoyan, J. C.; Gonzalez-Diaz, H. PTML Model for Proteome Mining of B-cell Epitopes and TheoreticExperimental Study of Bm86 Protein Sequences from Colima Mexico. J. Proteome Res. 2017, 16, 4093−4103. (23) McCullough, B. D. Microsoft Excel′s ‘Not The Wichmann-Hill′ random number generators. Comput. Stat. Data Anal. 2008, 52, 4587− 4593. (24) Gonzalez-Diaz, H.; Perez-Montoto, L. G.; Ubeira, F. M. Model for Vaccine Design by Prediction of B-epitopes of IEDB given Perturbations in Peptide Sequence, In Vivo Process, Experimental Techniques, and Source or Host organisms. J. Immunol. Res. 2014, 2014, 768515. (25) Hill, T.; Lewicki, P. Statistics: Methods and Applications. A Comprehensive Reference for Science, Industry and Data Mining; StatSoft: Tulsa, 2006; Vol. 1, p 813. (26) Box, G. E. P.; Jenkins, G. M. Time Series Analysis: Forecasting and Control; Holden-Day: San Francisco, CA, 1970; p 575. (27) Garcia, I.; Fall, Y.; Gomez, G.; Gonzalez-Diaz, H. First Computational Chemistry Multi-target Model for Anti-Alzheimer, Anti-parasitic, Anti-fungi, and Anti-bacterial Activity of GSK-3 Inhibitors In Vitro, In Vivo, and In Different Cellular Lines. Mol. Diversity 2011, 15, 561−567. (28) Pratim Roy, P.; Paul, S.; Mitra, I.; Roy, K. On Two Novel Parameters for Validation of Predictive QSAR Models. Molecules 2009, 14, 1660−1701. (29) Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graphics Modell. 2002, 20, 269−76. (30) Simón-Vidal, L.; García-Calvo, O.; Oteo, U.; Arrasate, S.; Lete, E.; Sotomayor, N.; González-Díaz, H. Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies. J. Chem. Inf. Model. 2018, 58 (7), 1384−1396. (31) Golub, G.; Van Loan, C. F. Matrix Computations, Third ed.; The Johns Hopkins University Press: Baltimore, 1996; p 728.

2419

DOI: 10.1021/acs.jcim.8b00383 J. Chem. Inf. Model. 2018, 58, 2414−2419