Disentangling Structural Confusion through Machine Learning

Nov 13, 2017 - Disentangling Structural Confusion through Machine Learning: Structure Prediction and Polymorphism of Equiatomic Ternary Phases ABC. An...
3 downloads 11 Views 2MB Size
Subscriber access provided by READING UNIV

Article

Disentangling Structural Confusion through Machine Learning: Structure Prediction and Polymorphism of Equiatomic Ternary Phases ABC Anton O. Oliynyk, Lawrence A. Adutwum, Brent W. Rudyk, Harshil Pisavadia, Sogol Lotfi, Viktor Hlukhyy, James J Harynuk, Arthur Mar, and Jakoah Brgoch J. Am. Chem. Soc., Just Accepted Manuscript • DOI: 10.1021/jacs.7b08460 • Publication Date (Web): 13 Nov 2017 Downloaded from http://pubs.acs.org on November 13, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of the American Chemical Society is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Disentangling Structural Confusion through Machine Learning: Structure Prediction and Polymorphism of Equiatomic Ternary Phases ABC Anton O. Oliynyk,†,‡ Lawrence A. Adutwum,‡ Brent W. Rudyk,‡ Harshil Pisavadia,‡ Sogol Lotfi,† Viktor Hlukhyy,§ James J. Harynuk,‡,* Arthur Mar,‡,* Jakoah Brgoch†,* † Department of Chemistry, University of Houston, Houston, TX 77204 USA ‡ Department of Chemistry, University of Alberta, Edmonton, AB T6G 2G2 Canada § Department Chemie, Technische Universität München, Garching, 85747, Germany Abstract A method to predict the crystal structure of equiatomic ternary compositions based only on the constituent elements was developed using cluster resolution feature selection (CR-FS) and support vector machine (SVM) classification. The supervised machine-learning model was first trained with 1037 individual compounds that adopt the most populated ternary 1:1:1 structure types (TiNiSi-, ZrNiAl-, PbFCl-, LiGaGe-, YPtAs-, UGeTe-, and LaPtSi-type), and then validated using an additional 519 compounds. The CR-FS algorithm improves class discrimination and indicates that 113 variables including size, electronegativity, number of valence electrons, and position on the periodic table (group number) influence the structure preference. The final model prediction sensitivity, specificity, and accuracy were 97.3%, 93.9%, and 96.9%, respectively, establishing that this method is capable of reliably predicting the crystal structure given only its composition. The power of CR-FS and SVM classification is further demonstrated by segregating the crystal structure of polymorphs, specifically to examine polymorphism in TiNiSi- and ZrNiAl-type structures. Analyzing 19 compositions that are experimentally reported in both structure types, this machine-learning model correctly identifies, with high confidence (>0.7), the low temperature polymorph from its high temperature form. Interestingly, machine learning also reveals that certain compositions cannot be clearly differentiated and lie in a “confused” region (0.3–0.7 confidence), suggesting that both polymorphs may be observed in a single sample at certain experimental conditions. The ensuing synthesis and characterization of TiFeP adopting both TiNiSi- and ZrNiAl-type structures in a single sample, even after long annealing times (3 months), validates the occurrence of the region of structural uncertainty predicted by machine learning.

1 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction Predicting a compound’s crystal structure a priori has long been a goal of chemists. Historically, structure prediction has relied on simple notions based on electron count (octet rule),1 size factors (radius ratio rules),2 or position on the periodic table (Mendeleev number).3 Of course, the formation of a crystal structure is controlled by a combination of many factors. Structure maps attempting to highlight the two or three most important factors dictating structure formation were developed in the past (e.g., Mooser-Pearson,4,5 Phillips-van Vechten,6 Pettifor,7 Zunger,8 Villars9,10) with some moderate predictive ability for limited classes of compounds. Crystal structure prediction is a multidimensional problem because of the range of elemental and physical properties along with synthesis conditions that dictate which crystal structure a given substance will adopt. As the number and diversity of atoms that comprise the compound increase, and the number of possible crystal structures increases, the dimensionality of the problem increases dramatically. Fortunately, new strategies based on informatics and machine learning provide an attractive statistical approach well-suited to tackle this challenge. Chemometrics, which applies informatics and machine-learning techniques to solve multidimensional chemical problems, has been developed largely in analytical chemistry to analyze chromatograms,11 classify gasoline components,12,13 identify biomarkers,14,15 and optimize synthesis conditions.16 Among other recent examples, machine-learning methods have been applied to screen for new materials,17 calculate formation energies,18 assess stability or metastability of compounds,19,20 and predict properties of inorganic materials.21 These new approaches can be extended to solid-state organic chemistry to predict co-crystallization,22 and to solid-state inorganic chemistry to classify crystal structures. For example, principal component analysis (PCA) was used to generate structure maps of spinel nitrides (AB2N4).23,24 A random forest algorithm enabled the prediction of full-Heusler structures and discrimination of Heusler versus inverse Heusler structures, given an arbitrary composition AB2C.25 A support vector machine (SVM) model was applied to classify structures of equiatomic compounds AB based on their composition alone, using a subset of variables obtained by the use of a cluster resolution feature selection (CR-FS) algorithm, leading to the identification of the new compound RhCd, even though most binary phases were believed to have been discovered long ago.26 The great advantage of these statistical methods is that crystal structures can be predicted for any given combination of elements within seconds, in contrast to methods based on ab initio quantum 2 ACS Paragon Plus Environment

Page 2 of 32

Page 3 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

calculations that are time-consuming and computationally demanding.27 Nevertheless, the success of statistical methods hinges on the availability of truly large-scale datasets to develop reliable training models; in this case, having more crystal structure data will generate more reliable structure predictions. Complicating crystal structure prediction further is the phenomenon of polymorphism (the occurrence of different structures with the same composition) or polytypism (a special case of polymorphism involving different stacking sequences of similar layers).28 Polymorphism is particularly important for crystal engineering in pharmaceutical chemistry, where the formation of structural modifications can greatly affect the solubility, stability, dissolution rate, and other properties of a synthesized drug.29,30 The crystallization process in these systems is challenging to understanding and control because it often occurs through intermediates, which can be kinetically stabilized, leading to observation of multiple polymorphs. Further, the lifetime of metastable phases remains unclear, and thus they can be difficult to circumvent.31 Polymorphism in solid-state inorganic chemistry is just as complex. For example, SiC has more than forty known polytypes, which are difficult to distinguish experimentally because of their similar X-ray diffraction patterns.32 DFT has been the principal tool for understanding and predicting preferential polymorph formation. Machine learning has more recently been employed to investigation of the polymorphism problem, although this work has been largely restricted to organic small molecules and proteins.33 It is not clear, however, if machine learning is capable of discriminating between polymorphs of crystalline inorganic solids. Among inorganic solids, metal phosphides are especially prone to polymorphism; in fact, they account for more than 15% of all polymorph examples reported in the literature.34 As such, they present an excellent opportunity to investigate whether machine learning can be applied to predict polymorph formation. Polymorphism is particularly prevalent in binary metal-rich (M) phosphides with the general formula M2P, which tend to crystallize in either the Co2Si- or Fe2Ptype structure, depending on M.35 As illustrated in Figure 1, both structures can be described as two-dimensional homologues based on the parent AlB2-type structure, with the key structural motif being tricapped trigonal prisms having M atoms at the corners and P atoms at the center. The different three-dimensional connectivity of these prisms leads to two structure types: the orthorhombic Co2Si-type structure contains zigzag sheets of corner-sharing trigonal prisms, whereas the hexagonal Fe2P-type structure contains six-membered rings enclosing an isolated 3 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

prism.36 If there are two metal components, yielding ternary phosphides with the general formula MM′P, these ordered derivatives have corresponding structures that are termed the orthorhombic TiNiSi-type37 and the hexagonal ZrNiAl-type, respectively.38 In most ternary MM′P phosphides, the larger metal atoms prefer the corner-shared site of the trigonal prism and the smaller metal atoms prefer the unshared corner site.35 The TiNiSi- and ZrNiAl-type structures are extremely common, not only for phosphides, but also for the overwhelming number of equiatomic ternary phases; remarkably, these two structure types encompass over 69% of all reported equiatomic phases regardless of composition. Adding the next five most common structure types (PbFCl, LiGaGe, YPtAs, UGeTe, LaPtSi) increases the coverage to 91% of all reported equiatomic phases.

Figure 1. Motifs of the binary Co2Si- and Fe2P-type structures and the related ternary TiNiSi- and ZrNiAl-type structures derived from the AlB2-type structure. Given our earlier success in applying machine learning to predict the crystal structures of comparatively smaller groups of binary compounds, we wish to evaluate whether these methods 4 ACS Paragon Plus Environment

Page 4 of 32

Page 5 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

can be extended to the larger group of equiatomic ternary phases with any arbitrary composition. These phases were considered because they comprise one of the largest crystallographic datasets available, allowing a statistically robust analysis of the structure preference. We also wish to develop a protocol for predicting the occurrence of polymorphism. This approach uses a CR-FS algorithm to select the best combination of variables in conjunction with SVM, a supervised approach for machine learning, to generate quantitative probabilities that a given equiatomic ternary phase will be predicted to adopt one of the seven most common structure types. This new machine-learning approach is then employed to examine the occurrence of polymorphism in a series of inorganic compounds. Importantly, we test these predictions by performing synthetic experiments on a ternary phosphide, TiFeP, which exhibits an unusual case of coexisting polymorphs, as confirmed by investigations of their crystal structures, physical properties, and chemical bonding.

Experimental Machine learning structure prediction To develop the machine-learning model, crystallographic data for ternary equiatomic compounds with the general formula ABC were extracted from Pearson’s Crystal Data (2015 edition),34 subject to the following criteria: (i) the components A, B, C can be any elements, excluding hydrogen, noble gases, and heavy elements with Z > 83 (i.e., radioactive elements and actinides), (ii) the crystal structures do not exhibit site mixing, and (iii) the compounds have been confirmed to exist and their structures were experimentally determined (i.e., not hypothetical ones proposed from calculations). Out of ~2800 entries satisfying these conditions, 2562 (91%) were found to belong to seven different structure types. At least 30 representatives are contained within each of these structure types, ensuring statistical reliability. Having a substantial number of representatives for each structure type is essential to allow a statistically robust prediction. If data are too sparse for a given structure type, it is likely that errors will be introduced into the model and the structure preference will not be reliably determined. Removing multiple reports of the same crystal structures reduced the set to 1556 unique compounds for use in the machinelearning model. Because the compositions of these compounds are diverse, an important step is to standardize the assignment of component elements (A, B, C) and their positional coordinates (x, 5 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

y, z) (Table S1 in Supporting Information). In other words, the occupation of sites by the component elements already conveys physical and chemical meaning even without input of specific crystallographic data. The assignment of component elements could be related to the position in the periodic table or to particular electronegativity scales, among various possibilities. We chose to assign these components based on chemical groups in the periodic table, because of their unambiguous definitions, such that A is generally a large electropositive atom (alkali, alkaline-earth, or rare-earth metals; Ti, Zr, Hf), B is typically a d-block element, and C is a pblock metalloid. An assignment based on electronegativities was eschewed because having to select one over several possible scales introduces bias; moreover, elements with quite different chemical properties may have nearly identical electronegativities (e.g., Au, P, S on the Pauling scale).39 Variables describing the elemental properties used for feature selection were compiled from open sources and databases; they had well-defined values or were interpolated in rare occasions such as for the lanthanide series. These variables provide information related to position in the periodic table (e.g., atomic number), electronic structure (e.g., electronegativity, quantum numbers), and physical properties (e.g., heat of atomization), among others. Additional subsets of variables were generated from mathematical formulae involving simple combinations of these elemental properties (e.g., sums, averages, ratios) or their extremes (e.g., largest or smallest values). In total, the complete list comprised 990 individual descriptors for feature selection and machine learning (Table S2 in Supporting Information). The data for the 1556 unique ABC compounds and 990 variables were recorded as entries within a 1556 × 990 matrix. These data were normalized, mean-centered, and scaled to unit variances along the columns (variables) of the matrix. The set of 1556 compounds was split into two parts to ensure unbiased results: two-thirds (1037 compounds) were used for training to build the model, and one-third (519 compounds) were used for external validation. Within the training data set of 1037 compounds, half were used for feature selection and half for model training (optimization). The entire set of 990 variables was evaluated through the CR-FS algorithm40 in a three-dimensional PCA score space (PC1 vs PC2 vs PC3), with each variable ranked by a Fisher ratio (F-ratio).41 The machine-learning algorithm employed SVM classification with a radial basis function and a venetian blind cross-validation with 10-fold data splits to optimize the model.42 A 6 ACS Paragon Plus Environment

Page 6 of 32

Page 7 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

binary classification was applied; e.g., Class 1 vs. all other classes, Class 2 vs. all other classes, and so on. The segregation of structure types was quantified by sensitivity (true response rate), specificity (true negative rate), and accuracy (overall correct rate), as summarized in Table S3 in Supporting Information. Decision barriers for each class were assigned based on Bayesian statistics, and generally represent the lines located between the lowest probability of correctly assigned structure type and the highest probability of incorrectly assigned structure type. Predictions of structure type for unseen samples were made on the remaining third of the compound data set. Finally, predictions were also made on new polymorphic compounds not included in the initial data set. Once the machine-learning model was finalized, all possible combinations of elements satisfying the extraction criteria were determined, yielding 98,769 compounds. The probability for each of these compositions to adopt the seven structure types was then predicted. These results are provided as Supporting Information and can be used to estimate the structure preference for a majority of equiatomic, stoichiometric compounds. Data handling and feature selection were performed with in-house written algorithms in Matlab 2015a (The Mathworks, Natick, MA). SVM models were generated using PLS Toolbox Version 8.0.1 (Eigenvector Research Inc., Wenatchee, WA). All computations were performed on a Windows PC (Intel® CoreTM i7-4790 CPU).

Synthesis of TiFeP To verify predictions made by machine learning, TiFeP was selected as a target compound to synthesize because of the discrepancies in numerous experimental reports43-47 leading to uncertainty whether it should adopt TiNiSi- or ZrNiAl-type structure. The synthesis proceeded by preparing 1.5 g of binary FeP starting material from a stoichiometric mixture of Fe powder (99.9%, Cerac) and P powder obtained by freshly grinding red phosphorus lumps (99.999%, Sigma-Aldrich) to minimize exposure to air and water. The mixture was pressed into a pellet, placed within an evacuated and sealed fused-silica tube, and sintered at 1070 K for two weeks using heating and cooling rates of 20 K/h. Using binary FeP as a starting material avoids problems with directly arc-melting elemental phosphorus, which could lead to formation of white phosphorus. The product was then ground to fine powder and confirmed to be phase-pure FeP by its powder X-ray diffraction pattern, collected on an Inel diffractometer equipped with a 7 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

curved position-sensitive detector (CPS 120) and a Cu Kα1 (λ = 1.540598 Å) radiation source. Next, a stoichiometric mixture of Ti powder (99.98%, Cerac) and FeP powder with total mass of 0.5 g was pressed into a pellet and arc-melted in a Centorr 5TA tri-arc furnace on a water-cooled copper hearth under an argon atmosphere. The sample was flipped and melted again. The weight loss was 69%) of all reported ternary ABC phases; probabilities for adopting the remaining five structure types are shown in the Supporting Information (Figures S1–S5). For each structure type, there is a stark segregation between true positives (data points with high probabilities, lying well above the decision barrier) and true negatives (data points with low

Figure 2. Plots of probabilities of forming the TiNiSi- (upper panel) and ZrNiAl-type structures (lower panel), showing unoptimized segregation using the full-feature model. The dashed line is the decision barrier determined by Bayesian statistics. Symbols: TiNiSi (orange circles), ZrNiAl (blue squares), LiGaGe (green diamonds), PbFCl (orange stars), YPtAs (purple down-pointing triangles), UGeTe (green up-pointing triangles), LaPtSi (orange side-pointing triangles).

11 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 32

probabilities, lying well below the decision barrier). The structure predictions are statistically robust, although there is a slightly wider spread of data points within the validation sets. For the TiNiSi- and ZrNiAl-type structures classified in Figure 2, a small fraction (3% of the training set and 10% of the validation set) of the data points are misclassified. For example, the model predicts that MnCoAs and NbRuP adopt the ZrNiAl-type structure, which is in disagreement with experimental reports of TiNiSi-type structure. Similarly, TiCoGe and MoNiP are predicted to adopt the TiNiSi-type structure, in disagreement with experimental reports of ZrNiAl-type structure. The misclassification of some of the data points likely stems from the attribution of equal weights to the full set of 990 variables, which leads to overfitting and is a common problem in SVM algorithms.42 When the data are over-described, the redundancy in variables introduces additional noise, which can overpower important variables. To reduce the number of variables and to identify only the most important ones for classifying crystal structures, CR-FS was applied to perform an unbiased variable selection.40 The F-ratio, or the ratio of between-group and within-group variability, was calculated to evaluate the potential of each variable to discriminate between classes.41 CR-FS relies on a model quality parameter termed cluster resolution. The cluster resolution is the measure of the separation between samples clustered according to their class assignment in principal component scores space. It is estimated from the product of the maximum size of non-colliding confidence ellipses/ellipsoid generated around these sample clusters. A single output parameter on a scale of 0 to 1, with zero being the worst and 1 being the best model, is obtained and gives information about the overall model quality. The highest-ranked variables according to the F-ratio scores were then subjected to backward elimination, in which the sensitivity, specificity, and accuracy of the structure prediction are compared before and after a variable is removed from the model. Next, the variables were subjected to forward selection, in which variables that were initially lower-scored are included, and the model quality is again evaluated before and after this inclusion. Through a sequential process of backward elimination and forward selection, variables can be eliminated or added, and their influence on the model determined by comparing the cluster resolution before and after the addition/elimination of that variable. CR-FS can, therefore, determine which combination of variables is essential for robust structure prediction.

12 ACS Paragon Plus Environment

Page 13 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Figure 3. Fisher ratio (F-ratio) scores for the full variable set of 990 variables (blue) as well as the 113 variables highlighted by the CR-FS (orange). CR-FS indicated that only 113 variables, or 11% of the original variable set, were required for accurate classification of structure type (orange lines in Figure 3). Removing the variables that were not selected with CR-FS improves the quality of crystal class segregation, because some of the variables appear to be noise without positive impact to the model. Interestingly, some initially high-ranked variables (according to F-ratio scores) did not survive feature selection, whereas some initially low-ranked variables were found to be helpful in discriminating structure types; their relative potential to classify crystal structures is shown in Figure 3. Examination of the CR-FS results lends insight into the factors that influence the crystal structure selection for ABC phases. In broad terms, out of the types of variables considered, size (35%), electronegativity (10%), position of element in the periodic table (12%), and electron 13 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 32

count (10%) were most important, whereas physical properties (33%) were not. Closer inspection reveals whether a set of variables as a whole or an individual variable within this set plays a key role. Among size variables, those based on metallic and Zunger pseudopotential radii, but not covalent, ionic, or single-bond radii, strongly affected the model. In agreement with the previous structure maps of Villars,9 the sum of Zunger radii, which represents an interesting bridge between first-principles and semiclassical approaches, was identified to be significant. Among electronegativity variables, none of the five different scales (Pauling,59 MartynovBatsanov,60 Gordy,61 Mulliken,62 Allred-Rochow63) was particularly emphasized. This observation is consistent with the fact that many of these equiatomic ternary phases, which are composed of metals and metalloids, possess intermediate bonding character so that no one scale dominates in segregating their structure types. Among variables based on the position of element in the periodic table, the Mendeleev number stands out, with emphasis placed on separating nonmetals, metalloids, and metals among p-block elements. The most important electron counting relates to the sum of valence electrons. This is a case in which a particular mathematical expression (sum), as opposed to the set as a whole (sum, difference, average, and others), is stressed. The number of valence electrons, which can belong to different shells, such as 3s and 4d in transition metals, matters, especially for the most electron-rich component in the formula ABC, whereas the number of outer-shell electrons coming from the shell with the highest principal quantum number is irrelevant. Identifying the components with the largest and smallest number of electrons (and the sum and difference of these electron counts) helps to distinguish between compounds containing similar (e.g., all transition metals) and diverse elements (e.g., alkali metal–transition metal–nonmetal). Among variables based on physical properties, only density (which relates to size factors) and electrical conductivity (which distinguishes between metals, metalloids, and nonmetals) are relevant. With this new subset of variables selected by CR-FS, the classification prediction model was reconstructed using SVM. The discrimination of the seven major structure types was dramatically improved, as exemplified by the plots of probability for the TiNiSi- and ZrNiAltype structures (Figure 4); the probabilities for the other five structure types are also shown in Figures S1–S5 in Supporting Information. Correct predictions were now made for 93% of the compounds examined, or an 8% improvement compared to the earlier procedure without CR-FS. The prediction rate of true negatives, particularly for TiNiSi- and ZrNiAl-type structures, 14 ACS Paragon Plus Environment

Page 15 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

improved to 92% and 93%, respectively. Only a few misclassifications remain in the training set and they mainly involve unusual combinations of ABC; for example, compounds containing highly electronegative transition metals (Rh, Pd, Ir, Au) acting in an uncharacteristic role as a component that is normally a metalloid or nonmetal. The complete list of misclassified compounds is available in Table S4 in Supporting Information.

Figure 4. Plots of probabilities of forming the TiNiSi- (upper panel) and ZrNiAl-type structures (lower panel), showing improved segregation using the CR-FS model. The dashed line is the decision barrier determined by Bayesian statistics. Symbols have the same meaning as in Figure 2. Machine learning structure prediction for classification of polymorphs The SVM algorithm combined with the CR-FS feature set is undoubtedly effective in classifying structure types based solely on composition. However, it is not clear if this methodology can be applied for more complicated structure phenomena, such as the formation of polymorphs. Thus, it would be interesting to evaluate probabilities for compounds that are experimentally known to adopt both TiNiSi- and ZrNiAl-type structures. There are 35 equiatomic ternary compositions reported in both structures types, depending on the temperature or pressure of the synthesis, or if the crystals were grown in the presence of a metal flux (Table S5 in Supporting Information). Among these, 19 involved annealing that can be assumed to 15 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 32

approach equilibrium conditions. The probability for each of these compounds to adopt either TiNiSi- or ZrNiAl-type structures was thus calculated using the SVM/CR-FS model (Figure 5). High probabilities (>0.7) of forming one or the other of these structure types are found for 11 of the 19 compounds. For example, the TiNiSi-type structure is correctly predicted for YbMnGe,64 YAlPd,65 TmMnGe,66 LuPbNi,67,68 GdAlPd,65 HfRhGe,69 and TbAlPd,65 whereas a ZrNiAl-type structure is predicted for YbSnPd,70 CeZnPd,71 and TbSnPt.72 These predictions completely agree with the experimental structures adopted at low temperature. The sole exception occurs in GdAlPd, which adopts the TiNiSi-type structure at room temperature but as a ZrNiAl-type structure at a very low temperature of 180 K.

Figure 5. Prediction probability to belong to a certain structure type for 19 experimentally confirmed polymorphs that adopt either TiNiSi- or ZrNiAl-type structure. The region highlighted in gray indicates the compositions that are experimentally reported in both structure types even at near equilibrium conditions. It is worth noting that the limited number of temperature-dependent cases for 1:1:1 polymorphs, and the number of correctly identified low-temperature modifications does not allow us to make conclusions of statistical accuracy. Further, the proposed machine-learning tool cannot substitute DFT calculations for identifying lower energy modifications at a certain temperature range. Instead, these results suggest that structure classification using machine learning is ideal as a recommendation tool to raise awareness if adoption of a particular structure 16 ACS Paragon Plus Environment

Page 17 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

type is known for chemically similar systems. It can also help determine the most probable crystal structures for isostructural series, or predict the structures of novel compounds containing specific elements. This tool can also help interpret crystal chemistry. For example, TiNiSi- and ZrNiAl-type structures are typical for compounds containing three transition metals or p-block metalloid systems, whereas PbClF-type structures usually contains two nonmetal elements. For the remaining 8 of the 19 polymorphs in this set, the probability falls in an ambiguous range (0.3–0.7), as highlighted in the shaded region of Figure 5. Experimental reports reveal that these compounds (ScRhGe,73 MnNiP,74 YSnPt,72 TiPdGe,75 ScRuSi,73,76 ScCuSi,76,77 MnNiAs,74 and TiFeP) are highly unusual in that they each are reported in both structure types under identical synthesis conditions (Table S5 in Supporting Information). Typically these compounds were obtained by arc-melting and annealing at high temperatures over relatively long periods (e.g., 1270 K for 2 weeks for ScRhGe73 and ScRuSi,76 1073 K for a month for ScCuSi76) with Xray diffraction indicating the presence of both structure types regardless of annealing. The case of TiFeP is especially complicated. Early reports indicated that either sintering (at 1070 K for 2 weeks)43 or arc-melting followed without annealing44,45 resulted in the TiNiSi-type polymorph, according to powder X-ray diffraction. Later, an investigation of the Ti–Fe–P phase diagram at 1070 K46 revealed the existence of TiFeP adopting the ZrNiAl-type structure, according to single-crystal X-ray diffraction, and also suggested that the TiNiSi-type structure is only stabilized when there is an Fe deficiency, Ti1+xFe1–xP (x = 0.2). However, this is contradicted by a recent report of fully stoichiometric TiFeP existing in the TiNiSi-type structure (synthesized by sintering at 1673 K), according to powder X-ray diffraction.47 Remarkably, machine learning captures this confusion with the model showing a nearly equal preference for both structure types. The interpretation of these intermediate probabilities is that neither TiNiSi- or ZrNiAl-type structures is strongly favored, and thus both phases have nearly equal chance of existing, as indicated by our machine-learning model. The only exception is MnNiAs, which shows a definite machine-learning prediction preference (probability of 0.85) for forming the ZrNiAl-type structure, even though both structure types are experimentally observed.

Synthesis, structures, and bonding in TiFeP polymorphs

17 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 32

To clarify the conflicting descriptions of TiFeP reported in the literature as well as the uncertainty indicated by machine learning, extensive new synthetic experiments were conducted in which mixtures of FeP and Ti were arc-melted and then annealed at 1070 K for 3 months, an extremely long time that suffices to approach equilibrium conditions. This synthesis was repeated for 16 individual samples under identical conditions. The products consisted of twophase mixtures, with majority TiNiSi-type phase in 5 (30%), majority ZrNiAl-type phase in 7 (44%), and nearly equal amounts of both phases in 4 (26%) of the samples. These results are surprising because normally such a long annealing time should lead to formation of the thermodynamically stable phase at a given temperature. Moreover, when these samples were subjected to further annealing at different temperatures (870 and 1270 K for two weeks), their power X-ray diffraction patterns did not change. The presence of both structure types defies the Gibbs phase rule, which permits the existence of only one thermodynamically stable polymorph. However, compositional differences throughout the sample or kinetic stability may explain the observation of the coexistence of more than one polymorph.

Figure 6. SEM backscattered electron micrographs of (a) TiFeP (TiNiSi-type), (b) TiFeP (ZrNiAl-type), and (c) sample containing both phases. Given that compositional inhomogeneities have been previously reported in this ternary system, as noted above, these samples were first examined by SEM backscattered electron micrographs. There is no difference in contrast in the images of samples containing nearly single-phase TiNiSi-type or ZrNiAl-type and two-phase mixtures of both structure types, implying uniform distribution of the elements (Figure 6). The compositions were determined by semiquantitative EDX analysis, averaged over 10 individual points across the sample, to be Ti1.09(1)Fe1.00(1)P0.94(1) for the near single-phase TiNiSi-type sample and Ti1.03(1)Fe1.00(1)P1.00(1) for 18 ACS Paragon Plus Environment

Page 19 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

the ZrNiAl-type sample, which are both close to the nominally loaded stoichiometries supporting the equiatomic composition. One of the two-phase samples was also analyzed through powder X-ray diffraction, which confirms the presence of both phases even after the long annealing step (Figure 7). Rietveld refinement of the X-ray diffraction pattern reveals the phase fractions to be 47(5)% TiFeP (TiNiSi-type) and 53(5)% TiFeP (ZrNiAl-type) (Figure S6 in Supporting Information). Within standard deviations, the sample contains equal amounts of both phases. A concern that can be raised is that one of the polymorphs could be stabilized by interstitial impurities, such as oxygen atoms, as has been documented for some intermetallic compounds. For example, TbNiAl in the hexagonal ZrNiAl-type structure has been reported to undergo an orthorhombic distortion depending on the presence of hydrogen or deuterium.78 However, a complete reconstructive transformation of the TiFeP polymorphs, depending on the presence of oxygen impurities, is highly improbable because there is simply not enough void space available. The largest voids are ~0.07 Å3 in either structure, at physically unrealistic distances of 280 182 2 2σ(Fo ) No. of variables 19 14 2 2 a 0.0434 0.0089 R(F) for Fo > 2σ(Fo ) Rw(Fo2) b 0.1109 0.0192 Goodness of fit 1.168 1.136 2.934, –1.036 0.347, –0.216 (∆ρ)max, (∆ρ)min (e Å–3) a b

R (F ) = ∑ Fo − Fc Rw

∑F (F ) = [∑ [w(F − F ) ] ∑ w F 2 o

2 o

o

2 2 c

4 o

]

1/ 2

; w −1 =

[σ (F ) + ( Ap ) 2

2 o

2

]

[ (

)

]

+ Bp where p = max Fo2 ,0 + 2 Fc2 3

Table 2. Atomic coordinates for (a) for TiFeP (TiNiSi-type) and (b) TiFeP (ZrNiAl-type) refined from single crystal X-ray diffraction atom Wyckoff position x y z Ueq (Å2) a (a) TiFeP (TiNiSi-type) Ti 4c 0.0253(2) 1/4 0.6769(2) 0.0070(3) 21 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 32

Fe 4c 0.1473(2) 1/4 0.0609(2) 0.0084(3) P 4c 0.2703(3) 1/4 0.3760(3) 0.0074(4) (b) TiFeP (ZrNiAl-type) Ti 3f 0.41574(8) 0 0 0.00646(1) Fe 3g 0.75332(5) 0 1/2 0.00820(9) P1 1a 0 0 0 0.0081(2) P2 2d 1/3 2/3 1/2 0.0061(1) a Ueq is defined as one-third of the trace of the orthogonalized Uij tensor.

Figure 8. Band dispersion (left panels) and density of states curves (right panels) for (a) TiFeP (TiNiSi-type) and (b) TiFeP (ZrNiAltype). DFT calculations provide insight on the origin of the colors of the two polymorphs. Consistent with the fact that these are metal-rich phases, the density of states (DOS) curves show no gap at the Fermi level (Figure 8). However, close inspection of the band dispersion diagrams reveals an interesting difference. For the hexagonal ZrNiAl-type polymorph, there is a band gap of about 1 eV along special directions (A–L–H–A) in reciprocal space, which can be related to the ab-plane in real space, that is, parallel to the waists of the P-centered trigonal prisms. The states undergo little dispersion along these directions, leading to sharp spikes in the DOS. Similar to the criteria for the appearance of color in intermetallic compounds,80 these features permit

22 ACS Paragon Plus Environment

Page 23 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

absorption of radiation with an edge in the visible region, resulting in the gold color observed for the ZrNiAl-type polymorph of TiFeP. The Helmholtz free energy was also calculated using DFT for both polymorphs to estimate the most energetically favorable structure as a function of temperature. The results of these calculations, provided in Figure S9, indicate that the hexagonal modification (ZrNiAl-type) is slightly (0.16 kJ/mol) preferred at 0 K with a stronger preference of 1.73 kJ/mol at 1500 K. Although the hexagonal structure type is the lowest energy structure across the entire temperature range examined, the difference in Helmholtz free energies is very small, supporting the observation of both structure types in a single sample even at near equilibrium conditions. The electronic structures of the two polymorphs of TiFeP were further examined through XPS, to provide additional evidence for the compositions and to probe valence states. From the survey spectra (not shown), the elemental compositions of single-phase samples were found to be Ti1.05(2)Fe0.98(2)P0.98(2) for the TiNiSi-type polymorph and Ti1.02(2)Fe0.96(2)P0.98(2) for the ZrNiAltype polymorph, consistent with the results presented above.

Figure 9. (a) Ti 2p, (b) Fe 2p, and (c) P 2p XPS spectra for TiFeP samples. The vertical grey dashed lines mark the BEs found for the pure elements. High-resolution Ti 2p, Fe 2p, and P 2p XPS spectra were collected on single-phase TiNiSi-type, single-phase ZrNiAl-type, and two-phase mixtures of TiFeP (Figure 9). Each of these 2p spectra consists of a set of two peaks corresponding to 2p3/2 and 2p1/2 spin-orbit-coupled final states fitted in an intensity ratio of 2:1. The spectra are essentially identical for all three 23 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 32

samples, which is not surprising considering the similar local coordination environments in the structures of the two polymorphs. The binding energies (BE) of the Ti 2p, Fe 2p, and P 2p peaks remain unchanged, consistent with the presence of P atoms centered within trigonal prismatic environments of metal atoms in both structures. The resemblance extends to related binary phosphides MP49 and M2P,48 which are also built up of P-centered trigonal prisms and have similar P 2p XPS spectra. The metal spectra and binding energy values are similar to those in the previously reported binary phosphides, strongly suggesting that no undesirable oxidation took place in the samples. The peak widths of the P 2p spectra for the TiFeP samples are narrow, with FWHM of 0.7 eV, similar to those in the binary phosphides MP.49 The absence of peak broadening implies that there is no disorder of the Ti and Fe atoms in any of the TiFeP samples. In both TiFeP polymorphs, the P 2p3/2 BE is 128.8 eV, which is similar to that in other binary phosphides MP49 and less than in elemental phosphorus (129.9 eV), consistent with the presence of anionic phosphorus in the form of P1– species.81 The Ti 2p and Fe 2p spectra of the TiFeP polymorphs show a slight asymmetry of the peaks to higher BE, a feature that is frequently attributed to the occurrence of metal-metal bonding and electronic delocalization, as in the spectra of the related binary phosphides M2P.48 The Ti 2p3/2 and Fe 2p3/2 BEs are similar to those in the elemental metals.

Electrical and magnetic properties of TiFeP polymorphs The electrical resistivities of the two polymorphs of TiFeP (Figure 10) decrease with lower temperature, consistent with the metallic behavior predicted from the electronic band structures. The low-temperature absolute resistivities are similar, about 800–900 µΩ⋅cm for both polymorphs at 2 K. The relative resistivity ratios ρ300K/ρ2K are 2.1 for the TiNiSi-type and 1.4 for the ZrNiAl-type polymorphs. Magnetic properties of the two polymorphs were measured from 2 to 300 K, and reproduced on two separate magnetometers (Figure 11). For the TiNiSi-type polymorph, the temperature dependence of the magnetic susceptibility reveals a transition from paramagnetic to ferromagnetic behavior at 200 K. The linear portion of the inverse magnetic susceptibility (from 200 to 300 K) was fit to the Curie-Weiss law, χ = C / (T – θp), yielding a paramagnetic Weiss temperature θp of 198 K. The isothermal magnetization at 2 K shows a saturation magnetization of 0.25 emu/mol consistent with soft ferromagnetism, with a low coercive field of 2000 Oe. 24 ACS Paragon Plus Environment

Page 25 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

These results disagree with the Pauli paramagnetic behavior recently reported on the orthorhombic modification of TiFeP from 2 to 300 K.47 The discrepancy between these results could be attributed to compositional deviations between the sample reported here and their reported compositions. Slight deviations in the Ti/Fe ratio will likely influence the magnetic response in these phases. For the ZrNiAl-type polymorph, ferromagnetic behavior is also indicated from the temperature-dependent magnetic susceptibility, but with an apparently much higher Curie temperature, above 300 K, and a significantly lower saturation magnetization of 1000 Oe, which can be explained by the presence of elemental iron in the measured sample indicated by X-ray powder diffraction pattern likely coming from crushing the sample (see Figure S10).

Figure 10. Electrical resistivity for TiFeP with ZrNiAltype (blue squares) and TiNiSi-type structures (orange circles).

25 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 32

Figure 11. Magnetic properties of TiFeP polymorphs: temperature-dependent magnetic susceptibility of TiFeP in (a) TiNiSi-type and (c) ZrNiAl-type; Magnetization isotherms of TiFeP in (b) TiNiSi-type and (d) ZrNiAltype. Conclusions A machine-learning approach has been developed to classify and predict the crystal structures of ternary equiatomic compositions ABC based only on the constituent elements. Through an unbiased feature selection algorithm, 113 variables were identified that control structure classification, the most important of these being associated with size, electronegativity, electron count, and position on the periodic table. The final model showed a validated accuracy of 96.9%, sensitivity of 97.3%, and specificity of 93.9%. This model was then used to calculate the probability for 98,769 ternary compounds to adopt the seven most common structure types. These results will allow researchers to identify the most likely crystal structure for nearly any ternary composition. Further, the model classifies and distinguishes compositions that are subject to polymorphism, with the highest probability (>0.7) crystal structure associated with the low temperature and near equilibrium crystal structure. The desire for some compositions to adopt multiple structure types under the same experimental conditions is also revealed by the SVM method indicated by the prediction probabilities falling between 0.3 and 0.7. This surprising result is subsequently supported by the synthesis of TiFeP, which falls in this confused region. TiFeP was confirmed to exist in the TiNiSi- and ZrNiAl-type structures in a single sample even after extremely long (3 months) annealing as confirmed by X-ray (single crystal and powder) diffraction. Observing the existence of both crystal structures under identical reaction conditions 26 ACS Paragon Plus Environment

Page 27 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

even after such a long annealing step leads to the obvious question of whether or not these samples are really at thermal equilibrium, or if the crystal structures are just kinetically trapped. However, the new machine learning technique to classify crystal structures presented here shows that it is a useful tool for also separating polymorphs and that in some compounds it is at least (statistically) possible for these phases to adopt more than one structure type even at identical synthesis conditions.

Supporting Information The supporting information contains X-ray crystallographic files in CIF format for TiFeP (TiNiSi-type) and TiFeP (ZrNiAl-type) structures. This material is available free of charge via the Internet at http://pubs.acs.org or may be obtained from Fachinformationszentrum Karlsruhe, Abt. PROKA, 76344 Eggenstein-Leopoldshafen, Germany (No. CSD-433389, CSD-433390). Data extraction for machine-learning (Table S1) List of descriptors (Table S2) Machine-learning model metrics for classification of seven ternary equiatomic structure types (Table S3) Ambiguous compounds (0.2 probability for competitor structure type) from the machine-learning model that could potentially be polymorphs (Table S4) Experiment summary for 35 polymorphs compared to machine learning predictions (Table S5) Selected interatomic distances in TiFeP (TiNiSi-type) and TiFeP (ZrNiAl-type) (Table S6) Binding energies for eight ternary phosphides from XPS experiment (Table S7) Model for PbClF-, LiGaGe-, YPtAs-, UTeGe-, LaPtSi-type structures and prediction probabilities (Figure S1–S5) Powder diffraction Rietveld refinement of the sample containing TiFeP phases in TiNiSi- and ZrNiAl-type structures, present in one sample (Figure S6) Powder diffraction Rietveld refinement of TiFeP (TiNiSi-type) (Figure S7) Powder diffraction Rietveld refinement of TiFeP (ZrNiAl-type) (Figure S8) Helmholtz free energy calculations of TiFeP (TiNiSi-type) and TiFeP (ZrNiAl-type) (Figure S9) Powder diffraction of the TiFeP (ZrNiAl-type structure) sample powder after magnetic property measurements (Figure S10) 27 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Author Information Corresponding Authors *E-mail: [email protected] (J. Brgoch), [email protected] (A. Mar), [email protected] (J. J. Harynuk)

Notes The authors declare no competing financial interest.

Acknowledgment J. B. thanks the Department of Chemistry and the Division of Research at the University of Houston for providing generous start-up funds as well as the National Science Foundation through No. NSF-CMMI 15-62142. A.M. thanks the Natural Sciences and Engineering Research Council for funding provided through Discovery Grants and the Collaborative Research and Training Experience Program. J.J.H and L.A.A wish to thank the Natural Sciences and Engineering Research Council for funding provided through Discovery, Engage, and I2I grants and Genome Canada / Genome Alberta through their funding to The Metabolomics Innovation Centre which permitted the development of the CR-FS and machine learning algorithms. A. O. O. gratefully acknowledges the Eby Nell McElrath Postdoctoral Fellowship for financial support as well as The Alberta/Technical University of Munich International Graduate School for Hybrid Functional Materials (ATUMS), an innovative joint training initiative of the University of Alberta and the Technische Universität München, for funding support. We thank Clément Genet, Aurélien Boucheron, and Benoît Ruellan for help with creating the database of ABC compounds during their summer internship at University of Alberta. The research presented here used the Maxwell/Opuntia Cluster(s) operated by the University of Houston and the Center for Advanced Computing and Data Systems (CACDS).

References (1) (2)

Abegg, R. Z. Anorg. Allg. Chem. 1904, 39, 330−380. Pauling, L. The Nature of the Chemical Bond and the Structure of Molecules and Crystals; Cornell University Press, 1960. 28 ACS Paragon Plus Environment

Page 28 of 32

Page 29 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

(3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32)

Villars, P.; Brandenburg, K.; Berndt, M.; LeClair, S.; Jackson, A.; Pao, Y.-H.; Igelnik, B.; Oxley, M.; Bakshi, B.; Chen, P.; Iwata, S. J. Alloys Compd. 2001, 317-318, 26−38. Mooser, E.; Pearson, W. B. Acta Crystallogr. 1959, 12, 1015−1022. Phillips, J. C. Helv. Phys. Acta 1985, 58, 209−215. Phillips, J. C.; van Vechten, J. A. Phys. Rev. Lett. 1969, 22, 705−708. Pettifor, D. G. Solid State Commun. 1984, 51, 31−34. Zunger, A. Phys. Rev. B 1980, 22, 5839−5872. Villars, P. J. Less-Common Met. 1983, 92, 215−238. Villars, P.; Cenzual, K.; Daams, J.; Chen, Y.; Iwata, S. J. Alloys Compd. 2004, 367, 167− 175. Sandercock, P. M. L.; Du Pasquier, E. Forensic Sci. Int. 2003, 134, 1−10. Doble, P.; Sandercock, M.; Du Pasquier, E.; Petocz, P.; Roux, C.; Dawson, M. Forensic Sci. Int. 2003, 132, 26−39. Johnson, K. J.; Synovec, R. E. Chemom. Intell. Lab. Syst. 2002, 60, 225−237. Li, X.; Xu, Z.; Lu, X.; Yang, X.; Yin, P.; Kong, H.; Yu, Y.; Xu, G. Anal. Chim. Acta 2009, 633, 257−262. Beckstrom, A. C.; Humston, E. M.; Snyder, L. R.; Synovec, R. E.; Juul, S. E. J. Chromatogr. A 2011, 1218, 1899−1906. Raccuglia, P.; Elbert, K. C.; Adler, P. D. F.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Nature 2016, 533, 73. Meredig, B.; Agrawal, A.; Kirklin, S.; Saal, J. E.; Doak, J. W.; Thompson, A.; Zhang, K.; Choudhary, A.; Wolverton, C. Phys. Rev. B 2014, 89, 094104. Faber, F. A.; Lindmaa, A.; von Lilienfeld, O. A.; Armiento, R. Phys. Rev. Lett. 2016, 117, 135502. Sun, W.; Dacek, S. T.; Ong, S. P.; Hautier, G.; Jain, A.; Richards, W. D.; Gamst, A. C.; Persson, K. A.; Ceder, G. Science Adv. 2016, 2, e1600225. Schmidt, J.; Shi, J.; Borlido, P.; Chen, L.; Botti, S.; Marques, M. A. L. Chem. Mater. 2017, 29, 5090−5103. Ward, L.; Agrawal, A.; Choudhary, A.; Wolverton, C. npj Comput. Mater. 2016, 16028. Wicker, J. G. P.; Crowley, L. M.; Robshaw, O.; Little, E. J.; Stokes, S. P.; Cooper, R. I.; Lawrence, S. E. CrystEngComm 2017, 19, 5336-5340. Pettersson, F.; Sun, C.; Saxén, H.; Rajan, K.; Chakraborti, N. Mater. Manuf. Processes 2009, 24, 2−9. Srinivasan, S.; Rajan, K. Materials 2013, 6, 279−290. Oliynyk, A. O.; Antono, E.; Sparks, T. D.; Ghadbeigi, L.; Gaultois, M. W.; Meredig, B.; Mar, A. Chem. Mater. 2016, 28, 7324−7331. Oliynyk, A. O.; Adutwum, L. A.; Harynuk, J. J.; Mar, A. Chem. Mater. 2016, 28, 6672− 6681. Friesner, R. A. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6648−6653. Zvyagin, B. B. Comput. Math. Appl. 1988, 16, 569−591. Lee, E. H. Asian J. Pharmacol. 2014, 9, 163−175. Huang, L.-F.; Tong, W.-Q. Adv. Drug Deliv. Rev. 2004, 56, 321−334. Sun, W.; Ceder, G. CrystEngComm 2017, 19, 4576−4585. Cheung, R. Silicon Carbide Microelectromechanical Systems for Harsh Environments; Imperial College Press, 2006. 29 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(33) (34) (35) (36) (37) (38) (39) (40) (41) (42) (43) (44) (45) (46) (47) (48) (49) (50) (51) (52) (53) (54) (55) (56) (57) (58) (59) (60) (61) (62) (63) (64) (65) (66)

Takaya, D.; Sato, T.; Yuki, H.; Sasaki, S.; Tanaka, A.; Yokoyama, S.; Honma, T. J. Chem. Inf. Model. 2013, 53, 704−716. Pearson's Crystal Data: Crystal Structure Database for Inorganic Compounds (on DVD), Release 2015/16 ASM International®; Materials Park, OH, USA Lomnytska, Y. F.; Kuz'ma, Y. B. J. Alloys Compd. 1998, 269, 133−137. Miller, G. J.; Cheng, J. Inorg. Chem. 1995, 34, 2962−2968. Landrum, G. A.; Hoffmann, R. Inorg. Chem. 1998, 37, 5754−5763. Havela, L.; Diviš, M.; Sechovský, V.; Andreev, A. V.; Honda, F.; Oomi, G.; Méresse, Y.; Heathman, S. J. Alloys Compd. 2001, 322, 7−13. Housecroft, C. E.; Sharpe, A. G. Inorganic Chemistry; Pearson: New York, NY, 2012. Sinkov, N. A.; Johnston, B. M.; Sandercock, P. M. L.; Harynuk, J. J. Anal. Chim. Acta 2011, 697, 8−15. Johnson, K. J. S., R. E. Chemom. Intell. Lab. Syst. 2002, 60, 225–237. Boser, B. E.; Guyon, I. M.; Vapnik, V. N. Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory 1992, 144−152. Vogel, R.; Giessen, B. Arch. Eisenhüttenw. 1959, 30, 565. Rundqvist, S.; Nawapong, P. C. Acta Chem. Scand. 1966, 20, 2250−2254. Roy-Montreuil, J.; Deyris, B.; Michel, A.; Rouault, A.; l'Heritier, P.; Nylund, A.; Senateur, J. P.; Fruchart, R. Mater. Res. Bull. 1972, 7 813−826. Toma, O.; Dzevenko, M. D.; Oliynyk, A. O.; Lomnytska, Y. F. Cent. Eur. J. Chem. 2013, 11, 1518−1526. Suzuki, N.; Asahi, R.; Kishida, Y.; Masuoka, Y.; Sugiyama, J. Mater. Res. Express 2017, 4, 046505. Blanchard, P. E. R.; Grosvenor, A. P.; Cavell, R. G.; Mar, A. Chem. Mater. 2008, 20, 7081−7088. Grosvenor, A. P.; Cavell, R. G.; Mar, A. J. Solid State Chem. 2007, 180, 2702−2712. SHELXTL; Bruker AXS Inc.: Bruker AXS Inc.: Madison, WI, 2001. Gelato, L. M.; Parthé, E. J. Appl. Crystallogr. 1987, 20, 139−143. Kresse, G.; Joubert, D. Phys. Rev. B 1999, 59, 1758−1775. Blöchl, P. E. Physical Review B: Condensed Matter 1994, 50, 17953-17953. Perdew, J. P.; Burke, K.; Ernzerhof, M. Physical Review Letters 1996, 77, 3865-3868. Togo, A.; Oba, F.; Tanaka, I. Phys. Rev. B: Condens. Matter 2008, 78, 134106-134106. Parlinski, K.; Li, Z. Q.; Kawazoe, Y. Phys. Rev. Lett. 1997, 4063, 4063-4066. CasaXPS; Casa Software, Ltd.: Casa Software, Ltd.: Teighnmouth, Devon, U.K., 2003. Bain, G. A.; Berry, J. F. J. Chem. Educ. 2008, 85, 532−536. Pauling, L. J. Am. Chem. Soc. 1932, 54, 3570−3582. Martynov, A. I.; Batsanov, S. S. Zh. Neorg. Khim. 1980, 5, 3171−3175. Ghosh, D. C.; Chakraborty, T. J. Mol. Struct.: THEOCHEM 2009, 906, 87−93. Mulliken, R. S. J. Chem. Phys. 1934, 2, 782−784. Allred, A. L.; Rochow, E. G. J. Inorg. Nucl. Chem. 1958, 5, 264−268. Fornasini, M. L.; Merlo, F.; Palenzona, A.; Pani, M. J. Alloys Compd. 2002, 335, 120− 125. Hulliger, F. J. Alloys Compd. 1995, 218, 44−46. Klosek, V.; Vernière, A.; Ouladdiaf, B.; Malaman, B. J. Magn. Magn. Mater. 2003, 256, 69−92. 30 ACS Paragon Plus Environment

Page 30 of 32

Page 31 of 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

(67) (68) (69) (70) (71) (72) (73) (74) (75) (76) (77) (78) (79) (80) (81)

Gulay, L. D.; Kalychak, Y. M.; Wolcyrz, M.; Lukaszewicz, K. J. Alloys Compd. 2000, 313, 42−46. Gulay, L. D. J. Alloys Compd. 2008, 459, L23−L25. Zhong, W. X.; Chevalier, B.; Etourneau, J. R.; Hagenmuller, P. Mater. Res. Bull. 1987, 22, 331−336. Kussmann, D.; Pöttgen, R.; Künnen, B.; Kotzyba, G.; Müllmann, R.; Mosel, B. D. Z. Kristallogr. 1998, 213, 356−363. Hermes, W.; Mishra, R.; Müller, H.; Johrendt, D.; Pöttgen, R. Z. Anorg. Allg. Chem. 2009, 635, 660−666. Riecken, J. F.; Heymann, G.; Huppertz, H.; Pöttgen, R. Z. Anorg. Allg. Chem. 2007, 633, 869−872. Hovestreydt, E.; Engel, N.; Klepp, K.; Chabot, B.; Parthé, E. J. Less-Common Met. 1982, 85, 247−274. Nylund, M. A.; Roger, A.; Sénateur, J. P.; Fruchart, R. Monatsh. Chem. 1971, 102, 1631− 1642. Demchyna, R. O.; Prots, Y. M.; Schwarz, U.; Grin, Y. Z. Z. Anorg. Allg. Chem. 2006, 632, 2152. Harmening, T.; Eckert, H.; Fehse, C. M.; Sebastian, C. P.; Pöttgen, R. J. Solid State Chem. 2011, 184, 3303−3309. Kotur, B. Y.; Gladyshevskii, E. I.; Sikirica, M. J. Less-Common Met. 1981, 81, 71−78. Yartys, V. A.; Gingl, F.; Yvon, K.; Akselrud, L. G.; Kolomietz, A. V.; Havela, L.; Vogt, T.; Harris, I. R.; Hauback, B. C. J. Alloys Compd. 1998, 279, L4-L7. Diamanti, M. V.; Del Curto, B.; Pedeferri, M. Color Res. Appl. 2007, 33, 221−228. van der Lingen, E. J. S. Afr. Inst. Min. Metall. 2014, 114, 137−144. NIST Standard Reference Database 20; National Institute of Standards and Technology: National Institute of Standards and Technology: Gaithersburg, MD, 2012.

31 ACS Paragon Plus Environment

Journal of the American Chemical Society

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC Figure

32 ACS Paragon Plus Environment

Page 32 of 32