Discovery of Intermetallic Compounds from Traditional to Machine

7 days ago - Biography. Anton O. Oliynyk (born in 1989 in Priozersk, Kazakhstan) received his B.Sc. and M.Sc. degrees at Ivan Franko National Universi...
2 downloads 13 Views 5MB Size
Article Cite This: Acc. Chem. Res. XXXX, XXX, XXX−XXX

pubs.acs.org/accounts

Discovery of Intermetallic Compounds from Traditional to MachineLearning Approaches Published as part of the Accounts of Chemical Research special issue “Advancing Chemistry through Intermetallic Compounds”. Anton O. Oliynyk and Arthur Mar* Department of Chemistry, University of Alberta, Edmonton, AB T6G 2G2, Canada CONSPECTUS: Intermetallic compounds are bestowed by diverse compositions, complex structures, and useful properties for many materials applications. How metallic elements react to form these compounds and what structures they adopt remain challenging questions that defy predictability. Traditional approaches offer some rational strategies to prepare specific classes of intermetallics, such as targeting members within a modular homologous series, manipulating building blocks to assemble new structures, and filling interstitial sites to create stuffed variants. Because these strategies rely on precedent, they cannot foresee surprising results, by definition. Exploratory synthesis, whether through systematic phase diagram investigations or serendipity, is still essential for expanding our knowledge base. Eventually, the relationships may become too complex for the pattern recognition skills to be reliably or practically performed by humans. Complementing these traditional approaches, new machine-learning approaches may be a viable alternative for materials discovery, not only among intermetallics but also more generally to other chemical compounds. In this Account, we survey our own efforts to discover new intermetallic compounds, encompassing gallides, germanides, phosphides, arsenides, and others. We apply various machine-learning methods (such as support vector machine and random forest algorithms) to confront two significant questions in solid state chemistry. First, what crystal structures are adopted by a compound given an arbitrary composition? Initial efforts have focused on binary equiatomic phases AB, ternary equiatomic phases ABC, and full Heusler phases AB2C. Our analysis emphasizes the use of real experimental data and places special value on confirming predictions through experiment. Chemical descriptors are carefully chosen through a rigorous procedure called cluster resolution feature selection. Predictions for crystal structures are quantified by evaluating probabilities. Major results include the discovery of RhCd, the first new binary AB compound to be found in over 15 years, with a CsCl-type structure; the connection between “ambiguous” prediction probabilities and the phenomenon of polymorphism, as illustrated in the case of TiFeP (with TiNiSi- and ZrNiAl-type structures); and the preparation of new predicted Heusler phases MRu2Ga and RuM2Ga (M = first-row transition metal) that are not obvious candidates. Second, how can the search for materials with desired properties be accelerated? One particular application of strong current interest is thermoelectric materials, which present a particular challenge because their optimum performance depends on achieving a balance of many interrelated physical properties. Making use of a recommendation engine developed by Citrine Informatics, we have identified new candidates for thermoelectric materials, including previously unknown compounds (e.g., TiRu2Ga with Heusler structure; Mn(Ru0.4Ge0.6) with CsCl-type structure) and previously reported compounds but counterintuitive candidates (e.g., Gd12Co5Bi). An important lesson in these investigations is that the machine-learning models are only as good as the experimental data used to develop them. Thus, experimental work will continue to be necessary to improve the predictions made by machine learning.

1. INTRODUCTION

which the metal composition is high and metal−metal bonding dominates (e.g., Ni2P).2 Despite the long history of intermetallics, dating back to the use of brass coins (Cu−Zn phases) in early civilization and extending to today’s technological applications (e.g., Nd2Fe14B magnets, NbTi superconductors, Bi2Te3 thermoelectrics),3 chemists have hesitated to study them because it is not easy to predict what structure, let alone what composition, will be

Metals constitute the majority of elements, and they combine to form innumerable alloys and intermetallic compounds (or intermetallics). Strictly defined, intermetallics connote compounds having fixed compositions and adopting structures distinct from the component elements. A simple example is Na3Hg2, which crystallizes in a structure different from elemental Na or Hg, and contains a square Hg4 cluster with aromatic bonding character.1 The definition is often relaxed to include combinations with metalloids (e.g., germanides, arsenides, antimonides, tellurides) or metal-rich phases in © XXXX American Chemical Society

Received: October 1, 2017

A

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX

Article

Accounts of Chemical Research

Figure 1. Two homologous series of ternary rare-earth arsenides derived by inserting [REAs] slabs (yellow), with formulas of hypothetical members shown in parentheses. Adapted with permission from ref 14. Copyright 2015 Royal Society of Chemistry.

preferred electron count, solid solutions can often be formed that stray slightly from the ideal count. For example, although the BaAl4-type structure generally requires a valence electron count (vec) of 14, electron-poor phases SrAuxIn4−x (x = 0.5− 1.2) and SrAuxSn4−x (x = 1.3−2.2) can be prepared whose vec is lowered to ∼11.10,11 On the other hand, substitution within Zintl phases requires strict adherence to a charge-balanced formula. For example, starting from the host structure of ternary rare-earth arsenides RE4Zn1.5As5,12 aliovalent substitutions lead to the formation of another series, RE4CuMnAs5.9 Intermetallic structures are typically dense, but in some cases, there may be interstitial sites present that can be filled by atoms of appropriate size and coordination preference. The structure of A2Zn2As3 (A = Sr, Eu) contains A cations between [Zn2As3] layers, with tetrahedral interstitial sites between layers.13 The quaternary arsenide A2Ag2ZnAs3 is a stuffed derivative of A2Zn2As3 in which Ag and Zn atoms become disordered within the layers while simultaneously occupying the interstitial sites.13 Paying attention to features such as motifs or “building blocks” leads to the development of phase homologies to target members of a series of closely related structures. More complicated structures can be envisioned as intergrowths of fragments belonging to simpler structure types. These fragments could be in the form of two-dimensional slabs, such as the alternation of [REAs] and [M2As2] layers of varying thicknesses in the two homologous series of arsenides REM 2−x As 2 ·n(REAs) and RE 2−y M 4 As 4 ·n(REAs) (Figure 1).12,14,15 Metal-rich pnictides often have structures that can be described in terms of increasingly larger triangular assemblies of pnicogen-centered trigonal prisms.16−19 For many rare-earth transition-metal germanides RE−M−Ge, the structures can be related by focusing on how infinite MGe ladders (double zigzag chains built of alternating M and Ge atoms propagating in one direction) are connected by Gen bridges (perpendicular to the ladders) (Figure 2a).20 If n is odd, the MGe ladders are related by reflection and the structure is orthorhombic; if n is even, they are related by 2-fold rotation and the structure is monoclinic. The relationship between composition and structure evolves along two directions within the phase diagram (Figure 2b). Proceeding toward RE-rich compositions (left and down) from CeAl2Ga2-type to YbFeGetype (equiatomic point) results in separation of MGe sheets into MGe ladders. Proceeding toward M-poor compositions (down) from YbFeGe-type to Sc3NiSi3-type corresponds to longer Gen bridges separating these ladders. By exploiting this relationship, we prepared Dy3RuGe3 (Sc3NiSi3-type) which contains MGe ladders connected by Ge4 bridges.9

adopted given an arbitrary combination of metals. For limited classes of intermetallics, such as Zintl phases, the compositions and structures can be rationalized by electron-counting rules,4 but for intermetallics with smaller differences in electronegativity, the rules for bonding remain poorly understood.5 Moreover, the design of new intermetallics is challenging for several reasons. First, the notion of a functional group, so beloved in molecular chemistry, in which a small fragment of the compound can be selectively manipulated, does not apply in structures that lack molecular entities. Second, the selection of synthetic conditions is not easily deduced and must be established empirically; until we know more about mechanisms, the reactions are assumed to proceed by self-assembly of atoms thermodynamically driven by minimization of total energy. Third, and inherent to intermetallics, the problems of site disorder, as well as solid solubility or site deficiency, can complicate matters. In this Account, we present an overview of how we have discovered new intermetallics. In traditional approaches, we make use of chemical concepts, building on existing knowledge and relying on pattern recognition skills. However, it can be difficult to foresee complex relationships between the component elements and the resulting structures and properties. To overcome these limitations, we take guidance from machine-learning approaches, which are faster and unbiased compared to traditional approaches. Both approaches are valuable only if experiments are performed to validate predictions.

2. TRADITIONAL APPROACHES For classes of intermetallics that are well-defined and studied, specific strategies for modification can be proposed based on chemical behavior, coordination preferences, and structural peculiarities. When the electronic band structure does not feature a prominent gap or pseudogap, there is usually some flexibility for simple elemental substitution, subject to constraints in size and electron count, resulting in isostructural compounds. For example, we have prepared many germanides that are straightforward substitutional derivatives with various rare-earth (RE) and transition metals (M). The germanides RE3M2Ge3 (M = Ru, Ir) were targeted as extensions of earlier known series of silicides RE3M2Si3 (M = Fe, Co, Ni, Rh, Pd) and germanides RE3M2Ge3 (M = Fe, Co, Ni) with the Hf3Ni2Si3-type structure.6 The versatile series of silicides and germanides RE4M2XTt4 (M = transition metal; X = p-block metalloid; Tt, tetrel, or group-14 element = Si, Ge) was derived from the Ho4Ni2InGe4-type structure.7−9 Even when there is a B

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX

Article

Accounts of Chemical Research

Figure 3. Isothermal section of Ce−Mn−Ge phase diagram at 1070 K. Adapted with permission from ref 22. Copyright 2015 Elsevier.

3. MACHINE-LEARNING APPROACHES Most scientific work builds on existing literature to yield datadriven discoveries; it is hard to find instances where a novel idea is proposed solely from intuition. However, because extracting knowledge from the literature is performed by human beings, this process may be biased, incomplete, and time-consuming. As the body of data grows larger and more unwieldy, perhaps a fresh approach is needed. Machine learning is a promising tool developing at a vibrant pace to tackle challenges in materials discovery. High-throughput methods have been applied to predict the existence of various compounds (e.g., M2AX ceramics, oxides, perovskites, AB2 phases, zeolites, spinels, alloys, half-Heuslers)23−30 and to recommend new materials (e.g., thermoelectrics, photovoltaics, battery materials).31−34 Under the premise that parameters defined by particular combinations of elements are related to the resulting structures and properties, algorithms then unravel the hidden patterns found within a large data set. In some cases, experimental data are abundant, such as crystal structures or certain physical properties, but in other cases, they are sparse and then high-throughput first-principles calculations are invaluable. In our own work, we have favored the analysis of experimental data and the validation of predictions through experiments.

Figure 2. Common structure types for rare-earth transition-metal germanides RE−M−Ge in terms of (a) Gen bridges connecting MGe ladders and (b) location within generalized phase diagram.

A limitation of these rational strategies is that they rely on precedent; by definition, they cannot anticipate surprising results that nature has in store. There is no substitute for exploratory synthesis, when there is no precedent or when simple extrapolations are not possible. For example, given the existence of RE2CoGe2, an analogous phase RE2MnGe2 was targeted but instead a phase with slightly different composition RE2.1MnGe2.2 was obtained with a new structure type.21 A pragmatic, but perhaps underutilized, approach to discovering new intermetallics is the systematic study of phase diagrams. Although sometimes pooh−poohed as old-fashioned and unsophisticated, these investigations thoroughly establish the existence of all thermodynamically stable compounds under specified conditions; if performed carefully, no compounds will be missed except possibly for metastable ones. For example, through metallographic and X-ray analyses, investigation of the Ce−Mn−Ge system at 1070 K revealed two new phases, Ce3Mn2Ge3 and Ce43Mn18Ge39 with close compositions, and confirmed three previous ones, CeMn2Ge2, Ce2MnGe6, and CeMnGe (Figure 3).22 Extension to the quaternary Ce−Mn− In−Ge system also led to two new quaternary phases, Ce4Mn2InGe4 and Ce2MnInGe2.22

4. CLASSIFYING CRYSTAL STRUCTURES A quintessential problem is how to predict what compound will form, if any, for an arbitrary combination of elements, and what structure it will adopt. Binary equiatomic compounds AB encompass many bonding types and structures, including molecular species (e.g., CO) and ionic solids (e.g., CsF), but most are intermetallic solids. Early attempts to systematize structures of AB compounds began by invoking few chemical descriptors, exemplified by radius ratio rules to govern the structures of alkali-metal halides.35 By introducing more descriptors (e.g., electronegativity, number of valence electrons, and Mendeleev number, which is an ordering number emphasizing similar chemical behavior), structure maps were generated that can segregate a broad variety of AB compounds.36 For ternary compounds, such as equiatomic C

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX

Article

Accounts of Chemical Research

atomic properties that have well-defined values, and encompass electronegativity, radii, and properties related to the position in the periodic table, as well as mathematical expressions involving these descriptors (some of which are highlighted in Figure 4b). An important step for successful application of machinelearning methods is feature selection, in which descriptors are evaluated rigorously through a cluster resolution feature selection (CR-FS).39 Each descriptor is examined in terms of its effect on improving how well it separates structure types within clusters in a multidimensional principal component analysis (PCA) score space. The data are split into a training set, to select the best descriptors and to build the machine-learning model, and a validation set, to test the model on unseen data. Among various algorithms available for supervised classification and pattern recognition, we applied partial least-squares discriminant analysis (PLS-DA), a routine chemometric method, and, for the first time to a crystallographic problem, support vector machine (SVM) methods.40 SVM is well-suited for this classification problem of structure types, by constructing hyperplanes from boundary data points (support vectors) to separate classes of data which can be related in complex nonlinear ways. We found that the SVM model, in conjunction with CR-FS, gave excellent performance for classifying the crystal structures of AB compounds, with a sensitivity of 94%, specificity of 93%, and accuracy of 93% for the validation set data. Moreover, we quantified probabilities for a given AB compound to adopt a specified structure, such as CsCl-type (Figure 4c). Among the ∼3000 combinations of elements, 21 have not been reported to form stable AB compounds or have not been investigated through phase diagrams. For example, the unknown compound RhCd is predicted to adopt a CsCl-type structure with high probability (92%) but other structures with low probability (∼0%). We confirmed this prediction by preparing RhCd, the first new binary AB compound to be discovered in over 15 years, through reaction of the elements at 800 °C and characterizing it through X-ray diffraction.38 There is room for improvement in this model; in particular, structures with 100 000 possible combinations of elements. However, experiment has lagged far behind; to date, only ∼2800 reports of compounds confirmed to exist at ambient conditions have appeared.37 When redundancies from multiple reports are eliminated, 1556 unique compounds are found that adopt one of seven common structure types, containing at least 30 representatives in each set. It may appear that the smaller number of compounds renders the problem more tractable, but the absence of knowledge about the remaining combinations (do such compounds even exist?) makes it unclear if machinelearning methods will work. Extracting data from the literature, including ternary phase diagrams, is more difficult and timeconsuming than for binary phases. As before, descriptors were chosen based on atomic properties and evaluated through CRFS. Besides improving model performance, the CR-FS procedure yields insight into important factors influencing crystal structures. For example, it confirms that electronegativity and group number are relevant in distinguishing between PbFCl-type (rich in nonmetal components) and

Figure 4. (a) Structure types adopted by binary AB phases. (b) Fisher ratio scores for variables selected in the CR-FS procedure. (c) Predicted probability for CsCl-type structures using a machinelearning model based on SVM. Adapted with permission from ref 38. Copyright 2016 American Chemical Society. D

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX

Article

Accounts of Chemical Research TiNiSi-type compounds (rich in metal components).41 Using an SVM model and descriptors optimized through CR-FS, we obtained excellent discrimination of these structure types, with a sensitivity of 97%, specificity of 94%, and accuracy of 97% for the validation set data.41 For example, the probabilities for forming TiNiSi- and ZrNiAl-type structures show stark delineations (Figure 5a).

polymorphic compound and what do such probabilities mean? As a first step, 19 reports of ABC phases were identified that adopt TiNiSi- and ZrNiAl-type structures under conditions involving long annealing times. The probabilities for adopting these structures were evaluated using the SVM model above (Figure 5b). For 11 compounds, there is a high probability (>0.7) for forming either TiNiSi- or ZrNiAl-type structures. These predictions agree well (accuracy of 90%) with the polymorph observed experimentally to form at lower temperatures. This result is interesting because it suggests that machine-learning tools complement predictions based on DFT total energies, which can be difficult to calculate for structures that are large, have low symmetry, or contain highly correlated electrons. For the remaining 8 compounds, the probabilities are intermediate (0.3−0.7) (as highlighted in the shaded region of Figure 5b). At first glance, such ambiguous probabilities appear meaningless and perhaps suggest a deficiency in the model. However, these compounds correspond to reports in which similar experimental conditions led to formation of the two polymorphs, of which one is presumably the thermodynamically stable phase and the other is a metastable phase. This is a stunning interpretation: The model is able to capture nuances associated with polymorph formation that would otherwise be evaded even by first-principles calculations. The power of this model is further illustrated by the case of TiFeP, which has the most ambiguous probability (60% for adopting TiNiSi- and 40% for adopting ZrNiAl-type structure) and for which conflicting reports have appeared. Our reinvestigation indicates that two-phase mixtures of TiFeP are difficult to avoid, even after long annealing times, probably because of suppressed transformation. 4.3. Full-Heusler Phases AB2C

Another large class of intermetallics have the formula AB2C, known as full-Heusler compounds, in which typical components are a large electropositive metal for A, a transition metal for B, and a p-block metalloid for C.42 Their crystal structures are simple, but there are three commonly observed variations that are difficult to distinguish experimentally by X-ray diffraction: (i) disordered CsCl-type structures AB2C in which A and C are randomly arranged over a cubic sublattice, (ii) normal Heusler structures AB2C in which A and C are ordered over a cubic sublattice, and (iii) inverse Heusler structures A2BC in which two sets of ordered cubic sublattices are found. Out of 1948 compounds with the formula AB2C reported to be stable at ambient conditions, the most common are Heusler (341) followed by NaFeO2-type compounds (255). Similar descriptors as before were used based on atomic properties.43 Instead of SVM, here we applied a random forest algorithm, a method of classification based on many decision trees, each of which is trained to learn complicated patterns that reflect the diverse compositions of these compounds.44 A decision tree is first built on a training set of variables to describe each compound and then applied to make predictions about a new compound. The method has many advantages. It is fast, works well with large databases, and can handle thousands of variables used to describe noisy data. It requires no separate feature selection step and no separate cross-validation test, because the algorithm is already self-validated. It can deal with missing data; for example, if the electrical conductivity for oxygen is missing, there is no need to exclude this property for other substances, or to remove all compounds containing oxygen. The model developed from this algorithm is fast (over

Figure 5. (a) Predicted probability for TiNiSi-type (upper panel) and ZrNiAl-type structures (lower panel) structures based on an SVM model, with dashed line marking decision barrier. (b) Predicted probability for 19 compounds adopting both TiNiSi- and ZrNiAl-type structures. Adapted with permission from ref 41. Copyright 2017 American Chemical Society.

Polymorphism, in which more than one structure forms under different conditions, is a common phenomenon for many solids, including intermetallics. For example, among equiatomic phases ABC, polymorphism between the orthorhombic TiNiSitype or the hexagonal ZrNiAl-type structures is prevalent and especially common for metal-rich phosphides MM′P.37 For either humans or machines, it is not obvious how to predict polymorphism. If the methods used above are extended to this problem, what will be the probabilities for the structure of a E

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX

Article

Accounts of Chemical Research 400,000 candidates evaluated in 45 min, or 50%), were prepared and confirmed to be Heusler compounds, whereas the series LaM2Ga, which have low probabilities, could not be prepared. The model also helps to flag potentially inaccurate entries in databases. For example, LiAg2Al was reported as an inverse Heusler compound45 but has an 85% probability of being a Heusler compound, according to the machine-learning model. In fact, for many Li-containing compounds, the powder XRD patterns for Heusler vs inverse Heusler structures are nearly indistinguishable. Thus, machine learning also plays an important function in data sanitizing. F

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX

Article

Accounts of Chemical Research predictions on new candidates through experiments.54,55 We have made use of a thermoelectrics recommendation engine developed by Citrine Informatics based on experimental data to identify unconventional thermoelectrics, including counterintuitive candidates.49 Data for four properties (Seebeck coefficient, thermal conductivity, electrical conductivity, and band gap) were manually compiled for bulk compounds over various temperature ranges, descriptors were constructed based on atomic properties (such as radii, electronegativity, and number of electrons), and a random forest algorithm was used to build the model. The greatest advantage of this model is that it makes predictions solely from chemical composition (and not crystal structurethe very thing that is unknown for compounds yet to be discovered!) to compute properties. Other advantages include lower computational costs, much faster predictions, and consideration of structures that may be prohibitively difficult for first-principles calculations (e.g., very large unit cells, low symmetry, highly correlated electron systems). In this way, we hope to break out of our intellectual rut of investigating known compounds and only making incremental improvements. 5.1. Heusler Compounds

Although metallic behavior is commonly avoided in thermoelectric materials, some special classes of intermetallics may possess band gaps, such as Zintl phases and Heusler compounds. Reassuringly, the thermoelectrics recommendation engine listed Heusler (and half-Heusler) compounds as among the best candidates based on transport properties. In addition to known materials (e.g., Ti−Ni−Sn phases),56 these candidates included previously reported compounds for which no property measurements had ever been made and heretofore unknown compounds. For example, several gallides and indides were suggested to exhibit low thermal conductivity, including three (TiRu2Ga, TiRu2In, MnRu2In) that were unreported but feasible substitutional candidates given that closely related series are known.20 Subsequently, one of these, TiRu2Ga, was successfully prepared. Note that TiRu2Ga belongs to the MRu2Ga series anticipated to be Heusler compounds, as discussed earlier, illustrating the power of combining machine-learning predictions of both structure and properties. Preliminary measurements indicate that TiRu2Ga exhibits low thermal conductivity, comparable to the lowest values observed in other full Heusler compounds.

Figure 8. Probabilities for low thermal conductivity (κ < 10 W m−1 K−1) mapped onto phase diagrams of (a) Mn−Ru−Ge and (b) Dy− Ru−Ge systems. Adapted with permission from ref 20. Copyright 2016 Elsevier.

combining the pattern recognition skills perfected by experimentalists, as expressed in the traditional approaches involving chemical concepts, with the guidance provided by machine-learning models, the discovery process can be accelerated further. For example, the structural trends in ternary rare-earth germanides RE−M−Ge help identify promising compositions with likely structures, while the machine-learning model suggests where to look and what combination of components (RE and M) are worthwhile exploring. The Dy−Ru−Ge system appears promising in terms of overlap of likely phases to be found and high probabilities for low thermal conductivities (Figure 8b). Experimental investigations of this system are in progress.

5.2. Germanides

Without any prior structural information, the model can evaluate probabilities of hypothetical compounds of any composition to exhibit a desired thermoelectric property. Among the suggestions, transition-metal germanides are predicted to possess low thermal conductivities, even though they are typically metallic and have not normally been considered as suitable candidates. For example, the Mn−Ru− Ge system is expected to show high probabilities for low thermal conductivities over wide regions of the phase diagram (Figure 8a).20 Although there is no guarantee that any compounds exist, the point is that the experimentalist receives guidance. We prepared Mn(Ru0.4Ge0.6), adopting a CsCl-type structure, within a region showing high probability (>90%) of low thermal conductivity. Preliminary measurements suggest that this compound may have a thermal conductivity as low as 2 W m−1 K−1. The apparent success of these models does not mean that human intervention is no longer required. Far from it. By

5.3. Counterintuitive Candidates

Although thermoelectric materials encompass diverse compounds, they usually contain heavy p-block elements (such as Sn, Sb, Te). Intermetallics containing large proportions of dand f-block metals are believed to be unlikely candidates because they are expected to exhibit strongly metallic behavior. It was thus surprising that the recommendation engine predicts that the rare-earth intermetallics RE12Co5Bi may be viable candidates even though they appear, at first glance, to be unrelated to existing thermoelectrics.49,57 In retrospect, RE12Co5Bi can be regarded as nearly antitypic to the filled G

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX

Article

Accounts of Chemical Research

recognized to be essential for machine-learning methods.58 Besides accelerated discovery, machine-learning methods provide other subtle benefits. Analyzing the descriptors that influence structures or physical properties helps draw insight. Predictions that disagree with experiment may not necessarily suggest that the machine-learning model is poor, but rather that the experimental data must be re-evaluated. Finally, experimentalists can be guided to think “outside the box” in efforts to test unlikely candidates.

skutterudites AM4X12 (e.g., LaFe4Sb12), which are well-known thermoelectrics. The feature of Bi atoms enclosed within icosahedral cages in RE12Co5Bi is reminiscent of pnicogen atoms within similar cages in filled skutterudites. Even though the model has no knowledge of structures, it is interesting that it arrives at a suggestion that is not obvious but might have eventually come to the attention of an experimentalist. Measurements confirm predictions that RE12Co5Bi should exhibit low thermal conductivities, high electrical conductivities, and modest Seebeck coefficients, and these results were obtained with no attempts at optimization (Figure 9).



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Arthur Mar: 0000-0003-0474-5918 Notes

The authors declare no competing financial interest. Biographies Anton O. Oliynyk (born in 1989 in Priozersk, Kazakhstan) received his B.Sc. and M.Sc. degrees at Ivan Franko National University of Lviv and his Ph.D. degree at the University of Alberta. He was a McElrath Postdoctoral Research Fellow at the University of Houston. He is currently a Research Associate at the University of Alberta, working on machine-learning methods for materials discovery. Arthur Mar (born in 1966 in San Juan, Philippines) received his B.Sc. degree at the University of British Columbia and his M.S. and Ph.D. degrees at Northwestern University. He was an NSERC Postdoctoral Fellow at the Institut des Matériaux de Nantes. He is currently a Professor of Chemistry at the University of Alberta. His research is focused on inorganic solids, including intermetallics, pnictides, and chalcogenides.



Figure 9. (a) Electrical resistivity, (b) Seebeck coefficient, (c) thermal conductivity, and (d) thermoelectric figure of merit for RE12Co5Bi (RE = Gd, Er). The bars at the bottom indicate the confidence levels for the first three properties for the thermoelectrics recommendation engine.

ACKNOWLEDGMENTS We thank the many students and collaborators who have contributed to this work, with special mention to Lawrence A. Adutwum, James J. Harynuk, Michael W. Gaultois, Taylor D. Sparks, Bryce Meredig, and Jakoah Brgoch. Financial support was provided by the Natural Sciences and Engineering Research Council of Canada through Discovery Grants and the Collaborative Research and Training Experience Program.

6. CONCLUSIONS Machine learning has the exciting potential to accelerate the discovery of new materials, including intermetallics. We described here efforts to search for intermetallics, such as germanides and gallides, and to predict crystal structures and physical properties. However, these machine-learning models are only as good as the data used to construct them. It is important to continue experimental investigations. Because the rules that govern compositions and structures of intermetallics are complex, it is still not possible to easily predict them. Phase diagram investigations and exploratory synthesis will remain viable activities for a long time to come, and nature still provides surprising results. To be sure, the literature is biased toward successful syntheses, well-defined structures, and good properties for specific applications such as thermoelectrics. Just as failure breeds character, the machine-learning model must be taught about unsuccessful syntheses, structural complications, and poor properties (or failures in materials optimization) to build up a robust database, to enhance statistical reliability, and to make better predictions. So-called “dark reactions,” or failed experiments that never make it to the literature, have been



REFERENCES

(1) Tkachuk, A. V.; Mar, A. Redetermination of Na3Hg2. Acta Crystallogr., Sect. E: Struct. Rep. Online 2006, 62, i129−i130. (2) Blanchard, P. E. R.; Grosvenor, A. P.; Cavell, R. G.; Mar, A. X-ray Photoelectron and Absorption Spectroscopy of Metal-Rich Phosphides M2P and M3P (M = Cr−Ni). Chem. Mater. 2008, 20, 7081− 7088. (3) Sauthoff, G. Intermetallics; VCH: Weinheim, 1995. (4) Nesper, R. The Zintl-Klemm Concept − A Historical Survey. Z. Anorg. Allg. Chem. 2014, 640, 2639−2648. (5) Ferro, R.; Saccone, A. Intermetallic Chemistry; Elsevier: Amsterdam, 2008. (6) Oliynyk, A. O.; Stoyko, S. S.; Mar, A. Ternary rare-earth ruthenium and iridium germanides RE3M2Ge3 (RE = Y, Gd−Tm, Lu; M = Ru, Ir). J. Solid State Chem. 2013, 202, 241−249. (7) Oliynyk, A. O.; Stoyko, S. S.; Mar, A. Quaternary Germanides RE4Mn2InGe4 (RE = La−Nd, Sm, Gd−Tm, Lu). Inorg. Chem. 2013, 52, 8264−8271. (8) Oliynyk, A. O.; Stoyko, S. S.; Mar, A. Many Metals Make the Cut: Quaternary Rare-Earth Germanides RE4M2InGe4 (M = Fe, Co, Ni, Ru, H

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX

Article

Accounts of Chemical Research Rh, Ir) and RE4RhInGe4 Derived from Excision of Slabs in RE2InGe2. Inorg. Chem. 2015, 54, 2780−2792. (9) Mar, A. and co-workers. Unpublished work. (10) Tkachuk, A. V.; Mar, A. Electron-poor SrAuxIn4−x (0.5 ≤ x ≤ 1.2) and SrAuxSn4−x (1.3 ≤ x ≤ 2.2) phases with the BaAl4-type structure. J. Solid State Chem. 2007, 180, 2298−2304. (11) Lin, Q.; Miller, G. J.; Corbett, J. D. Ordered BaAl4-Type Variants in the BaAuxSn4−x System. Inorg. Chem. 2014, 53, 5875− 5877. (12) Lin, X.; Mar, A. Homologous Series of Rare-Earth Zinc Arsenides REZn2−xAs2·n(REAs) (RE = La−Nd, Sm; n = 3, 4, 5, 6). Inorg. Chem. 2013, 52, 7261−7270. (13) Stoyko, S. S.; Khatun, M.; Mar, A. Ternary Arsenides A2Zn2As3 (A = Sr, Eu) and Their Stuffed Derivatives A2Ag2ZnAs3. Inorg. Chem. 2012, 51, 2621−2628. (14) Lin, X.; Tabassum, D.; Mar, A. Narrowing the gap: from semiconductor to semimetal in the homologous series of rare-earth zinc arsenides RE2−yZn4As4·n(REAs) and Mn-substituted derivatives RE2−yMnxZn4−xAs4·n(REAs) (RE = La−Nd, Sm, Gd). Dalton Trans. 2015, 44, 20254−20264. (15) Tabassum, D.; Lin, X.; Mar, A. Rare-earth manganese arsenides RE4Mn2As5 (RE = La−Pr). J. Alloys Compd. 2015, 636, 187−190. (16) Wang, M.; Mar, A. Nb9PdAs7: A Unique Arrangement in the Mn2+3n+2Xn2+nY Family of Hexagonal Structures. Inorg. Chem. 2001, 40, 5365−5370. (17) Bie, H.; Mar, A. Ge Pairs and Sb Ribbons in Rare-Earth Germanium Antimonides RE12Ge7−xSb21 (RE = La−Pr). Chem. - Asian J. 2009, 4, 1465−1473. (18) Stoyko, S. S.; Ramachandran, K. K.; Mullen, C. S.; Mar, A. RareEarth Manganese Copper Pnictides RE2Mn3Cu9Pn7 (Pn = P, As). Inorg. Chem. 2013, 52, 1040−1046. (19) Ramachandran, K. K.; Stoyko, S. S.; Mullen, C. S.; Mar, A. RareEarth Manganese Copper Phosphides REMnCu4P3 (RE = Gd−Ho). Inorg. Chem. 2015, 54, 860−866. (20) Sparks, T. D.; Gaultois, M. W.; Oliynyk, A. O.; Brgoch, J.; Meredig, B. Data mining our way to the next generation of thermoelectrics. Scr. Mater. 2016, 111, 10−15. (21) Oliynyk, A. O.; Mar, A. Rare-earth manganese germanides RE2+xMnGe2+y (RE = La, Ce) built from four-membered rings and stellae quadrangulae of Mn-centred tetrahedra. J. Solid State Chem. 2013, 206, 60−65. (22) Oliynyk, A. O.; Djama-Kayad, K.; Mar, A. Investigation of phase equilibria in the quaternary Ce−Mn−In−Ge system and isothermal sections of the boundary ternary systems at 800 °C. J. Alloys Compd. 2015, 622, 837−841. (23) Ashton, M.; Hennig, R. G.; Broderick, S. R.; Rajan, K.; Sinnott, S. B. Computational discovery of stable M2AX phases. Phys. Rev. B: Condens. Matter Mater. Phys. 2016, 94, 054116. (24) Hautier, G.; Fischer, C. C.; Jain, A.; Mueller, T.; Ceder, G. Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory. Chem. Mater. 2010, 22, 3762−3767. (25) Schmidt, J.; Shi, J.; Borlido, P.; Chen, L.; Botti, S.; Marques, M. A. L. Predicting the Thermodynamic Stability of Solids Combining Density Functional Theory and Machine Learning. Chem. Mater. 2017, 29, 5090−5103. (26) Kong, C. S.; Luo, W.; Arapan, S.; Villars, P.; Iwata, S.; Ahuja, R.; Rajan, K. Information-Theoretic Approach for the Discovery of Design Rules for Crystal Chemistry. J. Chem. Inf. Model. 2012, 52, 1812−1820. (27) Lach-hab, M.; Yang, S.; Vaisman, I. I.; Blaisten-Barojas, E. Novel Approach for Clustering Zeolite Crystal Structures. Mol. Inf. 2010, 29, 297−301. (28) Pettersson, F.; Suh, C.; Saxén, H.; Rajan, K.; Chakraborti, N. Analyzing Sparse Data for Nitride Spinels Using Data Mining, Neural Networks, and Multiobjective Genetic Algorithms. Mater. Manuf. Processes 2008, 24, 2−9. (29) Nyshadham, C.; Oses, C.; Hansen, J. E.; Takeuchi, I.; Curtarolo, S.; Hart, G. L. W. A computational high-throughput search for new ternary superalloys. Acta Mater. 2017, 122, 438−447.

(30) Legrain, F.; Carrete, J.; van Roekeghem, A.; Madsen, G. K. H.; Mingo, N. Materials Screening for the Discovery of New HalfHeuslers: Machine Learning versus ab Initio Methods. J. Phys. Chem. B 2017, DOI: 10.1021/acs.jpcb.7b05296. (31) Gorai, P.; Gao, D.; Ortiz, B.; Miller, S.; Barnett, S. A.; Mason, T.; Lv, Q.; Stevanović, V.; Toberer, E. S. TE Design Lab: A virtual laboratory for thermoelectric material design. Comput. Mater. Sci. 2016, 112, 368−376. (32) Dey, P.; Bible, J.; Datta, S.; Broderick, S.; Jasinski, J.; Sunkara, M.; Menon, M.; Rajan, K. Informatics-aided bandgap engineering for solar materials. Comput. Mater. Sci. 2014, 83, 185−195. (33) Hautier, J.; Jain, A.; Ong, S. P. From the computer to the laboratory: materials discovery and design using first-principles calculations. J. Mater. Sci. 2012, 47, 7317−7340. (34) Jain, A.; Hautier, G.; Ong, S. P.; Persson, K. New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships. J. Mater. Res. 2016, 31, 977−994. (35) Pauling, L. The Nature of the Chemical Bond, 3rd ed.; Cornell University Press: Ithaca, NY, 1960. (36) Villars, P. A three-dimensional structural stability diagram for 998 binary AB intermetallic compounds. J. Less-Common Met. 1983, 92, 215−238. (37) Villars, P.; Cenzual, K. Pearson’s Crystal Data − Crystal Structure Database for Inorganic Compounds (on DVD), release 2015/16; ASM International: Materials Park, OH, 2016. (38) Oliynyk, A. O.; Adutwum, L. A.; Harynuk, J. J.; Mar, A. Classifying Crystal Structures of Binary Compounds AB through Cluster Resolution Feature Selection and Support Vector Machine Analysis. Chem. Mater. 2016, 28, 6672−6681. (39) Sinkov, N. A.; Johnston, B. M.; Sandercock, P. M. L.; Harynuk, J. J. Automated optimization and construction of chemometric models based on highly variable raw chromatographic data. Anal. Chim. Acta 2011, 697, 8−15. (40) Cortes, C.; Vapnik, V. Support-Vector Networks. Machine Learning 1995, 20, 273−297. (41) Oliynyk, A. O.; Adutwum, L. A.; Rudyk, B. W.; Pisavadia, H.; Lotfi, S.; Hlukhyy, V.; Harynuk, J. J.; Mar, A.; Brgoch, J. Disentangling Structural Confusion through Machine Learning: Structure Prediction and Polymorphism of Equiatomic Ternary Phases ABC. J. Am. Chem. Soc. 2017, 139, 17870−17881. (42) Graf, T.; Felser, C.; Parkin, S. S. P. Simple rules for the understanding of Heusler compounds. Prog. Solid State Chem. 2011, 39, 1−50. (43) Oliynyk, A. O.; Antono, E.; Sparks, T. D.; Ghadbeigi, L.; Gaultois, M. W.; Meredig, B.; Mar, A. High-Throughput MachineLearning-Driven Synthesis of Full-Heusler Compounds. Chem. Mater. 2016, 28, 7324−7331. (44) Ho, T. K. The Random Subspace Method for Constructing Decision Forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832−844. (45) Lacroix-Orio, L.; Tillard, M.; Belin, C. Exploration of the lithium-aluminum-silver system. Solid State Sci. 2004, 6, 1429−1437. (46) Kiselyova, N. N.; Dudarev, V. A.; Korzhuev, M. A. Database on forbidden zones of solids, 2006−2017 (http://bg.imet-db.ru). (47) Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; Persson, K. A. The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 2013, 1, 011002. (48) Citrine Informatics, http://www.citrination.com. (49) Gaultois, M. W.; Oliynyk, A. O.; Mar, A.; Sparks, T. D.; Mulholland, G. J.; Meredig, B. Perspective: Web-based machine learning models for real-time screening of thermoelectric materials properties. APL Mater. 2016, 4, 053213. (50) Carrete, J.; Mingo, N.; Wang, S.; Curtarolo, S. Nanograined Half-Heusler Semiconductors as Advanced Thermoelectrics: An Ab Initio High-Throughput Statistical Study. Adv. Funct. Mater. 2014, 24, 7427−7432. (51) Chen, W.; Pöhls, J.-H.; Hautier, G.; Broberg, D.; Bajaj, S.; Aydemir, U.; Gibbs, Z. M.; Zhu, H.; Asta, M.; Snyder, G. J.; Meredig, I

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX

Article

Accounts of Chemical Research B.; White, M. A.; Persson, K.; Jain, A. Understanding thermoelectric properties from high-throughput calculations: trends, insights, and comparisons with experiment. J. Mater. Chem. C 2016, 4, 4414−4426. (52) Wang, S.; Wang, Z.; Setyawan, W.; Mingo, N.; Curtarolo, S. Assessing the Thermoelectric Properties of Sintered Compounds via High-Throughput Ab-Initio Calculations. Phys. Rev. X 2011, 1, 021012. (53) Gorai, P.; Stevanović, V.; Toberer, E. S. Computationally guided discovery of thermoelectric materials. Nat. Rev. Mater. 2017, 2, 17053. (54) Ortiz, B. R.; Gorai, P.; Stevanović, V.; Toberer, E. S. Thermoelectric Performance and Defect Chemistry in n-Type Zintl KGaSb4. Chem. Mater. 2017, 29, 4523−4534. (55) Zhu, H.; Hautier, G.; Aydemir, U.; Gibbs, Z. M.; Li, G.; Bajaj, S.; Pöhls, J.-H.; Broberg, D.; Chen, W.; Jain, A.; White, M. A.; Asta, M.; Snyder, G. J.; Persson, K.; Ceder, G. Computational and experimental investigation of TmAgTe2 and XYZ2 compounds, a new group of thermoelectric materials identified by first-principles high-throughput screening. J. Mater. Chem. C 2015, 3, 10554−10565. (56) Kim, S.-W.; Kimura, Y.; Mishima, Y. High temperature thermoelectric properties of TiNiSn-based half-Heusler compounds. Intermetallics 2007, 15, 349−356. (57) Oliynyk, A. O.; Sparks, T. D.; Gaultois, M. W.; Ghadbeigi, L.; Mar, A. Gd12Co5.3Bi and Gd12Co5Bi, Crystalline Doppelgänger with Low Thermal Conductivities. Inorg. Chem. 2016, 55, 6625−6633. (58) Raccuglia, P.; Elbert, K. C.; Adler, P. D. F.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Machine-learning-assisted materials discovery using failed experiments. Nature 2016, 533, 73−76.

J

DOI: 10.1021/acs.accounts.7b00490 Acc. Chem. Res. XXXX, XXX, XXX−XXX