Expediting Combinatorial Data Set Analysis by Combining Human

Dec 2, 2016 - ... Journal of Natural Products · The Journal of Organic Chemistry .... Bochum, Universitätsstraße 150, 44780 Bochum, Germany ... Rapi...
0 downloads 0 Views 6MB Size
This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes.

Technology Note pubs.acs.org/acscombsci

Expediting Combinatorial Data Set Analysis by Combining Human and Algorithmic Analysis Helge Sören Stein,*,† Sally Jiao,‡,† and Alfred Ludwig*,†,§ †

Institute for Materials and §Materials Research Department & ZGH, Ruhr-Universität Bochum, Universitätsstraße 150, 44780 Bochum, Germany ‡ Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States S Supporting Information *

ABSTRACT: A challenge in combinatorial materials science remains the efficient analysis of X-ray diffraction (XRD) data and its correlation to functional properties. Rapid identification of phase-regions and proper assignment of corresponding crystal structures is necessary to keep pace with the improved methods for synthesizing and characterizing materials libraries. Therefore, a new modular software called htAx (highthroughput analysis of X-ray and functional properties data) is presented that couples human intelligence tasks used for “ground-truth” phase-region identification with subsequent unbiased verification by an algorithm to efficiently analyze which phases are present in a materials library. Identified phases and phase-regions may then be correlated to functional properties in an expedited manner. For the functionality of htAx to be proven, two previously published XRD benchmark data sets of the materials systems Al-Cr-Fe-O and Ni-Ti-Cu are analyzed by htAx. The analysis of ∼1000 XRD patterns takes less than 1 day with htAx. The proposed method reliably identifies phase-region boundaries and robustly identifies multiphase structures. The method also addresses the problem of identifying regions with previously unpublished crystal structures using a special daisy ternary plot. KEYWORDS: combinatorial materials science, X-ray diffraction, phase-region identification, crystal structures, clustering

C

through a ternary (or higher) composition space, such as quasibinary cuts, as introduced by Ermon et al.2 and Le Bras et al.7 There is, however, a need for tools that can identify phases by matching experimental patterns to patterns from a database. In the present work, a software package and the underlying algorithms are presented that allow an expert to first visually assess the existence of phase regions by solving small human intelligence tasks.2 In order to aid the analysis, the obtained results are interactively visualized in a ternary diagram. The second step analyzes all experimental XRD patterns by first identifying any peaks in the patterns. Identified peaks are then automatically matched to known peaks from a crystal structure database. The user can therefore rapidly identify phase regions, determine corresponding crystal structures, and interactively receive feedback for further correlation of (un)known crystal structures to functional properties. Finally, a ternary daisy plot is used to visualize matched crystal structures, multiphaseregions, and unmatched regions. Each point on the ternary plot, corresponding to a measurement area on the materials library, shows a glyph in the shape of a “daisy” having several “sectors”. These sectors correspond to the matched crystal structures, and the colors of the sectors correspond to how good the match is, which is quantified in a figure of merit (FOM). Thus, regions that fail to match any crystal structure will not be colored,

ombinatorial materials science endeavors to discover new materials and improve existing materials through the combinatorial synthesis of materials libraries and their efficient high-throughput characterization. The vast amount of data collected is used to correlate functional properties to crystal structures and compositions; expediting this process requires semiautomated analysis tools. In order to generate functional phase diagrams (i.e., ternary or quasi-ternary diagrams that convey compositional, structural, and functional properties beyond simple linear correlations), information about the location of phase boundaries, corresponding crystal structures, and functional properties on a combinatorial materials library needs to be efficiently gathered, analyzed, and visualized. However, identifying phases for combinatorial material libraries is often time-consuming and labor-intensive. Several methods have been proposed that try to complete this task using unsupervised machine learning, such as k-mean or k-medoids clustering, non-negative matrix factorization (NNMF), or complex algorithms like “GRENDEL” and “SS-AutoPhase”.1−7 However, most clustering algorithms are error-prone when dealing with effects that occur frequently in thin-film materials libraries, such as peak shifts, strong texture and texture changes, or experimental issues such as a low signal-to-noise ratio. Even with state-of-the-art algorithms, phase-region identification requires a laborious postprocessing verification process that involves comparing individual XRD patterns one by one. Another method for phase-region identification that is agnostic to the actually occurring structures is a visual analysis of “cuts” © XXXX American Chemical Society

Received: September 30, 2016 Revised: November 24, 2016 Published: December 2, 2016 A

DOI: 10.1021/acscombsci.6b00151 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Technology Note

ACS Combinatorial Science

Figure 1. Workflow of the proposed analysis scheme. First, data is read in and sorted. Second, the analysis of quasi-binary cuts helps identify phaseregion boundaries for a “ground-truth” analysis. The identified phase-regions are then matched to published crystal structures, visualized, and correlated to functional properties. The htAx package unifies all analysis steps in a single framework, aiding the researcher in an expedited discovery of new materials and structure−property relationships. All insets, besides the left-most, appear as figures throughout the paper and solely serve for illustrative purposes here.

correctly grouped together without loss of information. This advantage is crucial as some functional properties are known to be anisotropic. If some regions on the materials library are XRD amorphous or do not exhibit peaks at positions that match those of reported crystal structures, a special ternary daisy plot helps the user to visually assess the quality of the crystal structure matching. For the correlation to functional properties, such as, for instance, Tafel slope in oxygen-evolution-reaction catalysts, plugins for htAx exist that can analyze functional properties semi-automatically. Several methods that expedite the identification of phase regions in combinatorial materials libraries exist. One such method is unsupervised machine learning using dissimilarity measures.3,10 These dissimilarity measures work under the assumption that points within a yet-to-be-identified phase region will produce similar XRD patterns because similar crystal structures will have similar peak positions. As these clustering algorithms depend mostly on the selection of the correct dissimilarity/similarity measure, considerable effort by Takeuchi et al.3,10 and Kusne et al.15 has been spent to identify suitable dissimilarity measures for kmeans clustering. However, these algorithms usually fail in the correct physical description of phase mixtures and peak shifts in solid solutions. In order to assess the results from htAx analysis, results from k-means clustering are shown for both case study materials systems in Figures S1 and S2. The fact that a singleor multiphase region contains similar crystal structures and therefore similar XRD patterns (i.e., peaks at similar positions) may, however, be exploited in a more robust manner by analyzing quasi-binary cuts.1,2,7 Quasi-binary cuts through an arbitrary, high-dimensional composition space will exhibit

whereas matched regions will be color-coded according to the value of the FOM so that the user immediately receives feedback on the quality of the crystal structure identification. This work demonstrates the utility of the proposed analysis scheme implemented in htAx by comparing published results with results obtained by htAx for data sets from Al-Cr-Fe-O8 and Ni-Ti-Cu9 thin-film materials libraries. Figure 1 shows the workflow of the proposed analysis scheme. The general approach to phase-region identification and matching to known crystal structures for the purpose of correlation to functional properties begins with importing and sorting data from high-throughput experiments. Even though the task of data import is often trivial, it may be a challenge if different coordinate systems and missing measurement areas exist for one material library, which is often the case when analysis is performed in different groups using partially incompatible coordinate systems. In the present approach, this problem is overcome by a semiautomatic “tree-search” that identifies the common subset of measurement areas analyzed by all methods. After import, the XRD data is first analyzed by creating quasi-binary cuts through the ternary composition space (the software can be modified to work with an arbitrarily high dimension). These cuts are analyzed as described below. After this “ground-truth” preanalysis, a peak-matching algorithm attempts to identify matches between the experimental patterns and those of known crystal structures. The previously identified phase-regions are thus matched to known crystal structures. One of the benefits of this second stage is that regions that belong to the same crystal structure but that change in orientation and thus might have been incorrectly separated in the first stage would now be B

DOI: 10.1021/acscombsci.6b00151 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Technology Note

ACS Combinatorial Science

Figure 2. Identification of phase boundaries by analyzing quasi-binary cuts in the Ni-Ti-Cu system at (a) 20 ± 2.5 and (b) 10 ± 2.5 at. % Cu. This example uses the diffraction peak at an angle of around 42.45°, which is present in the B19, B19′, and B2 phases (ICSD reference peak at 42.6°). The right side in (a) displays the binary cut with diffraction intensity color coded as a virtual z-axis. Red triangles mark the user-selected angle and the compositions at which the peak appears and vanishes, denoted by the numbers 1 and 2. The software then draws a corresponding line cutting through the phase region in the color-coded ternary diagram on the left with measurement areas colored corresponding to peak intensity at that userselected angle. The analysis at both Cu concentrations already indicates that the phase region becomes narrower with less Cu. The vertical box highlights the peak used for identification; in the actual user interface, a line is drawn over the peak.

e.g., (20 ± 2 at. %) Cu in Figure 2a. The htAx software allows the user to select the compositions at which peaks start to appear or vanish. The corresponding end points are then interactively shown in the ternary diagram as shown in Figures 2 and 3. A trained person spends ∼30 s on each binary cut. From experience, on the presented materials systems, approximately 20 to 40 of these quasi-binary analyses suffice to generate a reasonable outline of regions exhibiting the same phase(s). Figure 3 shows an example for such an analysis: a colorcoded ternary diagram with lines connecting the end points from the user-identified phase-region boundaries from quasibinary cuts. The colors of these lines correspond to the diffraction angle of the peak that helped to identify the region of similar XRD patterns (red, 42.6°; purple, 40.76°). At a Ti concentration of ∼50 at. %, a narrow region exists where both red and blue lines are present. This indicates the existence of a two-phase region between two single-phase regions in agreement with the Gibbs rule. This case cannot be properly described by k-means clustering in which each measurement

visually similar patterns (as peaks will appear at similar positions with similar intensities) over single or multiple phase regions (as shown in Figure 2a and b) and will also exhibit abrupt changes when crossing a phase-region boundary, as a change in the crystal structure or orientation will abruptly alter peak locations. In the “UDiscoverIT” framework,2 this detection is gamified in an interactive Web site where users can solve small “human intelligence tasks” (HITs). In the present work, the user solves these HITs as well. However, the user gets immediate visual feedback of the result in the composition space. Furthermore, every time a user identifies a compositional range corresponding to a possible phase region, the diffraction angle of the peak(s) on which the analysis is based is saved. This information provides additional evidence for the phaseregion determination. Example heatmaps of quasi-binary cuts for the Ti-Ni-Cu system are shown in Figure 2a and b and for the Al-Cr-Fe-O system in Figure 8. In these plots, certain peaks start to appear at certain compositions along the quasi-binary cuts where one element is kept constant (within a certain tolerance window), C

DOI: 10.1021/acscombsci.6b00151 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Technology Note

ACS Combinatorial Science

Figure 3. Ternary diagram of a thin-film Ni-Ti-Cu materials library showing the result of the quasi-binary cut (Cu = const at. % ± tolerance) analysis. The end points of the color-coded lines crossing the phase regions represent phase-region boundaries. The region crossed by red lines covers measurement areas showing a peak at 42.6°, as determined by quasi-binary cuts. The region crossed by purple lines covers measurement areas showing a peak at 40.76°, as determined by quasi-binary cuts. The narrow region in which red and purple lines overlap indicates a two-phase region.

area may only be part of one cluster or phase region, as shown in the Figures S1 and S2. However, in reality, multiphase regions do exist, as correctly described by the analysis of XRD data using NNMF. In NNMF analysis, each occurring phase is always present everywhere, so that there may be no truly single phase regions, and again, peak shifts are not properly described. In htAx, the task of phase-region identification is semiautomated and split into small “human intelligence tasks” that result in a so-called “ground-truth”1,7 analysis of the system highlighting possible phase boundaries. This analytical approach will produce an accurate description of the phase constitution of the materials system under investigation. htAx is believed to be more robust than k-means clustering as it takes into account peak shifts in the quasi-binary cut analysis. Also, htAx results are not dependent on the choice of distance measure that can yield incorrect results as shown in Figures S1 and S2 where some distance measures result in a random clustering solution. The herein presented combination of human intelligence coupled with a subsequent algorithmic analysis (discussed below) yields reliable results in the identification of phase regions and crystal structures in combinatorial materials science analysis. The second main aspect of the proposed analysis scheme is the identification of phases based on a comparison of experimental XRD patterns with reference patterns. This comparison requires the ability to identify peaks from experimental patterns. There are several ways to identify peaks,11 and the success of these methods depends on the proper identification of as many real peaks as possible. It is important to distinguish between real peaks and peaks resulting from artifacts due to, e.g., noise, because if artifact-derived peaks coincidentally match peaks from a database powder pattern, it will result in a false identification of a crystal structure. Such false positive hits are, however, random and can be sorted out by visual inspection of a color-coded ternary diagram. The overall procedure for noise reduction, thresholding, peak matching, and the assignment of an FOM is shown in the block diagram in Figure 4. In order to robustly determine peaks in an experimental pattern, the data first needs to be denoised. htAx denoises a pattern utilizing MATLAB’s wden function using the automatic

Figure 4. Block diagram outlining the procedure for peak identification, thresholding, and matching using an FOM; see text for detailed explanation. This analysis is performed for all measured diffraction patterns in a materials library. The FOM for one structure is only saved for the best matching peak at a given pattern.

maximum overlap wavelet transformation with universal thresholding ( ln2 length(x) , sqtwolog). For most diffraction data, the first four levels are enough to describe the signal and to filter the noise. Analyzing the difference between the signal and its denoised representation allows the construction of a model describing the noise. This noise measure is used as a threshold for a minimum peak height in the peak detection algorithm. The peak detection algorithm can be any function returning the peak location, width, and height as shown in the block diagram in Figure 4. The information obtained from peak detection is subsequently used for peak matching. On the basis of the identified locations, heights (prominence) and widths of the identified peaks, the pattern can be reconstructed by a series of Gaussian peaks, as shown in Figure 5 c and d, which could be used for further processing. The peak’s prominence measures the peak’s relative intensity compared to the neighboring signal, and the peak’s width is measured at half the peak’s prominence. For a given XRD pattern in the experimental data, the peak matching algorithm takes in (1) all of the peaks identified by the peak identification algorithm and (2) XRD powder patterns from a crystal structure database. The algorithm seeks to match each peak in the experimental pattern with peaks from the database patterns. The algorithm considers a “match” to be present when the distance between the peak from the experimental pattern and the peak from the database pattern is less than one-fourth of the width of the peak from the experimental pattern. The algorithm assigns each peak match a figure of merit (FOM) equal to ⎛ ⎞ p ⎟ log10⎜ 2 ⎝ w (xe − xr) ⎠ D

DOI: 10.1021/acscombsci.6b00151 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Technology Note

ACS Combinatorial Science

Figure 5. Demonstration of the proposed FOM to perform peak identification and matching. (a) Color-coded ternary plot of the Ni-Ti-Cu system. Yellow dots represent high FOM values and dark blue dots represent values close to zero for the phase (CuNi)Ti (ICSD phase #628583). (b) Plots of the FOM (top) and its components (middle, bottom) for a quasi-binary cut along 20 ± 2 at. % Cu highlight the sharp contrast identifying (CuNi)Ti. (c) Measured XRD pattern (red) and reconstructed XRD pattern (blue) from a measurement area in the yellow region in (a). The peak prominence p, center c, and width w are shown in the figure. The reference peak position is 40.8713°, resulting in a FOM of 5.8. (d) Measured XRD pattern (red) and reconstructed XRD pattern (blue) from a representative measurement area from the dark blue region in (a). The closest peak is more than one-fourth of the peak width away from the closest reference peak position (40.8°) so that the peak is considered to be not matching as outlined in the text. The black bars in (c) and (d) represent the reference peak positions from ICSD file #628583.

where p = peak prominence in the experimental pattern, w = peak width in the experimental pattern, xe = peak location in the experimental pattern, and xr = peak location in the database or reference powder pattern. The value of the proposed FOM increases with increasing peak intensity, decreasing peak width, and proximity to the peaks from the reference pattern as illustrated in Figure 5. If there are multiple peak matches between an experimental pattern and a database pattern, the FOM assigned to the experimental−database pattern pair will be the maximum of all FOMi of the peak matches. Assigning max(FOMi) to a pattern yields better results than sum(FOMi) by visual comparison to quasi-binary cut analysis, k-means clustering, and NNMF all serving as internal standards as shown in Figures S1 and S2. It is assumed that this better agreement with other analysis techniques is due to the effect that max(FOMi) yields more false negative results in contrast to sum(FOMi) that tends to yield more false positives. This observation is likely associated with the fact that, when a crystal structure match is bad for a couple of identified peaks, the sum will be of an equally high value as that of a single very good match for one peak, especially considering the logarithmic growth of the FOM. The use of the FOM is illustrated in the example of an arbitrarily chosen phase ((CuNi)Ti) in the system Ni-Ti-Cu. Figure 5a displays a quasi-binary cut along (20 ± 2 at. %) Cu. The FOM values shown in Figure 5b are the logarithm of the product of the [prominence/width] measure, which measures

the reliability of the experimental peak, and [1/(width×distance)] measure, which correlates to the likelihood that the matched peak is correct. The diffraction peak at 40.87° (Figure 5c) matches the phase (CuNi)Ti and yields a high FOM, which is shown in the yellow area in Figure 5a. The reconstructed peaks in Figure 5d are too far away from peaks in the published (CuNi)Ti pattern to be considered a match. It is evident from these plots that the term [1/(width×distance)] is highest at the center of the phase-region and becomes lower as one moves away from it. Changes in peak shape could be the result of several factors, such as changes in crystallinity, which would alter peak prominence and peak width, or residual stress, which would alter peak width. The [prominence/width] term is motivated by the assumption that, as the presence of a given phase falls toward the edge of a phase boundary, its peaks in the XRD pattern will become smaller and the widths of those peaks will increase. As shown in Figure 5b, the FOM with distance cutoff results in a sharp edge with a difference in the FOM of ∼6 (on a log10 scale). In comparison to dissimilarity measures such as the cosine metric as used in clustering approaches by Kusne et al.,15 the presented FOM for crystal structure mapping shows a comparably larger contrast between what is and is not in the identified phase region. Below, case studies on an intermetallic and a transition metal oxide system will be presented to emphasize the utility of the method presented herein. E

DOI: 10.1021/acscombsci.6b00151 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Technology Note

ACS Combinatorial Science

Figure 6. Ternary diagrams showing the crystal structure matching color-coded FOM from htAx analysis of a Ni-Ti-Cu thin film materials library for different ICSD phases: (a) #189950 Ti(Ni0.5Cu0.5), (b) #103052 Ni2(CuTi), (c) #628583 Ti(CuNi), (d) #628578 Ti(Ni2.91Cu0.09), (e) #164151 NiTi, and (f) #602112 Ti(Cu0.053Ni0.947)3.

The Ni-Ti-Cu system is revisited using the htAx approach. The Ni-Ti-Cu system exhibits phases with shape memory properties. The data was taken from a paper on the martensitic transformation in the complete Ni-Ti-Cu system.9 Quasi-binary cuts are analyzed for the occurrence of phase boundaries. An example is shown above in Figure 2a and b, which presents two cuts through the Ni-Ti-Cu system and determines phase boundaries by analyzing the peak region around 42.45° corresponding to the B19, B19′, and B2 phases (literature peak in ICSD #189950 of 42.6°) at constant Cu concentrations of (10 ± 5 at. %) and (20 + 5 at. %). Analysis of the B19 phase at the two Cu concentrations indicate that the phase-region becomes narrower with lower Cu concentrations as shown in Figure 3. After analyzing the binary cuts, the peak identification and matching algorithm described earlier performs automatic crystal structure matching. Figure 6 displays the six most prominent phases according to the FOM, corresponding to the same phases identified in the analysis by Zarnetta et al.9 The combined analysis using quasi-binary cut analysis and FOM-based crystal structure matching yields a physically more correct result than the previously reported k-means clustering, as k-means clustering can only identify one measurement area as belonging to one phase-region, which is physically incorrect. The visualization of the automatic crystal structure matching in Figure 6 is overlaid by circular, red outlines illustrating the phase-boundary analysis from the quasi-binary cuts. Using these phase boundaries, it becomes clear that some of the phase boundaries identified in fact correspond to the right-hand boundary of Ni2(CuTi) (Figure 6b) and Ti(Ni2.91Cu0.09) (Figure 6d). For visualizing multiphase regions and identifying amorphous or unmatched regions, up to six matched crystal structures can be visualized in a daisy ternary plot. An example is shown in Figure 7. The Al-Cr-Fe-O materials system is revisited8,12,13 following the workflow of the proposed analysis scheme (Figure 1). A combinatorial outreach program13 found that this material system exhibits interesting functional properties for solar water splitting. In a previous publication,8 two crystal structure regions were identified by k-means clustering in a manner

Figure 7. Visualization of matched crystal structures in a ternary diagram. At each composition, the stylized daisy glyph as shown on the top of the figure contains sectors corresponding to the matched crystal structures. The colors of the sectors, labeled I to VI, correspond to the FOM for the matched crystal structures from Figure 6a−f, as indicated by the color bar on the top left. The stylized daisy on the top is a magnification of the daisy in the red circle. Most of the daisies indicate at least one match so that it is assumed that the system is well described by the six chosen crystal structures.

similar to the approach presented by Long et al.3,10 and Kusne et al.6 The Cr-rich region of the materials library exhibited a corundum-type structure with p-type semiconducting properties, and the Fe-rich and Al-poor regions indicated a spinel-type structure with n-type properties. The compositions between the two regions were assumed to be X-ray amorphous. Depending on different metrics used for k-means clustering, different phase-regions were suggested. These clusterings did distinguish between Al-rich and Al-poor regions as shown in the colorcoded ternary diagrams in Figures S1 and S2, but the identification of phase regions varied significantly between metrics. By comparing this Al-rich right-hand side to the Alpoor spinel-type region, it was concluded that these regions have the same crystal structure with different degrees of crystallinity. It should be noted that the analysis scheme used in F

DOI: 10.1021/acscombsci.6b00151 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Technology Note

ACS Combinatorial Science

Figure 8. (a) Color-coded ternary plot of the Al-Cr-Fe-O materials library showing the peak intensity distribution for the XRD peak at 36.12°. This peak is assumed to correspond to a CrFeO3 corundum- or spinel-type structure. (b) Corresponding quasi-binary plot showing the appearance (1) and vanishing (2) of the peak above 43 at. % Cr. The end points (1) and (2) are interactively drawn into the color-coded ternary diagram in (a). The red horizontal lines serve as a guide for the reader, identifying the compositions at which the peak appears and vanishes. The vertical box highlights the peak used for identification; in the actual user interface, a line is drawn over the peak.

the original publication6 for Al-Cr-Fe-O did not use noise reduction or automatic peak detection. This materials library also poses a “worst-case” scenario, as the high Fe content in the film produces a high background signal due to X-ray fluorescence by the 20 kV Cu Kα1 line. Following the htAx analysis scheme, first quasi-binary cuts are performed to identify phase-region boundaries. As an example, the peak intensity for the peak at 36.12° is shown in a ternary diagram in Figure 8a with a corresponding quasi-binary plot at 8 at. % Al in Figure 8b. This quasi-binary cut shows that the peak at 36.12°, as indicated by the red triangle, vanishes for compositions >43 at. % Cr. Therefore, the user marks a phaseregion boundary in this quasi-binary plot and is interactively presented with the blue line in the ternary diagram in Figure 8a that now indicates where in the ternary composition space a possible phase region exists. The end points of the blue line are on the phase-region boundary; the blue line itself crosses the phase region, and the color of the line corresponds to the vanishing peak’s diffraction angle. The second analysis stage performs automatic crystal structure matching using all crystal structures found on ICSD that could plausibly appear in the Fe-Cr-Al-O system, a total of 12 structures. Similar to the analysis in Sliozberg et al.,8 two distinct phase regions appeared, as shown in Figure 9. From the 12 plausible ICSD structures for the Al-Cr-Fe-O system, only the four with the highest matching FOMs are shown. These structures correspond to spinel- (Figure 9 a, c, and d) and corundum-like structures. Some of these crystal structures correspond to truly quaternary (Al1.4Cr0.6Fe1)O4 (ICSD #166537) (Al 0.4 87 Cr 1.4 83 Fe 1. 031 )O 4 (ICSD #187923) (Al1.8Cr0.2Fe1)O4 (ICSD #166538) spinel-type crystal structures that previously could not be definitively matched. The problem, however, is that certain peaks overlap for some of the spinel- and corundum-type structures. This new set of crystal structure matchings for Al-Cr-Fe-O is an important finding as it was previously suggested that the Al-poor n-type oxide likely had a corundum structure, whereas now there is additional evidence that it is a corundum and spinel mixture. Even though the algorithm had not been supplied with the composition of the corresponding measurement area, crystal structures with nominally more Cr content yielded better matches in Cr-rich regions and vice versa. This result shows the

Figure 9. Color-coded ternary plots showing the FOM for matched crystal structures (a) #166537 (spinel: Al1.4Cr0.6Fe1O4), (b) #187923 (spinel: Al 0.487 Cr 1.483 Fe 1.031 O 4 ), (c) #163943 (corundum: Cr0.75Fe1.25O3), and (d) #166538 (spinel: Al1.8Cr0.2Fe1O4). The dotted black lines illustrate the identified phase regions based on the results from quasi-binary cuts and FOM crystal structure matching.

suitability of the proposed FOM, as peak shifts due to alloying can be correctly implemented in the analysis. Most of the phase-region boundaries identified through analyzing the quasibinary cuts, illustrated by dotted black lines, agree well with boundaries identified by visually inspecting the color changes of the FOM in the ternary diagrams in Figure 9. Most of the identified phase-region boundaries describe the structure shown in Figure 9a well, as the XRD peaks in this region are relatively high in intensity and are thus well described by the quasi-binary cuts and FOM as shown in Figure 8. The results of the previous8 identification of phase regions by k-means clustering correlated with the type of the semiconductor, which was identified by measuring the change of the open circuit potential under illumination (ΔOCP). The kmeans clustering results using different distance measures for the Al-Cr-Fe-O system are shown in Figure S1. The new analysis using htAx improves this prior finding by additionally correlating well with photocurrent densities. This improved G

DOI: 10.1021/acscombsci.6b00151 ACS Comb. Sci. XXXX, XXX, XXX−XXX

Technology Note

ACS Combinatorial Science

Keller Center at Princeton University and Ruhr-Universität Bochum International Office through the REACH Program.

correlation is achieved due to the incorporation of crystallinity into the FOM (high crystallinity leads to higher XRD peaks and thus higher FOM values). In summary, a new analysis scheme for the analysis of XRD data from combinatorial thin-film materials libraries was demonstrated. Phase boundaries are identified by creating quasi-binary cuts through the data and solving small human intelligence tasks. The same XRD data set is then presented to an algorithm that identifies peaks and matches them to peaks in patterns of known crystal structures supplied by a database. For all crystal structures that produced matches, the user is presented with a color-coded ternary diagram displaying the matches’ FOM overlaid with his own interactively generated phase-region boundary identification. The two information sets enable a plausible identification of phase regions. The proposed scheme was successfully demonstrated in the analysis of the intermetallic system Ni-Ti-Cu and oxide system Al-Cr-Fe-O and helped to answer an outstanding question of which structures are present in the p-type region. Furthermore, a quantitative and unbiased method as addressed by Bunn et al.14 was implemented. The algorithm also correctly incorporated peak shifts due to alloying and matched those crystal structures with a nominally higher Cr content where the EDX data measured a higher Cr concentration. The entire XRD analysis using htAx was performed in less time than the analysis using kmeans clustering, as htAx does not require postprocessing steps and includes crystal structure identification as part of the analysis scheme. The demonstrated analysis scheme may also be applied to the analysis of functional properties such as photoelectrochemistry, where usually a laborious manual analysis is also performed to extract information beyond simple photocurrent densities at fixed potentials. htAx is published under a permissive open-source license (see Supporting Information for code and www.github.com/WDM-RUB/ htAx), and the scientific community is invited to use, adapt, and share it.





ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscombsci.6b00151. Comparison of the htAx-based analysis to k-means clustering and pattern decomposition based on nonnegative matrix factorization (PDF)



REFERENCES

(1) Ermon, S.; Le Bras, R.; Suram, S. K.; Gregoire, J. M.; Gomes, C. P.; Selman, B.; van Dover, R. B. Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery; AAAI Press, 2015. (2) Ermon, S.; Le Bras, R.; Gomes, C. P.; Selman, B.; van Dover, R. B. SMT-Aided Combinatorial Materials Discovery. In Theory and Applications of Satisfiability Testing−SAT 2012; Lecture Notes in Computer Science; Springer Berlin Heidelberg: Berlin, Heidelberg, 2012; Vol. 7317, pp 172−185. (3) Long, C. J.; Bunker, D.; Li, X.; Karen, V. L.; Takeuchi, I. Rapid Identification of Structural Phases in Combinatorial Thin-Film Libraries Using X-Ray Diffraction and Non-Negative Matrix Factorization. Rev. Sci. Instrum. 2009, 80 (10), 103902−103906. (4) Takeuchi, I.; Long, C. J.; Famodu, O. O. Data Management and Visualization of X-Ray Diffraction Spectra From Thin Film Ternary Composition Spreads. Rev. Sci. Instrum. 2005, 76, 062223. (5) Bunn, J. K.; Hu, J.; Hattrick-Simpers, J. R. Semi-Supervised Approach to Phase Identification From Combinatorial Sample Diffraction Patterns. JOM 2016, 68 (8), 2116−2125. (6) Kusne, A. G.; Keller, D.; Anderson, A.; Zaban, A.; Takeuchi, I. High-Throughput Determination of Structural Phase Diagram and Constituent Phases Using GRENDEL. Nanotechnology 2015, 26 (44), 1−9. (7) Le Bras, R.; Xue, Y.; Bernstein, R.; Gomes, C. P.; Selman, B. A Human Computation Framework for Boosting Combinatorial Solvers. Second AAAI Conference on Human Computation and Crowdsourcing, 2014. (8) Sliozberg, K.; Stein, H. S.; Khare, C.; Parkinson, B. A.; Ludwig, A.; Schuhmann, W. Fe−Cr−Al Containing Oxide Semiconductors as Potential Solar Water-Splitting Materials. ACS Appl. Mater. Interfaces 2015, 7 (8), 4883−4889. (9) Zarnetta, R.; Buenconsejo, P. J. S.; Savan, A.; Thienhaus, S.; Ludwig, A. High-Throughput Study of Martensitic Transformations in the Complete Ti-Ni-Cu System. Intermetallics 2012, 26 (C), 98−109. (10) Long, C. J.; Hattrick-Simpers, J.; Murakami, M.; Srivastava, R. C.; Takeuchi, I.; Karen, V. L.; Li, X. Rapid Structural Mapping of Ternary Metallic Alloy Systems Using the Combinatorial Approach and Cluster Analysis. Rev. Sci. Instrum. 2007, 78 (7), 072217. (11) Gregoire, J. M.; Dale, D.; van Dover, R. B. A Wavelet Transform Algorithm for Peak Detection and Application to Powder X-Ray Diffraction Data. Rev. Sci. Instrum. 2011, 82 (1), 015105−015109. (12) Kondofersky, I.; Müller, A.; Dunn, H. K.; Ivanova, A.; Stefanic, G.; Ehrensperger, M.; Scheu, C.; Parkinson, B. A.; FattakhovaRohlfing, D.; Bein, T. Nanostructured Ternary FeCrAl Oxide Photocathodes for Water Photoelectrolysis. J. Am. Chem. Soc. 2016, 138 (6), 1860−1867. (13) Woodhouse, M.; Parkinson, B. A. Combinatorial Approaches for the Identification and Optimization of Oxide Semiconductors for Efficient Solar Photoelectrolysis. Chem. Soc. Rev. 2009, 38 (1), 197− 210. (14) Bunn, J. K.; Fang, R. L.; Albing, M. R.; Mehta, A.; Kramer, M. J.; Besser, M. F.; Hattrick-Simpers, J. R. A High-Throughput Investigation of Fe−Cr−Al as a Novel High-Temperature Coating for Nuclear Cladding Materials. Nanotechnology 2015, 26 (27), 1−9. (15) Private communication with Aaron Gilad Kusne.

AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. *E-mail: [email protected]. ORCID

Alfred Ludwig: 0000-0003-2802-6774 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors are grateful to the DFG for financial support within the SPP1613 (LU1175/10-1, LU1175/10-2). H.S.S. acknowledges a Ph.D. fellowship from the International Max Planck Research School for Surface and Interface Engineering (IMPRS-SurMat). S.J. and H.S. acknowledge support from the H

DOI: 10.1021/acscombsci.6b00151 ACS Comb. Sci. XXXX, XXX, XXX−XXX