Neural-Network-Biased Genetic Algorithms for Materials Design

Publication Date (Web): December 20, 2016. Copyright ... Machine Learning-Based Experimental Design in Materials Science. Thaer M. Dieb , Koji Tsuda. ...
2 downloads 0 Views 3MB Size
Research Article pubs.acs.org/acscombsci

Neural-Network-Biased Genetic Algorithms for Materials Design: Evolutionary Algorithms That Learn Tarak K. Patra, Venkatesh Meenakshisundaram, Jui-Hsiang Hung, and David S. Simmons* Department of Polymer Engineering, The University of Akron, 250 South Forge Street, Akron, Ohio 44325, United States S Supporting Information *

ABSTRACT: Machine learning has the potential to dramatically accelerate high-throughput approaches to materials design, as demonstrated by successes in biomolecular design and hard materials design. However, in the search for new soft materials exhibiting properties and performance beyond those previously achieved, machine learning approaches are frequently limited by two shortcomings. First, because they are intrinsically interpolative, they are better suited to the optimization of properties within the known range of accessible behavior than to the discovery of new materials with extremal behavior. Second, they require large pre-existing data sets, which are frequently unavailable and prohibitively expensive to produce. Here we describe a new strategy, the neuralnetwork-biased genetic algorithm (NBGA), for combining genetic algorithms, machine learning, and high-throughput computation or experiment to discover materials with extremal properties in the absence of pre-existing data. Within this strategy, predictions from a progressively constructed artificial neural network are employed to bias the evolution of a genetic algorithm, with fitness evaluations performed via direct simulation or experiment. In effect, this strategy gives the evolutionary algorithm the ability to “learn” and draw inferences from its experience to accelerate the evolutionary process. We test this algorithm against several standard optimization problems and polymer design problems and demonstrate that it matches and typically exceeds the efficiency and reproducibility of standard approaches including a direct-evaluation genetic algorithm and a neural-networkevaluated genetic algorithm. The success of this algorithm in a range of test problems indicates that the NBGA provides a robust strategy for employing informatics-accelerated high-throughput methods to accelerate materials design in the absence of preexisting data. KEYWORDS: materials design, machine learning, optimization, neural network, genetic algorithm, Ising model, compatibilizer, polymers, soft matter, molecular dynamics simulation



INTRODUCTION The task of designing a material with a targeted set of properties is a classical inverse problem, in most cases requiring iterative solution of the forward problem of material property characterization or prediction.1−4 Efficient materials design therefore requires two elements: a rapid method for solving the forward problem of material property characterization or prediction and a design strategy minimizing the number of candidate materials assessed en route to the ultimate target material. In the context of biomolecules and some hard materials, progress has been made in addressing these needs via the combination of machine learning tools such as artificial neural networks and formal optimization algorithms such as genetic algorithms.5−8 However, the extension of these approaches to the design of soft materials has been limited by several factors. First, whereas informatics-based design of biomolecules has been accelerated by the availability of large, open, and homogeneous structure−property relation databases such as the Protein Data Bank, such data in soft materials tend to be sparse, heterogeneous, and often outright unavailable.9,10 Second, because of a general lack of rapid, parallelizable © 2016 American Chemical Society

solutions to the forward problem in these materials, building such databases de novo for a particular design problem is often cost-prohibitive. Third, many critical design problems in soft materials, such as the search for highly flexible barrier materials,11 target materials properties well outside the range of those previously achieved; since machine learning methods are intrinsically interpolative, they tend to become less reliable when searching for these types of extremal materials. To address this challenge, here we describe a new strategy for combining machine learning and genetic algorithms with highthoughput methods such as combinatorial experiments12−15 or molecular simulation to enable the efficient design of soft materials in the absence of pre-existing databases and without an explicit reliance on interpolative strategies that perform poorly in the discovery of extremal materials. This new algorithm is related to a family of active learing strategies wherein the predictions of a machine learning tool are experimentally or computationally tested, with new results Received: September 7, 2016 Revised: November 23, 2016 Published: December 20, 2016 96

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science employed to improve the machine learning model.16,17 Unlike prior approaches in this area, the present strategy combines this concept with a genetic algorithm, leveraging the combined strengths of evolutionary design, machine learning, and active learning. Before describing this strategy and its application to a number of materials-related design test problems, we first review central elements of hybrid machine-learning- and optimization-algorithm-based approaches to materials design. Genetic Algorithms for Materials Design. We begin by reviewing the overall structure of a molecular or materials design cycle within the sort of formal design process discussed above. The central component of this approach is the use of an optimization algorithm for selection of candidate materials en route to target properties. The optimization algorithm may range from a simple steepest-descent method to a metaheuristic optimization algorithm such as simulated annealing, a genetic algorithm, or a particle swarm algorithm. In any of these cases, the general approach is to begin with one or more initial candidates, to compute an objective function quantifying their properties relative to targeted values, and then to iteratively select new candidates chosen in an effort to converge to a desired value of the objective function. The process continues until some criterion for convergence is satisfied or it is terminated because of a maximal time constraint. The various optimization algorithms differ primarily in the manner by which new candidates are iteratively selected. For example, steepestdescent methods compute new candidates via a gradient approach, genetic algorithms via evolutionary operators, and particle swarm methods via an effective candidate “velocity” in parameter space.18 Here we focus on the genetic algorithm (GA) as a widely used and versatile strategy for material property optimization and design.19 Genetic algorithms have seen increasing use in the field of polymers20−23 and other materials design problems.24−29 Genetic algorithms aim to mimic the process of natural selection to optimize a system’s properties. The schematic structure of a standard genetic algorithm as typically employed in materials design is shown in Figure 1. A random population of candidate materials is initially

assay anywhere from hundreds to hundreds of thousands of candidates. Many of the details of this process, such as the population size, the rate of mutations, and the details of selection and crossover, are subject to problem-specific optimization in an effort to minimize the number of candidates tested en route to convergence. The key bottleneck determining both the accuracy and the overall time scale of the GA is the fitness determination step. For example, if one were to spend 1 day experimentally synthesizing and characterizing each candidate material (a reasonable or even unrealistically short time in many soft materials applications), even a modest genetic algorithm would be impractical, requiring years to decades to perform. For some problems, high-throughput or combinatorial experimental methods are available in order to allow experiment-based fitness assessment, but for many soft matter problems there is no established high-throughput route for material synthesis and characterization. GAs have therefore seen greater use in problems for which efficient computational prediction of material properties is possible.23,30 While this works very well in some problems, in others it faces a trade-off in which any loss of prediction accuracy with fast property prediction can lead the GA to incorrectly converge to undesirable candidates. For this reason, a common strategy bases fitness assesment on heuristic models built against pre-existing data sets. This strategy leverages pre-existing data to provide a rapid, data-based solution to the forward problem of materials property prediction (fitness determination).31−34 In modern approaches, these heuristic models are generally built via nonlinear machine learning methods rather than through simple regression. Artificial Neural Networks for Materials Machine Learning. As introduced above, a variety of machine learning methods have been used to predict material properties in diverse applications.35−38 Among a range of machine learning tools applicable to materials design, here we focus on the artificial neural network (ANN) as one of the most widely employed machine learning strategies. An ANN attempts to reproduce the brain’s logical operation using a collection of neuron-like nodes to perform processing. As with other machine learning methods, an ANN aims to build a nonlinear heuristic model for the relationship between the input variables (e.g., molecular structure descriptors) and output variables (e.g., material performance properties). A schematic of a typical back-propagation ANN is shown in Figure 2; it consists of nodes organized into an input layer, one or more hidden layers, and one output layer. The input layer consists of nodes that represent all of the required descriptors of a structure. Similarly, the output layer contains nodes that represent predicted properties of interest. Each node in the hidden layers consists of a compute function that transforms the sum of the weighted input values (consisting of all output values from the previous layer) into an output value that is passed to the next layer.39 The compute function, which is also known as the activation function, is in general a sigmoidal function.40 The layers are arranged in the form of a directed acyclic graph, with each node of each layer receiving input from all nodes of the prior layer but not from nodes in the same or later layers. Every layer has a bias node that does not receive any input from its previous layer, and it provides a constant bias to all of the nodes in its succeeding layer. The optimal numbers of hidden layers and nodes in a hidden layer have no universal values but are selected in an effort to maximize the efficiency and accuracy of the ANN. The network is “trained” prior to use by optimizing

Figure 1. Schematic representation of a genetic algorithm.

selected; each material’s chemical structure is mapped to a genetic representation; the fitness (the term for the value of the objective function in the context of evolutionary algorithms) of each candidate is determined; and genetic operators, including selection, crossover, and mutation, are then employed to obtain the next generation of candidates. This process is iterated until targeted properties are achieved within some acceptance criteria or a time constraint is reached. Typically, a GA includes tens to several hundred candidates per generation and iterates over tens to thousands of generations, such that it may ultimately 97

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science

Figure 2. Schematic representation of an artificial neural network with a single output node. The connection between any two nodes has a unique weight that is initialized randomly and is optimized during training. The compute nodes are denoted by f. The bias nodes, which are denoted by b, provide a constant bias to all of the nodes in the next layer. The arrows pointing to the input layer nodes represent the input information to the network. Similarly, the arrow originating from the output layer node represents the output of the network. All information transfer during prediction is to the right. Figure 3. Schematic representation of an artificial-neural-networkevaluated genetic algorithm.

all of the weights in the network (one of which is defined between every pair of nodes in adjacent layers) to achieve a best fit to pre-existing data. This is done via a back-propagation algorithm.41 The back-propagation algorithm calculates the gradient of an error function, which is the square of the difference between the target and actual outputs, with respect to all of the weights in the network. It then uses an optimization method such as gradient descent to update the weights in order to minimize the error function. Artificial neural networks have been employed to establish materials quantitative structure−property relations (MQSPRs) in a variety of applications. For example, machine learning models trained on quantum-mechanical data have been used to screen dielectric properties of polymers42,43 and predict atomization energies of molecules,44 and molecular dynamics simulations of molten and crystal silicon have been conducted with on-the-fly machine learning of quantum-mechanical forces.45 In addition, ANNs have been used to successfully predict the lower critical solution temperature of polymers,46 the glass transition temperature,47−49 and various other polymer properties.50−52 Neural-Network-Evaluated Genetic Algorithms. Because of their versatility, ANNs have seen wide use in the prediction of material properties for fitness determination within materials-related genetic algorithms. This strategy, wherein an ANN is directly employed for fitness assessment within a genetic algorithm, is frequently called a neuralnetwork-assisted genetic algorithm.33 To avoid confusion with the alternate strategy described later in this paper, here we refer to this type of approach as a neural-network-evaluated genetic algorithm (NEGA), reflecting the fact that fitness evaluation directly employs neural network predictions. A schematic representation of this algorithm is shown in Figure 3. In problems such as protein design, for which large pre-existing databases are available on which to train the machine learning tool, this is generally a preferred method, as it leverages the large sunk cost of database generation and becomes comparably inexpensive. However, there exist a large class of materials design problems, many of them in the area of polymers and

other soft matter, to which this type of approach is poorly suited for two reasons. First, machine learning tools typically require training against large sets of pre-existing data. Whereas large sets of data are available for some biomolecular problems such as protein structure, they are absent or sparsely populated in many soft materials problems. In many cases, these databases would be prohibitively expensive to produce, again because of the typical lack of an efficient solution to the forward problem of materials synthesis and characterization. These systems would therefore benefit from modified machine learning strategies eliminating the need for large pre-existing databases. Second, because machine learning methods are intrinsically interpolation algorithms, they are primarily suited to optimization within the range of properties spanned by available data and can perform poorly when applied to the design of extremal materials. However, many of the current design challenges in soft materials, such as nanostructured polymers for gas storage, polyelectrolytes for energy storage devices,53−55 and sequence-specific copolymers for directing the assembly of nanoparticles56 specifically aim to push these materials’ properties beyond the currently accessible range of behavior. More broadly, the direct use of an ANN for fitness assessment in the genetic algorithm means that the success of the design cycle is determined by the accuracy of the ANN predictions because the fitness of each candidate is not verified explicitly via experiment or direct simulation during the evolution of the genetic algorithm. If an optimal material lies in an area of parameter space that is poorly modeled by the ANN (perhaps because it was poorly sampled in the data set), there is therefore a high likelihood that it will be missed.



NEURAL-NETWORK-BIASED GENETIC ALGORITHM These challenges demand a new strategy for the combined use of machine learning and design tools to solve materials design problems for which few prior data exist and targeted material properties lie outside those previously achieved. Here we 98

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science

Figure 4. Schematic representation of the artificial-neural-network-biased genetic algorithm. It is driven by a core simulation-evaluated (directevaluated) genetic algorithm, each generation of which incorporates a suggestion from an artificial-neural-network-evaluated genetic algorithm using an artificial neural network that is progressively trained from accumulated data of the core genetic algorithm.

algorithm’s progressively improving projection of likely top candidates. The fitness of each candidate in the genetic algorithm, however, including those introduced by the ANN, is ultimately assessed by direct simulation (or in principle by highthroughput experiment), such that the applicability of the algorithm is not limited to the range of previously available data. The genetic algorithm’s selection pressure in turn biases data collection toward candidates closer to the optimum, improving the ANN’s likely prediction quality in this vicinity. Moreover, any pre-existing data, if available, can be incorporated into the ANN prior to the outset of the genetic algorithm. In order to test the performance of this new algorithm, we compare the efficiency of the ANN-biased genetic algorithm (NBGA) with those of an ANN-evaluated genetic algorithm (NEGA) and a purely direct evaluation genetic algorithm with no ANN acceleration (GA). We hold most details of the genetic algorithm, other than the manner of fitness assessment and coupling to an ANN, constant for the three strategies as follows. For the purposes of testing, we focus on binary genetic algorithms in which each gene in the genome has two possible alleles. The algorithm begins with a randomly generated initial population as shown in Figure 4. In order to produce the next generation after fitness assessment, two parents are randomly selected from among the candidates of a particular generation using roulette wheel selection with self-crossover prohibited.57,58 Two-point crossover is used to combine the selected parents, and point mutations are applied to new candidates at a per-gene rate of 0.01. Linear scaling of fitness values is employed to maintain consistent selection pressure. An elitism scheme, in which the single best candidate in each

describe a new strategy, which we denote the artificial-neuralnetwork-biased genetic algorithm (NBGA), for coupling a genetic algorithm and machine learning to solve this type of design problem by guiding the online generation of data via a high-throughput method. This strategy, illustrated schematically in Figure 4, has three components: (1) Employ a direct-evaluated genetic algorithm (i.e., a genetic algorithm that performs fitness evaluations directly via simulation or experiment) as the main optimization tool in order to ensure that the domain of applicability of the optimization scheme is not limited by the range of pre-existing data. (2) Employ the data progressively generated by the directevaluated genetic algorithm to continuously train an artificial neural network, obviating the need for preexisting data. (3) Accelerate the GA using the progressively trained ANN by introducing into each generation of the GA the best projected candidate identified by a separate ANNevaluated genetic algorithm that uses the ANN as trained on all of the data accumulated up to that point. In effect, this strategy gives the genetic algorithm the ability to “learn” from accumulated data, in contrast to standard genetic algorithms, which have no memory of prior attempts beyond their accumulated impact on the instantaneous genetic distribution. In principle, this strategy combines the best features of both genetic algorithms and neural networks. The neural network accelerates the progress of the direct-evaluation genetic algorithm toward a system optimum by regularly introducing predicted top-performing candidates into the genetic algorithm based on the ANN-evaluated genetic 99

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science

ANN-biased genetic algorithm. On the other hand, if the suggested candidate has a low fitness compared with the population, then the basic principles of fitness-based selection indicate that its genes are unlikely to be retained and bias the GA in the wrong direction. Because only a single candidate out of 32 per generation in our ANN-biased GA is obtained from the ANN, there is little cost in terms of the number of genetically selected candidates. Therefore, for problems in which direct fitness evaluation is much slower than ANN training and NEGA operation (a common situation in soft matter), the total time and computational resources required remain very similar to those for a direct-evaluation GA while offering the potential for considerable acceleration. Below we describe the use of the NBGA to determine the optimal properties of several materials-related systems. We first test the scheme against standard well-known “test” optimization models, including maximizing the presence of a gene in a genome and finding the ground states of 1D and 2D Ising models. We then test the scheme’s performance in several model polymeric material design problems, including maximizing a polymer’s density and optimizing the sequence of a copolymer that minimizes the interfacial energy between two immiscible polymer domains. In test applications of this algorithm to materials problems in the absence of pre-existing data, it typically outperforms both an unaugmented genetic algorithm and a neural-network-evaluated genetic algorithm. Unlike a NEGA, even when the neural network predictions are poor, the performance of the combined algorithm is comparable to that of the neat direct-evaluated genetic algorithm (GA) because the NBGA does not directly rely upon the neural network for property prediction.

generation is directly transferred to the next generation, is employed to accelerate convergence.59 In addition to the variable strategies for incorporating an ANN, one additional modification is made to a standard genetic algorithm in order to optimize the performance for soft matter problems in which simulation-based evaluation is relatively slow. In these types of problems, the time for the genetic algorithm to complete is dominated by the simulation time rather than by the algorithmic cost of the genetic algorithm itself. It is therefore desirable to avoid any repeat fitness evaluations and to keep the total number of new evaluations (and thus the computational resource needs) fixed from generation to generation. For this reason, the genetic algorithm stores a complete database of all prior fitness evaluations as it proceeds. Any time a previously evaluated candidate is encountered in the genetic algorithm, its fitness is drawn from the database rather than freshly determined. The size of the population is then also temporarily increased by one in order to maintain a constant number of new fitness evaluations. The result of this approach is to progressively increase the effective population size as more data are accumulated. When fitness evaluation is computationally expensive, this enhancement has a negligible computational cost compared with the per-generation computational time. Similarly, a uniform ANN design is used in all cases for comparability. The number of neurons in the input layer is the same as the number of genes in the associated genetic algorithm genome, and one neuron is employed in the output layer, with this output consisting of the candidate fitness. Additional details of the ANN architecture as implemented for each test problem are provided in the Supporting Information. The ANN is trained against all available data using the backpropagation algorithm, at which point it is used to predict the fitness of unknown candidates. In both the standard ANNevaluated genetic algorithm and the ANN-biased genetic algorithm, the top candidate predicted by the ANN is then identified by employing the ANN to predict fitness within a genetic algorithm run for 100 generations with 100 candidates per generation. The standalone NEGA and the NEGA incorporated into the ANN-biased genetic algorithm therefore do not differ in structure but purely in the nature of the data used for training their ANNs: in the case of the standalone NEGA, the training data are taken from a pregenerated data set consisting of randomly chosen candidates; in the ANN incorporated into the NBGA, the training data consist of a continually updated data set generated via the genetic algorithm itself. Additional details of the implementation of the ANN-biased genetic algorithm are as follows. After each generation of the genetic algorithm, an updated database including the fitness of all candidates assessed up to that time is used to train an ANN via the back-propagation algorithm, as described above. An ANN-evaluated genetic algorithm is then run using 100 candidates per generation and 100 generations. The single best candidate identified by the NEGA is then sent to the ANN-biased genetic algorithm and incorporated into the next generation’s population along with candidates introduced via standard genetic operations and elitism. This candidate is then subject to the same direct fitness evaluation as all of the other candidates, and this process is iterated every generation over the course of the ANN-biased genetic algorithm. Within this process, if the suggestion is among the top candidates, it is likely to accelerate the convergence of the



TEST APPLICATIONS In order to assess the efficiency and capabilities of this new design strategy, we compare the performance of the ANNbiased GA (NBGA) to those of a standard GA and a NEGA for several optimization and material design problems. We begin with three simple test cases in which the optimal solution is known and fitness determination is rapid. Then the NBGA is used for two additional polymer optimization problems where the solution is unknown and the search space is expected to be complex. We compare the NEGA, NBGA, and GA by plotting the value of the fitness function (or related target property) versus the total number of direct fitness evaluations employed in the optimization. For materials design problems in which simulation- or experiment-based fitness assessment is slow, this count of the number of times one must directly solve the forward problem of materials characterization is the most important measure of the resources necessary to complete design process. For the NBGA and direct-evaluated GA, this is simply the number of new evaluations per generation times the generation index. For the NEGA, this is the size of the data set employed for training its ANN. Data for the GA and NBGA therefore represent the average progress of a single GA with an increasing number of generations, whereas data for the NEGA represent the improvement in the end output of the NEGA as the training set size is increased. Ground State of the Ising Model. The Ising model, which originated as a model for understanding magnetism, is relevant to a range of materials, optimization, and graph theory problems.60,61 Because of this generality and because its ground state is analytically known, this model has long served as a 100

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science

Figure 5. Performances of the direct-evaluation GA, NEGA, and NBGA, with the mean fitness of the candidate suggested by the ANN within the NBGA at each generation also shown, for the (a) 1D and (b) 2D Ising models described in the text. The solid line in each panel represents the ground state of the system. For the NEGA and the suggested candidates within the NBGA, the y axis corresponds to the directly calculated energy of the best identified candidate rather than the energy predicted by the associated ANN. The data are averaged over 10 different trials, each with a different initial configuration.

Figure 6. Run-to-run standard deviation of the optimal energy as a function of the number of direct fitness evaluations for the GA, NEGA, and NBGA for (a) 1D and (b) 2D Ising model problems.

GA nor the NEGA could reliably identify the Ising ground state even after direct calculation of the fitness of 5000 candidates in the case of the 1D Ising model and 2500 candidates in the case of the 2D Ising model. In contrast, the NBGA converges to the ground state in both cases. At a more detailed level, the selection pressure within the GA and NBGA leads to reliable improvement in fitness even at low generations, where the NEGA sometimes underperforms because of a lack of sufficient training data. At the same time, the NBGA outperforms both the NEGA and direct-evaluation GA in the limit of large numbers of fitness evaluations, where the inherent scatter in the NN prediction limits the ability of the NEGA to reach perfect convergence but the combined selection pressure and data usage of the NBGA facilitates rapid convergence. In both the 1D and 2D cases, introduction of ANN biasing leads to an improvement over the direct-evaluation GA, especially at long times when a substantial data set has been accumulated. In order to understand the biasing of the genetic algorithm of the NBGA by its ANN, we also show in Figure 5 the mean directly evaluated fitness of the candidates suggested by the NEGA within the NBGA. Notably, the predictions of the

benchmark for testing new optimization and machine learning algorithms.62−65 The GA has been used to study many Isinglike systems.66−69 Here we employ the NBGA along with NEGA and GA to determine the ground states of 40-spin 1D and 36-spin 2D (square lattice) Ising models. In its most general form, the Ising model consists of a grid or lattice arrangement of elements, which are called spin variables. A spin has two states and interacts with neighboring spins only. The total energy of the Ising model is then N

E=

∑ Jij SiSj i,j

(1)

where Si = −1 or +1 and Jij is the interaction strength between the ith and jth spins. Here we employ Jij = 1.0 for neighboring spins and 0 otherwise. The systems are treated as periodic. In the ground state of this model, two neighboring spins have opposite sign. As shown in Figure 5, the NBGA outperforms both the NEGA and direct-evaluation GA in terms of the number of fitness assessments needed to converge to the analytically known ground state. Neither the direct-evaluation 101

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science

fully converge to the analytically known optimum even with very large training sets. Again, this is a consequence of the inherent noisiness in the predictions of regressive methods, as can be seen from the run-to-run standard deviation shown in Figure 8. Even when the coefficient of determination of the

NEGA that is trained off of GA-selected candidates tend to exhibit lower energy (equivalent to higher fitness) than those of the NEGA trained on randomly selected candidates for large numbers of fitness evaluations. This confirms the proposition that by preferentially selecting candidates that are closer to the global optimum than randomly distributed candidates, the NBGA leads to an ANN that makes better predictions in the vicinity of the optimum. In essence, in addition to biasing the GA toward candidates suggested by the ANN to be optimal, this algorithm biases the training of the ANN toward more optimal regions of parameter space. This leads to a favorable feedback loop between the central NBGA and the biasing neural network, accelerating convergence toward the global optimum. Further, the run-to-run variation of the NBGA is considerably reduced relative to the other algorithms, as shown in Figure 6. The variability tends to drop with increasing number of generations, tending toward zero for long runs. In other words, in addition to improving upon the efficiency of the genetic algorithm or ANN alone, the NBGA yields much more reproducible results. In contrast, the results of evolutionary or machine learning methods alone tend to be stochastic and to include a degree of irreproducibility. Gene Maximization. In order to test a case in which the ANN is expected to be extremely accurate, here we test the NBGA on a simple problem in which we seek to maximize the content of one gene in the genome. In effect, here we are solving the simple problem of optimizing for “all ones”. Therefore, the fitness function can be written as E=

∑ gi i

Figure 8. Run-to-run standard deviation of the optimal energy as a function of the number of direct fitness evaluations for the gene maximization problem.

machine learning model predictions is high, stochasticity confounds reliable prediction of the single best candidate. The NBGA, on the other hand, leverages the rapid early convergence of its advisory NEGA to likewise exhibit rapid early improvement in fitness, but its selection pressure enables it to converge reliably to the optimum within approximately 50 generations (1600 candidates). This behavior is particularly striking in contrast to the direct-evaluation genetic algorithm. Because this GA is a stochastic method that does not draw inferences from accumulated data, it is unable to take as great advantage of the simple correlation between genotype and fitness as does the NEGA or NBGA. In summary, as with identification of the ground state of the Ising model, the NBGA yields both faster convergence and higher reproducibility relative to the other algorithms tested here. Indeed, by 2000 direct fitness evaluations, the optimum identified by the NBGA is found to be fully reproducible. Polymer Density Optimization. We now move to a pair of test problems that are more representative of design problems in polymers and other soft materials: the design of copolymerspolymers consisting of two distinct types of repeat unit. We first focus on the problem of designing a binary copolymer with maximal density. Developing materials with targeted density is a fundamental material design problem.70−73 More generally, control of polymer packing is central to the rational design of their structure, rheology, porosity, and many other properties.74−76 Here we focus on a simple model system in which, out of two possible monomers, one has a higher cohesive energy and thus favors a higher density. As in the case above, the optimal solution here will therefore be a genome with all beads of a single type. However, despite sharing the same optimum as the prior test problem, this problem is likely to have a more complicated fitness landscape due to potentially nontrivial interactions between adjacent repeat units. We base this optimization on molecular dynamics simulations employing a standard coarse-grained model of the

(2)

where gi = 0 or 1, representing the binary value of a gene in the genome. Because of the extreme simplicity of the resulting fitness landscape, this is expected to be a case in which an ANN should perform very well. As shown in Figure 7, this is indeed the case: the NEGA exhibits a very rapid improvement in fitness even for small data sets. However, after a rapid improvement in predictions with small data sets, it does not

Figure 7. Evolution of the GA, NEGA, and NBGA for gene optimization. The average fitness of candidates suggested to the NBGA by its associated neural network is also shown. A binary chromosome of 100 genes is considered. A gene is represented by 0 or 1 in the chromosome. The chromosome is optimized for a gene whose value is 1. The maximum fitness is 100, where all of the genes have the value 1. The data are averaged over 10 different trials, each with a different initial configuration. 102

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science polymer.77 The details of the model and the molecular dynamics simulations are described in the Supporting Information. Because these simulation-based fitness evaluations require far more time than those in the prior tests, here it is not feasible to generate the large number of independent data sets needed to train independent ANNs against a wide range of statistically independent training sets of different sizes, as was done in the prior sections. Here we therefore focus on the performance of the NBGA relative to a direct-evaluation genetic algorithm. As shown by Figure 9, incorporation of ANN biasing into the genetic algorithm reduces the number of generations or the

ANN can accelerate the genetic-algorithm-based design of sequence-specific copolymer compatibilizers, the design and physics of which we consider at greater length in a separate publication.84 Figure 10 shows a snapshot of a molecular

Figure 10. MD snapshot of two immiscible polymer matrixes with compatibilizer polymers at the interface. Gray and orange colors represent two different chemical moieties.

dynamics simulation of a typical system rendered in VMD,85 where gray and orange regions are two immiscible polymers. Each polymer domain consists of linear 20-bead homopolymer chains, one comprising type-0 and the other type-1 beads. The genetic algorithm seeks to optimize the sequence of a linear 20bead copolymer chain species comprising a combination of type-0 and type-1 beads. These copolymer chains tend to localize to the interface and thereby reduce its energy (interfacial tension). The model and 611 methodological details are described in the Supporting Information. The nonbiased GA and the NBGA are used separately to determine the chain sequence minimizing the interfacial energy, with both again employing 32 new candidate fitness evaluations per generation. Figure 11 shows the interfacial energies

Figure 9. Number density of the best polymer melt candidate plotted versus the number of direct fitness evaluations by the algorithms. The reported densities for each algorithm are averaged over five different runs of that algorithm, and error bars are the standard deviation of the results from these five runs.

total number of fitness determinations required for convergence by more than a factor of 2. Consistent with the basic physics of the problem, the converged sequence is a chain of all type-0 monomers. Beyond this general enhancement in convergence rate, it is notable that the best candidates identified by the NBGA-integrated ANN are generally of higher fitness than those identified by the standalone NEGA. This again indicates that training the ANN on data selected by the GA to approach the optimum leads to better ANN predictions near the optimum than does training the ANN on randomly chosen candidates. This leads to a favorable feedback loop accelerating convergence. Moreover, as shown by the error bars in this figure, the NBGA involves much less variability in the convergence rate than does the direct-evaluation GA. Again, this is consistent with the finding in prior sections of improved reproducibility of the NBGA-identified optimum relative to a GA or ANN alone. In essence, because the NBGA evolutionary process is externally biased toward a projected global optimum, it tends to converge both more rapidly and less stochastically toward this optimum. Designing High-Performance Copolymer Compatibilizers. Finally, we consider a polymer design problem in which the optimal outcome is not known and the fitness landscape is likely to be far more complex: the design of a sequence-specific copolymer compatibilizer. Copolymer compatibilizers are employed to improve the thermodynamic stability of polymer interfaces, and they therefore have wide applicability in emulsions and composite materials.78−80 Block copolymers and random copolymers have long been used for this purpose.81−83 Here we consider whether incorporation of an

Figure 11. Interfacial energy between the two immiscible homopolymer blocks in the presence of compatibilizer polymers plotted versus the number of direct fitness evaluations. The data are averaged over five different runs of each optimization method.

achieved at different generations of the two algorithms. As can be seen in this figure, in this case the two methods are largely equivalent within the run-to-run variation. As can be seen from the interfacial energies of the suggestions provided to the NBGA by its NEGA, this is a consequence of very poor suggestions by the NEGA. In other words, the ANN as implemented here is simply unable to identify top candidates in this case, and its suggested candidates therefore do not 103

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science

weaknesses of data-based approaches. At the same time, by biasing this algorithm via insertions of top candidates predicted on the basis of a progressively constructed neural network, the strategy leverages accumulated data to draw heuristic interferences and therefore speed convergence to the global optimum. Moreover, the neural networks trained via this strategy tend to be better at prediction near the global optimum than a comparable neural network trained on randomly distributed data because the biased genetic algorithm tends to produce data weighted toward the vicinity of the optimum. In practice, this artificial-neural-network-biased genetic algorithm (NBGA) is shown to outperform both a nonbiased genetic algorithm and a neural-network-evaluated genetic algorithm based on a pre-existing data set in a number of test applications. This accelerated convergence comes at little additional computational cost relative to a standard genetic algorithm. Although the ANN model in this study uses raw genes as its input, it can be further improved by developing better descriptors. Similarly, the topology and training period of the ANN can be optimized on the fly, every time a new training data set is available, to enhance the performance of the algorithms. Finally, in problems where pre-existing data are available, it is possible to “pretrain” the artificial neural network employed by the ANN-biased genetic algorithm to leverage these pre-existing data and further speed convergence. In effect, this strategy enables the genetic algorithm to “learn” from its history and thereby accelerate the design process beyond the rate of an evolutionary process alone. In addition to this acceleration, the combination of selection pressure with machine-learning prediction in the NBGA yield much more reproducible results than either an artificial neural network or genetic algorithm alone. The artificial-neural-network-biased genetic algorithm appears to provide a promising substitute for the sole use of either artificial neural networks or genetic algorithms for materials design. Beyond the test problems considered here, this NBGA is completely generic and in principle can be used for any optimization problem from information science to material design or engineering design. More broadly, the general strategy of employing a machine learning tool to bias a genetic algorithm can in principle be extended beyond artificial neural networks to the use of other techniques such as Bayesian learning and deep learning in problems for which those machine learning strategies are more effective. Although we have employed computer simulations for fitness evaluation in the test cases, this method can also be extended to high-throughput experimental design of materials. We expect this general GA-biasing strategy to be of particular value in the design of polymers and other soft materials, where large data sets are commonly absent and where societal challenges ranging from better separation membranes to more stable and energy-rich batteries demand polymeric material properties well beyond those previously achieved.

appreciably accelerate the GA. On the other hand, consistent with the discussion in the description of this method, the incorporation of a single “bad” suggestion each generation into the GA evidently does not reduce its efficiently appreciably; instead, any bad suggestion is simply lost from the gene pool and does not bias the algorithm in an unfavorable direction. We note that typical values of the coefficient of determination (R2) for the agreement between the training data and their ANN predictions remains quite highin the vicinity of R2 = 0.9 for these systems. The predictions of the ANN are therefore generally fairly good, but it performs poorly in predicting the structure of “extremal” individuals based on a broader data set. It is possible that this problem could be overcome by further optimization of the machine learning tool for this particular problem; strategies such as adaptive selection of neural network structure,16 use of input descriptors to the ANN beyond the genes themselves,35 or more sophisticated ANN-training algorithms could conceivably yield better predictions. However, the present ANN serves as a good model for the performance of the NBGA in the case in which the machine learning tool has difficulty extrapolating from available data to targeted candidates with extremal properties. This is precisely the type of problem that one can encounter in soft materials when attempting to design a material with properties well outside the range of those previously shown. This case study demonstrates that the NBGA is capable of efficient materials design in these cases even when its artificial neural network is unable to make good predictions. In other words, the NBGA makes use of accumulated data when it is advantageous to do so but it is not led astray by poor heuristic models when they prove to be unsuitable for the problem at hand.



CONCLUSIONS One of the longstanding challenges in soft materials design is an insufficiency of well-curated, publically available data broadly relating structure to properties in a wide range of polymers and other soft materials. In general, polymers lack both the diverse databases that have accelerated data-based biomolecular materials design (e.g., the Protein Data Bank) and the rapid and well-validated theoretically based property predictions that have accelerated design in some hard materials problems. Leveraging data-based and optimization-based methods for materials design in these systems therefore requires a new strategy for their use in problems lacking pre-existing data and involving slow property determination. Compounding this problem, data-based methods can break down when one seeks to design a material well outside the range of those previously discovered. This type of challenge arises frequently in soft materials, such as in the design of materials that are stiff yet permeable, flexible yet impermeable, or soft yet highly tough or that aim to incorporate other combinations of materials that are commonly “antithetical”.86−88 This combination of challenges demands a new materials design framework that is robust against breakdown of data-based machine learning methods and can function in the absence of pre-existing data yet leverages machine-learning methods to accelerate high-throughput design. In this work, we have established a design strategy enabling the use of informatics without pre-existing data. By employing direct, simulation-based fitness assessment in its core genetic algorithm, this approach enables the design of materials well outside the range of pre-existing data, allaying one of the central



ASSOCIATED CONTENT

* Supporting Information S

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscombsci.6b00136. Additional methodological details and additional data on neural network performance (PDF) 104

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science



to Melting Temperatures of Single- and Binary-Component Solids. Phys. Rev. B: Condens. Matter Mater. Phys. 2014, 89 (5), 054303. (18) Hassan, R.; Cohanim, B.; de Weck, O.; Venter, G. A Comparison of Particle Swarm Optimization and the Genetic Algorithm. In 46th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference; American Institute of Aeronautics and Astronautics: Reston, VA, 2004. (19) Chakraborti, N. Genetic Algorithms in Materials Design and Processing. Int. Mater. Rev. 2004, 49 (3−4), 246−260. (20) Arora, V.; Bakhshi, A. K. Molecular Designing of Novel Ternary Copolymers of Donor-Acceptor Polymers Using Genetic Algorithm. Chem. Phys. 2010, 373 (3), 307−312. (21) Mitra, K. Genetic Algorithms in Polymeric Material Production, Design, Processing and Other Applications: A Review. Int. Mater. Rev. 2008, 53 (5), 275−297. (22) Kasat, R. B.; Ray, A. K.; Gupta, S. K. Applications of Genetic Algorithm in Polymer Science and Engineering. Mater. Manuf. Processes 2003, 18 (3), 523−532. (23) Jaeger, H. M.; de Pablo, J. J. Perspective: Evolutionary Design of Granular Media and Block Copolymer Patterns. APL Mater. 2016, 4 (5), 053209. (24) Srinivasan, B.; Vo, T.; Zhang, Y.; Gang, O.; Kumar, S.; Venkatasubramanian, V. Designing DNA-Grafted Particles That SelfAssemble into Desired Crystalline Structures Using the Genetic Algorithm. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (46), 18431− 18435. (25) Chua, A. L.-S.; Benedek, N. A.; Chen, L.; Finnis, M. W.; Sutton, A. P. A Genetic Algorithm for Predicting the Structures of Interfaces in Multicomponent Systems. Nat. Mater. 2010, 9 (5), 418−422. (26) Fornleitner, J.; Lo Verso, F.; Kahl, G.; Likos, C. N. Genetic Algorithms Predict Formation of Exotic Ordered Configurations for Two-Component Dipolar Monolayers. Soft Matter 2008, 4 (3), 480− 484. (27) Deaven, D. M.; Ho, K. M. Molecular Geometry Optimization with a Genetic Algorithm. Phys. Rev. Lett. 1995, 75 (2), 288−291. (28) Kanters, R. P. F.; Donald, K. J. Cluster: Searching for Unique Low Energy Minima of Structures Using a Novel Implementation of a Genetic Algorithm. J. Chem. Theory Comput. 2014, 10 (12), 5729− 5737. (29) Khaira, G. S.; Qin, J.; Garner, G. P.; Xiong, S.; Wan, L.; Ruiz, R.; Jaeger, H. M.; Nealey, P. F.; de Pablo, J. J. Evolutionary Optimization of Directed Self-Assembly of Triblock Copolymers on Chemically Patterned Substrates. ACS Macro Lett. 2014, 3 (8), 747−752. (30) Le, T. C.; Winkler, D. A. Discovery and Optimization of Materials Using Evolutionary Approaches. Chem. Rev. 2016, 116 (10), 6107−6132. (31) Yasin, Y.; Ahmad, F. B. H.; Ghaffari-Moghaddam, M.; Khajeh, M. Application of a Hybrid Artificial Neural Network−genetic Algorithm Approach to Optimize the Lead Ions Removal from Aqueous Solutions Using Intercalated Tartrate-Mg−Al Layered Double Hydroxides. Environ. Nanotechnol. Monit. Manag. 2014, 1−2, 2−7. (32) Balasubramanian, M.; Paglicawan, M. A.; Zhang, Z.-X.; Lee, S. H.; Xin, Z.-X.; Kim, J. K. Prediction and Optimization of Mechanical Properties of Polypropylene/Waste Tire Powder Blends Using a Hybrid Artificial Neural Network-Genetic Algorithm (GA-ANN). J. Thermoplast. Compos. Mater. 2008, 21 (1), 51−69. (33) Marim, L. R.; Lemes, M. R.; Dal Pino, A. Neural-NetworkAssisted Genetic Algorithm Applied to Silicon Clusters. Phys. Rev. A: At., Mol., Opt. Phys. 2003, 67 (3), 033203. (34) Umegaki, T.; Watanabe, Y.; Nukui, N.; Omata, K.; Yamada, M. Optimization of Catalyst for Methanol Synthesis by a Combinatorial Approach Using a Parallel Activity Test and Genetic Algorithm Assisted by a Neural Network. Energy Fuels 2003, 17 (4), 850−856. (35) Pilania, G.; Wang, C.; Jiang, X.; Rajasekaran, S.; Ramprasad, R. Accelerating Materials Property Predictions Using Machine Learning. Sci. Rep. 2013, 3, 2810.

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

David S. Simmons: 0000-0002-1436-9269 Author Contributions

The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Notes

The authors declare no competing financial interest.

■ ■

ACKNOWLEDGMENTS This work was made possible by generous financial support from the W. M. Keck Foundation. REFERENCES

(1) Jain, A.; Bollinger, J. A.; Truskett, T. M. Inverse Methods for Material Design. AIChE J. 2014, 60 (8), 2732−2740. (2) Hannon, A. F.; Gotrik, K. W.; Ross, C. A.; Alexander-Katz, A. Inverse Design of Topographical Templates for Directed SelfAssembly of Block Copolymers. ACS Macro Lett. 2013, 2 (3), 251− 255. (3) Paradiso, S. P.; Delaney, K. T.; Fredrickson, G. H. Swarm Intelligence Platform for Multiblock Polymer Inverse Formulation Design. ACS Macro Lett. 2016, 5 (8), 972−976. (4) Qin, J.; Khaira, G. S.; Su, Y.; Garner, G. P.; Miskin, M.; Jaeger, H. M.; de Pablo, J. J. Evolutionary Pattern Design for Copolymer Directed Self-Assembly. Soft Matter 2013, 9 (48), 11467−11472. (5) Curtarolo, S.; Hart, G. L. W.; Nardelli, M. B.; Mingo, N.; Sanvito, S.; Levy, O. The High-Throughput Highway to Computational Materials Design. Nat. Mater. 2013, 12 (3), 191−201. (6) Schneider, G.; Fechner, U. Computer-Based de Novo Design of Drug-like Molecules. Nat. Rev. Drug Discovery 2005, 4 (8), 649−663. (7) Maier, W. F.; Stöwe, K.; Sieg, S. Combinatorial and HighThroughput Materials Science. Angew. Chem., Int. Ed. 2007, 46 (32), 6016−6067. (8) Wang, X. Z.; Perston, B.; Yang, Y.; Lin, T.; Darr, J. A. Robust QSAR Model Development in High-Throughput Catalyst Discovery Based on Genetic Parameter Optimisation. Chem. Eng. Res. Des. 2009, 87 (10), 1420−1429. (9) Le, T.; Epa, V. C.; Burden, F. R.; Winkler, D. A. Quantitative Structure−Property Relationship Modeling of Diverse Materials Properties. Chem. Rev. 2012, 112 (5), 2889−2919. (10) Bereau, T.; Andrienko, D.; Kremer, K. Research Update: Computational Materials Discovery in Soft Matter. APL Mater. 2016, 4 (5), 053101. (11) Lagaron, J. M.; Catalá, R.; Gavara, R. Structural Characteristics Defining High Barrier Properties in Polymeric Materials. Mater. Sci. Technol. 2004, 20 (1), 1−7. (12) Amis, E. J. Combinatorial Materials Science: Reaching beyond Discovery. Nat. Mater. 2004, 3 (2), 83−85. (13) Meredith, C.; Smith, A. P.; Crosby, A. J.; Amis, E. J.; Karim, A. Combinatorial Methods for Polymer Science. In Encyclopedia of Polymer Science and Technology; John Wiley & Sons: Hoboken, NJ, 2002; DOI: 10.1002/0471440264.pst069. (14) Genzer, J. Surface-Bound Gradients for Studies of Soft Materials Behavior. Annu. Rev. Mater. Res. 2012, 42 (1), 435−468. (15) Cabral, J. T.; Karim, A. Discrete Combinatorial Investigation of Polymer Mixture Phase Boundaries. Meas. Sci. Technol. 2005, 16 (1), 191−198. (16) Balachandran, P. V.; Xue, D.; Theiler, J.; Hogden, J.; Lookman, T. Adaptive Strategies for Materials Design Using Uncertainties. Sci. Rep. 2016, 6, 19660. (17) Seko, A.; Maekawa, T.; Tsuda, K.; Tanaka, I. Machine Learning with Systematic Density-Functional Theory Calculations: Application 105

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science (36) Phillips, C. L.; Voth, G. A. Discovering Crystals Using Shape Matching and Machine Learning. Soft Matter 2013, 9 (35), 8552− 8568. (37) Long, A. W.; Ferguson, A. L. Nonlinear Machine Learning of Patchy Colloid Self-Assembly Pathways and Mechanisms. J. Phys. Chem. B 2014, 118 (15), 4228−4244. (38) Long, A. W.; Zhang, J.; Granick, S.; Ferguson, A. L. Machine Learning Assembly Landscapes from Particle Tracking Data. Soft Matter 2015, 11 (41), 8141−8153. (39) Agatonovic-Kustrin, S.; Beresford, R. Basic Concepts of Artificial Neural Network (ANN) Modeling and Its Application in Pharmaceutical Research. J. Pharm. Biomed. Anal. 2000, 22 (5), 717−727. (40) Rojas, R. Neural Networks; Springer: Berlin, 1996. (41) LeCun, Y.; Bottou, L.; Orr, G. B.; Müller, K.-R. Efficient BackProp. In Neural Networks: Tricks of the Trade; Orr, G. B., Müller, K.-R., Eds.; Lecture Notes in Computer Science; Springer: Berlin, 1998; pp 9−50. (42) Wang, C. C.; Pilania, G.; Boggs, S. A.; Kumar, S.; Breneman, C.; Ramprasad, R. Computational Strategies for Polymer Dielectrics Design. Polymer 2014, 55 (4), 979−988. (43) Xu, J.; Wang, L.; Liang, G.; Wang, L.; Shen, X. A General Quantitative Structure−property Relationship Treatment for Dielectric Constants of Polymers. Polym. Eng. Sci. 2011, 51 (12), 2408− 2416. (44) Hansen, K.; Montavon, G.; Biegler, F.; Fazli, S.; Rupp, M.; Scheffler, M.; von Lilienfeld, O. A.; Tkatchenko, A.; Müller, K.-R. Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. J. Chem. Theory Comput. 2013, 9 (8), 3404−3419. (45) Li, Z.; Kermode, J. R.; De Vita, A. Molecular Dynamics with Onthe-Fly Machine Learning of Quantum-Mechanical Forces. Phys. Rev. Lett. 2015, 114 (9), 096405. (46) Xu, J.; Chen, B.; Liang, H. Accurate Prediction of θ (Lower Critical Solution Temperature) in Polymer Solutions Based on 3D Descriptors and Artificial Neural Networks. Macromol. Theory Simul. 2008, 17 (2−3), 109−120. (47) Afantitis, A.; Melagraki, G.; Makridima, K.; Alexandridis, A.; Sarimveis, H.; Iglessi-Markopoulou, O. Prediction of High Weight Polymers Glass Transition Temperature Using RBF Neural Networks. J. Mol. Struct.: THEOCHEM 2005, 716 (1−3), 193−198. (48) Joyce, S. J.; Osguthorpe, D. J.; Padgett, J. A.; Price, G. J. Neural Network Prediction of Glass-Transition Temperatures from Monomer Structure. J. Chem. Soc., Faraday Trans. 1995, 91 (16), 2491−2496. (49) Chen, X.; Sztandera, L.; Cartwright, H. M. A Neural Network Approach to Prediction of Glass Transition Temperature of Polymers. Int. J. Intell. Syst. 2008, 23 (1), 22−32. (50) Ulmer, C. W., II; Smith, D. A.; Sumpter, B. G.; Noid, D. I. Computational Neural Networks and the Rational Design of Polymeric Materials: The next Generation Polycarbonates. Comput. Theor. Polym. Sci. 1998, 8 (3−4), 311−321. (51) Sumpter, B. G.; Noid, D. W. Neural Networks and Graph Theory as Computational Tools for Predicting Polymer Properties. Macromol. Theory Simul. 1994, 3 (2), 363−378. (52) Roy, N. K.; Potter, W. D.; Landau, D. P. Designing Polymer Blends Using Neural Networks, Genetic Algorithms, and Markov Chains. Appl. Intell. 2004, 20 (3), 215−229. (53) Aricò, A. S.; Bruce, P.; Scrosati, B.; Tarascon, J.-M.; van Schalkwijk, W. Nanostructured Materials for Advanced Energy Conversion and Storage Devices. Nat. Mater. 2005, 4 (5), 366−377. (54) Breneman, C. M.; Brinson, L. C.; Schadler, L. S.; Natarajan, B.; Krein, M.; Wu, K.; Morkowchuk, L.; Li, Y.; Deng, H.; Xu, H. Stalking the Materials Genome: A Data-Driven Approach to the Virtual Design of Nanostructured Polymers. Adv. Funct. Mater. 2013, 23 (46), 5746− 5752. (55) Shi, Y.; Yu, G. Designing Hierarchically Nanostructured Conductive Polymer Gels for Electrochemical Energy Storage and Conversion. Chem. Mater. 2016, 28 (8), 2466−2477.

(56) Liu, L.; Sun, C.; Li, Z.; Chen, Y.; Qian, X.; Wen, S.; Zhang, L. In-Chain Functionalized Polymer Induced Assembly of Nanoparticles: Toward Materials with Tailored Properties. Soft Matter 2016, 12 (7), 1964−1968. (57) Goldberg, D. E.; Deb, K. Foundations of Genetic Algorithms (FOGA 1); Morgan Kaufmann: San Mateo, CA, 1991. (58) Goldberg, D. E. Genetic Algorithms in Search, Optimization, and Machine Learning; Addison-Wesley: Reading, MA, 1989. (59) Baluja, S.; Caruana, R. Machine Learning: Proceedings of the Twelfth International Conference; Morgan Kaufmann: San Mateo, CA, 1995. (60) Lucas, A. Ising Formulations of Many NP Problems. Front. Phys. 2014, 2, 5. (61) Briest, P.; Brockhoff, D.; Degener, B.; Englert, M.; Gunia, C.; Heering, O.; Jansen, T.; Leifhelm, M.; Plociennik, K.; Röglin, H.; Schweer, A.; Sudholt, D.; Tannenbaum, S.; Wegener, I. The Ising Model: Simple Evolutionary Algorithms as Adaptation Schemes. In Parallel Problem Solving from Nature - PPSN VIII; Yao, X., Burke, E. K., Lozano, J. A., Smith, J., Merelo-Guervós, J. J., Bullinaria, J. A., Rowe, J. E., Tiňo, P., Kabán, A., Schwefel, H.-P., Eds.; Lecture Notes in Computer Science; Springer: Berlin, 2004; pp 31−40. (62) Thomas, C. K.; Katzgraber, H. G. Optimizing Glassy P-Spin Models. Phys. Rev. E 2011, 83 (4), 046709. (63) Thomas, C. K.; Katzgraber, H. G. Sampling the Ground-State Magnetization of D-Dimensional P-Body Ising Models. Phys. Rev. B: Condens. Matter Mater. Phys. 2011, 84 (17), 174404. (64) Wang, L. Discovering Phase Transitions with Unsupervised Learning. Phys. Rev. B: Condens. Matter Mater. Phys. 2016, 94 (19), 195105. (65) Carrasquilla, J.; Melko, R. G. Machine Learning Phases of Matter. 2016, arXiv:1605.01735. arXiv.org e-Print archive. https:// arxiv.org/abs/1605.01735 (accessed Sept 9, 2016). (66) Prügel-Bennett, A.; Shapiro, J. L. The Dynamics of a Genetic Algorithm for Simple Random Ising Systems. Phys. D 1997, 104 (1), 75−114. (67) Anderson, C. A.; Jones, K. F.; Ryan, J. A Two Dimensional Genetic Algorithm for the Ising Problem. Complex Syst. 1991, 5 (3), 327−333. (68) Fischer, S.; Wegener, I. The One-Dimensional Ising Model: Mutation versus Recombination. Theor. Comput. Sci. 2005, 344 (2−3), 208−225. (69) Maksymowicz, A. Z.; Galletly, J. E.; Magdón, M. S.; Maksymowicz, I. L. Genetic Algorithm Approach for Ising Model. J. Magn. Magn. Mater. 1994, 133 (1), 40−41. (70) Baranau, V.; Tallarek, U. Random-Close Packing Limits for Monodisperse and Polydisperse Hard Spheres. Soft Matter 2014, 10 (21), 3826−3841. (71) Zou, R.-P.; Feng, C.-L.; Yu, A.-B. Packing Density of Binary Mixtures of Wet Spheres. J. Am. Ceram. Soc. 2001, 84 (3), 504−508. ̀ (72) Rassouly, S. M. K. The Packing Density of perfect’ Binary Mixtures. Powder Technol. 1999, 103 (2), 145−150. (73) Zou, R. P.; Gan, M. L.; Yu, A. B. Prediction of the Porosity of Multi-Component Mixtures of Cohesive and Non-Cohesive Particles. Chem. Eng. Sci. 2011, 66 (20), 4711−4721. (74) Gu, W.; Chern, R. T.; Chen, R. Y. S. The Gas Permeability and Packing Density of Copolymers of Methacryloxypropyltris (Trimethylsiloxy) Silane and Methylmethacrylate. J. Polym. Sci., Part B: Polym. Phys. 1991, 29 (8), 1001−1007. (75) Rocha Pinto, R.; Santos, D.; Mattedi, S.; Aznar, M. Density, Refractive Index, Apparent Volumes and Excess Molar Volumes of Four Protic Ionic Liquids + Water at T = 298.15 and 323.15 K. Braz. J. Chem. Eng. 2015, 32 (3), 671−682. (76) Xu, J. Q.; Zou, R. P.; Yu, A. B. Packing Structure of Cohesive Spheres. Phys. Rev. E 2004, 69 (3), 032301. (77) Kremer, K.; Grest, G. S. Molecular Dynamics (MD) Simulations for Polymers. J. Phys.: Condens. Matter 1990, 2 (S), SA295−SA298. (78) Cigana, P.; Favis, B. D.; Jerome, R. Diblock Copolymers as Emulsifying Agents in Polymer Blends: Influence of Molecular Weight, 106

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107

Research Article

ACS Combinatorial Science Architecture, and Chemical Composition. J. Polym. Sci., Part B: Polym. Phys. 1996, 34 (9), 1691−1700. (79) Sundararaj, U.; Macosko, C. W. Drop Breakup and Coalescence in Polymer Blends: The Effects of Concentration and Compatibilization. Macromolecules 1995, 28 (8), 2647−2657. (80) Macosko, C. W.; Guégan, P.; Khandpur, A. K.; Nakayama, A.; Marechal, P.; Inoue, T. Compatibilizers for Melt Blending: Premade Block Copolymers. Macromolecules 1996, 29 (17), 5590−5598. (81) Eastwood, E. A.; Dadmun, M. D. Multiblock Copolymers in the Compatibilization of Polystyrene and Poly(methyl Methacrylate) Blends: Role of Polymer Architecture. Macromolecules 2002, 35 (13), 5069−5077. (82) Lyatskaya, Y.; Gersappe, D.; Gross, N. A.; Balazs, A. C. Designing Compatibilizers To Reduce Interfacial Tension in Polymer Blends. J. Phys. Chem. 1996, 100 (5), 1449−1458. (83) Dong, W.; Wang, H.; He, M.; Ren, F.; Wu, T.; Zheng, Q.; Li, Y. Synthesis of Reactive Comb Polymers and Their Applications as a Highly Efficient Compatibilizer in Immiscible Polymer Blends. Ind. Eng. Chem. Res. 2015, 54 (7), 2081−2089. (84) Meenakshisundaram, V.; Hung, J.-H.; Patra, T. K.; Simmons, D. S. Designing Sequence-Specific Copolymer Compatibilizers Using a Molecular-Dynamics-Simulation-Based Genetic Algorithm, Macromolecules [Online early access]. DOI: 10.1021/acs.macromol.6b01747. (85) Humphrey, W.; Dalke, A.; Schulten, K. VMD − Visual Molecular Dynamics. J. Mol. Graphics 1996, 14, 33−38. (86) Ducrot, E.; Chen, Y.; Bulters, M.; Sijbesma, R. P.; Creton, C. Toughening Elastomers with Sacrificial Bonds and Watching Them Break. Science 2014, 344 (6180), 186−189. (87) Madkour, T. M.; Mohamed, S. K.; Barakat, A. M. Interplay of the Polymer Stiffness and the Permeability Behavior of Silane and Siloxane Polymers. Polymer 2002, 43 (2), 533−539. (88) Yang, J.; Min, M.; Yoon, Y.; Kim, W. J.; Kim, S.; Lee, H. Impermeable Flexible Liquid Barrier Film for Encapsulation of DSSC Metal Electrodes. Sci. Rep. 2016, 6, 27422.

107

DOI: 10.1021/acscombsci.6b00136 ACS Comb. Sci. 2017, 19, 96−107