Designing Sequence-Specific Copolymer Compatibilizers Using a

Jan 13, 2017 - Defining the Field of Sequence-Controlled Polymers. Jean-François Lutz. Macromolecular Rapid Communications 2017 38 (24), 1700582 ...
1 downloads 0 Views 4MB Size
Article pubs.acs.org/Macromolecules

Designing Sequence-Specific Copolymer Compatibilizers Using a Molecular-Dynamics-Simulation-Based Genetic Algorithm Venkatesh Meenakshisundaram, Jui-Hsiang Hung, Tarak K. Patra, and David S. Simmons* Department of Polymer Engineering, The University of Akron, 250 South Forge Street, Akron, Ohio 44325-0301, United States ABSTRACT: Compatibilizerssurfactant molecules designed to improve the stability of an interfaceare employed to enhance material properties in settings ranging from emulsions to polymer blends. A major compatibilization strategy employs block or random copolymers composed of distinct repeat units with preferential affinity for each of the two phases forming the interface. Here we pose the question of whether improved compatibilization could be achieved by employing new synthetic strategies to realize copolymer compatibilizers with specific monomeric sequence. We employ a novel molecular-dynamicssimulation-based genetic algorithm to design model sequence-specific copolymers that minimize energy of a polymer/polymer interface. Results indicate that sequence-specific copolymers offer the potential to yield larger reductions in interfacial energy than either block or random copolymers, with the preferred sequence being compatibilizer concentration dependent. By employing a simple thermodynamic scaling model for copolymer compatibilization, we pinpoint the origins of this sequence specificity and concentration dependence in the “loop entropy” of compatibilizer segments connecting interfacial bridge points. In addition to pointing toward a new strategy for improved interfacial compatibilization, this approach provides a conceptual basis for the computational design of a new generation of sequence-specific polymers leveraging recent and ongoing synthetic advances in this area.



INTRODUCTION The dominant paradigm over the history of polymer science has been the proposition that the properties of synthetic polymers can be largely understood without reference to details of chain sequence. This paradigm is reflected, for example, in the broad use of mean-field theories to describe polymer behavior.1−4 Unlike proteins and other biological macromolecules, the activity and properties of which are determined by the exact sequence of amino acids or other constitutive moieties in the chain backbone, synthetic copolymers are generally characterized at the level of coarse sequence statistics such as mean mass fraction and mean block length. In contrast, biological macromolecules commonly incorporate complex, specific sequences to yield performance and functionality not seen in synthetic macromolecules. Examples include spider silk,5,6 antigen−antibody interactions,7,8 DNA-binding proteins,9,10 and lipid bilayers.11 One of the areas in which the use of modest sequence control is understood to play an important role is the use of copolymers as interfacial compatibilizers. Applications ranging from emulsification12 to barrier materials13 to the coalescence and stabilization of polymer blends14,15 rely upon the use of compatibilizersadditives that reduce interfacial energyto control morphology as well as provide thermal and mechanical stability to interfaces of polymer blends.14,16 Copolymers containing repeat units with differential preference for the two phases are an important strategy to this end. Traditionally, this strategy has employed a coarse level of sequence control via the use of block copolymersmost commonly diblocks.17−19 © XXXX American Chemical Society

Work by the Balazs group and others has also indicated that random copolymers can be used as effective compatibilizers.1,20,21 Recent work suggests that architectural control, in the form of comb/brush compatibilizers,1,3,22−24 and grafting control, in the form of copolymer grafted nanoparticles,25−28 have the potential to yield more efficient copolymer compatibilizers. Here, we address the question of how one might design copolymer compatibilizers for polymer−polymer interfaces with a higher level of sequence specificity in order to improve compatibilizer efficiency. The focus on a mean-field understanding of polymer composition and the absence of synthetic macromolecules employing high-precision sequence control is to a large extent a result of a practical inability to synthetically control polymer sequence. Typical synthetic copolymers are polydisperse in molecular weight and involve no chain sequence specificity. However, recent developments in synthetic chemistry are beginning to overcome this challenge, through strategies such as the synthesis of precise copolymers29−33 (chains exhibiting robust periodic intrachain order), use of sequence-specific macromonomers to generate specific repeating sequences,34−36 the simulation and synthesis of “protein-like” AB copolymers that exhibit memorized preferential folded states,37−40 and synthesis of fully sequence-specific polymers from noncanonical amino acids.41−45 Early evidence suggests that these materials Received: August 10, 2016 Revised: November 29, 2016

A

DOI: 10.1021/acs.macromol.6b01747 Macromolecules XXXX, XXX, XXX−XXX

Article

Macromolecules

employ a modified genetic algorithm that prevents resimulation of candidates that have been previously simulated. Instead, the genetic algorithm progressively builds a database of previously simulated candidates and draws upon this database to prevent duplicative simulations and thereby maximize the number of new candidates probed in the design process. We apply this algorithm to design simulated sequence-specific copolymer compatibilizers. Results show that sequence-specific compatibilizers yield more efficient reduction of interfacial energies than regular block copolymer compatibilizers or random copolymers. Moreover, we “reverse engineer” the optimal molecules designed by this algorithm to yield new fundamental understanding of the physics controlling interfacial compatibilization, in terms of the energy and entropy of compatibilizer “docking” at the interface. Beyond its immediate applications to the design of the many polymer interfacial systems listed above, this approach is extensible to enable rational design of chemically realistic sequence-specific polymers across a variety of applications. As in this application, the realization and study of “extremal” sequence-specific polymers in other settings is likely to yield new physical understanding deriving from study of the limits of these materials’ behavior.

can exhibit performance dramatically exceeding that of statistical copolymers.29−31,42 Evidence for an important role of sequence in interfacial activity in particular is provided by a body of literature focused on understanding the thermodynamics governing pattern recognition between random copolymers and heterogeneous substrates.46−55 These studies focus on statistical heteropolymers containing a disperse mix of sequences and demonstrate an important role for pattern matching between individual chains and substrate patterns. These findings hint at the possibility that selection of more specific sequences could further improve interfacial activity; indeed, one of these studies suggests that statistical pattern matching may have provided a precursor for the emergence of specific sequence matching in biological organisms.51 Even as the prospect of realizing synthetic sequence control approaching that of nature appears to be on the horizon, there exists no systematic approach for matching nature’s capacity to design these sequences or even to predict a priori whether they will be of appreciable value for a given application. This is a nontrivial problem: a molecule as simple as a 20-mer binary copolymer has over 106 possible sequences, with this number becoming astronomically large for longer sequences and increasing numbers of available constituents. A generalizable solution to this problem would have the potential to transform the emerging synthetic advances in polymer sequence control into a new generation of sequence-specific polymeric materials with properties and performance unmatched by present-day synthetic materials. In contrast to synthetic macromolecules, considerable effort has been dedicated to advancing rational sequence design in proteins and other biological macromolecules. Design of these molecules most commonly employs machine learning algorithms such as neural networks,56−58 the use which is made possible by large existing databases such as the protein data bank.59−61 These databases provide a foundation for training predictive machine learning algorithms in the context of proteins; sequence design can then be facilitated by the use of search algorithms or optimization algorithms such as genetic algorithms.62−64 However, this strategy is not currently extensible to the problem of sequence design in synthetic macromolecules because comparable databases do not exist and would be prohibitively expensive to create given current synthetic capabilities. This situation suggests that design of soft materials with extremal properties will require design based upon genuinely predictive methods such as experiment or molecular dynamic simulation. To overcome these challenges and enable the design of a new generation of sequence-specific synthetic polymers, here we describe the use of a molecular-simulation-based genetic algorithm to design sequence-specific polymers. One of the key challenges in using molecular dynamic simulations within genetic algorithms for soft materials design is the typically long simulation periods needed to predict material properties. This is the primary reason why molecular and materials design studies commonly employ regression methods such as neural networks for fitness evaluation in genetic-algorithm-based optimization.65,66 However, to identify extremal candidates, it is important to evaluate material properties directly because neural networks are intrinsically interpolative and may not be able to identify these extremal candidates if they lay well outside the properties of the training set of data. To enable efficient use of a genetic algorithm based on direct molecular dynamics simulation of candidate molecules, here we



METHODOLOGY Molecular Dynamics Simulations. Our genetic algorithm is based on molecular dynamics (MD) simulations of a polymer−polymer interface, modeled at a bead−spring level in a manner following prior simulation work in this area.28,67 As shown in Figure 1 (rendered in VMD68), the system is

Figure 1. Image (rendered in VMD) of typical simulation geometry. Light gray and blue regions are A and B homopolymer domains, respectively. Darker gray and blue beads are A and B monomers, respectively, within copolymer compatibilizer chains.

composed of two types of incompatible model homopolymers, denoted A and B, in a lamellar geometry. Periodic boundary conditions are imposed in all directions. Copolymer chains containing A and B repeat units are introduced in equal numbers to each of the two interfaces, with an areal concentration of compatibilizer chains given by the number of chains per interfacial area. With the exception of one control simulation of random copolymers, all copolymer compatibilizer chains in a given simulation have the same sequence, with this sequence either corresponding to a block copolymer or specified by a genetic algorithm as described in the following section. For simulations of 50/50 random copolymers, each chain has a different sequence generated by a “coin flip” for each monomer. B

DOI: 10.1021/acs.macromol.6b01747 Macromolecules XXXX, XXX, XXX−XXX

Article

Macromolecules All polymer chains are 20 beads long and are modeled after the work by Kremer and Grest,69 variants of which have been widely employed to study polymer interfaces.28,67,70−76 The pair interaction between nonbonded beads is described via a 12−6 Lennard-Jones (LJ) potential: ⎡⎛ σ ⎞12 ⎛ σ ⎞6 ⎤ ij E LJ = 4εij⎢⎜ ⎟ − ⎜ ⎟ ⎥ ⎝r⎠ ⎦ ⎣⎝ r ⎠

Table 1. Number of Compatibilizer, Homopolymer A, and Homopolymer B Chains in Standard-Size Simulations at Each Concentration Considered in This Study

(1)

where εij sets the interaction energy between two beads, σ sets the range of the interaction, and r is the distance between two beads in dimensionless LJ units. We model the high-χ limit, similar to prior work on polymer−polymer interfaces,28,67 by 1/6 truncating the A−B LJ EAB LJ interaction at a distance of 2 σ such that it is purely repulsive. Bonded monomers interact via the finitely extensible nonlinear elastic (FENE)77 potential ⎡ ⎡⎛ σ ⎞12 ⎛ σ ⎞6 ⎤ ⎛ r ⎞2 ⎤ 1 E bond = − KR 0 2 ln⎢1 − ⎜ ⎟ ⎥ + 4ε⎢⎜ ⎟ − ⎜ ⎟ ⎥ + ε ⎝r⎠ ⎦ ⎢⎣ 2 ⎝ R 0 ⎠ ⎥⎦ ⎣⎝ r ⎠ (2)

where K = 30ε/σ2 is the spring constant, R0 = 1.5σ is the maximum extensibility of the bond, and the second term is truncated at 21/6σ. Interfacial energies are extracted from in-equilibrium simulations, as follows. For each compatibilizer sequence, an initial lamellar configuration with random chain packing in each layer is generated using the PACKMOL software package.78 In systems with compatibilizers, an equal number of compatibilizer chains are evenly distributed at each interface, with the chain initially oriented normally to the interface and centered in the interface. The system is equilibrated for 2 × 106τ (where τ is the LJ unit of time) at a reduced Lennard-Jones temperature of T = 1.0 and under constant zero pressure in the direction normal to the interface. In test simulations of two comparable miscible polymer layers initiated in a lamellar geometry, this is sufficient time for the two layers to fully interdiffuse, indicating that this is sufficient time for the interface between two immisible polymers to reach its equilibrium structure. Data are then collected under the same thermodynamic conditions for another 2 × 106τ for determination of interfacial energy. Simulations are performed in the LAMMPS software package79 using a Nosé−Hoover thermostat and a Nosé−Hoover barostat coupled only to the direction normal to the interface so that the interfacial area remains constant (36σ by 36σ in standard simulations, where σ is the LJ unit of distance). Simulations employ Verlet time integration80 with a time step size of 0.005. In each simulation, the number of homopolymer chains is adjusted according to the compatibilizers’ concentration such that the total number of beads remains constant, as shown in Table 1. This maintains an approximately equal lamellar thickness for all simulations. Interfacial energy is calculated in a standard manner:81 γ12 =

Lz ⎛ ⎞ 1 ⎜P − (Pxx + Pyy)⎟ zz ⎝ ⎠ 2 2

concn

compatibilizer chains per interface

homopolymer A chains

homopolymer B chains

0.0417 0.0486 0.0556 0.0625 0.0694 0.0764 0.0833 0.0903 0.0972 0.1042 0.1111 0.1181 0.1250 0.1319 0.1389 0.1458 0.1528 0.1597 0.1667 0.1736 0.1806 0.1875 0.2083 0.2431 0.2778 0.3472

54 63 72 81 90 99 108 117 126 135 144 153 162 171 180 189 198 207 216 225 234 243 270 315 360 450

846 837 828 819 810 801 792 783 774 765 756 747 738 729 720 711 702 693 684 675 666 657 630 585 540 450

846 837 828 819 810 801 792 783 774 765 756 747 738 729 720 711 702 693 684 675 666 657 630 585 540 450

general attributes of genetic algorithms before highlighting several modifications made to improve the algorithm’s suitability to the problem of molecular-dynamics-simulationbased polymer sequence design. A genetic algorithm is a type of evolutionary algorithm that seeks to mimic the process of “survival of the fittest” to design or optimize a system to satisfy chosen target properties.82,83 In general, this involves establishing a reversible mapping of every possible candidate (in this case, each chain sequence) to a genomic representation, selecting a random initial population of candidates, and determining the fitnessa measure of the degree to which a candidate’s properties match the target propertiesof each. Genetic operators including selection, crossover, and mutation are then used to create a new population with a distribution of genetic material that is biased toward the higher-fitness members of the prior population, typically as follows.82,84 First, a pair of “parent” candidates is selected via a method of fitness-proportionate selection such as roulette wheel selection, such that higher fitness candidates have a higher probability of selection. Second, a combining rule is employed to create a “child” candidate consisting of a mixture of the two parents’ genes. Third, genes within the child are potentially subject to random mutations, typically at a very low rate. These three steps are repeated until a new population of the desired size (typically held constant over the course of the GA) is generated. The fitness of this new generation is then assessed, and the entire process is iterated over many generations until the desired properties are achieved or some time or resource constraint is reached. A typical genetic algorithm therefore might assay tens to thousands of

(3)

where z is the direction normal to the interface, Lz is the length of the system in the z direction, Pxx, Pyy, and Pzz are the diagonal components of the pressure tensor as output from LAMMPS based on the Virial, and the brackets denote a time average. Genetic Algorithm. Copolymer sequences are designed via a genetic algorithm (GA) employing the molecular dynamics simulations described above. We begin by summarizing the C

DOI: 10.1021/acs.macromol.6b01747 Macromolecules XXXX, XXX, XXX−XXX

Article

Macromolecules

generationin this case, 32. However, here we employ a novel modification of the genetic algorithm to improve its suitability for use with molecular dynamics simulations. Specifically, because the rate-limiting step of a GA is fitness evaluation, most GA’s employ fitness evaluation methods, such as machine-learning-based quantitative structure−property relationships, 87,88 that require seconds to minutes at most.89−91 In contrast, the molecular dynamics simulations employed here to determine candidate fitness require ca. 9 h, using 2 CPU’s and 1/4 of a shared Tesla K20 GPU each, to determine γ12 within reasonable uncertainty bounds. In order to make this GA tractable, it thus is essential to minimize the number of MD simulations required. Whereas GAs typically do not make use of data accumulated over the run, we therefore store the results of all MD simulations and reuse these predictions any time a particular candidate is found to repeat from a prior generation (for example, as a result of elitism or recursive searching near the global optimum). To leverage this data, the fitness of any repeat candidate is recalled from the database of prior simulations rather than computed via a fresh simulation. In order to maintain a fixed usage of computational resources and prevent repeat candidates from dominating the population, we then increase the population size by the number of repeat candidates in that generation. The GA thus employs a dynamic population size in which 32 new candidates with freshly performed MD simulations are tested each generation, and any repeat candidate is accommodated with an increased population size. This effectively increases the GA population size, which is a key determinant of convergence rate, over time without any corresponding increase in computational expense. A typical result in terms of total population size as a function of generation is shown in Figure 3; as can be seen here, it

generations, with each generation containing a population of tens of candidates, and each candidate specified by a set of genes. Broadly speaking, within this study we employ the molecular dynamics simulations described above to perform the fitness assessment step of the genetic algorithm. A schematic of the GA implemented for this study is shown in Figure 2. We

Figure 2. Schematic of the genetic algorithm employed in this work.

employ a mapping of polymer sequences to binary genomes in which a 0 indicates a type A monomer and a 1 indicates a type B monomer. Since, as described above, this study focuses on 20-repeat-unit chains, each possible candidate can be mapped to a 20-gene binary genome. The GA is initiated with a population of 32 randomly generated chain sequences. The GA then determines the fitness of each of these candidates by spawning and analyzing a LAMMPS molecular dynamics simulation of a polymer−polymer interface containing a fixed areal concentration of that candidate, with the simulation details described in the prior section. These simulations are performed in parallel via a queuing system such that all 32 simulations can be completed simultaneously. This process leads to a simulated interfacial energy γ12 for each candidate; a raw fitness for each compatibilizer sequence is then defined as f = 100 − γ12. This raw fitness is then subject to a linear scaling82 algorithm to obtain a scaled fitness that spans the interval from 0 to 1 within each generation, in order to maintain consistent selection pressure. The single highest fitness (lowest interfacial energy) candidate is then passed directly on to the next generation without modification (singlecandidate elitism85) in order to speed convergence. Other members of each consecutive generation are generated as follows. After each generation, parents of children in the next generation are selected via roulette wheel selection.82,86 Twopoint crossover82 is used to combine the selected parents, after which point mutations82 (bit flips) at a rate of 0.01 per gene are applied to obtain each new candidate in the next generation. In a standard genetic algorithm, the population size in the next generation would be the same as in each prior

Figure 3. Total population size (blue) and number of new simulationbased fitness evaluations (red) as a function of generation in the genetic algorithm employed in this study.

increases by a factor of 2 or more over the course of the run, significantly accelerating convergence. On the basis of initial test runs, we ran each genetic algorithm for 50 generations, with each generation involving 32 new simulations, such that each genetic algorithm assays a total of 1600 candidate sequences. Optimization was performed for four different compatibilizer concentrations. Once the optimization was complete, the best compatibilizer for each concentration was subject to multiple longer simulations to improve statistics for the optimal case.



RESULTS Interfacial Compatibilization with Regular Block Copolymers. Before considering the behavior of sequenceD

DOI: 10.1021/acs.macromol.6b01747 Macromolecules XXXX, XXX, XXX−XXX

Article

Macromolecules specific polymers designed by a genetic algorithm, we describe the baseline behavior of the polymer interface simulated in this study, in the neat state and compatibilized by diblock, tetrablock, pentablock, decablock, and alternating 20-mer copolymers. As shown by Figure 4a, the neat polymer interface

Figure 5. Interfacial energies as a function of concentration for systems containing block compatibilizers with sequences indicated. Diamonds indicate interfacial energies determined from simulations using the standard box size employed in the genetic algorithm. Squares report interfacial energies from simulations with a 9× smaller interfacial area, for which a finite size effect emerges at the onset of the saturation plateau. Lines are guides to the eye.

a second layer of compatibilizer at the interface, visible in Figure 6. A detailed consideration of the system’s physics and

Figure 6. Image of simulated interface with tetrablock copolymer at concentrations of 0.174 (left) and 0.188 (right), immediately before and after the onset of the saturation plateau seen in Figure 5 (images rendered in VMD).

Figure 4. (a) Density (diamonds) and mole fraction (circles) of monomer A as a function of position through a simulated pure polymer interface. (b) Mole fraction A as a function of position for systems containing the block copolymer compatibilizers and areal concentrations specified in the legend. (c) Density as a function of position for the same systems as in (b).

behavior beyond the onset of this plateau is beyond the scope of this study because the optimal compatibilizer sequence at any concentration always lies prior to the onset of this plateau (see sections below). This is a natural result of the fact that systems within the saturation plateau involve, by definition, an inefficient use of compatibilizers. In some cases the saturation plateau is not reached prior to attaining a negative interfacial energy. In experimental practice, this would generally not occurthe system would revert to a zero interfacial energy by increasing its interfacial area, via strategies ranging from weak interfacial curvature up to formation of a nanoemulsion. Because the boundary conditions of the present simulations are designed to allow for study of interfaces with positive energy, they are poorly suited to studying this “beyond zero” interfacial energy limit. However, this “nanoemulsion” limit is not the focus of the present study, which considers the question of how to most efficiently reduce the interfacial energy of a macroscopic domain to zero, rather than the question of how to control morphology beyond this threshold. Finally, we consider the question of whether the simulation size employed within our genetic algorithm is sufficient to yield results representative of an interface between macroscopic domains. As shown by Figure 4b, these compatibilizers generally have a weak impact on the breadth of the

is extremely sharp, with the composition dropping from 80% A monomer to 20% A monomer over approximately 1/2 of a monomer diameter. This is comparable to a polymer−polymer interface with thickness less than 1 nm, indicating that this is a high-χ, strongly segregated system. As shown by this figure, segregation at this interface is sufficiently strong that the system exhibits a density minimum at the interface, consistent with prior studies interfaces between repulsive Kremer−Grest polymers.67 This neat polymer interface has an interfacial energy of 1.84 in LJ units. Interfacial energies in compatibilized systems throughout the remainder of this study are reported relative to this pure-interface value. As shown by Figure 5, introduction of block copolymer compatibilizers leads to a reduction in interfacial energy with increasing copolymer areal density, comparable to earlier results for compatibilization of this type of model polymer interface.28 Each compatibilized system exhibits a saturation concentration beyond which the addition of additional compatibilizer chains does not yield further reductions in interfacial energy. The onset of this saturation plateau corresponds to the formation of E

DOI: 10.1021/acs.macromol.6b01747 Macromolecules XXXX, XXX, XXX−XXX

Article

Macromolecules compositional interface, when they reduce the interfacial energy to near zero, shown for the tetrablock copolymer at an areal density of 0.188. In addition to a slight broadening of the interface, introduction of compatibilizers tends to suppress the density depletion at the interface (Figure 4), an expected consequence of introducing bridging covalent bonds. Notably, in all cases both the composition and density reach a plateau corresponding to their bulk values in the center of the layers, indicating that these simulated lamellae are sufficiently thick that the two interfaces do not interact. As shown by Figure 7,

Figure 8. Interfacial energy of representative systems with standard layer thickness (diamonds) and double the standard layer thickness (squares), illustrating an absence of finite size effects in the layer thickness in this range of system sizes. Image next to each pair of points illustrates the corresponding compatibilizer sequence.

compatibilization of domains with macroscale thickness and area. Sequence-Specific Copolymer Compatibilizers from the MD Genetic Algorithm. We now turn to copolymer sequences designed via the molecular-dynamics-based genetic algorithm described above. As shown in Figure 9, the best Figure 7. Mole fraction of compatibilizer monomers of type A as a function of position, with the pure A homopolymer layer approximately centered in the figure.

the concentration of compatibilizer chain monomers generally decays to the simulation limit of detection by the film midpoint; again, the exception is the tetrablock compatibilizer at a concentration for which the interfacial energy is nearly zero. These results are consistent with a scenario within which the equilibrium concentration profiles of the compatibilizers reflect strong localization to the interface, a natural consequence of their polymeric nature (which leads to a low entropic driving force for solvation) and the high chi of the two polymers (which leads to a large energetic driving force for interfacial localization). In order to confirm that any residual concentration of compatibilizer the film center does not lead to an appreciable deviation of the interfacial energy from its bulklayer value, we perform an additional set of validating simulations in which we double the layer thickness (by approximately doubling the number of homopolymer chains while holding the interfacial areas constant). As shown by Figure 8, this doubling of the layer thickness does not appreciably change the interfacial energy in any of these test systems, providing direct evidence that simulated interfacial energies reflect the macroscopic limit of lamellae thickness. Similarly, we consider the effect of varying the interfacial area to determine whether the simulated interfacial area is sufficiently large to report an interfacial energy consistent with a macroscopic interface. As shown by Figure 5, reducing the simulated area by a factor of 9 does not alter interfacial energy in the initial power-law regime of interfacial energy. However, it does impact the onset of the saturation plateau. Essentially, when the system size is too small, the nucleation of a second layer, shown in Figure 6, is suppressed, leading to the emergence of a local minimum in interfacial energy around the large-size limit of the saturation concentration. However, as seen in Figure 5, this finite size effect is largely eliminated at the scale of our standard simulation size. Taken together, these results indicate that these simulated systems are sufficiently large that their results can be viewed as modeling the

Figure 9. Interfacial energy γ12 of interface compatibilized by the best candidate in each generation as a function of generation in the genetic algorithm at an areal compatibilizer concentration of 0.159.

compatibilizer (i.e., the one yielding the lowest interfacial energy) considerably improves over the course of a typical GA run. For example, in the case of an areal concentration of 0.159 compatibilizers/σ2 (Figure 9), the best performer improved from γ12 = 0.31 to γ12 = −0.06 within 50 generations. We terminate the GA at 50 generations because test runs indicated that additional generations do not generally yield further significant improvements in interfacial energy. This relatively rapid convergence is likely a consequence of the dynamic population size employed in our GA approach. Each of the four GA’s reported in this study required a total of approximately 20 days on 64 CPUs and 8 NVidia Tesla K20 GPUs (approximately 30 000 GPU-accelerated CPU-hours)a feasible amount of resources for a materials design problem. As shown in Figure 10a and b, the best sequence identified by this GA substantially outperforms all possible periodic 20mer compatibilizers, including those with block sizes of 1, 2, 4, 5, and 10 (a diblock) as a well as a 50/50 random copolymer. Notably, the optimal sequence is highly irregular, consisting of a combination of singlets, triplets, quadruplets, quintuplets, and sextuplets. Why should this very nonintuitive, superficially random, sequence outperform any regular, periodic sequence? Could it simply be a matter of realizing an optimal mean block F

DOI: 10.1021/acs.macromol.6b01747 Macromolecules XXXX, XXX, XXX−XXX

Article

Macromolecules

Figure 11. Interfacial energy versus mean block length for systems containing periodic compatibilizers (open diamonds), a 50/50 random compatibilizer (open circle), or optimized compatibilizers (filled symbols) at concentrations of 0.069 (a), 0.083 (b), 0.111 (c), and 0.159 (d). Symbols correspond to sequences shown in Figure 10a.

Figure 10. (a) Bead representation of periodic compatibilizers (open diamonds) and the optimized compatibilizers (filled symbols), with each optimized at the concentration indicated in parts d and e. (b) Interfacial energy with the optimized compatibilizer compared to all periodic compatibilizers and a 50/50 random copolymers (open circle) at an areal concentration of 0.159. (c) Interfacial energy with each compatibilizer tested by GA at a concentration of 0.159 (black points) compared to the optimized compatibilizer (red diamond). (d) Interfacial energy of systems sharing the same mean block length as the optimized compatibilizer at each concentration studied (black points). (e) Interfacial energy of systems sharing the same block length distribution as the optimized compatibilizer at each concentration studied (black points).

energy. These results suggest that rather than being a secondorder consideration, sequence is the central determinant of the performance of copolymer compatibilizers at high-χ interfaces. Why should this be? Physical Origins of Sequence Specificity. We begin by considering the question of why aperiodic compatibilizers should typically outperform the best periodic compatibilizers, even at the same or nearly the same mean block length. To do so, we consider a simple 1-dimensional random walk model of the conformation of a compatibilizer molecule about the interface. We assume that the random walk must return to the interface at each A−B junction in the chain backbone and then seek to determine what type of distribution of interjunction distances (block lengths) minimize the system free energy while holding the total number of junctions (i.e., the total of blocks) constant. While this simple model neglects fluctuations of A−B junctions away from the interface, it is sufficient to address this aspect of the problem. The free energy contribution ΔF of this molecule to the system can be divided into energetic and entropic contributions: ΔF = ΔE − TΔS. As a leading-order approximation, the energy contribution is simply proportional to the number of times the chain crosses the interface: each interfacial crossing replaces O(1) energetically unfavorable A−B contact with a favorable A−B covalent bond. Since we wish to optimize the distribution of sequence lengths while holding the number of junctions constant, this term drops out in the energy minimization problem. On the other hand, the entropy of this chain can be estimated from a 1-D random walk model of a single “loop” of polymer between two interfacial junctions. The number of ways Q of arranging a 1-D random walk of n steps while requiring that it return to the interface is simply given by the combination formula for the selection of n/2 steps to be in the positive direction and n/2 steps to be in the negative direction. The configurational entropy the loop is then given by Boltzmann’s constant kB times the natural log of Q

size that cannot be achieved via a periodic pattern in a 20-bead chain? To test this question of whether simple mean block size is sufficient to predict performance, in Figure 10c, we plot the interfacial energy of every system simulated in the course of this design cycle versus the mean block length of the compatibilizer sequence employed. As shown by this figure, mean block length is a poor predictor of performance. For example, sequences with the same mean block length as the optimal sequence shown in Figure 10c range from an interfacial energy of approximately 0 to 1 (in reduced LJ units) compared to a γ12 of 1.84 for the pure polymer−polymer interface. A statistical analysis of these data suggests that mean block size accounts for at most 25% of the variation in interfacial energy between systems. This result is not limited to a particular surface concentration of surfactant. In Figure 10d, we show the results of performing this design process at three additional compatibilizer concentrations. As seen in this figure, none of the resulting optimal sequences are periodic. Furthermore, we have confirmed at each of these four concentrations that the optimal sequence identified equals or outperforms any periodic sequence (see Figure 11). Again, other sequences sharing the mean block size of the optimal sequence span a wide range of interfacial energy at each concentration, indicating a poor correlation between mean block size and performance. As shown in Figure 10e, even sequences with the same block size distribution (meaning the same count of singlets, doublets, triplets, etc.) as the optimal chain exhibit a wide range of interfacial energies. Evidently, even this relatively specific measure of chain sequence is insufficient to predict interfacial

ΔS loop

G

⎤ ⎡ ⎢ n! ⎥ ≅ kB ln⎢ ⎥ n 2 ⎢⎣ 2 ! ⎥⎦

( )

(4) DOI: 10.1021/acs.macromol.6b01747 Macromolecules XXXX, XXX, XXX−XXX

Article

Macromolecules where n is the number of repeat units in the loop (equivalent to the size of the block in number of repeat units). The total configurational entropy of the chain is then given by the sum of this equation over all loops in a given chain. The question of whether periodic or aperiodic sequences are preferred can now be answered by determining whether the overall free energy tends to be minimized by incorporating sequences of all-equal length or sequences of differing lengths. This is mathematically equivalent to the problem of liquid− liquid phase separation, which asks whether the free energy is minimized by holding the entire system at a single concentration or by allowing it to separate into domains of two distinct concentrations. The analogous question here is whether the entire sequence should employ a single block size or “separate” into a distribution of distinct block sizes. The condition beyond which sequences of differing length (the equivalent of a phase-separated system in the analogy to phase equilibria) are unconditionally favored over uniform length sequences is given by the spinodal equation:

areal density 0.083. The second, with sequence BAABBBABBAABBABAABAA, has the same distribution of block sizes but considerably underperforms the optimized compatibilizer: the former reduces γ12 by 1.21, while the latter reduces it only by 0.88. To understand this difference in performance, we consider the effective positions of each compatibilizer’s segments relative to the interface. As shown in Figure 13, despite possessing the

Figure 13. Average position of each bead in the compatibilizer with reference to interface (dashed line). (a) Compatibilizer optimized at a concentration of 0.083. (b) Statistically identical compatibilizer as optimized compatibilizer at a concentration of 0.083. The circled region highlights the effect of interblock interactions.

2

∂F