A Self-Consistent Sonification Method to Translate Amino Acid

Jun 26, 2019 - We report a self-consistent method to translate amino acid sequences into audible sound, use the representation in the musical space to...
0 downloads 0 Views 7MB Size
www.acsnano.org

A Self-Consistent Sonification Method to Translate Amino Acid Sequences into Musical Compositions and Application in Protein Design Using Artificial Intelligence Downloaded via NOTTINGHAM TRENT UNIV on July 19, 2019 at 07:03:19 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

Chi-Hua Yu, Zhao Qin, Francisco J. Martin-Martinez, and Markus J. Buehler* Laboratory for Atomistic and Molecular Mechanics (LAMM), Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue 1-290, Cambridge, Massachusetts 02139, United States S Supporting Information *

ABSTRACT: We report a self-consistent method to translate amino acid sequences into audible sound, use the representation in the musical space to train a neural network, and then apply it to generate protein designs using artificial intelligence (AI). The sonification method proposed here uses the normal mode vibrations of the amino acid building blocks of proteins to compute an audible representation of each of the 20 natural amino acids, which is fully defined by the overlay of its respective natural vibrations. The vibrational frequencies are transposed to the audible spectrum following the musical concept of transpositional equivalence, playing or writing music in a way that makes it sound higher or lower in pitch while retaining the relationships between tones or chords played. This transposition method ensures that the relative values of the vibrational frequencies within each amino acid and among different amino acids are retained. The characteristic frequency spectrum and sound associated with each of the amino acids represents a type of musical scale that consists of 20 tones, the “amino acid scale”. To create a playable instrument, each tone associated with the amino acids is assigned to a specific key on a piano roll, which allows us to map the sequence of amino acids in proteins into a musical score. To reflect higher-order structural details of proteins, the volume and duration of the notes associated with each amino acid are defined by the secondary structure of proteins, computed using DSSP and thereby introducing musical rhythm. We then train a recurrent neural network based on a large set of musical scores generated by this sonification method and use AI to generate musical compositions, capturing the innate relationships between amino acid sequence and protein structure. We then translate the de novo musical data generated by AI into protein sequences, thereby obtaining de novo protein designs that feature specific design characteristics. We illustrate the approach in several examples that reflect the sonification of protein sequences, including multihour audible representations of natural proteins and protein-based musical compositions solely generated by AI. The approach proposed here may provide an avenue for understanding sequence patterns, variations, and mutations and offers an outreach mechanism to explain the significance of protein sequences. The method may also offer insight into protein folding and understanding the context of the amino acid sequence in defining the secondary and higher-order folded structure of proteins and could hence be used to detect the effects of mutations through sound. KEYWORDS: protein, structural analysis, sonification, artificial intelligence, recurrent neural networks, molecular mechanics

M

tions of materials in distinct spaces such as sound or language to advance design objectives.2,7−9 The approach proposed here is that the translation of protein material representations into

aterials and music have been intimately connected throughout centuries of human evolution and civilization.1−4 Indeed, materials such as wood, animal skin, or metals are the basis for most musical instruments used throughout history.5,6 Today, we are able to use advanced computing algorithms to blur the boundary between material and sound and use hierarchical representa© XXXX American Chemical Society

Received: March 20, 2019 Accepted: June 5, 2019

A

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

Cite This: ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano

hierarchical structures of protein sequences in musical space and use it to generate protein designs through this translational approach: material to music, solving a design problem in musical space, and translation back to material. The plan of the paper is as follows. We first present an analysis of the translation of the vibrational spectra of each of the 20 amino acids into audio signals, using the concept of transpositional equivalency. We then report various sonifications of known protein structures into musical scores, extracted from the Protein Data Bank (PDB). Based on a large number of sonified protein structures we train a recurrent neural network and generate musical expression using AI. We then map the musical scores generated by AI into amino acid sequences and analyze the resulting protein structures. The overall approach reported in this paper is presented in Figure 1, showing how the mapping and reverse mapping closes the loop between material manifestation and musical space and back.

music not only allows us to create musical instruments but also enables us to exploit deep neural network models to represent and manipulate protein designs in the audio space. Thereby we take advantage of longer-range structure that is important in music and which is equivalently important in protein design (in connecting amino acid sequence to secondary structure and folding).10−15 This paradigm goes beyond proteins but rather enables us to connect nanostructures and music in a reversible way, providing an approach to design nanomaterials, DNA, proteins, or other molecular architectures from the nanoscale upward. Electronic means to generate sound has been an active field in music theory,16 as evidenced in various computer-based electronic synthesizers. These methods typically aim to create a spectrum of overlapping waves either to mimic the sounds of natural instruments (such as a piano, guitar, or classical string instruments) or to generate sounds that do not naturally exist (such as done in early synthesizers such as the Moog and Roland synthesizers [SH-1000, SH-101, and so on] and more recently methods such as granular synthesis and wavetable synthesis). In our lab’s earlier work we considered sonification of spider webs17,18 and whole protein structures,19 and we also presented mathematical modeling approaches using category theoretic representations to describe hierarchical systems and their translations between different manifestations (e.g., between materials, music, and social networks).2,20−22 Other scientific inquiry in this general area proposed sonification methods of protein sequences by mapping them onto Western classical musical scales23−25 or more broadly representing various scientific data as sound.26−28 In this study we propose a formulation of sonification and generate a method by which the amino acid sequence of proteins, the most abundant molecular building blocks of virtually all living matter, is used to generate audible sound through consideration of the elementary chemical and physical properties of amino acids and apply it to generate designer materials. We explore a distinct avenue of exploration of the vibrational normal modes of amino acids, reflecting a broad range of diverse protein materials in nature.29,30 The proposed sound-based generative algorithm is based on the natural vibrational frequencies of amino acids. Generally, the vibrational spectra of molecules can be computed by computational chemistry methods such as density functional theory (DFT)31−33 or molecular dynamics (MD).34−36 We use a computer algorithm to convert these inaudible vibrations into a space that the human ear can detect. By making these natural vibrations of the proteins audible, they can then be used to creatively express sound and generate music that is based on the complex vibrational spectrum offered by these protein structures. This offers an avenue to sonify the characteristic overlays of natural frequencies and to use them as a playable musical instrument. The significance of considering vibrations as a means to translate between material and sound has broader ranging implications. For instance, it was suggested that protein vibrations play a role in information processing in the brain,37 protein expression,38 or the growth of plants.39,40 The use of AI in understanding and classifying proteins and predicting de novo amino acid sequences has been explored in recent literature and presents an opportunity for further research investigations.41−44 Other work has applied AI to design composites, which can offer an efficient means to materials by design and manufacturing.45,46 Here we apply AI to learn

Figure 1. Overall flowchart of the work reported here, closing the loop between different manifestations of hierarchical systems in material and sound and the reversible translation in between the two representations. Future work could generate musical expressions by human compositions and thereby lead to de novo amino acid sequence designs and de novo proteins. In this paper, we generate musical compositions using AI, offering a design method for proteins. A key insight from this overarching approach is that we can use the neural network to generate music that is innately encoded with patterns reflecting the design principles of a certain group of protein structures. This encoded information in the audio can then be turned into protein sequences that are not included in the training set but that resemble the set of desired features. This means that the neural network has learned the design principles by which certain structural features are generated from the sequence of amino acids, closing the loop between material → sound → material.

RESULTS AND DISCUSSION A detailed description of methods used in this work is included in the Methods section. Since it is the basis for sound generation, we first review the frequencies generated by each amino acid, as depicted in Figure 2. Figure 2(a) shows the frequencies of the vibrational modes, from lowest to highest. The data show that each amino acid is associated with a particular frequency spectrum. The heaviest amino acid, TRP B

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano

Figure 2. Analysis of vibrational spectra of amino acids based on DFT data. (a) Depiction of the frequencies associated with each of the 20 amino acids computed based on DFT, with original data taken from ref 31, where the x-axis reflects the number of the mode whose frequency is plotted. In the sonification approach, the sound of each amino acid is generated by overlaying harmonic waves at the said frequencies and playing them together, creating a complex sonic spectrum associated with each of them. (b) Depiction of the lowest frequency of each amino acid, sorted from smallest to largest. The range of notes of the base frequencies, in terms of conventional musical scales, is approximately from B0 to F4 (spanning around 3 octaves). However, the total sonic character of each amino acid is more complex, as it is created through the overlay of all frequencies shown in panel (a).

the 12 tones per octave used in Western classical music (detected notes are displayed in the form of “blobs” in the analysis). The analysis shows that the character of each amino acid sound is composed of multiple-frequency clusters, representing a concept similar to a musical chord. Further, the data show that while some frequencies fall on piano keys, most are in between keys, representing a complex collection of frequencies. Attempting to fit a natural musical to the data, the algorithm predicts the best fit overall to a C minor scale. Table 1 shows the results of an analysis where the best fit to a musical scale for each of the 20 amino acids is presented. The analysis suggests that the soundings of the amino acids are reflected through a set of major and minor scales, with varying degrees of fit to those scales. A sweep through the sounds associated with each of the 20 amino acids (ALA, ARG, ASN, ASP, CYS, GLU, GLN, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL) is represented in 20 AA sweep − DFT.mp3 (all audio files referenced in this paper are attached as Supporting Information). The CHARMM-based sonifications shows a similar tonal characteristic, although there are differences in the frequency spectra (generally, the frequencies predicted by CHARMM are higher, which we attribute to the assumption of the point charges in CHARMM that result in fewer degrees of freedom, more rigidity, and thus higher frequency). We note that due to computational limitations, DFT is not a feasible approach to simulate the vibrations of very large molecules or complexes of large molecules. CHARMM, on the other hand, offers a computationally more efficient way to compute molecular vibrations, which can be scaled to millions of atoms and beyond. It is noted that TYR has the lowest base frequency, as confirmed in Figure 2. In the melodic analysis shown in Figure 3 (top) where the frequency spectrum is mapped onto a piano

(tryptophan), shows the slowest increase of frequencies (and also the most modes, since it has the most degrees of freedom). GLY (glycine), the lightest amino acid, shows the fastest increase of frequencies with modes (and also the fewest modes, since it has the smallest degrees of freedom). Figure 2(b) depicts the lowest frequency of each amino acid, sorted from smallest to largest. We find that the range of notes of the base frequencies, in terms of conventional musical scales, is approximately from F2 to C#5. However, the sonic character of each amino acid is much more complex and does not follow conventional tunings, as it is created through the overlay of all naturally occurring frequencies. The frequencies of each amino acid are included as Supporting Information, with the results for both DFT B3LYP 31 and MD CHARMM based computations of the eigenfrequencies (supplementary files: ALL AAs - CHARMM - frequency data.txt and ALL AAs DFT - frequency data.txt). Note that although we performed the analysis with both DFT B3LYP and MD CHARMM data, only the DFT-based data are used from hereon, as they are considered a more accurate representation of the vibrational spectra of amino acids. The MD CHARMM data, however, can be useful for consistency, if additional sonifications of larger molecules or entire proteins are considered. Using the MD CHARMM data in such cases would allow for a consistent prediction of tonal character across further hierarchical scales. We find that the lowest frequency generated across all 20 amino acids stems from the TYR (tyrosine) residue, and in our algorithm, it is represented by a value of 61.74 Hz. The highest produced frequency is around 20 000 Hz. The audible frequency spectrum of humans is within the range of 20 to 20 000 Hz, and hence most generated protein vibrations fall within that range. Figure 3 shows an analysis of the frequency spectrum of the 20 amino acids, mapped onto a piano keyboard that features C

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano

Table 1. Analysis of Musical Scale Associated with Each of the 20 Amino Acids, Determined Based on a Best Fit Analysis of the Sound Spectruma amino acid identifier

musical scale fitted to its frequency spectrum

match to scale score

ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL

Bb major F major A major C major D major Eb major E major B minor F major Eb major Eb minor Eb minor B major E minor Eb minor Eb minor G# minor C# minor E minor G minor

70% 57% 75% 63% 78% 60% 67% 43% 50% 63% 25% 50% 50% 33% 71% 57% 43% 71% 56% 57%

a

The match to scale score is calculated based on the ratio of how many notes of all notes included in each amino acid sound fall onto the scale that was determined as best fit. The data show that the fit to scale ranges from 25% (for LEU) to 78% (CYS).

created music for further processing. A series of screenshots of the app is shown in Figure 4 and is available for download in the Google Play store as “Amino Acid Synth” (Google Play is a trademark of Google Inc.). The app features 20 keys representing the notes in the “amino acid scale” and allows users to interactively create melodies that represent amino acid sequences. Functional details of the app are explained in the caption of Figure 4. We now apply the protein sonification method described in the Methods section to translate protein sequences into musical expressions and process those further by focusing on what musical features resemble certain protein features. Table 2 summarizes a set of protein structures and associated audio files, all created based on existing protein structures. The name of each file corresponds to the PDB ID as listed in https:// www.rcsb.org. The proteins translated into musical expressions include 194l (lysozyme), 107m (myoglobin), 6cgz (β-barrel), a silk protein, amyloid protein, and others. Figure 5 shows an example of the musical score for 194l (lysozyme), featuring a musical piece with a 21 bar length. The musical score illustrates the interplay of pitch (reflecting different amino acids) and rhythm (reflecting different secondary structures), altogether reflecting the protein fold in musical space. A general weakness of sonification approaches alone is that they are not necessarily enough, on their own, to understand protein structure upon listening. At a minimum, it is strongly dependent on a person’s experience, training, and musical skill. To overcome this limitation, we propose using an AI approach to capture the expressions of the hierarchical structures of proteins in musical space through a neural network. Once trained against a data set, the neural network is capable of predicting musical expressions that resemble proteins that were not part of the training set. This overarching framework,

Figure 3. Top: Analysis of the frequency spectrum of the 20 amino acids (horizontal axis, labels at bottom) mapped onto a piano roll (left vertical axis) that includes the 12 semitones assigned in Western classical music. The analysis shows that the character of each amino acid sound is composed of multiple frequency clusters, representing a concept similar to a musical chord. Further, the data show that while some frequencies fall on piano keys, many are in between keys, representing a complex collection of frequencies. The left side of the graph depicts a piano roll (with its characteristic combination of white and black keys, 12 of them per octave and repeating), with the labels indicating the note associated with each key. The bottom graph shows a spectral analysis of the audio produced for all 20 amino acids, for a frequency range from 50 to 20 000 Hz.

keyboard, however, the TYR residue does not feature the lowest frequency. A melodic range spectrogram analysis of the original audio data depicted in Figure 3 (bottom) confirms that, indeed, TYR features the lowest base frequency in the produced audio. We attribute this to the algorithm used to map a complex sound onto a piano roll, reflecting the individual components of the sound and lacking some detail of the lower frequencies for this sound. More analysis could be done to understand better why the algorithm does not reflect this detail. To make the sounds of amino acids accessible to and playable by a broad audience, we developed a phone app that allows users to play the various soundings in an interactive manner, record and edit played sequences, and share the D

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano

Figure 4. Screenshots of the phone app, which allows users to play the sounds associated with each amino acid interactively, explore the sonic landscape, and record played sequences and share them (for further processing, e.g., to synthesize or computationally fold sequences created through playing the instrument). (a) Primary screen. (b) Amino acid keyboard, where each key on the phone app is assigned to the sound of one amino acid type, which plays upon touch. (c) Built-in sequence editor to change sequences played interactively with the keyboard. Space can be added to distinguish multiple protein chains. (d) Information panel of the app, giving scientific background and a reference to this paper. This app is published for free public download (https://play.google.com/store/apps/details?id=com.synth. aminoacidplayer; source code of the app is attached in the SI).

representation of notes as bars (bottom) to show rhythmic detail. The data are shown for three proteins in the training set (α helix rich 107m, β sheet rich 6zg, and a mix of various secondary structures in 194l). Figure 7 shows similar data for one of the de novo predicted proteins, revealing how secondary structure information is also encoded in the predicted proteins, in agreement with the ORION and MODELER predictions. Figure 8 shows a melodic spectrum for the sonified representation of lysozyme with PDB ID 194l. The figure depicts also a comparison with the secondary structure, revealing how certain acoustic patterns are associated with certain secondary structures of the protein. An interesting insight from this work is the interplay of universality and diversity. The elementary building blocks of proteins, e.g., amino acids and secondary structure types, are limited. However, the structures that are built from these, using hierarchical principles of organization, are complex and responsible for proteins being capable of acting in many functional roles (e.g., enzyme, structural material, molecular switch). The expression of sequence in a musical space offers a means to understand how different length scales determine function. It can be seen that the AI-generated musical compositions and amino acid sequences show similar repetitive characteristics as seen in the protein training set. The analysis in Table 3 shows that the structure of the resulting proteins reflects those characteristic features learned in the training set, confirming that the model is able to capture key structure− functional relationships between amino acid sequence and various levels of protein organization. As a specific example, it is possible to design proteins with certain structural features

summarized in Figure 1, allows one to utilize sonification as a design method through the use of AI. We train three neural networks reflecting the music generated by distinct protein classes. Details of the methods used are included in the Methods section. Training set #1 includes a set of β sheet (BS) rich proteins, training set #2 is α helix (AH) rich proteins, and training set #3 is a combination of the former two. We then use these trained neural networks to generate musical scores and then translate the musical scores back into amino acid sequences to obtain a set of de novo proteins, whose folded structure we analyze. Table 3 summarizes the results of these AI-generated musical compositions as well as images of the de novo proteins designed by AI. We note that while the musical representations that are used to train the neural networks include both sequence and secondary structure information, when we translate the musical scores back into amino acid sequences, we solely capture the sequence of amino acids. This serves as a way to test the predictive capabilities of the neural networks as to whether or not they are capable of predicting proteins with the desired secondary structures and higher-order folding patterns. Indeed, the data in Table 3 show that the neural networks are capable of achieving this feat, as they are capable of designing proteins with the desired features: AH-rich proteins, BS-rich proteins, or a mix of the two. Further addressing this issue, we analyze the predicted musical patterns to better understand how secondary structures of proteins are reflected in musical space. Figure 6 shows an analysis of musical patterns, here visualized on a piano roll (y-axis) over time (x-axis) (top) and a E

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano Table 2. List of Audio Files Created Based on Existing Protein Structures (Most of Them Experimentally Determined and Deposited in the Protein Data Bank (PDB)), Sonified Using the Approach Described in the Papera

Figure 5. Musical score generated for the protein 194l (lysozyme), 21 bars long. Note that the notes indicated do not reflect a conventional musical scale, but that each note in the space of 20 admissible tones in the native amino acid scale is assigned to one of the 20 amino acids. The score is shown here only for visualization of the concept and to illustrate the timing, rhythm, and progression of notes as learned from the amino acid sequence (in the score C2 = ALA, D2 = ARG, and so on; each of the amino acids is assigned to 20 notes in the C major scale on a piano roll [the white keys]). The score illustrates the progression from α helix rich secondary structures to segments of β sheet folds, to α helix structures, to random coils toward the end.

encoded information can then be turned back into protein sequences that are not included in the training set, but that resemble a set of desired features. This means that the neural network has learned the design principles by which certain structural features of proteins are generated from the sequence of amino acids. Other future applications could be, for instance, to build a database of various enzymes and then use the design approach shown here to develop a set of enzymes that can be used as the basis for functional optimization and exploration of a very broad design space. The expression of certain features can be achieved either by selecting a protein as seed for further generation or by developing a training set or global conditioning to reflect certain features. Similar concepts as proposed here for protein design may be applied to other nanomaterial design problems and interactions between proteins and nanoparticles.47

The name of each file corresponds to the PDB ID as listed in https://www.rcsb.org. a

(e.g., α helices, β sheets) with this process. A key insight from this is that we can use the neural network to generate music that is innately encoded with patterns reflecting the design principles of a certain group of protein structures. This F

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano

parameters. As demonstrated in the paper, the representation of a protein in musical spacea sort of languagealso allows us to use neural network methods to train, classify, and generate de novo protein sequences. Our method here can provide a useful tool that allows anyone to easily translate between protein and music, make rigorous analogy with the training set, and satisfy the given design requests (sequence seed, secondary structure, etc.). We think this approach may be generalized to express the structure of other nanostructures in a different domain (here, sound) that provides a better interface with human cognition than plain data and may intrigue more creativity. For example, it will allow humans to tune music either intuitively or according to music theories or tools to modify a protein structure. The method offers an avenue for musical compositions to be translated into protein sequences and to understand patterns in various forms of hierarchical systems and how they can be designed. Proteins are the most abundant building blocks of all living things, and their motion, structure, and failure in the context of both normal physiological function and disease is a foundational question that transcends academic disciplines. In this paper we focused on developing a model for the vibrational spectrum of the amino acid building blocks of proteins, an elementary structure from which materials in living systems are built. This concept could be broadly important. For instance, at the nanolevel of observation, all structures continuously move. This reflects the fact that they are tiny objects excited by thermal energy and set in motion to undergo large deformations. This concept of omnipresent vibrations at the nanoscale is exploited here to extract audio as one way to represent nature’s concept of hierarchy as a paradigm to create complex, diverse function from simple, universal building blocks. More broadly, the translation from various hierarchical systems into one another poses a paradigm to understand the emergence of properties in materials, sound, and related systems and offers design methods for such systems where large-scale and small-scale relationships interplay. Additional analyses could be performed, for instance by investigating mutations and other aspects with disease mutations, offering potential avenues for future work. The method reported here can find useful applications in STEM outreach and general outreach to explain the concept of protein folding, design, and disease etiology (through making protein misfolding or mutations audible) to broad audiences. It also offers insights into the couplings between sound and matter, a topic of broad interest in philosophy and art. Finally, the AI-based approach to design de novo proteins provides a generative method that can complement conventional protein sequence design methods.

Table 3. Summary of AI Designed de Novo Proteins Using the Three Neural Network Models Developed and Description of Corresponding Audio Files on Which the Protein Structure Is Based

METHODS Vibrational Spectrum. To generate audible sound, we use the vibrational spectrum of amino acids, defined by the set of eigenfrequencies, as a basis. We consider two data sets for sound generation, one that bases the vibrational spectra on B3LYP DFT as published in ref 31 and the other one where we use CHARMM MD to compute the same data. In the latter case we use a custom Bash script that allows integrating multiple open source software with the CHARMM c37b1 program to automatically analyze each of the amino acids and then compute their normal modes. Translating a Vibrational Spectrum into Audible Sound. We use an interactive tool that allows us to generate sounds based on the list of eigenfrequencies provided, implemented in Max 8.03,48,49 and accessed through a Digital Audio Workstation (DAW) Ableton Live

CONCLUSION In this paper we reported an approach to sonify protein sequences and understand protein compositions in a different, musical space. This translation may offer cognitive avenues to understand protein function and how it changes under variations of sequence, secondary structure, and other G

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano

Figure 6. Analysis of musical patterns, here visualized on a piano roll (y-axis) over time (x-axis) (top) and a representation of notes as bars (bottom) to show rhythmic detail. The data are shown for three proteins in the training set (α helix rich 107m, β-sheet rich 6zg, and a mix of various secondary structures in 194l). The images show how the secondary structures are reflected in musical patterns (examples of specific areas highlighted in bottom row).

Figure 7. Analysis of musical patterns generated by AI, here visualized on a piano roll (y-axis) over time (x-axis) (top) and a representation of notes as bars (bottom) to show rhythmic detail, for the longer protein sequence predicted from the AH-BS training set. As shown in the analysis, the model predicts protein designs with both α helix (toward the beginning of the sequence, protein predicted shown on top left) as well as β sheet rich proteins (toward the end of the sequence, protein predicted shown on bottom right). 10.1b15 (Ableton is a trademark of Ableton AG).50 Max is a visual programming language for music and used here to implement a method to realize the sound of all amino acids analyzed using our method. We use a sound generation engine developed earlier19 and adapt it here for the synthesis of amino acid soundings, considering the first 64 vibrational modes of each amino acid (higher-order modes beyond the audible spectrum are not considered). To translate the vibrational frequencies into audible sound, we transpose the frequencies of molecular vibrations into the audible range by multiplying the frequencies, normalized by the lowest frequency that is found in any of the 20 amino acids (it is seen in the first mode of TYR), by 61.74 Hz (corresponding to the B0 tone). This translation process is based on the music theoretical concept of transpositional equivalence, a feature of musical set theory.51 The choice of base frequency of 61.74 Hz is chosen based on the audible frequency ranges, so that the resulting frequency spectra of all 20 amino acids is transposed to the audible range. We build the spectrum of higher-order frequencies on top of the lowest eigenfrequency, each represented by harmonic sine waves that are added to form the audio signal associated with each amino acid. This method of overlaying higher frequencies based on the particular spectrum of each amino acid allows us to translate the frequencies into audible space and to maintain the characteristic sound spectrum associated with each

amino acid without altering it by confining it to conventional musical scales. An advantage of using the chemistry-based approach to define soundings of each amino acid is that the characteristic sound of each, defined by the set of harmonic waves superpositioned to create the audio, is self-consistent across all amino acids and that it naturally captures the differences between distinct amino acid vibrational spectra. This leads to a specific tonal characteristic, or timbre, of each of the amino acids. Moreover, since the base frequency and all higherorder contributions of each amino acid residue is different, as shown in Figure 2(b), the sound associated with each amino acid is distinct and has a reversible association with a musical note. These notes do not reflect the classical Western musical scales,51,52 but define their own natural scale innate in the vibrations of the amino acids. Alternative approaches that have been defined as a means to map amino acid sequences to sound assign a certain classical note or chord to each amino acid residue.23 This earlier method maps the protein sequences into the framework of Western classical musical scales. However, it does not capture the foundational vibrational characteristic of each protein, as it was predetermined to be expressed in classical Western scales.23 Our analysis suggests that there exists an “amino acid scale” that is composed by 20 sounds. H

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano

Figure 8. Melodic range spectrogram over time for the sonified representation of lysozyme with PDB ID 194l (total duration of the music analyzed is around 48 s). The figure depicts also a comparison with the secondary structure, revealing how certain acoustic patterns are associated with certain secondary structures of the protein. (a) Frequency spectrum and (b) secondary structure over the sequence of the protein (note that the time axis in panel (a) and sequence axis in panel (b) are not identical, since different secondary structures are associated with different rhythms, leading to variations in time passed per amino acid). We use Sonic Visualizer (version 3.2.1)54 to analyze time histories of frequency patterns of the produced sounds to study the initial music and features representing certain secondary structures as well as a comparison of predicted musical features and the folded proteins. We apply the melodic spectrum analysis tool to represent the frequency spectrum (y-axis) over time (x-axis). Translating Protein Sequence into Musical Scores. Building a Musical Instrument. After generating the sound of each of the 20 amino acids, we assign each of the amino acids to one key on a piano, using Ableton Live Sampler (a sampling instrument that allows one to play back audio recordings). We use an input device such as a MIDI keyboard, Ableton Push, ROLI BLOCK, or similar devices that allow convenient access to playing the instrument (the advantage of using devices like Push or BLOCK is that they do not follow the traditional 12-tone piano roll setup with white and black keys but allow instead to be programmed to represent the 20-tone “amino acid scale”). One can visualize the resulting musical instrument as a piano with 20 keys. This setup allows one to play the musical instrument and use the amino acid soundings as a generative way to create musical complexity over time. For instance, the C3 key is mapped to ALA, the D3 key to ARG, and so on (for all 20 amino acids, on 20 distinct

The resulting audio recordings are analyzed using Sennheiser HD 800 S high-resolution reference headphones (Sennheiser electronic GmbH & Co. KG, frequency response: 6−48 000 Hz (−10 dB)), as well spectrum analyzers implemented in Ableton Live and Max/MSP, to empirically confirm the predicted frequency ranges. The use of reference headphones is important for the detailed analysis of the sounds produced. The wide and flat frequency spectrum of the reference headphones used allows us to exactly hear minute changes of the tones generated in the broad frequency spectrum originating from the amino acid vibrations. Spectral Analysis of Amino Acid and Protein Soundings. We use Melodyne Studio 4.2.153 to analyze the note spectrum associated with each of the 20 amino acids, mapped onto a piano roll. Using the polyphonic detection mechanism in Melodyne (DNA Direct Note Access technology), we analyze the sound of all 20 amino acids (Melodyne and DNA Direct Note Access are registered trademarks of Celemony Software GmbH). The method allows us to detect the notes of which the sound of each of the amino acids is composed of, as well as the underlying musical parameters such as relative volume. We use the scale detective function in Melodyne to find the best fit to any of the musical scales to the amino acid soundings. I

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano keys). It is important to note that the proteins do not reflect the sound of what is commonly associated with C3, D3, etc. Instead, they directly resemble the sound defined by their innate vibrational spectrum as described above without any change in the frequency spectrum or tonal characteristic. The mapping onto piano keys is done solely for convenience in the use of existing interactive electronic music devices, the Digital Audio Workstation, and MIDI. Mapping Amino Acid Sequences into Musical Scores. Another way by which we exploit the sonification approach is to map amino acid sequences into musical scores that reflect music composed in the “amino acid scale”. Using bioinformatics libraries Biopython and Biskit we developed a python script that translates any sequence into a musical score. Sequences using the one-letter amino acid code can be entered either manually or based on lists of one or more protein PDB identifiers. We also implemented a function by which proteins can be searched and grouped, using PyPDB. This allows one to build complex musical scores (e.g., to generate music or for use as training sets for the neural networks). Musical scores are stored as MIDI files that can be accessed by DAWs and sonified using the Ableton Sampler tool described above. To reflect higher-order chemical structure in the musical space, we incorporate information about the secondary structure associated with each amino acid in the translation step in affecting the duration and volume of notes played. We use DSSP to compute the secondary structure from the protein geometry file and sequence.55,56 Table 4

The sequence data are used for further analysis to examine similarities with known proteins and to build 3D models using protein folding methods. To better understand the similarities of amino acid sequences, we use tools such as BLAST.57 To build 3D models of proteins, homology methods58 or other protein folding approaches are used. In the analysis reported here we use ORION58 to predict an estimated structure of the designed protein sequences, and a 3D structure is obtained using MODELER,59 reflecting the images shown in Table 3 (right column). Design Approaches. The translation to music and from music to protein sequence enables a seamless mapping between different manifestations of matter. It enables a design approach of sequences in either molecular space or musical space, or combinations thereof. For instance, musical compositions generated by humans or AIs can be analyzed in the protein space. Proteins can thereby be a source of musical compositions and generate innovative concepts for artistic expressions. Deep Learning to Generate Musical Scores with a Recurrent Neural Network Model. We use the musical scores generated from amino acid sequences and train a deep neural network, using the Magenta framework developed by Google Brain, which is implemented in TensorFlow60 (https://magenta.tensorflow. org/) (Google Brain and TensorFlow are trademarks of Google Inc.). A recurrent neural network (RNN) we used for melody generation is adopted from language modeling, which was implemented in the Melody RNN model61 using TensorFlow.60 This RNN cell uses a long short-term memory unit (LSTM) for time sequence featuring alongside an attention model.10 The attention model allows us to access past information in the musical score and hence learn longer term dependencies in musical note progression. To illustrate the approach, we develop two RNN models using several training sets derived from collections of musical scores translated by the approach described above. We train the model using a batch size of 128, two layers of RNN with 128 units each, and an attention length of 40 steps (2.5 bars). Training is done until convergence is achieved, typically around 40 000 steps or less. The training and generations are done on a Dell Precision Tower 7810 workstation (Xeon CPU E5-2660 v4 2.0 GHz, 32 GB memory with a GeForce RTX 2080 Ti GPU). Training Set #1: β Sheet Rich Proteins. We use a training set consisting of β-barrel protein structures and similar β sheet rich proteins (PDB IDs 6CZG, 2YNK, 6CZJ, 6CZH, 6CZI, 2JMM, 6CZI, 2JMM, 6D0T, 3P1L, 2QOM, 1G7N, 4K3B, 5EE2, 5G38, 5G39, 5NJO, 6F SU, 1DC9, 2F1V, 2F1T, 5LDT, 2MXU, 2NNT, 3OW9, 2LNQ, 5KK3, 2MUS, 2M5M, 2M5K, 2LBU, 3ZPK, 6EKA, 5O65, 2E8D, 2LMP, 2LMO, 4RIL, 2LMN, 2KJ3, 2RNM, 2LMQ, 5OQV, 2M5N, 2KIB, 2BEG, 2M5N, 2KIB, 5O67, 2N0A, 6CU8, 6CU7, 6CU8, 6FLT, 3LOZ, 4OLR; around 20 000 amino acid residues). Training Set #2: α Helix Rich Proteins. We use a training set consisting of α helix rich proteins (PDB 6A9P, 6F62, 6F63, 6F64, 6GAJ, 6GAK, 5VR2, 5TO5, 5TO7, 5XDJ, 5LBJ, 2NDK, 5WST,5IIV,5D3A, 5HHE, 2MG1, 2LBG,2L5R,3 V4Q, 2D3E, 2HN8, 2FXO, 3TNU, 4YV3, 1GK6, 3SSU, 3SWK, 2XV5, 3UF1, 3PDY, 1X8Y, 3TNU, 4ZRY, 6E9R, 6E9T, 6E9X, 2MG1; around 20 000 amino acid residues). Training Set #3: α Helix and β Sheet Rich Proteins. This training set includes all protein sequences from training sets #1 and #2 combined. Generation of Music. We use the trained neural network model to generate various musical scores. As seeds for musical score generation we use either a set of notes (we seed it with two notes reflecting the ALA and VAL notes) or an existing protein structure taken from the PDB (the seeds used for each of the cases are described in Table 3). Using the synthesis method described above we sonify these musical scores, just as we sonified the naturally occurring musical scores using the method described above. The seed used acts as the basis for further note generation and thereby acts as a template for the following sequences that are variations and evolutions of the notes represented in the seeds. This allows one to use variations in the seed to control the type of musical patterns produced and, by extension, the type of protein designed. While we

Table 4. Incorporation of Secondary Protein Structure in the Translation into a Musical Score, Affecting Note Timing and Note Volumea secondary structure

note timing

note volume

β sheet (all types) helices (α helix and others) random coil and unstructured

1.0 0.5 2.0

1 0.5 0.25

a Different proteins are separated by a longer break. By classifying three major secondary structure classes we can capture their representation in musical space and also translate the feature into the AI. Figure 6 shows how these rules are reflected in the corresponding musical score.

lists the parameters determined by this approach. We propose using longer note durations for disordered secondary structures, very short note durations for helices, and short notes for β-sheets. We also modulate the volume by rendering β-sheets the loudest, and others more softly. For instances, ALA residues in a BS will be played louder and slower than ALA residues in an AH, which will be played in a fast and repetitive manner. Similarly, ALA residues in random coils or unstructured regions would be played slow and softly. These modulations of the tone by volume and timing lead to a certain rhythmic character that overall reflects the 3D folded geometry of the protein. It is noted that, distinct from the way we obtained the frequency spectra of amino acids, the effect of secondary structure on the musical score is not directly based on physical principles and involves choices. However, for the training of the neural networks, capturing these features is essential, as it reflects the hierarchical nature of the protein fold from primary, to secondary, to tertiary and higher-order structures. Mapping Musical Scores into Protein Sequences and Protein Structure Analysis. To map musical scores back into amino acid sequences, we developed a script that reads a MIDI file and maps the notes associated with the 20 amino acids back onto amino acids, generating sequence outputs in the one-letter codes. In the translation of the musical scores back into amino acid sequences we solely capture the sequence of amino acids. This serves as a means to test the predictive power of the neural networks as to whether or not they are capable of predicting proteins with the desired secondary structures. In principle, secondary structure information could be extracted from the musical scores as well. J

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano

(8) Buehler, M. J. Tu(r)ning Weakness to Wtrength. Nano Today 2010, 5, 379. (9) Giesa, T.; Spivak, D. I.; Buehler, M. J. Reoccurring Patterns in Hierarchical Protein Materials and Music: The Power of Analogies. Bionanoscience 2011, 1, 153−161. (10) Bahdanau, D., Cho, K., Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate, arXiv:1409.0473, 2014. (11) Huang, C.-Z. A., Vaswani, A., Uszkoreit, J., Shazeer, N., Simon, I., Hawthorne, C., Dai, A. M., Hoffman, M. D., Dinculescu, M., Eck, D. Music Transformer, arXiv:1809.04281v3, 2018. (12) Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D. A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music, arXiv:1803.05428v4, 2018. (13) Tamerler, C.; Sarikaya, M. Genetically Designed Peptide-Based Molecular Materials. ACS Nano 2009, 3, 1606−1615. (14) Peralta, M. D. R.; Karsai, A.; Ngo, A.; Sierra, C.; Fong, K. T.; Hayre, N. R.; Mirzaee, N.; Ravikumar, K. M.; Kluber, A. J.; Chen, X.; Liu, G.; Toney, M. D.; Singh, R. R.; Cox, D. L. Engineering Amyloid Fibrils from β-Solenoid Proteins for Biomaterials Applications. ACS Nano 2015, 9, 449−463. (15) Mathieu, F.; Liao, S.; Kopatsch, J.; Wang, T.; Mao, C.; Seeman, N. C. Six-Helix Bundles Designed from DNA. Nano Lett. 2005, 5, 661−665. (16) Russ, M. Sound Synthesis and Sampling; Burlinton: Focal, 2009. (17) Su, I.; Qin, Z.; Saraceno, T.; Krell, A.; Mühlethaler, R.; Bisshop, A.; Buehler, M. J. Imaging and Analysis of a Three-Dimensional Spider Web Architecture. J. R. Soc., Interface 2018, 15, 20180193. (18) Su, I., Qin, Z., Bisshop, A., Muehlethaler, R., Ziporyn, E., Buehler, M. J. Sonification of a 3D Spider Web and Reconstitution into Musical Composition using Granular Synthesis. Submitted. (19) Qin, Z.; Buehler, M. J. Analysis of Molecular Vibrations of over 100,000 Protein Structures, Sonification, and Application as a New Musical Instrument. Extrem. Mech. Lett. 2019, 29, 100460. (20) Yeo, J.; Jung, G.; Tarakanova, A.; Martín-Martínez, F. J.; Qin, Z.; Cheng, Y.; Zhang, Y.-W.; Buehler, M. J. Multiscale Modeling of Keratin, Collagen, Elastin and Related Human Diseases: Perspectives from Atomistic to Coarse-Grained Molecular Dynamics Simulations. Extrem. Mech. Lett. 2018, 20, 112−124. (21) Spivak, D. I.; Giesa, T.; Wood, E.; Buehler, M. J. Category Theoretic Analysis of Hierarchical Protein Materials and Social Networks. PLoS One 2011, 6, 0023911. (22) Brommer, D. B.; Giesa, T.; Spivak, D. I.; Buehler, M. J. Categorical Prototyping: Incorporating Molecular Mechanisms into 3D Printing. Nanotechnology 2016, 27, 024002. (23) Takahashi, R.; Miller, J. H. Conversion of Amino-Acid Sequence in Proteins to Classical music: Search for Auditory Patterns. Genome Biol. 2007, 8, 405. (24) Duncan, A. Combinatorial Music Theory. J. Audio Eng. Soc. 1991, 39, 427−448. (25) All The Scales (https://allthescales.org/, May 14, 2019). (26) Supper, A. Sublime frequencies: The Construction of Sublime Listening Experiences in the Sonification of Scientific Data. Soc. Stud. Sci. 2014, 44, 34−58. (27) Dubus, G.; Bresin, R. A systematic Review of Mapping Strategies for the Sonification of Physical Quantities. PLoS One 2013, 8, No. e82491. (28) Delatour, T. Molecular Music: The Acoustic Conversion of Molecular Vibrational Spectra. Comput. Music J. 2000, 24, 48−68. (29) Beese, A. M.; Sarkar, S.; Nair, A.; Naraghi, M.; An, Z.; Moravsky, A.; Loutfy, R. O.; Buehler, M. J.; Nguyen, S. T.; Espinosa, H. D. Bio-Inspired Carbon Nanotube-Polymer Composite Yarns with Hydrogen Bond-Mediated Lateral Interactions. ACS Nano 2013, 7, 3434−3446. (30) Giesa, T.; Schuetz, R.; Fratzl, P.; Buehler, M. J.; Masic, A. Unraveling the Molecular Requirements for Macroscopic Silk Supercontraction. ACS Nano 2017, 11, 9750−9758. (31) Moon, J. H.; Oh, J. Y.; Kim, M. S. A systematic and Efficient Method to Estimate the Vibrational Frequencies of Linear Peptide and Protein Ions with any Amino Acid Sequence for the Calculation

have explored a variety of seeds in this paper (as shown in Table 3), future studies could explore these relationships in greater detail. We translate the AI-generated musical scores back to amino acid sequences for further analysis, as described above. Interactive Phone App Development. We use Android Studio (https://developer.android.com/) to create an app for Android phones (Android is a trademark of Google LLC). In order to play the sounds, we program a Java class and use the MediaPlayer library in Android Studio. We create a MediaPlayer object and use associated attributes to play and stop the audio. In order to run the Java voids, we used XML, as well as for formatting of the colors, text, and design. Audio files in the WAV format of all 20 amino acid sounds are used for the app. The app is published for public download on the Google Play store (https://play.google.com/store/apps/details?id=com. synth.aminoacidplayer. The Java source code is attached in the SI.

ASSOCIATED CONTENT S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acsnano.9b02180. Overview of all files included in SI.zip (PDF) Data files, MP3 audio files, and Java code (ZIP)

AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected]. Tel: +1.617.452.2750. ORCID

Markus J. Buehler: 0000-0002-4173-9659 Author Contributions

M.J.B. designed this research, in collaboration with Z.Q., C.H.Y., and F.M.M. Z.Q. conducted the MD CHARMM analysis of vibrational frequencies. M.J.B. and C.H.Y. conducted the AI training and AI music generation. F.M.M. contributed the analysis of the DFT data. The paper was written by M.J.B with input from all coauthors. Notes

The authors declare no competing financial interest.

ACKNOWLEDGMENTS This research was supported by ONR (grant # N00014-16-12333) and NIH U01 EB014976. We acknowledge E. L. Buehler for help with python analysis scripts and designing the interactive phone app. We acknowledge the MIT Center for Art, Science & Technology (CAST) program for fruitful discussions. The RNN models, training sets used for the generations, and associated PDB and MIDI files are available from the authors upon request. REFERENCES (1) Bucur, V. Handbook of Materials for String Musical Instruments; Springer International Publishing: New York, 2016. (2) Buehler, M. J. Materials by Design - A Perspective from Atoms to Structures. MRS Bull. 2013, 38, 169−176. (3) Hansen, U. J. Materials in Musical Instruments. J. Acoust. Soc. Am. 2011, 129, 2517−2518. (4) Hofstadter, D. R. Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books: New York, 1979. (5) Osaki, S. Spider Silk Violin Strings with a Unique Packing Structure Generate a Soft and Profound Timbre. Phys. Rev. Lett. 2012, 108, 154301. (6) Wegst, U. G. K. Bamboo and Wood in Musical Instruments. Annu. Rev. Mater. Res. 2008, 38, 323−349. (7) Xenakis, I. Formalized Music: Thought and Mathematics in Composition; Indiana University Press: Bloomington, 1971. K

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX

Article

ACS Nano of Rice-Ramsperger-Kassel-Marcus Rate Constant. J. Am. Soc. Mass Spectrom. 2006, 17, 1749−1757. (32) Wong, M. W. Vibrational Frequency Prediction using Density Functional Theory. Chem. Phys. Lett. 1996, 256, 391−399. (33) Watson, T. M.; Hirst, J. D. Density Functional Theory Vibrational Frequencies of Amides and Amide Dimers. J. Phys. Chem. A 2002, 106, 7858−7867. (34) Barth, A.; Zscherp, C. What Vibrations Tell us About Proteins. Q. Rev. Biophys. 2002, 35, 369−430. (35) Rischel, C.; Spiedel, D.; Ridge, J. P.; Jones, M. R.; Breton, J.; Lambry, J.-C.; Martin, J.-L.; Vos, M. H. Low Frequency Vibrational Modes in Proteins: Changes Induced by Point-mutations in the Protein-Cofactor Matrix of Bacterial Reaction Centers. Proc. Natl. Acad. Sci. U. S. A. 1998, 95, 12306−12311. (36) Patodia, S., Bagaria, A., Chopra, D. Molecular Dynamics Simulation of Proteins: A Brief Overview. J. Phys. Chem. Biophys. 2014, 4, DOI: 10.4172/2161-0398.1000166. (37) Smythies, J. On the Possible Role of Protein Vibrations in Information Processing in the Brain: Three Russian Dolls. Front. Mol. Neurosci. 2015, 8, DOI: 10.3389/fnmol.2015.00038. (38) Ghosh, R.; Mishra, R. C.; Choi, B.; Kwon, Y. S.; Bae, D. W.; Park, S.-C.; Jeong, M.-J.; Bae, H. Exposure to Sound Vibrations Lead to Transcriptomic, Proteomic and Hormonal Changes in Arabidopsis. Sci. Rep. 2016, 6, 33370. (39) Hassanien, R. H. E.; Tian-Zhen, H.; Li, Y.-F.; Li, B.-M. Advances in Effects of Sound Waves on Plants. J. Integr. Agric. 2014, 13, 335−348. (40) Fernandez-Jaramillo, A. A.; Duarte-Galvan, C.; Garcia-Mier, L.; Jimenez-Garcia, S. N.; Contreras-Medina, L. M. Effects of Acoustic Waves on Plants: An Agricultural, Ecological, Molecular and Biochemical Perspective. Sci. Hortic. 2018, 235, 340−348. (41) Al-Shahib, A.; Breitling, R.; Gilbert, D. R. Predicting Protein Function by Machine Learning on Amino Acid Sequences − a critical evaluation. BMC Genomics 2007, 8, 12051. (42) Mirabello, C., Wallner, B. rawMSA: Proper Deep Learning Makes Protein Sequence Profiles and Feature Extraction Obsolete, bioRxiv 394437, 2018. (43) Hou, J.; Adhikari, B.; Cheng, J. DeepSF: Deep Convolutional Neural Network for Mapping Protein Sequences to Folds. Bioinformatics 2018, 34, 1295−1303. (44) Wang, J.; Cao, H.; Zhang, J. Z. H.; Qi, Y. Computational Protein Design with Deep Learning Neural Networks. Sci. Rep. 2018, 8, 6349. (45) Gu, G. X.; Chen, C.-T.; Buehler, M. J. De Novo Composite Design Based on Machine Learning Algorithm. Extrem. Mech. Lett. 2018, 18, 19−28. (46) Gu, G. X.; Chen, C.-T.; Richmond, D. J.; Buehler, M. J. Bioinspired Hierarchical Composite Design Using Machine Learning: Simulation, Additive manufacturing, and Experiment. Mater. Horiz. 2018, 5, 939−945. (47) Calvaresi, M.; Zerbetto, F. Baiting Proteins with C 60. ACS Nano 2010, 4, 2283−2299. (48) Elsea, P. The Art and Technique of Electroacoustic Music; A-R Editions: Middleton, 2013. (49) Cycling ’74 Max 8, https://cycling74.com/ (May 14, 2019). (50) Ableton Live Digital Audio Workstation, https://www.ableton. com/en/live/ (May 14, 2019). (51) Schuijer, M. Analyzing Atonal Music: Pitch-Class Set Theory and its Contexts; University of Rochester Press: Rochester, 2008. (52) Forte, A. The Structure of Atonal Music; Yale University Press:New Haven, 1973. (53) Melodyne Studio, https://www.celemony.com/en/melodyne (May 14, 2019). (54) Cannam, C., Landone, C., Sandler, M. Sonic Visualiser. In Proceedings of the International Conference on Multimedia - MM ’10; ACM Press: New York, 2010; pp 1467−1468. (55) Kabsch, W.; Sander, C. Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 1983, 22, 2577−2637.

(56) Joosten, R. P.; te Beek, T. A. H.; Krieger, E.; Hekkelman, M. L.; Hooft, R. W. W.; Schneider, R.; Sander, C.; Vriend, G. A Series of PDB Related Databases for Everyday Needs. Nucleic Acids Res. 2011, 39, 411−419. (57) Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic Local Alignment Search Tol. J. Mol. Biol. 1990, 215, 403−410. (58) Ghouzam, Y.; Postic, G.; Guerin, P.-E.; de Brevern, A. G.; Gelly, J.-C. ORION: A Web Server for Protein Fold Recognition and Structure Prediction using Evolutionary Hybrid Profiles. Sci. Rep. 2016, 6, 28268. (59) Eswar, N.; Webb, B.; Marti-Renom, M. A.; Madhusudhan, M. S.; Eramian, D.; Shen, M.; Pieper, U.; Sali, A. Comparative Protein Structure Modeling Using Modeller. Curr. Protoc. Bioinform. 2006, 15, 5.6.1−5.6.30. (60) Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X., Brain, G. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16); USENIX Association: Berkeley, 2016; pp 265−283. (61) Waite, E.; Eck, D.; Roberts, A.; Abolafia, D. Project Magenta: Generating Long-Term Structure in Songs and Stories, https://magenta. tensorflow.org/2016/07/15/lookback-rnn-attention-rnn, 2016 (May 14, 2019).

L

DOI: 10.1021/acsnano.9b02180 ACS Nano XXXX, XXX, XXX−XXX