MS-DECODER: Milliseconds Sequencing of ... - ACS Publications

Oct 4, 2017 - As recently discussed in this journal,1 synthetic abiotic macromolecules can be used to store information at the molecular level and the...
3 downloads 12 Views 720KB Size
Note pubs.acs.org/Macromolecules

MS-DECODER: Milliseconds Sequencing of Coded Polymers Alexandre Burel,† Christine Carapito,*,† Jean-François Lutz,*,‡ and Laurence Charles*,§ †

Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, CNRS UMR7178, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France ‡ Institut Charles Sadron UPR22, CNRS, Université de Strasbourg, 23 rue du Loess, 67034 Cedex 2 Strasbourg, France § Aix Marseille Université, CNRS, UMR 7273, Institute of Radical Chemistry, 13397 Marseille Cedex 20, France S Supporting Information *

A

s recently discussed in this journal,1 synthetic abiotic macromolecules can be used to store information at the molecular level and therefore open up new areas of application for man-made polymers, including data storage,2 long-term storage, 3 and anticounterfeiting technologies.4 In such information-containing polymers,5 monomer units are used as a molecular alphabet, which can be binary, ternary, quaternary, decimal, or even more complex.1 Such polymers are usually synthesized via an iterative chemical process that allows preparation of uniform sequence-defined macromolecules.6 For instance, we have reported over the past three years the synthesis of a variety of digitally encoded macromolecules.7−11 These digital polymers allow information storage at room temperature and can be decoded using a sequencing technique,12 which is an analytical method that permits to characterize monomer sequences. Yet, although a wide variety of sequencing methods exist for biopolymer analysis,13,14 only a few of them have been validated so far for the characterization of non-natural polymers.12 Currently, tandem mass spectrometry (MS/MS) is the leading method for deciphering the sequences of synthetic polymers,15 as shown by us9−11,16−18 as well as others.19−21 In particular, we have emphasized that the sequenceability of digital polymers can be significantly improved through polymer design. Indeed, the molecular structure of sequence-coded macromolecules can be optimized to minimize the number of dissociation routes while avoiding secondary fragmentations in MS/MS measurements,9,10,22 thus greatly facilitating spectra interpretation. However, for nonnatural polymers, analysis of MS/MS spectra is usually performed manually, thus leading to average decoding times of about 1−10 min, depending indeed on sequence length and spectrum complexity. Although analysis times in the minute range are compatible with fundamental studies, they become limiting for advanced technological applications such as data storage or anticounterfeiting tags. In this context, automated approaches permitting to reduce decoding time could be very beneficial for the emerging field of digital polymers. During the past decades, the demanding fields of genomics and proteomics have been drastically simplified by the development of bioinformatics software that simplify protein and DNA sequencing. For example, the open-source UniNovo or the commercial PEAKS algorithms are widespread tools for MS/ MS peptide sequencing.23,24 However, these software tools are specifically conceived for peptides/proteins and cannot be easily applied to synthetic polymers that obey different fragmentation rules. In this technical note, we introduce a © XXXX American Chemical Society

new open-source tool called MS-DECODER, which was specifically conceived for the automated sequencing of synthetic sequence-coded macromolecules. The MS-DECODER algorithm is described, and its versatility is demonstrated by its successful application to sequence three types of digital polymers (Figure 1) that exhibit different fragmentation patterns in collision-induced dissociation (CID). It enables accurate decoding for all tested candidates in the millisecond range on a basic laptop computer. MS/MS Sequencing Rules of Coded Polymers. The simplest sequencing rules were defined for poly(urethane)s (PUs) when analyzed in the negative ion mode. As recently

Figure 1. Molecular structures of the different sequence-coded polymers that were studied in the present work: (a) digital polyurethanes (PU), (b) digital poly(alkoxyamine phosphodiester)s (PAP), and (c) digital poly(alkoxyamine amide)s (PAA). Received: August 10, 2017 Revised: September 22, 2017

A

DOI: 10.1021/acs.macromol.7b01737 Macromolecules XXXX, XXX, XXX−XXX

Note

Macromolecules

Figure 2. ESI-MS/MS and associated dissociations schemes of (a) PU (type II) and (b) PAP (type II), both coding for 0011 and analyzed in the negative ion mode, and (c) PAA coding for 110110 and produced as a doubly protonated species in the positive ion mode.

described in detail,25 dissociation of deprotonated PUs proceeds via competitive cleavages of the O−(CO) bond in each carbamate moiety (Figure 1a), hence producing a single series of fragments that are spaced by the mass of one or the other comonomer. As exemplified in Figure 2a, the 0011

message encoded in a type II PU can readily be reconstructed, from the left- to the right-hand side, by successive additions of 115.1 Da (for 0) or 129.1 Da (for 1) when starting from the first member of the series, a1−, always detected at m/z 131.1. Using center-of-mass collision energy in the 0.9−1.4 eV range B

DOI: 10.1021/acs.macromol.7b01737 Macromolecules XXXX, XXX, XXX−XXX

Note

Macromolecules allows all ai− fragments to be formed (and hence the whole sequence to be deciphered) while preventing secondary dissociations that would increase MS/MS data complexity. Sequencing of type I PUs makes use of the same rules (Figure S1, Supporting Information), also starting from a1− at m/z 131.1 (since both PU types have the same carboxypentyl α endgroup) but using mass increments of 101.1 Da (for 0) or 129.1 Da (for 1). The structure of poly(alkoxyamine phosphodiester)s (PAPs, Figure 1b) was specifically designed to achieve the simplest dissociation pattern upon collisional activation. First, their repeating units contain an alkoxyamine linkage which cleaves at much lower dissociation energy compared to any other chemical bonds of the polymeric skeleton. Mild activation (0.6−0.9 eV, center-of-mass frame) of deprotonated PAPs hence results in homolysis of each C−ON bond, yielding a set of two product ions named c when containing the α end-group or y when still holding the ω termination, as derived from the nomenclature established for synthetic polymer fragments.26 Second, the sizes of both the alkyl coding moiety and the nitroxide spacer were tailored to allow sufficient distance between phosphate groups for all of them to be deprotonated simultaneously. Accordingly, the selected precursor ion exhibits a charge state that matches its number of repeating units, while the charge state of fragments increases with their polymerization degree. Because the low resolving power of the quadrupole mass analyzer used for precursor ion selection does not allow sampling of a single isotopic form for species with z > 2, all product ions exhibit a partial isotopic pattern which is usefully employed to characterize their charge state. Third, due to relative mass of structural segments at each side of the alkoxyamine bond,22 fragment series are detected in distinct m/z ranges, with ci•i− at the left-hand side of the precursor ion and yi•(i−1)− on the other side. Such a peculiar MS/MS pattern is illustrated in Figure 2b with the type II PAP oligomer coding for 0011. Reconstruction of the sequence can be readily achieved using the ci•i− fragment series only. Owing to its structure, the first member of this series (c1•1−) is expected at m/z 295.1 when the first unit contains a butyl coding segment (defining a 0 unit) or at m/z 309.1 when the first unit contains a 2-methylbutyl coding segment (defining a 1 unit). Because the next c2•2− fragment contains an additional repeating unit but also an additional charge compared to c1•1−, one has to refer to the following mathematical relationship that relates m/z values of two successive members in this series: m /z(ci•+(1i + 1) −) =

m /z(ci•i −) × i + mmonomer i+1

For example, the Δm/z = 104.6 difference measured between c2•2− and c1•1− in Figure 2b has to be multiplied by 2 (that is the rank of the investigated repeating unit), and the so-obtained value is added to 295.1, the m/z value of c1•1− (which, by itself, indicates that the first monomer is 0): this equals 504.2, the mass of the repeating unit coding for 0. By use of the same approach, the third unit was found to be 1 since, when considering the next peak assigned to c3•3− and measured at m/ z 439.2, calculation of (439.2 − 399.7) × 3 + 399.7 yields 518.2. The −4 charge state of the dissociating precursor at m/z 530.3 implies a last unit is to be found, and (459.0 − 439.2) × 4 + 439.2 = 518.2 reveals it is a 1 repeating unit. As demonstrated with this example, values obtained according to eq 3 can only be 504.2 (for 0) or 518.2 (for 1) in the case of type II PAP. It means that computing m/z values of any other peaks that would be erroneously considered as ci•i− fragments (such as those secondary fragments annotated in gray in Figure 2b) would not lead to any of these two predetermined values. The same sequencing methodology also applies for type I PAP (Figure S2), starting from c1•1− at either m/z 281.1 or m/z 309.1 depending on the first unit being 0 or 1, respectively, and using increments of 490.2 (for 0) or 518.2 (for 1). Unlike the two previous types of digitally encoded polymers, poly(alkoxyamine amide)s (PAAs, Figure 1c) were best sequenced in the positive ion mode, using protons as the charging species.16 Similarly to PAPs, their mild activation (0.5−1.1 eV, center-of-mass frame) induces homolytic cleavage of the alkoxyamine linkage connected to the coding segment in each repeating unit, leading to complementary c/y fragments. However, unlike PAPs where negative charges were fixed on oxygen atoms within the polymer structure, PAAs exist as different protomers varying in terms of location of proton(s) along the polymer backbone. This feature has multiple implications that are illustrated with the case of the 110110 oligomer in Figure 2c. First, since adducted protons are preferentially located on nitroxide nitrogen atoms, the first member of each ion series is never detected. On the one hand, the structural segment at the left-hand side of the first C−ON bond is produced as a radical and not as c1•+. On the other hand, protonation of the last nitroxide nitrogen was shown to enhance the C−ON bond dissociation energy27 and hence prevents its cleavage (so, y1•+ cannot form). As a result, the two ion series have to be used to reach complete sequence coverage. Second, due to the quite small size of their repeating units (Figure 1c), the number of charges adopted by PAAs is much lower than their polymerization degree. Consequently, when generated from multiply charged precursors, the charge state of fragments does not vary as a linear function of their polymerization degree. However, for oligomers investigated here with a charge state up to 2, it was observed that fragments with less than four repeating units were not able to accommodate two protons. Nevertheless, in order to unambiguously characterize the charge state of each fragment based on partial isotopic pattern, the quadrupole resolving power was slightly lowered so that it can sample both the monoisotope and its 13C counterpart of doubly charged precursors. Despite these few constraints, binary messages encoded in PAAs can be retrieved from their MS/MS data according to the following reading rules. Before starting the actual sequencing, the number of NH−(CO)−C(CH3)R coding units has first to be determined from the mass of the dissociating precursor, M(PAA). Owing to their structure (Figure 1c), the number of coding units in PAAs is actually

(1)

where mmonomer is the mass of the deprotonated monomer (504.2 Da for unit 0, 518.2 Da for unit 1) at the (i + 1)th position. Therefore, the Δm/z difference between these two fragments is Δm/z = m/z(ci•+(i1+ 1) −) − m/z(ci•i −) =

mmonomer m/z(ci•i −) − i+1 i+1 (2)

allowing the monomer at the (i + 1)th position to be identified from the values measured for Δm/z in MS/MS, after rearrangement of eq 2 according to mmonomer = Δm /z(i + 1) + m /z(ci•i −)

(3) C

DOI: 10.1021/acs.macromol.7b01737 Macromolecules XXXX, XXX, XXX−XXX

Note

Macromolecules

repeating units. Accordingly, y2•+ was found at m/z 546.3 (= 305.1 + m1‑T + 1), y3•+ at m/z 786.5 (= y2•+ + m1‑T), y4•+ at m/z 1012.6 (= y3•+ + m0‑T), and y5•2+ at m/z 626.9 (= (y4•+ + m1‑T + 1)/2). The so-obtained T-1-T-0-T-1-T-1-T-0-ω partial sequence lacks the first unit connected to α but can be identified as unit 1 based on the mass of the radical released from the precursor ion to generate y5•2+, which is expected to be either 130.1 Da (α-0•) or 144.1 Da (α-1•). In the example of Figure 2c, the complete yi•z+ series was mainly used to validate of the 0/1 sequence determined with ci•z+ fragments, but any cases where sequencing might be stopped due to the lack of one series member would require the use of the second series to achieve full sequence coverage. MS-DECODER Algorithm Description. These fragmentation rules were translated as follows in MS-DECODER. The main concept of the software consists in iteratively searching for two potential ions corresponding to the 0 or the 1 unit increments from a given starting mass and to determine which one constitutes the best match in the case of ambiguities. The algorithm thus needs three input values for each polymer type: (i) the mass increment corresponding to the unit “0”, (ii) the mass increment corresponding to the unit “1”, and (iii) the charge state of the considered starting mass. In addition, prior to running the algorithm, a series of parameters need to be tuned by the user. These parameters include (i) a mass tolerance (in Da) that defines a m/z tolerance window to include matching ions, (ii) an absolute intensity threshold A that defines the lowest measured absolute intensity value for a peak to be considered as a real nonbackground peak, and (iii) a second absolute intensity threshold B on the matching peak above which a peak corresponding to the first isotope is searched (Figure S3). If the intensity of the matching peak is above B, the first isotope is searched for at +1 m/z for a singly charged ion (+0.5 m/z for a doubly charged ion, and so on). If both the matching peak and first isotope are above A, the matching peak is validated. For matching peaks below A, the peak is validated without searching for the first isotope. These parameters need to be adjusted by the user according to the intrinsic specifications of the instrument used to acquire MS/ MS spectra (mass resolution, absolute signal, and signal-tonoise reference values). Once set, the algorithm can be run and screen the peaklist (i.e., the list of all peaks measured in the MS/MS spectrum) to find potential matching ions corresponding to the two mass increments. If only one match is found, the 0 or 1 signal is easily assigned, and the matched ion thus becomes the new starting mass to be considered. In case of ambiguities when two matches are detected, the most intense (absolute intensity value) peak is selected to define the 0 or 1 unit and constitutes the starting mass for the next iteration. For PAPs, the charge state of the fragment increases at each iteration, and the mass increments are recalculated using eq 2 at each iteration. In case no match is found, the sequencing is stopped for PUs and PAPs. For PAAs, in the case of a multiply charged precursor, increasing charge state fragments can occur for fragments including more than four repeating units. In this case, when no match is found with the singly charged mass increments, doubly charged fragments are searched in a second iteration. The initial starting mass of an ion series is obvious for PUs and PAPs as described in their fragmentation rules. For PAAs, an initial calculation of the precursor polymerization degree is performed according to eq 5. This number of units is useful for concatenating both ion series at the end of the process. For this type of polymer, the starting mass is inferred

equal to the number of repeating units (composed of a coding unit and a TEMPO nitroxide) plus one. The mass of PAA can then be expressed as M(PAA) = mα + mω + am0 + bm1 + (a + b − 1)mT (4)

where mα + mω = 138 Da is the sum of the end-group masses, a the number of 0 coding units of mass m0 = 71 Da, b the number of 1 coding units of mass m1 = 85 Da, and mT = 155 Da the mass of TEMPO. The total number of coding units is n = a + b, but since both a and b are unknown, let us assume the PAA precursor is composed of 0 only to calculate DPmax (= n − 1), the maximal number of repeating units as the integer part of DPmax =

M(PAA) − (mα + mω) − m0 m0 + mT

(5)

Applying eq 5 to the 1395.8 Da PAA detected at m/z 698.9 as a doubly charged precursor (Figure 2c) leads to DPmax = 5, indicating n = 6. Then, the ci•z+ series is first considered in order to reconstruct the 0/1 sequence from α to ω. Although not part of an ionic species, the mass of the first coding segment can be deduced from the mass of the radical released upon formation of the largest yi•z+: this mass is either 130.1 or 144.1 Da in the case of α-0 or α-1, respectively. In Figure 2c, subtracting half of these values from the m/z 698.9 of the doubly charged precursor respectively yields m/z 633.9 and m/ z 626.9: while no signal is observed at the former value, a peak is detected at the latter one. This peak can hence be assigned to y5•2+, and the first coding unit linked to α is identified as 1. From here, the sequence can now be reconstructed from the left- to the right-hand side using ci•z+ fragments by performing successive mass increments of 226.2 Da for 0-T or 240.2 Da for 1-T repeating units. Taking into account the mass of the proton, the c2•+ is hence to be found either at m/z 371.3 (= 144.1 + 226.2 + 1) or m/z 385.3 (= 144.1 + 240.2 + 1): data from Figure 2c are clearly consistent with a sequence starting as α-1-T-1. Using the same approach, c3•+ was found at m/z 611.4 (= c2•+ + mT‑0), c4•+ at m/z 851.6 (= c3•+ + mT‑1), and c5•2+ at m/z 546.4 (= (c4•+ + mT‑1 + 1)/2). Of note, the example of Figure 2c was chosen to illustrate the need for high-resolution MS/MS data to ensure sequencing accuracy: here, c5•2+ at m/z 546.4 (in green) and y2•+ at m/z 546.3 (in blue) could indeed be distinguished and safely assigned based on observed isotope patterns validating their respective charge state. The α-1-T-1-T0-T-1-T-1 partial sequence obtained with ci•z+ has to be completed with a sixth coding unit. This last unit was found to be 0, as deduced from the mass of the radical released from the precursor ion to generate c5•2+, which is expected to be either 305.1 Da (mT + m0 + mω) or 319.1 Da (mT + m1 + mω). Using the same methodology, this sequence can also be reconstructed from ω to α using yi•z+. As previously mentioned, y1•+ is not formed so the mass of the first coding segment has to be deduced from the mass of the radical (•T-0-ω: 305.1 Da, or •T1-ω: 319.1 Da) released upon formation of the largest ci•z+. Subtracting half of these values from the doubly charged precursor m/z 698.9 in Figure 2c respectively yields m/z 546.4 and m/z 539.4: because no signal was detected at m/z 539.4, the peak at m/z 546.4 can unambiguously be assigned to c5•2+ and the last coding unit linked to ω identified as 0. From here, the sequence can now be reconstructed from the right- to the left-hand side using yi•z+ fragments by performing successive mass increments of 226.2 Da for T-0 or 240.2 Da for T-1 D

DOI: 10.1021/acs.macromol.7b01737 Macromolecules XXXX, XXX, XXX−XXX

Note

Macromolecules

Table 1. Computing Times for Accurate Sequence Decoding of MS/MS Data from [PU−H]−, [PAP−zH]z− (z = DP), and Singly or Doubly* Protonated PAAa no. 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

code PU (type I) 00 000 0000 0001 0010 00000 00010110 11100000 PU (type II) 00 01 000 001 100 0000 0001 0010 0100 1000 0011 1001 00000 00001 00011 10001 01101 01110 000001 000101 000111 101010 110111 1000010 0011110 00110000 01000010 00111100 00010101 00111110 1101001011011000

time (ms)

no.

2.07 1.51 7.16 0.93 3.68 2.21 5.44 9.53

1 2 3 4 5

1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

0.82 1.88 1.85 1.63 0.92 1.05 3.01 5.35 1.72 3.14 9.64 4.43 2.75 1.70 4.71 5.87 3.19 9.65 4.09 3.34 10.98 7.97 7.49 17.10 2.85 6.29 6.44 3.28 4.30 11.35 2.78

code

time (ms)

no.

PAP (type I) 00 000 0000 1001 11000101

10.98 20.55 3.74 2.50 25.76

1 2 3 4 5 6

PAA 000 001 010 100 011 101 110 111 0000 1110 1111 10101 11010 11101* 11111 11111* 011100* 100100 100100* 110101* 110110* 111110 111110* 111111 111111* 0001010* 1010010* 1111111 1111111* 01010011* 11111111* 1000001010* 1111111111*

4.35 11.20 8.02 14.63 14.38 7.93 12.53 4.08 7.93 5.14 3.22 5.15 7.27 3.09 7.09 8.00 11.74 2.51 8.85 5.12 6.00 2.47 4.73 2.46 8.58 11.73 13.03 7.38 22.73 11.00 9.16 78.53 4.37

code PAP (type II) 00 11 0000 0011 1100 01001110

time (ms) 5.22 7.34 42.06 22.83 5.16 13.68

a

Times provided are average times obtained from 10 000 runs of the complete data set (namely, the 84 MS/MS spectra) performed in randomized order of individual files.

from the presence of the radical released from the precursor m/ z upon formation of the largest yi•z+. It should also be noted that in the case of PAA, accuracy of decoding results highly depends on experimental quality in terms of signal-to-noise ratio (S/N). Indeed, unlike PU and PAP, PAA can generate fragments over a wide intensity range depending on the coded sequence of the dissociating precursor. This is due to the fact that the coded segment is directly connected to the C−ON linkage and hence highly influences the homolysis rate of this bond. Indeed, cleavages occurring at a 0 unit are slow reactions because they generate binary carbon-centered radicals that are less stable than tertiary radicals obtained when breaking a C− ON bond linked to a 1 unit.16 As a result, fragments formed upon cleavages at 0 units are of low abundance (as illustrated with c3•+ and y3•+ in Figure 1c), and this effect is further

amplified in the case of long PAAs containing a small number of 0 units. To prevent decoding failure, MS/MS data must exhibit signal above the A intensity threshold for all useful fragments. The number of MS/MS spectra to be accumulated to fulfill this requirement can however be easily determined from the abundance of protonated PAAs in the MS mode. With the instrument used in this study, it was found that accumulation of 10 MS/MS spectra (that is, a 10 s experiment) was sufficient for ions measured with an abundance higher than about 103 arbitrary units (au) in MS. Below this intensity threshold, the number of MS/MS spectra to be summed to ensure decoding accuracy highly depends on PAA sequence, but typical accumulation of 100 spectra (that is, a 1 min 40 s experiment) was found to be sufficient in all cases of low abundance precursors studied here, including the 0-rich E

DOI: 10.1021/acs.macromol.7b01737 Macromolecules XXXX, XXX, XXX−XXX

Note

Macromolecules oligomer 1000001010 (PAA #33, Table 1). However, when C− ON homolysis at a 0 unit is so slow that expected fragments are not formed within the time scale of the MS/MS experiment, sequence coverage may be partial. This can be the case for 1rich PAA oligomers, as illustrated with the example of 11110111 (Figure S4). In this oligomer, fragments expected to be generated upon cleavage at the sole 0 unit, namely c5•+ and y3•+, were not detected in the MS/MS spectrum. As a result, reconstruction of the sequence using ci•+ ions yielded α1111 while the piece of sequence that could be recovered when using yi•+ ions was 11-ω. Because DP = 8 was known from eq 5 for this fragmenting PAA, the output of MS-DECODER indicated location of the two undetermined units as 1111??11. In contrast to PAA, the rate of fragmentation reactions in PU or PAP is not sequence-dependent. As a result, accurate decoding of both PU and PAP was always achieved from MS/MS data generated from 10 summed spectra, even when peak intensity as low as 50 au per scan (that is near threshold) was measured for precursors in MS (Figure S2). Computational Performances and Improvement Perspectives. The MS-DECODER algorithm run on a basic laptop computer (Windows 10, Intel Core i3, 4GB RAM, SATA hard drive) allowed instantaneous sequence decoding of all spectra included in the present study. Calculation times (average times of 10 000 runs of the complete data set in randomized order) added in Table 1 were measured using the time method embedded in Java (System.currentTimeMillis) and revealed calculation times in the milliseconds range. With file sizes (average 100 kB) interpreted in this study, the calculation times are already compatible with an instantaneous decoding. The sequencing algorithm has been written to be computationally efficient, requiring a single read of the file. It can therefore be anticipated that computing times will increase linearly with the file size, but it will hardly become limiting with regard to the MS/MS spectrum acquisition time. In fact, main issues that could be anticipated when sequencing digital polymers of increasing length are related to fragmentation efficiency in MS/MS. On the one hand, the whole ion current will be “diluted” over an increasing number of fragments, hence requiring long acquisition times to accumulate a larger number of spectra. On the other hand, there might be an increasing number of cases where only partial sequence coverage can be achieved for PAAs due to their sequence-dependent fragmentation, as previously discussed. However, improvements to further reduce computational times could be implemented in future versions of MS-DECODER, especially when running the graphical user interface (Figure S3). For example, an “on-the-fly” mode could be added in order to store in memory the peaklists at selection rather than uploading them in memory at run start. Indeed, in the current version, this step is the most demanding of time requiring hard drive access and memory allocation. Thousands of peaklist files could be loaded in memory on a basic 4 GB RAM laptop without saturation. The current version of MS-DECODER is single threaded, and the alternative implementation of a multithreading mode could be advantageous. Finally, optimizations of the source code could be considered to reduce the size of required memory, including (i) to check numeric values: All floating point values are currently stored with Java Double variables (each value takes 8 bytes of memory). It is likely that some of these values do not need such precision and could be turned into Java Float (4 bytes); (ii) to prefer native types: Whenever possible, it could be beneficial to replace class types by primitive types. The

class types wrap the primitive types and offer more possibilities (such as allowing null values or automatic comparisons), but these possibilities are not necessarily useful for all MSDECODER data; and (iii) to rewrite the code in a lower level programming language such as C or C++ that does not depend on a virtual machine like Java and in which memory is directly managed in the source code. Altogether, such optimizations should be considered if computing times ultimately need to be further reduced. In summary, we introduced in this technical note a new open-source tool MS-DECODER that permits fast and unequivocal decryption of sequence-coded polymers. This software was tested on three different families of digital polymers, two of them including two subfamilies. Overall, 84 sequence-coded polymers were tested in the present work, and in all cases, MS-DECODER allowed rapid and error-free decoding. In most cases, the sequences could be deciphered within milliseconds. This new tool for automated MS/MS sequencing is therefore relevant for the broad polymer community and, in particular, very useful for the analysis of non-natural information-containing polymers. It shall be clearly pointed out that MS-DECODER is not restricted to the analysis of binary sequences, which have been studied in this note. The software may be evolved for decoding other families of sequence-defined polymers and other monomer-based alphabets.1



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.macromol.7b01737. MS-Decoder source code is freely available at https://github.com/LSMBO/MS-Decoder. Experimental details; Figures S1−S4 (PDF)



AUTHOR INFORMATION

Corresponding Authors

*E-mail [email protected] (C.C.). *E-mail jfl[email protected] (J.-F.L.). *E-mail [email protected] (L.C.). ORCID

Christine Carapito: 0000-0002-0079-319X Jean-François Lutz: 0000-0002-3893-2458 Funding

J.FL and L.C. thank the French National Research Agency (ANR project 00111001, grants ANR-16-CE29-0004-01 and ANR-16-CE29-0004-02) for financial support. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors thank Abdelaziz Al Ouahabi, Gianni Cavallo, Ufuk Gunay, Denise Karamessini, Chloé Laure, Benoı̂t Petit, and Raj Kumar Roy for the synthesis of the sequence-coded macromolecules as well as Jean-Arthur Amalian and Salomé Poyer for MS/MS measurements.



REFERENCES

(1) Lutz, J.-F. Coding Macromolecules: Inputting Information in Polymers Using Monomer-Based Alphabets. Macromolecules 2015, 48, 4759−4767.

F

DOI: 10.1021/acs.macromol.7b01737 Macromolecules XXXX, XXX, XXX−XXX

Note

Macromolecules (2) Zhirnov, V.; Zadegan, R. M.; Sandhu, G. S.; Church, G. M.; Hughes, W. L. Nucleic acid memory. Nat. Mater. 2016, 15, 366−370. (3) Grass, R. N.; Heckel, R.; Puddu, M.; Paunescu, D.; Stark, W. J. Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes. Angew. Chem., Int. Ed. 2015, 54, 2552− 2555. (4) Karamessini, D.; Petit, B. E.; Bouquey, M.; Charles, L.; Lutz, J.-F. Identification-Tagging of Methacrylate-Based Intraocular Implants Using Sequence Defined Polyurethane Barcodes. Adv. Funct. Mater. 2017, 27, 1604595. (5) Colquhoun, H. M.; Lutz, J.-F. Information-containing macromolecules. Nat. Chem. 2014, 6, 455−456. (6) Lutz, J.-F.; Ouchi, M.; Liu, D. R.; Sawamoto, M. SequenceControlled Polymers. Science 2013, 341, 1238149. (7) Trinh, T. T.; Oswald, L.; Chan-Seng, D.; Lutz, J.-F. Synthesis of Molecularly Encoded Oligomers Using a Chemoselective “AB + CD” Iterative Approach. Macromol. Rapid Commun. 2014, 35, 141−145. (8) Al Ouahabi, A.; Charles, L.; Lutz, J.-F. Synthesis of Non-Natural Sequence-Encoded Polymers Using Phosphoramidite Chemistry. J. Am. Chem. Soc. 2015, 137, 5629−5635. (9) Roy, R. K.; Meszynska, A.; Laure, C.; Charles, L.; Verchin, C.; Lutz, J.-F. Design and synthesis of digitally encoded polymers that can be decoded and erased. Nat. Commun. 2015, 6, 7237. (10) Cavallo, G.; Al Ouahabi, A.; Oswald, L.; Charles, L.; Lutz, J.-F. Orthogonal Synthesis of “Easy-to-Read” Information-Containing Polymers Using Phosphoramidite and Radical Coupling Steps. J. Am. Chem. Soc. 2016, 138, 9417−9420. (11) Gunay, U. S.; Petit, B. E.; Karamessini, D.; Al Ouahabi, A.; Amalian, J.-A.; Chendo, C.; Bouquey, M.; Gigmes, D.; Charles, L.; Lutz, J.-F. Chemoselective Synthesis of Uniform Sequence-Coded Polyurethanes and Their Use as Molecular Tags. Chem. 2016, 1, 114− 126. (12) Mutlu, H.; Lutz, J.-F. Reading Polymers: Sequencing of Natural and Synthetic Macromolecules. Angew. Chem., Int. Ed. 2014, 53, 13010−13019. (13) Steen, H.; Mann, M. The abc’s (and xyz’s) of peptide sequencing. Nat. Rev. Mol. Cell Biol. 2004, 5, 699−711. (14) Shendure, J.; Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 2008, 26, 1135−1145. (15) Wesdemiotis, C. Multidimensional Mass Spectrometry of Synthetic Polymers and Advanced Materials. Angew. Chem., Int. Ed. 2017, 56, 1452−1464. (16) Charles, L.; Laure, C.; Lutz, J.-F.; Roy, R. K. MS/MS Sequencing of Digitally Encoded Poly(alkoxyamine amide)s. Macromolecules 2015, 48, 4319−4328. (17) Amalian, J.-A.; Trinh, T. T.; Lutz, J.-F.; Charles, L. MS/MS Digital Readout: Analysis of Binary Information Encoded in the Monomer Sequences of Poly(triazole amide)s. Anal. Chem. 2016, 88, 3715−3722. (18) Charles, L.; Laure, C.; Lutz, J.-F.; Roy, R. K. Tandem mass spectrometry sequencing in the negative ion mode to read binary information encoded in sequence-defined poly(alkoxyamine amide)s. Rapid Commun. Mass Spectrom. 2016, 30, 22−28. (19) Thakkar, A.; Cohen, A. S.; Connolly, M. D.; Zuckermann, R. N.; Pei, D. High-Throughput Sequencing of Peptoids and Peptide− Peptoid Hybrids by Partial Edman Degradation and Mass Spectrometry. J. Comb. Chem. 2009, 11, 294−302. (20) Porel, M.; Alabi, C. A. Sequence-Defined Polymers via Orthogonal Allyl Acrylamide Building Blocks. J. Am. Chem. Soc. 2014, 136, 13162−13165. (21) Zydziak, N.; Konrad, W.; Feist, F.; Afonin, S.; Weidner, S.; Barner-Kowollik, C. Coding and decoding libraries of sequencedefined functional copolymers synthesized via photoligation. Nat. Commun. 2016, 7, 13672. (22) Charles, L.; Cavallo, G.; Monnier, V.; Oswald, L.; Szweda, R.; Lutz, J.-F. MS/MS-Assisted Design of Sequence-Controlled Synthetic Polymers for Improved Reading of Encoded Information. J. Am. Soc. Mass Spectrom. 2017, 28, 1149−1159.

(23) Jeong, K.; Kim, S.; Pevzner, P. A. UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics 2013, 29, 1953−1962. (24) Ma, B.; Zhang, K.; Hendrie, C.; Liang, C.; Li, M.; DohertyKirby, A.; Lajoie, G. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 2003, 17, 2337−2342. (25) Amalian, J.-A.; Poyer, S.; Petit, B. E.; Telitel, S.; Monnier, V.; Karamessini, D.; Gigmes, D.; Lutz, J.-F.; Charles, L. Negative mode MS/MS to read digital information encoded in sequence-defined oligo(urethane)s: A mechanistic study. Int. J. Mass Spectrom. 2017, DOI: 10.1016/j.ijms.2017.07.006. (26) Wesdemiotis, C.; Solak, N.; Polce, M. J.; Dabney, D. E.; Chaicharoen, K.; Katzenmeyer, B. C. Fragmentation pathways of polymer ions. Mass Spectrom. Rev. 2011, 30, 523−559. (27) Mazarin, M.; Girod, M.; Viel, S.; Phan, T. N. T.; Marque, S. R. A.; Humbel, S.; Charles, L. Role of the Adducted Cation in the Release of Nitroxide End Group of Controlled Polymer in Mass Spectrometry. Macromolecules 2009, 42, 1849−1859.

G

DOI: 10.1021/acs.macromol.7b01737 Macromolecules XXXX, XXX, XXX−XXX