Charting an Unexplored Streptococcal Biosynthetic Landscape

Charting an Unexplored Streptococcal Biosynthetic Landscape Reveals a Unique Peptide ... lites with diverse structures and specific bioactivities.14â€...
0 downloads 0 Views 3MB Size
Subscriber access provided by Kaohsiung Medical University

Article

Charting an Unexplored Streptococcal Biosynthetic Landscape Reveals a Unique Peptide Cyclization Motif Leah B. Bushin, Kenzie A. Clark, Istvan Pelczer, and Mohammad R. Seyedsayamdost J. Am. Chem. Soc., Just Accepted Manuscript • DOI: 10.1021/jacs.8b10266 • Publication Date (Web): 06 Nov 2018 Downloaded from http://pubs.acs.org on November 6, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Charting an Unexplored Streptococcal Biosynthetic Landscape Reveals a Unique Peptide Cyclization Motif Leah B. Bushin,† Kenzie A. Clark,† István Pelczer,† and Mohammad R. Seyedsayamdost†,‡,* †Department of Chemistry, Princeton University, Princeton, NJ 08544, United States

Department of Molecular Biology, Princeton University, Princeton, NJ 08544, United States KEYWORDS Radical SAM Enzymes, Natural Products, Streptococci, RiPPs ‡

ABSTRACT: Peptide natural products are often used as signals or antibiotics and contain unusual structural modifications, thus providing opportunities for expanding our understanding of Nature’s therapeutic and biosynthetic repertoires. Herein, we have investigated the under‐explored biosynthetic potential of Streptococci, prevalent bacteria in mammalian microbi‐ omes that include mutualistic, commensal, and pathogenic members. Using a new bioinformatic search strategy, in which we linked the versatile radical S‐adenosylmethionine (RaS) enzyme superfamily to an emerging class of natural products in the context of quorum sensing control, we identified numerous, uncharted biosynthetic loci. Focusing on one such locus, we identified an unprecedented post‐translational modification, consisting of a tetrahydro[5,6]benzindole cyclization motif in which four unactivated positions are linked by two C‐C bonds in a regio‐ and stereo‐specific manner by a single RaS enzyme. Our results expand the scope of reactions that microbes have at their disposal in concocting complex ribosomal peptides.

INTRODUCTION Microbial natural products are often endowed with in‐ tricate structures and potent biological activities. Aside from contributing an important source of therapeutic leads, the new structural features or chemotypes observed in natural products provide a context in which to examine Nature’s biosynthetic capacities.1,2 Conventional search strategies for new chemotypes have typically focused on “gifted” genera, that is, bacteria that are known to be pro‐ lific producers of natural products.3 Because the associated biosynthetic gene clusters (BGCs) are usually large, gifted producers, such as the actinomycetes or myxobacteria, ordinarily harbor large genomes.3‐5 By pre‐selecting these organisms, traditional discovery programs have neglected bacteria with small chromosomes. After a century of min‐ ing talented bacteria for novel chemotypes, frequent redis‐ covery of known scaffolds has become a major bottle‐ neck.6‐8 Combined with the explosion of microbial genome sequences, this constraint has, in turn, prompted a switch from a molecule‐first to a genome‐first approach, in which bacterial genomes are interrogated bioinformatically to identify those with the highest likelihood of delivering new metabolites.9‐13 The bioinformatic‐guided selection of BGCs thus stakes the major criterion for finding new secondary metabolites and their associated chemotypes. A common practice in the genome‐first approach is to target known biosynthetic enzymes, which often results in the isolation of molecules with familiar structural features. While useful, these studies do not expand the scope of chemical modifications, but rather provide further exam‐ ples of known transformations. A second common short‐ coming is that routine bioinformatic searches do not take genomic context into account, such as regulation or func‐

tion of the resulting natural product. Thus, the functions of the metabolites discovered and the conditions under which they may be produced, if at all, must be deduced from further experiments. A group of natural products that has seen effective ap‐ plications of bioinformatics‐guided search strategies are ribosomally synthesized and post‐translationally modified peptides (RiPPs), an emerging family of secondary metabo‐ lites with diverse structures and specific bioactivities.14‐20 RiPP biogenesis begins with the expression of a genetical‐ ly‐encoded precursor peptide, which is then modified by a series of tailoring enzymes, culminating in proteolytic trimming and secretion of the mature product. We recently reported one such example by elucidating the structure and biosynthesis of streptide, a quorum sensing‐regulated RiPP natural product synthesized by Streptococcus ther‐ mophilus (Fig. 1).21 Streptide carries a unique cyclization motif in the form of an intramolecular Lys‐Trp crosslink, which is installed by a new subfamily of radical S‐ adenosylmethionine (RaS) enzymes. United by the for‐ mation of a 5ꞌ‐deoxyadenosyl radical (5ꞌ‐dA•) via homolyt‐ ic cleavage of the cofactor S‐adenosylmethionine (SAM), members of this superfamily catalyze some of the most diverse and unusual transformations in Nature,22‐24 thus making them ideal targets for bioinformatics‐guided in‐ quiries. Herein, we devise and apply a genetic context‐ dependent bioinformatic search in Streptococcus spp., an abundant bacterial genus in human and mammalian mi‐ crobiomes with characteristically small chromosomes. Streptococci make up a large class of commensal and op‐ portunistic pathogens,25 but despite their importance in human health, their secondary metabolomes have been

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

vastly understudied. We searched the genomes of all Strep‐ tococci for the presence of RaS enzyme‐modified RiPP gene clusters that are under the control of a well‐characterized quorum sensing (QS) system, reasoning that these RiPPs harbor three key attributes: (i) They likely contain novel chemical modifications owing to the versatile chemistry of RaS enzymes; (ii) they are important physiologically as they are produced at high cell densities, a condition often associated with virulence; and (iii) they may play roles in host‐bacterial or bacterial‐bacterial interactions in mam‐ malian microbiomes. We find that such genetic loci domi‐ nate streptococcal genomes and describe ~600 unique QS‐ regulated, RaS‐RiPP gene clusters. Focusing on one sub‐ class, we report an unprecedented post‐translational mod‐ ification (PTM), in which four unactivated positions in the side‐chains of Trp and Lys are linked by two C‐C bonds to form a substituted tetrahydro[5,6]benzindole moiety, a reaction that is carried out by a single, new RaS enzyme. Our studies open the door for exploring the vast network of RaS‐RiPPs in streptococcal genomes with possibly novel PTMs and yet‐to‐be explored biological functions. shp rgg strA strB strC 9mer MSKELEKVLESSSMAKGDGWKVMAKGDGWE StrB StrC Streptide Figure 1. Streptide and the str BGC in S. thermophilus. The str cluster encodes genes for a precursor peptide (strA), a RaS enzyme (strB), and a peptidase/transporter (strC). StrA is a 30mer precursor peptide; the sequence of the final product within StrA is shown in bold and the residues to be cross‐ linked by StrB are shown in red.

RESULTS AND DISCUSSION Bioinformatic search for streptococcal RaS‐RiPPs. The organization of the streptide biosynthetic locus is shown (str, Fig. 1). The str genes are preceded by a Strep‐ tococcus‐specific shp/rgg QS operon, in which SHP pro‐ vides the short hydrophobic peptide signal and Rgg the cognate transcriptional regulator.26‐28 SHP is synthesized and released into the environment, where it accumulates as a function of cell density. At a threshold concentration, it is re‐imported into the cell, binds to the Rgg response regulator, and the SHP‐Rgg complex then modulates gene expression at select operons. The str cluster is regulated in this fashion, and the occurrence of multiple str‐homo‐ logous loci preceded by an shp/rgg system suggested to us that QS‐regulated and RaS‐modified RiPPs, which we refer to as RaS‐RiPPs, may be common in Streptococci.21,29 To examine the full scope of such instances, we conducted a bioinformatic inquiry in which we searched for loci akin to those of str in all streptococcal genomes. We started by selecting the 2875 available streptococcal genomes and

Page 2 of 12

searching for all RaS enzyme‐encoding genes, a total of 14,453, as well as all Rgg‐encoding genes, a total of 8,478. We then filtered for those instances in which the RaS en‐ zyme and Rgg co‐occur within a 1–3 gene distance, as is the case for the well‐studied str‐like RaS‐RiPP gene clus‐ ters. This dataset (667 total) was then examined for over‐ laps (i.e. gene clusters that carry two RaS enzymes associ‐ ated with one Rgg), the presence of a Rgg‐binding se‐ quence upstream of the RiPP gene cluster, and a ribosomal binding site, delivering 592 manually verified RaS‐RiPP gene clusters controlled by an shp/rgg QS locus (Table S1). These were then grouped by the precursor peptide se‐ quence using the EFI‐EST web tool developed by Gerlt and colleagues30‐31 and subsequently visualized in Cytoscape. The results of the QS‐regulated streptococcal RaS‐RiPP network are shown (Fig. 2a). We detected 16 distinct types of occurrences, ranging from subclasses with a single rep‐ resentative to those that are wide‐spread among different streptococcal species. The color‐coding in Fig. 2a shows that some subfamilies are unique to a single species. As natural products are synthesized to facilitate a microbe’s interaction with its environment,32‐35 these subfamilies may represent a ‘dialect’ that is limited to one species and therefore enables intraspecies communication. On the oth‐ er hand, most subfamilies of RaS‐RiPPs are produced by multiple Streptococci, suggesting they enable an interspe‐ cies dialogue. The network also shows that some Strepto‐ cocci, such as S. thermophilus, harbor as many as five QS‐ controlled RaS‐RiPPs. Thus, even one streptococcal species provides fertile grounds for discovering novel RiPPs as well as the biology and chemistry associated with each. These findings demonstrate a previously unrecognized and vast biosynthetic landscape in Streptococci, which are not known as talented secondary metabolite producers. Diverse RaS‐RiPP gene clusters in Streptococci. A representative gene cluster is shown for each of the sub‐ families detected (Fig. 2b). Gratifyingly, our search cap‐ tured all three str‐like BGCs (str, aga, and sui), which we have previously examined.29,36 These all grouped together in one of the larger subfamilies. It is the only one with a – KGDGW– motif in the precursor peptide, suggesting that all other subfamilies express RiPPs in which the modification is disparate from that found in streptide and possibly new. A logo plot comparing the precursor peptides in all members within each subfamily highlights conserved se‐ quences, which may constitute the sites of post‐trans‐ lational modifications (Fig. 2c). We have used the con‐ served precursor peptide motifs to preliminarily annotate each subfamily. The largest is the TQQ cluster, exclusively present in Streptococcus suis, a swine pathogen endemic in parts of the globe and capable of transferring to humans.37 This abundance is a result of the hundreds of S. suis ge‐ nome sequences that have recently been deposited.38 A second cluster unique to S. suis is KIS with 8 representa‐ tives. Aside from the TQQ, KIS, and streptide BGCs, the WGK, GGG, CGx, RRR, QMP, NxxC, NEF, and CGG represent other instances where only one RaS enzyme is observed. These single‐RaS enzyme‐encoding clusters vary in synteny and in the presence of a discrete RiPP Recognition Element (RRE) – a domain demonstrated by the Mitchell group to facilitate binding of precursor peptides to diverse RiPP biosynthetic enzymes35 – in the WGK and KIS loci.

ACS Paragon Plus Environment

Page 3 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

a c b TQQ WGK str GGG KGR HGH CGx SSH KIS RRR GRC QMP NxxC NEF VSA CGG shp/rgg Precursor RaS Enzyme RRE Transporter Hypothetical FeS protein Agmatinase ThiF-like Figure 2. RaS‐RiPP Network in Streptococci. (a) A sequence similarity network of RaS‐RiPP gene clusters based on the sequence of the precursor peptide. Color‐coding indicates the distribution of each subfamily among streptococcal strains. The subfamilies are named based on conserved motifs within each precursor peptide. (b) Representative biosynthetic gene cluster for each of the sub‐ families in panel a. The genes are color‐coded as indicated. The agmatinase and ThiF‐like genes encode a putative di‐Mn metal‐ loenzyme and a ThiF‐like adenylyltransferase, respectively. (c) Precursor peptide logo plots for RaS‐RiPP subfamilies. The con‐ served motifs in the possible core C‐terminal regions are underlined.

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Multiple tailoring enzymes appear in the KGR, HGH, SSH, GRC, and VSA clusters. The HGH, SSH, and GRC clus‐ ters each encode two different RaS enzymes. The HGH cluster also encodes a domain of unknown function as well as two protease/transporters. The GRC gene cluster is unique to the pathogen Streptococcus pneumoniae and en‐ codes two RaS enzymes along with a ThiF‐like protein. Lastly, the KGR cluster encodes a RaS enzyme, an Fe‐S‐ carrying metalloenzyme, as well as an agmatinase, a puta‐ tive di‐Mn metalloenzyme. Thus, Nature has extended the single‐RaS enzyme chasse as found in str to incorporate multiple metalloenzymes, suggesting that the resulting RiPPs may contain structurally complex PTMs. All subfamilies are controlled by an shp/rgg QS locus, with CGx, CGG, and VSA forming the only exceptions: in these cases, the rgg is not accompanied by a divergently‐ transcribed shp gene. Instead, in CGx and CGG, the precur‐ sor peptide is divergently transcribed from the rgg in a synteny that implicates the product of the RiPP cluster as an SHP‐like autoinducer, a prediction that will require ex‐ perimental assessment. We also organized the network by primary structure similarity in the RaS enzyme, rather than the precursor peptide (Fig. S1). This gave a largely similar network, except that the small subfamilies of QMP, CGx, NxxC, NEF, and CGG clustered together. We propose that RaS enzymes from these subfamilies may install simi‐ lar modifications. The conservation of one or more Cys residues in all the associated precursor peptides may sug‐ gest an important role for Cys in the ensuing modification. Characterization of the WGK gene cluster. We next set out to test the hypothesis that the enzymes in the vari‐ ous subfamilies catalyze new modifications, focusing on the WGK subclass. This BGC is found in a number of com‐ mensal and pathogenic Streptococci, notably S. equi and S. mutans, the causative agents of respiratory infections and dental caries, respectively. Here we focused on a strain that is more easily obtained, S. ferus, an oral commensal found in mammals, including rats and pigs. The WGK clus‐ ter encodes a precursor peptide (WgkA), a RaS enzyme (WgkB), a discrete RRE domain (WgkC), a transporter (WgkD), and a helicase‐like protein (WgkE) (Fig. 3a). We prepared WgkA by solid‐phase peptide synthesis (SPPS) and WgkC as a translational NusA‐His6 construct by re‐ combinant expression and purification from E. coli in good yields (Tables S2‐S4). Expression of His6‐tagged or un‐ tagged WgkC resulted in prohibitively low purification yields. WgkB was expressed and purified anaerobically and subsequently reconstituted chemically. The ultravio‐ let‐visible absorption spectrum of reconstituted WgkB revealed a transition at 320 nm and a broad feature at 410 nm, which are consistent with the presence of one or more [4Fe‐4S]2+ clusters (Fig. 3b). Iron and sulfide quantification from triplicate independent measurements gave 12.5 ± 0.4 Fe and 11.6 ± 0.2 labile S2‐ per protomer, which along with analysis of the SPASM domain of WgkB – a C‐terminal ex‐ tension that allows binding of additional Fe‐S clusters40‐43 – suggests that the enzyme binds three [4Fe‐4S] clusters (Fig. S2). An EPR spectrum of reduced, reconstituted WgkB exhibited an axial signal with gII and g⏊ of 2.06 and 1.93, respectively, symptomatic of the active site Fe‐S cluster of RaS enzymes (Fig. 3c).22‐24 In the absence of substrate, we could clearly detect time‐dependent formation of 5ʹ‐deoxy‐

a

shp

rgg

b c d

wgkA

Page 4 of 12 B

C

D

E

MSPKKEFNAPKTTKVNSWGKH

Figure 3. Characterization of WgkB from S. ferus. (a) The WGK biosynthetic gene cluster from S. ferus. WgkA, B, C, D, and E encode a precursor peptide, a RaS enzyme, a RRE, a trans‐ porter and a gene that shows weak homology to a helicase, respectively. The sequence of WgkA is shown, with the con‐ served –WGK– motif rendered in bold. (b) UV‐vis spectrum of reconstituted WgkB. The transitions at 325 and 410 nm are indicative of [4Fe‐4S]2+ clusters. (c) X‐band EPR spectrum of reconstituted, reduced WgkB. An axial signal with gII and g⏊ of 2.06 and 1.93, symptomatic of a reduced active site [4Fe‐4S]+ cluster that activates SAM, can be observed. (d) Analysis of the enzymatic activity of WgkB using HPLC‐Qtof‐MS. Reaction of WgkB with WgkA, WgkC, SAM and reductant (red trace) re‐ sults in a peak identified as the product (star). This peak is not observed when WgkB (blue trace), WgkC (green trace) or SAM (black trace) is omitted from the reaction. Peaks correspond‐ ing to SAM and WgkA are marked.

ACS Paragon Plus Environment

Page 5 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

as the possible site of modification (Fig. 4a, Table S5). The adenosine (5ʹ‐dA), a futile cleavage of SAM that is com‐ structures of the residues within this motif were next test‐ monly observed in members of this enzyme family (Fig. ed by site‐specifically incorporating isotopically‐labeled S3). We conclude that WgkB binds one active site [4Fe‐4S]+ variants into WgkA using SPPS and assessing the fate of cluster that reductively cleaves SAM and two [4Fe‐4S]2+ the isotopes via HPLC‐Qtof‐MS and NMR. Incorporation of clusters that are bound by the SPASM domain, a proposi‐ 2H2‐Gly within the –WGK– motif revealed formation of a tion that may be tested by further spectroscopic methods. A new post‐translational modification. With the RaS product with both deuterons intact, suggesting that the Gly enzyme characterized, we assessed the reaction carried α‐1Hs are not removed in the modification reaction. Incor‐ out by WgkB by incubating it with WgkA, WgkC, SAM, and poration of indole‐2H5‐Trp into the –WGK– sequence and reductant, and by analyzing the products via HPLC‐Qtof‐ reaction with WgkBC yielded a product that was now 6 Da MS. When all components were present, we observed time‐ lighter than the substrate, consistent with the loss of two dependent formation of a new peak that we assign as the deuterons from the Trp side chain (Table S4). We also ana‐ product of the reaction (Fig. 3d, Fig. S4). It was not formed lyzed reactions with uniformly 13C/15N‐labeled Trp within when WgkA, WgkB, WgkC, or SAM were omitted. The puta‐ the –WGK– sequence. NMR spectra of the substrate tive product peak was 4 Da lighter than the substrate, as showed typical HSQC correlations for all five indole‐1Hs. In determined by high‐resolution (HR)‐MS (Table S4). More‐ the product, however, HSQC signals for the C5 and C6 posi‐ over, its UV‐vis absorption spectrum exhibited a ~10 nm tions were not observed, suggesting these had been modi‐ red‐shift, possibly implicating the chromophore, the indole fied (Figs. S6‐7 & Table S6). Lastly, reaction of WgkBC with side‐chain of a Trp residue, as the site of the PTM (Fig. S5). WgkA containing uniformly 13C/15N‐labeled Lys within the To begin to elucidate the structure of the product, tan‐ –WGK– motif started to provide a potential structure of the dem HR‐MS was carried out. In the substrate, we could product. Using 1D 13C‐NMR analysis, we observed a sizable observe all possible b and y ions, whereas in the product, downshift of the Lys α‐carbon (from 54 ppm to 67 ppm), collision‐induced dissociation was not detected within the which now was a quaternary center lacking an α‐1H. The –WGK– motif at the C‐terminus, again pointing to the Trp chemical shifts of the Lys‐β‐, γ‐, and ε‐carbons did not Figure 4. Structural elucidation of the product of the WgkABC reaction. (a) HR‐MS/MS analysis of the product of the reaction re‐ veals fragmentation at all peptide bonds in the C‐terminal core region, except within the –WGK– motif, suggesting a modification in this sequence. (b–e) 2D NMR spectra of the purified, trypsin‐trimmed 7mer peptide product (H2N–VNS‐WGK‐H–CO2H) in which the ‐WGK‐ motif is modified. Note that the residue numbering for the full‐length 21mer peptide is used. (b) TOCSY NMR spectrum of the purified product, focusing on the spin system corresponding to the K20 side‐chain within the –WGK– sequence. (c) TOCSY NMR spectrum of the purified product highlighting correlations from W18‐H7 within the –WGK– motif. Cross‐peaks to several K20 side‐chain‐1Hs (labeled) are observed. 1Hs at the β‐, γ, δ, and ε‐positions are labeled. (d) ROESY NMR spectrum of the purified product highlighting correlations from W18‐H7 within the –WGK– motif. The relevant correlations are labeled. (e) HMBC spectrum 13Cs and a K20 of the purified product focusing on correlations from W18‐H7 within the –WGK– motif. Cross‐peaks to both indole‐ 13C can be observed (labeled). (f) Relevant NMR correlations (from panels b–e and Fig. S12) used to solve the structure sidechain‐ of the product of the WgkABC reaction. The numbering scheme for the tetrahydro[5,6]benzindole modification is shown. Absolute configurations of the newly‐generated chiral centers remain to be determined.

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

change significantly relative to the values in the substrate, whereas that of the δ‐carbon did, again a significant down‐ shift from 26 ppm to 36 ppm, accompanied by a change from a methylene to a methine moiety (Figs. S8‐9 & Table S7). Together, these data suggested to us that the Lys side‐ chain, at its α and δ‐positions, was modified along with the C5 and C6 carbons of the Trp residue. Complete structural elucidation was subsequently car‐ ried out by extensive analysis of 1D and 2D NMR spectra of the isolated product. To simplify spectral analysis, we car‐ ried out the reaction on a large scale, isolated the product, and treated it with trypsin, which yielded a 7mer C‐ terminal peptide (H2N–VNS‐WGK‐H–CO2H) that was sub‐ sequently purified. This peptide contained the ‐4 Da modi‐ fication and exhibited the same UV‐vis spectrum as the uncleaved 21mer product. The corresponding, unmodified, linear 7mer peptide was prepared by SPPS to aid in com‐ parative spectral analysis of the product 7mer (Fig. S10, Table S8). The relevant correlations used to solve the structure of the modified product are shown (Fig. 4b‐f, Figs. S10‐12, Table S9). Most notably, TOCSY correlations verified the presence of the CβH2‐CγH2‐CδH‐CεH2‐NH2 spin system in the modified Lys residue (Fig. 4b). At the mixing times used in the TOCSY experiment (80 ms), we observed a connection of this modified Lys spin system to the C7‐1H of the adjacent Trp residue (Fig. 4f). TOCYS, ROESY, and HMBC correlations from the indole‐H7 to various positions in the Lys side‐chain were consistent with crosslinks be‐ tween the C5 and C6 of Trp18 to the α‐ and δ‐positions of Lys20, respectively (Fig. 4b‐f, Figs. S10‐11, Table S9). This proposed regiochemistry is consistent with previous work on synthetic 4,5‐, 6,7, and 5,6‐tetrahydrobenzindoles,44 with the latter providing the best match to our NMR data. Together, our data indicated that WgkB installs an unprec‐ edented double‐crosslinking reaction resulting in a tricy‐ clic tetrahydro[5,6]benzindole modification (Fig. 5). This transformation provides a never‐before‐seen cyclization motif in RiPP natural products and confirms that novel reactions remain to be found from the network of RaS‐ RiPPs in Fig. 2a. Structural elucidation of the product of the WgkBC reaction sets the stage for future studies address‐ ing the mechanism of this unusual PTM. Prevalence of the WGK BGC in Firmicutes. Our bioin‐ formatic results show that the WGK cluster is wide‐spread in Streptococci, especially in the opportunistic pathogens S. mutans and S. equi. To address its occurrence beyond Streptococci, we searched for WGK clusters in all available bacterial genomes and found highly homologous clusters (>50% sequence identify in the RaS enzyme) in several firmicutes, including Enterococcus caccae, Butyrivibrio sp., Carnobacterium maltaromaticum, and Bacillus cereus (Fig. S13, Table S10). The first two are prominent, non‐ pathogenic gut microbiome bacteria.45,46 C. maltaromati‐ cum is a lactic acid bacterium; it occurs in the wild, can be found in meat and dairy products, and has been proposed as a probiotic in dairy fermentation, due to its ability to synthesize bacteriocins.47 Likewise, B. cereus, a genetically polymorphic species, can be found in the wild or in food.48 Some strains can cause food‐borne illnesses whereas oth‐ ers are animal probiotics. A comparison of the precursor peptides revealed that the –WGK– sequence is strictly con‐ served, suggesting that the benzindole modification could

Page 6 of 12

be installed in these cases as well (Fig. S13). The biological role of the small molecule product will be of future inter‐ est, given the distribution of the WGK cluster in numerous bacteria associated with animal and human microbiomes.





Figure 5. Reaction catalyzed by WgkBC. WgkBC catalyzes the first double‐crosslinking reaction in a RaS‐RiPP, adjoining four unactivated positions via two carbon‐carbon bonds.

CONCLUSIONS Of the bacterial genera mined for secondary metabo‐ lites, Streptococci remain vastly underexamined. They typ‐ ically harbor genomes smaller than ~2 Mbp and their ca‐ pacity for secondary metabolite production cannot rival that of talented genera. But Streptococci have been shown to produce and communicate with RiPP natural prod‐ ucts,27,28 molecules with small genomic footprints and complex structures. We have herein unveiled an expansive network of streptococcal RiPP secondary metabolites us‐ ing a search strategy in which one of the most versatile enzyme superfamilies is linked to an emerging family of natural products in the context of QS‐control. These me‐ tabolites likely contain new PTMs and play important physiological roles. To validate this hypothesis, we have characterized in detail one subfamily within the network and demonstrated an unprecedented modification in which a tetrahydro[5,6]benzindole, a mash‐up of adjacent Trp and Lys residues, provides the cyclization pattern. Previously, two different C‐C‐bond‐forming macrocycliza‐ tion reactions have been identified in RaS‐RiPPs: the Lys‐ Trp modification installed by StrB/AgaB/SuiB and the Glu‐ Tyr crosslink catalyzed by PqqDE.21,29,49,50 The reaction of WgkBC is the first double‐crosslinking modification in which four unactivated positions are connected by two C‐C bonds in a regio‐ and stereo‐specific manner. Mechanistic studies are required to elucidate how a single‐electron oxidant (5‐dA•) can affect a four‐electron oxidation reac‐ tion. The roles of the auxiliary clusters will be critical in this regard.51,52 Also unknown is the function of the small molecule product of the WGK cluster. Indeed, the RaS‐RiPP network that we have outlined will provide fertile grounds for the discovery of new RiPPs as well as the underlying biology, chemistry, and enzymology for years to come. MATERIALS AND METHODS Materials. All materials were purchased from Sigma‐ Aldrich or Fisher Scientific unless otherwise specified. Strep‐ tococcus ferus DSM 20646 was obtained from the DSMZ (Ger‐

ACS Paragon Plus Environment

Page 7 of 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

man Collection of Microorganisms and Cell Cultures). Re‐ striction enzymes, T4 DNA Ligase, Q5 High‐Fidelity DNA Pol‐ ymerase, Shrimp Alkaline Phosphatase, and Trypsin‐Ultra (Mass Spectrometry Grade) were purchased from New Eng‐ land BioLabs (NEB). DNA oligos were purchased from Inte‐ grated DNA Technologies (IDT). Ring‐2H5‐L‐Trp and uniformly 13C/15N‐labeled L‐Trp were purchased from Cambridge Iso‐ tope Labs. Uniformly 13C/15N‐labeled L‐Lys was obtained from Sigma‐Aldrich. 2,2‐2H2‐Gly was purchased from CDN Isotopes. General Procedures. UV‐vis spectra were acquired on a Cary 60 UV‐visible spectrophotometer (Agilent). Low resolu‐ tion high‐performance liquid chromatography‐mass spec‐ trometry (HPLC‐MS) analysis was performed on an Agilent 1260 Infinity Series HPLC system equipped with an automat‐ ed liquid sampler, a diode array detector, and a 6120 Series ESI mass spectrometer using a reversed phase Phenomenex Luna C18 column (3 μm, 100 x 4.6 mm). The mobile phases consisted of water and MeCN (both containing 0.1% formic acid). Upon injection, elution was carried out isocratically with 8% MeCN in water for 5 min followed by gradients of 8– 60% MeCN in water over 15 min, and then 60%–100% MeCN over 4 min, at a flow rate of 0.5 mL/min. High‐resolution (HR) HPLC‐MS and HR‐tandem HPLC‐MS were carried out on an Agilent 6540 Accurate‐Mass quadrupole time‐of‐flight (Qtof)‐ HPLC‐MS, consisting a 1260 Infinity Series HPLC system, an automated liquid sampler, a diode array detector, a JetStream ESI source, and the 6540 Series Qtof. Samples were resolved on a reversed phase Phenomenex Luna C18 column (3 μm, 100 x 4.6 mm) or an Agilent C18 Eclipse I (5 μm, 150 x 4.6 mm). The mobile phase consisted of water and MeCN (+0.1% formic acid). Elution was carried out isocratically with 8% MeCN in water (8 min) followed by a gradient of 8–60% MeCN over 12 min, and then 60%–95% MeCN over 4 min, at a flow rate of 0.6 mL/min. HPLC purifications were carried out on an Agilent 1260 Infinity Series analytical or preparative HPLC system equipped with a temperature‐controlled column com‐ partment, a diode array detector, and an automated fraction collector. The analytical system was also equipped with an automated liquid sampler. Nuclear magnetic resonance (NMR) and electron paramagnetic resonance (EPR) spectra were acquired at the Princeton University Department of Chemistry Facilities. NMR spectra were collected in D2O in the triple res‐ onance cryoprobe of a Bruker A8 Avance III HD 800 MHz NMR spectrometer. 1D/2D NMR data – specifically 1H, 13C, correla‐ tion spectroscopy (COSY), total correlation spectroscopy (TOCSY), rotating‐frame Overhauser spectroscopy (ROESY), heteronuclear single quantum coherence (HSQC), distortion‐ less enhancement by polarization transfer (DEPT)‐edited HSQC, and heteronuclear multiple bond correlation (HMBC) data – were analyzed with MestReNova software. CW X‐band EPR spectra were recorded at 10 K on a Bruker EMXplus EPR spectrometer equipped with an Oxford liquid Helium cryostat. Bioinformatic Search for Streptococcal RaS‐RiPPs. Analysis of streptococcal genomes was carried out using the Integrated Microbial Genomes and Microbiomes (IMG/M) System.53 To start, functions pfam04055 (Radical_SAM ‐ Radi‐ cal SAM superfamily) and TIGR01716 (transcriptional activa‐ tor, Rgg/GadR/MutR family, C terminal domain) were added to the function cart. The Compare Genomes – Function Profile tool was applied to search all finished, permanent draft, and draft streptococcal genomes for the selected functions. The search generated a function profile listing each genome and its numbers of pfam04055 and TIGR01716 genes as well as a detailed list of the genes identified. Because the application allows only 400 genomes to be selected at once, the search

was performed in several rounds. All data was exported into Microsoft Excel for further analysis. At the time of our most recent search (March 2018), 2,875 Streptococcal genomes were available on IMG/M. The search identified 8,478 Rgg‐ type genes and 14,453 radical SAM genes. The profile of genes was arranged according to gene ID number, which correlates with the position in the genome. In this way, pairs of rgg and radical SAM genes present in the same local genomic neigh‐ borhood could be readily identified by differences in gene ID values. Those radical SAM genes that did not co‐occur with an rgg‐type gene within a 3‐gene distance or conversely those rgg genes that did not co‐occur with a radical SAM gene with‐ in a 3 gene distance were eliminated. The data set was filtered further by examining upstream and downstream regions of the radical SAM gene for open reading frames encoding short precursor peptides. The 5'‐region upstream of candidate pre‐ cursor peptides was examined manually for ribosomal bind‐ ing sites as well as potential Rgg binding sites. Loci containing transposons were eliminated as we reasoned the native or‐ ganism would not be capable of producing the mature prod‐ uct. The final list contained 592 loci. A Sequence Similarity Network (SSN) of precursor pep‐ tides was generated using the EFI‐Enzyme Similarity Tool.31 The initial calculation was performed with an E‐value of 2 and a fraction value of 1 and the SSN generated with a minimum alignment length of 0, a maximum alignment length of 50,000, and an alignment cutoff score of 1. An SSN of radical SAM en‐ zymes was also generated. The initial calculation was per‐ formed with an E‐value of 5 and a fraction value of 1 and the SSN generated with a minimum alignment length of 0, a max‐ imum alignment length of 50,000, and an alignment cutoff score of 100. Radical SAM enzyme genes that were incom‐ plete, shortened, or cut off were not included. Networks were visualized in Cytoscape.54 A list of the 592 loci is provided in Table S1. Loci are referenced by the IMG Gene IDs of their radical SAM enzymes and organized according to the subfami‐ ly with which they grouped in the SSN of precursor peptides. A sequence logo was created for each subfamily using Web‐ Logo.55,56 Subfamilies were named according to a conserved sequence motif in the precursor peptide. Cloning of WgkB and WgkC. Genomic DNA from S. ferus DSM 20646 was isolated using the Wizard Genomic DNA Puri‐ fication Kit (Promega). The wgkB gene (IMG Gene ID: 2515261319) was PCR amplified from genomic DNA using Q5 DNA Polymerase (NEB) in FailSafe 2X PreMix D Buffer (Epi‐ center) and primers wgkB_DSM20646_F (NdeI) and wgkB_DSM20646_R (SacI) (Table S3). The PCR product was purified using the Qiagen PCR purification Kit and subse‐ quently digested with the restriction enzymes NdeI and SacI. Vector pET‐28b(+) (Novagen) was digested with the same restriction enzymes and subsequently treated with Shrimp Alkaline Phosphatase (NEB). Following gel extraction using the Qiagen Gel Extraction Kit, insert and vector were ligated using T4 DNA Ligase (NEB). The ligation reaction was trans‐ formed into E. coli DH5α cells by heat shock. Single colonies were screened for insertion of the fragment. Sequence verified pET‐28b(+)_wgkB, encoding an N‐terminally His‐tagged WgkB, along with pDB1282 were co‐transformed by heat shock into the expression host E. coli BL21(DE3).51 The wgkC gene (Gene ID: 2515261320) was PCR‐amplified from S. ferus DSM20646 genomic DNA using Q5 DNA Poly‐ merase (NEB) in Failsafe Buffer D (Epicenter) and primers wgkC_DSM20646_F (SacI) and wgkC_DSM20646_R (XhoI) (Table S3). The PCR product was purified using the Qiagen PCR purification Kit and then digested with the restriction

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

enzymes SacI and XhoI. Vector pSJ7, containing a 5'‐NusA‐ hexa‐His tag, was digested with the same restriction enzymes and subsequently treated with Shrimp Alkaline Phosphatase (NEB). Following gel extraction using the Qiagen Gel Extrac‐ tion Kit, insert and vector were ligated using T4 DNA Ligase (NEB). The ligation reaction was transformed into E. coli DH5α cells by heat shock. Single colonies were screened for insertion of the fragment. Sequence verified pSJ7_wgkC was transformed into the expression host E. coli Bl21 (DE3) by heat shock. Expression, Purification, and Reconstitution of WgkB. Expression, purification, and reconstitution of WgkB was car‐ ried out using previously published procedures without modi‐ fications.57 Purified WgkB was quantified using the method reported by Barr et al.49 WgkB was anaerobically reconstitut‐ ed with 10‐fold excess FeII and 10‐fold excess Na2S, as per reported procedures.57 Expression and Purification of NusA‐His6‐WgkC. A 14 mL sterile culture tube containing 5 mL LB supplemented with Amp (100 μg/mL) was inoculated with a single colony of E. coli BL21(DE3) cells carrying pSJ7_wgkC. The 5 mL culture was grown overnight at 37°C/200 rpm and used to inoculate 150 mL LB‐Amp in a 250 mL Erlenmeyer flask. This interme‐ diate culture was grown overnight at 37°C/200 rpm and used to inoculate 4 x 2.8 L Fernbach flasks each containing 1.25 L LB‐Amp at 1% dilution. This culture was grown at 37°C/180 rpm to OD600nm ≈ 0.6, cooled in an ice bath, and supple‐ mented with a final concentration of 0.1 mM IPTG to induce expression. Cultures were grown at 18°C/180 rpm for 18 hr. Cells were then harvested by centrifugation (15,000 x g, 30 min, 4°C) and frozen at ‐80°C. A typical yield was ~4 g cells per L culture. Purification was carried out at 4°C in a cold room. Wet cell paste (~30 g) was resuspended in lysis buffer (5 mL/g cell paste), which consisted of 50 mM HEPES, 300 mM KCl, 25 mM imidazole, 15% glycerol, pH 7.5 (+10 mM βME). Lysozyme and PMSF were added to final concentrations of 1 mg/mL of lysis buffer and 0.5 mM, respectively, and the suspension stirred for an additional 30 minutes at room temperature. The cells were then incubated on ice and subsequently sonicated for a total of 4 minutes in 15s on/15s off cycles at a 30% power rating. The cells were allowed to rest for 4 minutes, and the cycle was repeated. The crude cell lysate was transferred to centrifuge tubes and cell debris pelleted by centrifugation (35,000 x g, 65 minutes, 4°C). The supernatant was trans‐ ferred to a beaker on ice. DNA was precipitated by dropwise addition of 0.2 volumes of a 6% w/v streptomycin sulfate in lysis buffer to the crude extract over 15 minutes. The solution was stirred for an additional 15 minutes and precipitated DNA was removed by centrifugation (35,000 x g, 4°C, 40 minutes). The supernatant was then loaded onto a Nickel affinity resin (10 mL), which had been equilibrated with 10 column vol‐ umes (10 CV) of lysis buffer. The column was washed with an additional 5 CV of lysis buffer followed by 5 CV of wash buffer consisting of 50 mM HEPES, 300 mM KCl, 25 mM imidazole, 15% glycerol, pH 7.5 (+10 mM βME). NusA‐WgkC was eluted with 1.5‐2 column volumes of elution buffer consisting of 50 mM HEPES, 300 mM KCl, 300 mM imidazole, 15% glycerol, pH 7.5 (10 mM βME). The purified protein was subsequently de‐ salted on a Sephadex G‐25 column (~50 mL, d=1.25 cm, l=25 cm), which had been equilibrated in G‐25 buffer (100 mM HEPES, 300 mM KCl, 10% glycerol, pH 7.5(+5 mM DTT). Frac‐ tions containing NusA‐WgkC, as judged by the Bradford assay, were pooled, aliquoted, and stored at ‐80°C. A yield of 3.2 mg of pure NusA‐WgkC per g of cell paste was typical. NusA‐WgkC

Page 8 of 12

was used directly in activity assays without cleavage of the purification tag, as initial tests indicated that the full‐length and cleaved constructs behaved similarly in enzymatic activity assays with WgkB. The concentration of NusA‐WgkC was de‐ termined spectrophotometrically using a calculated extinction coefficient of 45,840 M‐1cm‐1. Synthesis of WgkA Peptide Substrates and the Linear WgkA15‐21 Standard. WgkA substrates and linear WgkA 7mer were prepared by Fmoc‐based solid phase peptide synthesis (SPPS) on a preloaded H‐His(Trt)‐HmPB‐ChemMatrix® resin using a Liberty Blue automated peptide synthesizer equipped with a Discover microwave module (CEM). The deprotection solution consisted of 10% piperazine (w/v) in a 10:90 solu‐ tion of EtOH:NMP (N‐methylpyrrolidine) supplemented with 0.1 M HOBt (1‐hydroxybenzotriazole). The activator solution consisted of 0.5 M DIC (N,N'‐diisopropylcarbodiimide) in DMF and activator base solution of 1.0 M Oxyma with 0.1 M DIPEA in DMF. A typical coupling cycle used 5 equiv. of amino acid and 5 equiv. of coupling reagent. Residues 18‐21 were double‐ coupled. The synthesis was typically performed on a 100 μmol scale. For synthesis of peptides containing isotopically labeled residues, the scale was reduced to 50 μmol. Upon completion of the synthesis, the resin was removed from the reaction vessel and transferred to an Econo‐Pac column (BioRad). The resin was washed several times with DMF, followed by DCM, and dried thoroughly under vacuum. The peptide was cleaved from the resin by incubation with freshly prepared cleavage cocktail (5 mL per 100 mg resin) consisting of 92.5% TFA, 2.5% H2O, 2.5% TIS (triisopropylsilane), and 2.5% DODT (2,2'‐(ethylenedioxy)diethanethiol). The reaction was stirred for 1–3 h at room temperature. The mixture was drained from the reaction tube and the resin was rinsed several times with TFA. The filtrate and rinses were combined and subsequently concentrated by evaporation of TFA under a stream of N2. The peptide was then precipitated by addition of 10 volumes of ice‐cold diethyl ether and isolated by centrifugation (4000 x g, 10 min, 4°C). The ether was poured off and the peptide was dried overnight in a fume hood. In the case of (ring‐2H5)‐W18‐ WgkA, cleavage was performed in deuterated TFA to avoid exchange during the cleavage reaction. Purification of WgkA Peptide Substrates. Crude peptide was dissolved in H2O (+0.1% formic acid) and purified by preparative HPLC. Samples were manually injected onto a Phenomenex Luna C18 column (5 μm, 250 x 21.20 mm), which had been equilibrated with 5% MeCN (in water +0.1% formic acid). A gradient of 5–38% MeCN over 16 minutes at a flow rate of 12 mL/min was used to elute the peptide. Frac‐ tions containing the peptide were pooled, dried down to sev‐ eral mL in vacuo, and lyophilized. The C‐terminal 7mer pep‐ tide of WgkA (H2N‐VNSWGKH‐CO2H) was purified in a similar manner. Synthesis of Fmoc‐L‐Lys(Boc)‐OH‐(13C6,15N2). This ami‐ no acid was synthesized using previously reported proce‐ dures without modification.36 The synthesis was conducted on a 100 mg scale. Synthesis of Fmoc‐L‐Trp(ring‐2H5)‐OH. This amino acid was synthesized using established procedures.58 Briefly, 155 mg of ring‐2H5‐L‐Trp was dissolved in a mixture of 2.15 mL 1 M NaHCO3 and 2.15 mL THF, supplemented with 297 mg of Fmoc‐OSu (Fmoc‐succinimide), and stirred for 2 h at room temperature. THF was removed by in vacuo to give a white aqueous suspension that was diluted by the addition of 2.15 mL 1 M NaHCO3. Unreacted Fmoc‐OSu was extracted with diethyl ether (2 x 10 mL). The pH of the aqueous layer was adjusted to