Comparative Proteomics of Enterotoxigenic Escherichia coli Reveals

Nov 29, 2017 - ... (ETEC) infections are an important cause of diarrhea among young children living in low- and middle-income countries and visiting t...
0 downloads 0 Views 1MB Size
Subscriber access provided by READING UNIV

Article

Comparative Proteomics of Enterotoxigenic Escherichia coli Reveals Differences in Surface Protein Production and Similarities in Metabolism Veronika Kucharova Pettersen, Hans Steinsland, and Harald G. Wiker J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00593 • Publication Date (Web): 29 Nov 2017 Downloaded from http://pubs.acs.org on November 29, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Comparative Proteomics of Enterotoxigenic Escherichia coli Reveals Differences in Surface Protein Production and Similarities in Metabolism Veronika Kuchařová Pettersen1,#, Hans Steinsland2,3 and Harald G. Wiker1,* 1

The Gade Research Group for Infection and Immunity, Department of Clinical Science,

University of Bergen, 5021 Bergen, Norway 2

Centre for International Health, Department of Global Public Health and Primary Care,

University of Bergen, 5021 Bergen, Norway 3

Department of Biomedicine, University of Bergen, 5021 Bergen, Norway

Keywords: Enterotoxigenic Escherichia coli, Label-Free Quantification, Proteogenomics, Bacterial Metabolism, Plasmid, Virulence

1 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Enterotoxigenic Escherichia coli (ETEC) infections are an important cause of diarrhoea among young children living in low- and middle-income countries and visiting travellers. The development of effective vaccines is complicated by substantial genomic diversity that exists among ETEC isolates. To investigate how ETEC genomic variation is reflected at expressed proteome level, we applied label-free quantitative proteomics to 7 human ETEC strains representing 5 epidemiologically important lineages. We further determined the proteome profile of the non-pathogenic E. coli B strain BL21(DE3) to discriminate features specific for ETEC. The analysis yielded a dataset of 2,893 proteins, of which 1,729 were present in all strains. Each ETEC strain produced on average 27 plasmid- or chromosomally-encoded proteins with known or putative connections to virulence, and a number of strain-specific proteins associated with the biosynthesis of surface antigens. Statistical comparison of protein levels between the ETEC strains and BL21(DE3) revealed several proteins with considerably increased levels only in BL21(DE3), including enzymes of arginine biosynthesis and metabolism of melibiose, galactitol, and gluconate. ETEC strains displayed consistently increased levels of proteins that were functional in iron acquisition, maltose metabolism, and acid resistance. The latter results suggest that specific metabolic functions might be shared among ETEC isolates.

2 Environment ACS Paragon Plus

Page 2 of 43

Page 3 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION

Infection with enterotoxigenic Escherichia coli (ETEC) is one of the most prevalent causes of diarrhoea by E. coli pathotypes (1-3). Diarrheal diseases are a serious health problem in low- and middle-income countries where they remain a leading cause of preventable deaths among children under five years old (4, 5). ETEC transmission occurs via the fecal-oral route and two distinguishing features appear to be important for human ETEC pathogenesis. One is the tendency to produce ETEC-specific protein surface structures termed colonization factors (CFs) that promote adherence to the host intestinal mucosa. The other is the elaboration of heat-labile (LT) and/or heat-stable (ST) enterotoxins, which disrupt signalling pathways that manage water and electrolyte homeostasis in the epithelial cells (6). A large genetic diversity of these mainly plasmid-encoded virulence factors exists within the ETEC population; 28 LT and 6 ST gene variants have been found, and more than 25 different types of CFs, each with their own gene variants, have been described so far (7-9). The chromosomal background is also diverse; ETEC isolates are found across E. coli phylogroups A, B1, D and E, and express a wide range of lipopolysaccharide and flagellin antigens (O and H, respectively) (10-12). This diversity complicates the efforts to develop vaccines against ETEC infection and diarrhea (13). A recent whole-genome comparison of 362 ETEC strains isolated from areas where ETEC infections are endemic further highlighted the heterogeneity of ETEC. The analysis identified 21 globally distributed and stable ETEC lineages that displayed consistent long-term association with specific O-antigens and plasmid-encoded virulence factors (14). The results provided an evidence for a long-term coupling of the bacterial chromosome and plasmid-encoded virulence factors, which has also been previously suggested by lower-resolution genotyping analyses (15, 16). The importance of chromosomal background was moreover stressed by genome-scale

3 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

metabolic reconstructions that showed that pathogenic and commensal E. coli strains can be categorized according to their metabolic capabilities (17). In addition, a theoretical modelling study predicted that a bacterial host is likely to control plasmid gene expression via a chromosomally based regulator (18). Indeed, in vitro studies demonstrated that cAMP receptor protein (CRP) and the histone-like nucleoid structuring (H-NS) factor regulate expression of the plasmid-encoded ETEC enterotoxins (6, 19), and that exposure to bile and alkaline pH modulates the expression of ST and LT, respectively (9, 20). However, variations between ETEC strains in their response to the external stimuli such as bile can exist, where the toxin production is dependent on the specific gene variant (9). Similarly, a study found that different ETEC strains may have different transcriptional response to exposure to chemical signals, further cautioning against generalizing findings from of a single ETEC isolate to the entire ETEC pathovar (21). In this report we have described and compared the protein profiles of 7 human ETEC strains representing 5 clonal lineages epidemiologically relevant for childhood ETEC diarrhea (15), and of the non-pathogenic E. coli B strain BL21(DE3). BL21(DE3) genome is similar in size and organization to the genome of another commensal E. coli strain, K-12 MG1655, with whom it shares >99% sequence identity over approximately 92% of the genome (22). The use of labelfree quantitative (LFQ) proteomics based on liquid chromatography tandem-mass spectrometry allowed us to explore in detail two features of ETEC pathogenicity at the expressed proteome level: the genetic diversity of ETEC and potential metabolic traits shared by the ETEC strains. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD005259.

4 Environment ACS Paragon Plus

Page 4 of 43

Page 5 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

MATERIALS AND METHODS Bacterial strains Details of E. coli strains used in this study are described in Table 1. ETEC strains TW10598, TW10722, TW10828, TW11681, and TW14425 were isolated from young children with diarrhea in Guinea-Bissau in 1997(23). ETEC reference strain H10407(24) was kindly provided by Ian R. Henderson from University of Birmingham, UK. ETEC strain H608 was isolated at Haukeland University Hospital in 2010 from an adult Norwegian who developed diarrhea after travelling to Madagascar. Non-pathogenic E. coli B strain BL21(DE3)pLysS (22) (hereafter called BL21(DE3)) was obtained from Stratagene, CA, USA. E. coli plasmid reference strain 39R861 harbouring 154, 66, 38 and 7 kb plasmids was used for the plasmid characterization analyses. All analysed strains originate from E. coli phylogroup A, except TW10722 and TW10828, which belong to phylogroup B1 (Table 1, Figure S-1)).

5 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Genome assembly and annotation The initial sequencing and assembly of the TW10598, TW10722, TW10828, TW11681 and TW14425 genomes have already been described (11). Sequence gap-closing was performed as previously described for TW10598, including checking the assembly against optical maps of each strain’s chromosome (25). Here, we also re-sequenced the genome of each strain on an Illumina HiSeq 2000 machine by using Illumina HiSeq v4 reagents on 300 bp libraries and aligning the resulting 125 bp paired-end reads against the completed assemblies in order to further identify and correct any mis-assemblies. Not all plasmid sequences could be completely assembled due to long stretches of repeat elements among the plasmid sequences. To annotate the sequences, we used CG-Pipeline to create the initial structural annotation (26), followed by adjustments of translation start positions by using in-house developed software that compared the annotation with those from E. coli reference genomes, and by visual inspections and manual adjustments in Artemis (27). In order to derive each ETEC strain’s functional annotation (the predicted proteome), we performed a protein BLAST search of all identified coding DNA sequences (CDS) against all bacterial proteins described in UniProtKB (accessed June 9, 2016) and identified any top match that had ≥70% sequence overlap and ≥90% amino acid identity with the queried sequence. The functional annotation of the UniRef90 representative for this top match was then set to represent the given CDS. We consider two proteins that have the same UniRef90 representative to represent the same protein. The CDS for H10407 were downloaded from GenBank (GI: 309700213, accessed February 27, 2015), and the functional annotation was repeated as for the other ETEC strains. Plasmids characterization

6 Environment ACS Paragon Plus

Page 6 of 43

Page 7 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

For characterization of low-molecular weight plasmids, plasmid DNA was purified with QIAGEN Midi Kit (Qiagen, Hilden, Germany) according to the manufacturer instructions. Total plasmid DNA (200 ng) was digested in 20 μL of S1 nuclease buffer containing 5 U/ μL of S1 nuclease (Thermo Fisher Scientific, Waltham, MA, USA) for 70 min at 37°C. Following digestion, the plasmid DNA was separated on 0.5 % agarose gel by electrophoresis in TAE buffer. The gel was stained with GelRedTM Nucleic Acid Gel Stain solution (Biotium, Hayward, CA, USA) according to the manufacturer instructions. Characterization of high-molecular weight plasmids was based on previously described S1Pulse Field Gel Electrophoresis (PGFE) method (28). Preparation of agarose plugs, lysis of cells embedded in the plugs and subsequent washes were performed according to the standardized PFGE procedure developed by the Center for Disease Control and Prevention, USA, for E. coli (O157:H7 and non-O157), Salmonella, Shigella sonnei and S. flexneri (protocol version March 2013). Each plug was digested with S1 nuclease (Thermo Fisher Scientific) at 0.2 U/ μL for 2 hours at 37°C in S1 nuclease buffer. The PFGE was performed using a CHEF-DR® III system (Bio-Rad Laboratories, Hercules, CA, USA) and with parameters designed to detect plasmids of sizes from around 30 – 350 kb. Specifically, initial switch time was set to 1 sec, final switch time to 25 sec, runtime was 24 or 27 hours, angle 120°, gradient 6V/cm, temperature 14°C, and ramping factor was linear. The gel was stained with GelRedTM Nucleic Acid Gel Stain solution (Biotium) for 1 hour, and distained in water overnight. Sizes of the plasmids were determined by comparison with the ProMega-Marker® Lambda Ladder (Promega, Madison, WI, USA). Genotypic characterization The phylogroup of ETEC isolate H608 was determined by using the multiplex PCR assay described by Clermont et al. (29). The multilocus sequence type (MLST) of the strain was

7 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

determined by Sanger sequencing of the 7 housekeeping genes specified by the EcMLST scheme (30). Sequence types and serotype of the genome-sequenced strains were determined by the MLST typing tool and SeroTypeFinder-1.1 of the Center for Genomic Epidemiology server (31, 32). For the detection of ST and LT genes in H608, we used an ETEC toxin multiplex PCR as previously described (33). Bacterial cell culture and lysate preparation For the mass spectrometry analysis, each strain was streaked onto sheep blood agar plates (Haukeland University Hospital, Bergen, Norway) in 3 or 4 parallel biological replicates and cultured at 37°C for ~16 h. Several colonies from each plate were resuspended in 1 mL of Lysogeny Broth, and 200 μL of this suspension was pipetted onto new blood agar plate and spread evenly to make a lawn. After incubation at 37°C for ~16 h the cells were harvested, washed and lysed as previously described (25). Filter-aided protein digestion and LC-MS/MS The whole cell lysates were processed as described earlier (25), according to the multiple enzymes for sample digestion – filter-aided sample preparation protocol (34). In short, the cell lysates were treated with LysC and trypsin in a two-step digestion reaction, and the resulting peptide mixtures were desalted and lyophilised. Prior to the liquid chromatography–mass spectrometry (LC-MS/MS) analysis, the peptides were resuspended in 0.1% formic acid (FA) and 2% acetonitrile (ACN). The LC-MS/MS analysis was carried out at the Proteomics Unit at the University of Bergen (PROBE) on an Ultimate 3000 RSLC system (Thermo Scientific, Waltham, MA) connected to a linear quadrupole ion trap-Orbitrap (LTQ-Orbitrap) mass spectrometer (Thermo Scientific) equipped with a nanoelectrospray ion source. Briefly, ~1 µg protein was loaded onto a 8 Environment ACS Paragon Plus

Page 8 of 43

Page 9 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

preconcentration column (Acclaim PepMap 100, 2 cm × 75 µm i.d. nanoViper column, packed with 3µm C18 beads) at a flow rate of 5µl/min for 5 min using an isocratic flow of 0.1% FA (vol/vol) with 2% ACN (vol/vol). Peptides were separated during a biphasic ACN gradient from two nanoflow UPLC pumps (flow rate of 270 nl/min) on the analytical column (Acclaim PepMap 100, 50cm x 75µm i.d. nanoViper column, packed with 3µm C18 beads). Solvent A and B was 0.1% FA (vol/vol) with 2% ACN or 90% ACN (vol/vol) respectively. Separated peptides were sprayed directly into the MS instrument during a 195 min LC run with the following gradient composition: 0-5 min 5%B, 5-5.5 min 8%B, 5.5-140 min 8–35%B, 140-155 min 35– 90%B. Elution of very hydrophobic peptides and conditioning of the column was performed by isocratic elution with 90%B (155-170 min) and 5%B (175-195 min) respectively. Desolvation and charge production were accomplished by a Nanospray Flex ion source. The mass spectrometer was operated in the data-dependent-acquisition mode to automatically switch between Orbitrap-MS and LTQ-MS/MS acquisition. Survey of full-scan MS spectra (from m/z 300 to 2000) were acquired in the Orbitrap with resolution of R = 240,000 at m/z 400 (after accumulation to a target of 1,000,000 charges in the LTQ). The method used allowed sequential isolation of the most intense ions (up to 10, depending on signal intensity) for fragmentation on the linear ion trap using collisionally-induced dissociation at a target value of 10,000 charges. Target ions already selected for MS/MS were dynamically excluded for 18s. General mass spectrometry conditions were as follows: electrospray voltage, 1.8 kV; no sheath; and auxiliary gas flow. Ion selection threshold was 1000 counts for MS/MS, and an activation Qvalue of 0.25 and activation time of 10 ms was also applied for MS/MS. MS/MS data analysis

9 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 43

The MS/MS raw data files were processed in MaxQuant (version 1.5.3.30) (35). Andromeda search engine integrated in the MaxQuant framework performed the spectra search against the following databases: 1) In an initial proteogenomic analysis aimed at improving the ETEC genomes annotation, the MS/MS data files of each sequenced ETEC strain were searched separately against the corresponding genome 6-frame translation and a reviewed E. coli protein database downloaded from the Swiss-Prot section of UniProtKB (July 6, 2015; 28,937 entries). 2) For comparative proteomic analysis, MS/MS data of all 8 strains were searched together against the predicted proteomes of the six genome-sequenced ETEC strains (Table S-1) and the predicted proteome of E. coli strain K12 (downloaded from UniProtKB Swiss-Pro, May 5, 2016; 4,314 entries). In all MaxQuant runs, enzyme specificity was defined in group-specific parameters as either to trypsin or LysC, allowing N-terminal cleavage to proline, and two missed cleavages were allowed. The spectra of the LysC and tryptic fractions originating from the same replicate were combined in MaxQuant. Standard settings were used for MaxQuant searches, except that lysine acetylation and glutamate/glutamine conversion to pyro-glutamate were set as variable modifications

in

addition

to

N-terminal

acetylation

and

methionine

oxidation.

Carbamidomethylation of cysteines was set as a fixed modification. The initial allowed mass deviation of the precursor ion was as high as 20 ppm, and the allowed tolerance value for the fragment mass was set to maximum 0.50 Da. The “match between runs” option was enabled to match identifications across samples. The maximum false discovery rates (FDR) at peptide and protein levels were kept at 1%. Normalized spectral proteins intensities (LFQ intensity), proportional to the quantity of a given protein in a sample, were derived by the MaxLFQ algorithms (36). The normalization ensures that one can compare LFQ scores between different

10 Environment ACS Paragon Plus

Page 11 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

analysed samples. Filtering of false positive identifications (i.e. proteins identified by MS/MS but not encoded in a genome) was performed by mapping MS/MS-detected peptides onto individual strains genomes, as described in the next section. Proteogenomics and other bioinformatics analysis For each sequenced ETEC strain, we loaded a FASTA file containing either chromosomal or plasmid sequence and its corresponding general feature format (GFF) file containing the functional annotations, as well as an Excel file containing the MS/MS-derived peptides, into the proteogenomic program VESPA, version 1.1.1 (37). From peptides that mapped onto the 6-frame translated genomic sequence but not to any annotated coding sequence (referred as orphan peptides), we included those that mapped to a reading frame with a suitable upstream E. coli translation start codon, and only if at least one more orphan peptide mapped to the same reading frame (25). MaxQuant output data were analysed with the Perseus module (38). Proteins identified in at least 2 of the biological replicates were considered valid. For the H608 strain in addition only proteins with the identification type “MS/MS” were considered valid. In the quantitative analysis we included only proteins described by unique peptides, and which had LFQ intensity values in minimum two replicates. For 98 proteins detected by a single peptide, and thus representing less confident identifications, detailed information about the MS/MS spectrum is provided in Table S-2. To identify and test for differences in protein abundance levels, we used analysis of variance (ANOVA) on log2-transformed LFQ values, with FDR set to 1% and performed by a permutation-based procedure with 250 randomizations, similar to the procedure applied to FDR calculation for differential expression analysis (39). Principal component analyses (PCA) was done on logarithmized LFQ values (FDR = 1%), and details of the PCA implementation were

11 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 43

previously described (40). We considered a >2 log2 difference between LFQ intensity means to represent a considerable difference in protein levels. Protein functional analysis was performed using the DAVID (41) and STRING (42) tools, together with the EcoCyte (43) and UniprotKB (44) databases. Protein-protein BLAST (NCBI) and Clustal Omega (EMBL-EBI) were used for sequence alignments of identified protein isoforms. Cellular localization prediction was made by PSORTb version 3.0.2 (45). MS/MS data were deposited in the proteomics identifications database PRIDE (46) under accession number PXD005259. RESULTS ETEC genomes annotation improvements by mass spectrometry We recently described the expressed proteome of the ETEC strain TW10598 (25), and in the present study we characterized six additional ETEC strains by using the same LC-MS/MS methodology (Table 1). Similarly to the study of TW10598 (25), we first improved the annotation of coding sequence regions (genome structural annotation) by searching the acquired MS/MS spectra of five genome-sequenced ETEC strains individually against the respective genome 6-frame translation (Table S-1). The proteogenomic mapping revealed minor errors for all annotations and an overview of 37 proteins whose coding sequences were either not represented or mis-annotated is given in Table S-3. From the corrected structural annotations we derived the strain´s predicted proteomes that served as protein databases in the subsequent comparative proteomic analysis. Expressed ETEC proteome Approximately 2.5 million MS/MS spectra were acquired from BL21(DE3) and all seven ETEC strains, including those already generated for TW10598, and these were further queried against the predicted proteomes of the six ETEC strains that had been sequenced (Table S-1) and the 12 Environment ACS Paragon Plus

Page 13 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Swiss-Prot-reviewed E. coli K12 proteome. The combined search matched the spectra to 37,058 distinct peptide sequences (Table S-4), which could be assigned to 2,893 different proteins (Table S-5). On average each strain yielded 2,238 proteins (Table 2), and 1,729 (60%) of the detected proteins were represented in all strains. Generally, there was a strong correlation between biological replicates for each strain, with a weighted mean Pearson correlation coefficient of 0.93 (range: 0.89-0.99) (Table S-6A). Comparison of the detected proteins cellular localization (Table S-7) and predicted function (Table S-8) showed very similar distribution between the individual strains. Almost 90% of the identified proteins (2,584) could be described by the LFQ scores, which indicate protein relative amounts in the analysed samples (Table S-9). The LFQ intensities covered a 5-log10 dynamic range, and their correlations between replicates, represented as Pearson correlation coefficient, varied between 0.92-0.99 (Table S-6B). The distribution of protein LFQ intensities was similar for all 8 strains (Figure S-2A); nevertheless, the score plot from principal component analyses showed clear differences in protein levels between the ETEC strains and BL21(DE3), as well as between individual ETEC strains (Figure S-2B). Figure S-2C, which shows PCA plot with biological replica clustering, documents that there was consistent quantification between the samples. Even two pairs of ETEC strains that we consider clonally related (sequence type (ST) 171: TW10598 and H10407; ST 88: TW14425 and H608) did not seem to have closely matching protein abundance levels. ETEC genetic diversity displayed at the expressed proteome level About 11% of all detected proteins (319) were identified only in one of the E. coli strains (Table S-10). TW10722 had the highest number (74) of strain-specific proteins, while the unsequenced clinical isolate H608 had the lowest (9). For 27 of the 319 strain-specific proteins, we identified

13 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 43

their isoforms (i.e. proteins performing essentially the same biological function) in other strains (Figure 1). When looking at the whole dataset, we identified 118 proteins that represented 52 different isoforms (Table S-11). Twenty-nine of the isoforms, represented by 58 different proteins, shared one or more MS/MS-derived peptides because of a sequence similarity, while the remaining 23 isoforms (60 proteins) were identified by peptides unique for the proteins (Figure 2A). Several of the strain-specific proteins were encoded by neighboring genes on the chromosome or plasmids (see Locus Tag in Table S-10), suggesting that the genes belong to functionally related gene clusters. One of these clusters, which was present in all of the sequenced strains, spanned around 45 kb of chromosomal DNA and encoded proteins functional in the biosynthesis of the surface antigens M and O. From these chromosomal regions originated 7, 6, 7 and 2 of the strain-specific proteins for strain TW10598, TW10722, TW10828 and TW11681, respectively, and 5 isoforms represented by 11 proteins (ManB1, 2, ManC1, 2, 3, RmlA1, 2, WzzB1, 2 and Ugd1, 2) (Figure S-3). Some of the isoforms displayed differential expression, most considerably the ManB enzyme (phosphomannomutase) whose quantitative levels differed between the two protein variants as well as for the ManB2 variant produced by different strains (Figure 2A). A dominant antigen of the adaptive immune response, flagellar antigen H encoded by the fliC gene (47), was detected in four different isoforms in TW10722, TW10828, H608, TW11681 and H10407 (Figure S-3). More proteins encoded by the fli operon appeared to be produced in H10407 than in the other ETEC strains: among the 19 proteins encoded by the fli operon, 12 could be detected in H10407, compared to between 2 and 7 for the other strains. Three proteins (KpsDEF) exclusively identified in the TW10598 strain were involved in the biosynthesis of the polysialic acid capsule K15, which is an important virulence determinant (48). Comparative

14 Environment ACS Paragon Plus

Page 15 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

analysis of the ETEC strains genomes showed that the respective kps genes were present only in the TW10598 genome and located within a 20 kb chromosomal region. Mapping of MS/MSderived peptides revealed additional four expressed genes, which encoded uncharacterized proteins (accessible under the following gene names at UniProtKB: TW10598_3126, TW10598_3128, TW10598_3129, and TW10598_3130) that had sequence homology to capsular polysaccharide biosynthesis proteins from other Gram-negative species (Figure S-4). Proteomic characterization of ETEC plasmids and virulence factors An important characteristic of ETEC is the presence of a large number of plasmid-encoded virulence factors (11, 24). Previous genomic comparison of five of the ETEC strains identified plasmid-associated sequences (11), but the sequences of the largest plasmids could not be completely assembled due to the presence of long stretches of repeated sequences. To determine the number and sizes of each strain’s plasmids, we performed both conventional (Figure S-5) and pulsed-field gel electrophoresis (Figure S-6). Each ETEC strain had between 2 and 4 plasmids (Table 1), with estimated plasmid sizes ranging from ~1.1 kb to more than 270 kb (Table S-12). The combined sizes for each strain’s plasmids corresponded to the combined length of the assembled plasmid contigs, except for TW10828 that was missing between 60 to 70 kb of plasmid DNA on the gel pictures. This difference suggested that TW10828 lost one of its plasmids during preparations for this study. We have identified a 60 kb region in the assembled plasmid sequences for TW10828 to which no peptides could be mapped, and we assume this represents the missing plasmid sequence. The region encoded a plasmid replication initiation protein RepA, the plasmid conjugation proteins TraABC, two uncharacterized proteins, and 13 proteins associated with the production of a type IV pili (PilJKLMNOPQRSTUV).

15 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 43

Of the 2,873 proteins that we detected across all 7 ETEC strains, 197 originated from plasmids and 39 of these had no sequence homology to proteins with described function. The annotated functions of a majority of the remaining 158 proteins could be divided into three groups: 1) plasmid partitioning, conjugation, replication, and maintenance, 2) mobile genetic elements regulation and production, and 3) virulence and antibiotic resistance. The proteomic data yielded 77 known or putative virulence factors (Table S-13), 40 of which were plasmid-encoded, and a majority of the virulence factors was specific for one or two of the ETEC strains (32 and 24 proteins, respectively). Of the colonization factors that the sequenced ETEC strains were known to encode (Table 1), we identified the proteins associated with production of the Colonization Factor Antigen I (CFA/I; CfaABCE) in TW11681 and H10407, Coli Surface antigen 2 (CS2; CotABCD) and CS3 in TW10598, CS5 (CsfA) and CS6 (CssAB) in TW10722, and CS21 (LngABCDEFGHIJ) in TW10598 and TW11681. No known colonization factor was identified for strain H608, and all detected colonization factors, except CS2, were plasmid-encoded (Figure 2B). The A and B-subunits of the heat labile enterotoxin (LT; EltAB) were detected in TW10598, TW10828, and H10407; however, relative amounts of the individual LT subunits varied among the strains and chain B (EltB) appeared to be below the MS/MS detection limit in TW10828. Low levels of the H-antigen flagella tip-adhesin EtpA were detected in TW10598 and H10407, while all ETEC strains except TW10722 produced the EtpA-associated proteins EtpB and EtpC from the two-partner secretion locus (EtpBAC) (49). Another conserved ETEC virulence factor, an immunogenic secreted serine protease EatA (50), was detected in 5 of the strains (Figure 2B). TW11681 and H10407 produced the putative virulence factor CexE (51), whose in vivo expression has been described to be dependent on the virulence regulator CfaD (52). Two

16 Environment ACS Paragon Plus

Page 17 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

isoforms of the CexE protein were exclusively detected in TW10598 and TW10722 (Table S13). These strains do not harbour the cfaD gene, and the identification of CexE isoforms might therefore suggest alternative regulatory mechanisms for these proteins. The GTPase virulence factor LeoA (53) and the pore-forming toxin hemolysin E (HlyE) (54) were only found in H10407. Proteins associated with the type IV pilus were produced by strains TW11681 (PilLNQS) and H608 (PilLNOQSV) (Figure 2B). Although the pilus genes were also present in the TW10828 genome, the corresponding proteins could not be detected in TW10828, probably because of the plasmid loss described above. Other virulence factors were chromosomally encoded and included the previously mentioned H and K surface antigens (Fli and Kps proteins), an outer membrane protein YghG, which is essential for the assembly of the type II secretion system that facilitates secretion of LT (55), and YghJ, a broadly conserved E. coli metalloprotease that helps ETEC intestinal colonization by degrading the major protective mucins in the small intestine, and which is also associated with effective delivery of LT to the cell surface (56). In addition to the virulence factors associated with ETEC pathogenesis, we detected the production of 3 plasmid-encoded antibiotic resistance determinants: the tetracycline repressor protein (TW10598, TW10722, TW11681 and TW14425) (Figure 2A), aminoglycoside phosphotransferase StrA (TW11681 and TW14425), and streptomycin 3''-adenyltransferase AadA (TW10722). TW10722 and H608 also produced the chromosomally encoded streptogramin A acetyltransferase VatD. Identification of ETEC-specific metabolic features Results from genome-scale metabolic modelling have suggested that intestinal pathogenic E. coli have phenotypes that are distinguishable from those of commensal and extraintestinal E. coli (17). In order to identify traits that could be specific for the ETEC strains under in vitro

17 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 43

conditions, we statistically compared protein LFQ intensities between the strains, including the non-pathogenic E. coli BL21(DE3). ANOVA of the LFQ intensities for 1,431 proteins, which were quantified in all strains, indicated significant differences for 134 proteins, and 48 of these had differential levels between the ETEC strains and BL21(DE3). Table S-14 shows details of the ANOVA analysis with distinction of individual biological replicates. Looking at the predicted function of the 48 proteins, together with functionally related proteins that did not yield LFQ scores for all strains, several were associated with amino acid metabolism. Levels of 8 enzymes participating in the biosynthesis of arginine from L-glutamate (ArgABCDIGH) were higher in BL21(DE3) than in the ETEC strains (Figure 3). There were also substantial differences in amounts of proteins involved in L-glutamate transport and the closely related acid resistance systems. BL21(DE3) displayed increased levels of the glutamate:sodium symporter GltS, and at the same time lower levels of the glutamate/aspartate ABC transporter (GltJKL), the glutamate decarboxylase and glutamic acid:4-aminobutyrate antiporter (GadABC) (Figure 3). BL21(DE3) is missing the gene for the RcsB transcriptional activator, which positively controls transcription of the glutamate-dependent acid resistance gad genes (57), and this could be an explanation of the lower Gad proteins levels in BL21(DE3). Although TW10598 displayed slightly lower levels of the GadABC proteins than the other ETEC strains, it had the highest levels of the general stress sigma factor RpoS, which governs yet another but less effective E. coli acid resistance system (58). Among the proteins involved in the utilization of various substrates, BL21(DE3) showed substantially lower levels for 7 out of a total 11 detected Mal proteins (Figure 3), which are responsible for uptake and metabolism of the glucose polymers maltose and maltodextrin. On the other hand, proteins responsible for transport and metabolism of gluconate (GntKTU), melibiose

18 Environment ACS Paragon Plus

Page 19 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(MelAB) and galactitol (GatABCDYZ) were detected at higher levels in BL21(DE3) than in ETEC. The gat operon was present in all the sequenced ETEC strains, except for the closely related TW10598 and H10407 strains, where it seemed to have been replaced by genes for arabinitol and ribitol utilization. Compared to the other ETEC strains, BL21(DE3) and TW10598 displayed reduced levels of the Eut proteins, which are involved in the utilization of ethanolamine (Figure 3). Ethanolamine is an amino alcohol that enterohemorrhagic E. coli (EHEC) O157:H7 may use as a signalling molecule to identify a gastrointestinal environment and to trigger the production of virulence factors (59). BL21(DE3) had higher levels of the transcriptional factor BasR as well as some of the proteins controlled by BasR (60). The most relevant of these were from the arn operon (ArnABCDT), some of which could not be quantified in the ETEC strains (Figure 3). In agreement with the non-pathogenic characteristics of BL21(DE3), the strain had lower levels of several broadly conserved virulence factors (Figure 3). The most notable of these were the Tam membrane proteins (TamAB) implied in virulence of several bacterial species (61), and the Ivy and MliC lysozyme inhibitors, which contribute to inhibiting host lysozyme activity (62). The expression of the ivy and mliC genes is under transcriptional control of the Rcs phosphorelay system described above, and the lack of the rcsB gene could play a role in the inhibitors lower levels in BL21(DE3). Of other proteins linked to E. coli virulence, all the ETEC strains produced more of the periplasmic chaperone DsbA, which is known to facilitate disulfide bond formation in LT and ST (63, 64). Iron availability is among the external signals that modulate expression of ETEC virulence genes (65). We found that the proteins involved in the biosynthesis of enterobactin (Ent), which strongly binds iron, and the corresponding CirA receptor, which binds the iron-enterobactin complex, were at lower levels in BL21(DE3) than in most of the ETEC

19 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 43

strains (Figure 3). An exception was TW11681, which also displayed low levels of some of the Ent proteins. In addition, four out of six detected proteins of the Fe-S cluster scaffold complex (SufABCDES), which is usually functional during adverse stress conditions such as iron starvation and oxidative stresses (66), had the lowest levels in BL21(DE3).

20 Environment ACS Paragon Plus

Page 21 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

DISCUSSION ETEC represents a diverse group of pathogens within E. coli, and investigating several genetically distinct isolates is essential for learning more about the ETEC pathotype characteristics (14). In this study we employed high resolution LC-MS/MS to explore how genomic variation of 7 human ETEC strains from epidemiologically important lineages (Table 1) is reflected at the expressed proteome level. For each analysed ETEC strain, we detected on average 47% and 21% of the predicted chromosome- and plasmid-encoded proteome, respectively (Table 2). It is possible that the lower coverage of plasmid proteomes is a result of lower expression levels of plasmid-encoded genes in general, and/or a result of a lack of relevant stimuli in the in vitro culturing conditions. The nutrient-rich blood agar we have used might distantly approximate the host-secreted mucus in the gut (67), but it is unlikely to accurately reflect the conditions encountered when enteropathogens colonize the human small intestine. Further studies are needed to investigate the potential effects of biologically relevant stimuli such as varying pH or different chemicals (9, 20), and how they can affect the production of plasmidencoded proteins. The proteogenomic analysis improved genome annotations of the 5 ETEC strains in several ways. First, the protein dataset presents a proof of expression for over 170 genes that were annotated as encoding uncharacterized proteins (Table S-5). Second, by comparing actual peptide sequences with the predicted coding sequences we were able to identify and correct basecall and assembly errors in the underlying genome sequences. Finally, the comparison helped us to identify wrongly predicted translation start position of nine genes as well as to identify 27 genes that were missing from the annotations (Table S-3).

21 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 43

Although much information about these strains can be obtained through comparative genomics analyses, quantitative proteomics analysis may provide additional valuable information. We found, for example, that even ETEC strains that are phylogenetically closely related (Table 1 and Figure S-1), often have considerable differences in protein levels when grown under the same conditions. This suggests that simple comparative genomic analyses alone may not be sufficient to accurately predict strain phenotypes. We also detected 66 outer membrane and 26 extracellular proteins produced by these strains, which may be suitable targets for the efforts to develop broadly protective vaccines against these pathogens. Finally, although not a focus of the present study, quantitative proteomics can also facilitate identification of coding sequences whose expression is likely to be co-regulated under specific conditions. Identifying other proteins that are co-produced with known virulence factors may, for example, give vaccine developers additional relevant vaccine target options. For the unsequenced clinical isolate H608, we detected approximately the same number of proteins as we did for each of the 6 ETEC strains that had been sequenced (Table 2). The low number of strain-specific H608 proteins (Table S-10) indicated that we have probably missed a few proteins because the MS/MS spectral search was performed without the H608 predicted proteome, since H608 has not been sequenced. Still, corresponding numbers of the identified proteins for all ETEC strains support the use of a specific protein database derived from a set of phylogenetically related, genome-sequenced strains for the MS/MS data interpretation of an unsequenced isolate. We identified a wide range of proteins and protein isoforms that are responsible for producing the serologically important lipopolysaccharide (O) and fimbrial (H) surface antigens, which was in agreement with the variety of the ETEC strains predicted serotypes (Table 1). The differences

22 Environment ACS Paragon Plus

Page 23 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

in relative levels of some of these proteins and their isoforms (Figure 2A) indicate that the ETEC strains can differentially control expression of the corresponding genes, and thus add additional levels of diversity to their antigenic profile. The ETEC strains also differed in the number of detected Fli proteins, which produce the bacterial flagellum that helps to propel the bacteria through the intestinal mucosal barrier and contributes to adhesion to the intestinal cell wall (68) (Figure S-3). However, these differences may be an effect of the growth conditions, and not necessarily an indication of difference in abilities to colonize the intestine. For the TW10598 strain we detected proteins involved in the biosynthesis of capsular polysaccharides (Figure S-4), which is an interesting feature that is not typical for human ETEC and which calls for further investigation. The production of the K antigens is common in E. coli associated with upper urinary tract infections, where the capsular polysaccharide protects these uropathogenic E. coli against the host’s cell-mediated immune response (69). One of the main characteristics of ETEC is the presence of the seemingly unstable virulence plasmids. The investigated ETEC strains had on average 3 plasmids, and most of the strains harboured both large (> 50kb) and small (< 10kb) plasmids (Table S-12). Sudden loss of plasmid virulence genes is a known challenge when working with ETEC (70, 71), and is usually only reported when the strain stops producing the ETEC toxins or colonization factors. The apparent loss of a 60 kb plasmid from strain TW10828 would have probably gone undetected without the integration of data obtained from genomics, proteomics and experimental plasmid characterization, especially since it does not seem to involve typical ETEC virulence factors. A cluster of 13 pil genes, which produce a type IV pilus, was part of the missing plasmid sequences in TW10828, and expression of some of these genes was detected for TW11681 and H608. Type IV pilus genes are known to be present on plasmids of another E. coli pathotypes (e.g. shiga-

23 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 43

toxigenic E. coli (72), enteroaggregative E. coli (73) and avian pathogenic E. coli (74)), which predicts their possible involvement also in ETEC virulence. The relatively high number and heterogeneous nature of detected virulence factors (Table S13) illustrate that ETEC do not rely on a single virulence factor, but rather on a repertoire of proteins that assist with various steps during the infection process, ranging from cell adhesion to toxin production and delivery. Although over half of the detected virulence factors were encoded on plasmids (Figure 1), genes for several known and putative virulence factors were also encoded on the chromosomes, further supporting the notion that ETEC pathogenicity depends on the interplay between chromosomally- and plasmid-encoded proteins. By comparing relative protein levels between the ETEC strains and the non-pathogenic E. coli B strain BL21(DE3) we identified several potential ETEC-specific traits. BL21(DE3) is a laboratory-adapted strain that might not exactly reflect wild-type commensal human E. coli; however, it still possess majority of its characteristics. For example, all ETEC strains displayed consistently higher levels of the acid stress chaperone HdeB and the Gad proteins, which have a crucial role in glutamate-dependent acid resistance, an important virulence property of enteric bacteria (58). On the other hand, the LFQ analysis indicated that arginine biosynthesis is upregulated in BL21(DE3) under the blood agar condition. Biosynthesis of this amino acid is important under oxidative stress conditions, since arginine is a precursor for the synthesis of polyamines, including putrescine and spermidine, which reduce oxidative damage to proteins and DNA (75). E. coli grown on solid media, rather than in liquid culture, tend to produce relatively high levels of proteins that protect against oxidative and osmotic stress (76), which was also observed in the present study. Relative levels of some of these proteins differed

24 Environment ACS Paragon Plus

Page 25 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

considerably between the strains (Table S-14); however, there was not a clear difference between BL21(DE3) and the ETEC strains. The comparison of protein levels also pointed out differences in several metabolic processes between BL21(DH3) and the ETEC strains. The ETEC strains showed upregulation of the maltose and maltodextrin metabolism, while BL21(DH3) had higher levels of proteins involved in uptake and metabolism of melibiose, galactitol and gluconate. Gluconate is metabolized primarily via the Entner-Douderoff (ED) pathway that contributes to the pathogenicity of other enteric pathogens (77, 78). Our results indicate that the ED pathway might be a less important energy-forming pathway for ETEC during in vitro aerobic growth. Levels of proteins encoded by the arn operon, which has been suggested to be under the control of the BasSR two-component system (60), were considerably reduced in the ETEC strains, possibly due to low levels of the BasR regulator. The Arn proteins specifically modify the lipid A moieties of the lipopolysaccharides, resulting in increased resistance to cationic antimicrobial peptides such as polymyxin (79). These modifications are closely related to bacterial virulence and it was therefore surprising to find the Arn proteins upregulated in the non-pathogenic E. coli strain. In a previous study, the uropathogenic E. coli strain 536 also showed increased production of these proteins when grown on blood agar, compared to cultivation on lactose agar or in blood culture (76). In addition, an earlier study on E. coli K12 has suggested that the BasSR regulatory system can be up-regulated in the presence of antimicrobial peptides isolated from bovine neutrophils (80), and a tenable explanation for the increased levels of Arn and BasR proteins in BL21(DE3) could therefore be presence of similar antimicrobial peptides in the sheep blood agar. Why this would have a larger effect on the BasR production in BL21(DE3) than in the ETEC strains needs further investigation.

25 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 43

Altogether, the comparison of relative protein levels between ETEC and BL21 indicated that the ETEC strains share certain metabolic functions that can be favourable for example for successful gut colonization. In addition, the ETEC strains produced considerably higher levels of proteins that can generally enhance the bacterial pathogenicity. These characteristics appeared to be shared by most of the ETEC strains, although we also observed differences between individual ETEC strains in the amounts of specific metabolic proteins, further highlighting the heterogeneity of these enteric pathogens. We believe that the results and the proteomic data generated in this study, which are available in the proteomics identifications database PRIDE (46) under accession number PXD005259, will be of use not only for further functional studies of ETEC virulence factors, but generally for the continued work to improve the understanding of how these enteric pathogens function.

26 Environment ACS Paragon Plus

Page 27 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

ASSOCIATED CONTENT Supporting Information The following files are available free of charge at ACS website http://pubs.acs.org: File “Supporting Information 1.pdf” contains Figure S-1 (Phylogenetic tree of strains used in the study and other reference E. coli and Shigella strains), Figure S-2 (Quantitative description of the detected proteomes), Figure S-3 (Mapping of MS/MS-detected peptides to chromosomal DNA sequences of six ETEC strains that contain genes involved in lipo- and exopolysaccharides biosynthesis), Figure S-4 (Detection of proteins encoded by the fli operon), Figure S-5 (Mapping of MS/MS-detected peptides to chromosomal DNA sequence of the TW10598 strain that contains genes involved in polysialic acid biosynthesis and transport), Figure S-6 (Characterization of low-molecular weight ETEC plasmids), Figure S-7 (Characterization of high-molecular weight ETEC plasmids), Table S-1 (ETEC-specific protein databases used in MS/MS data searches), Table S-7 (Distribution of detected proteins according to the cellular localization), and Table S-12 (Determination of the number and sizes of ETEC plasmids). File ”Supporting Information 2.xls” contains Table S-3 (Proteogenomic analysis of 5 ETEC strains), Table S-4 (Peptide sequences detected at 1% FDR), Table S-5 (Proteins detected at 1% FDR), Table S-6 (Pearson coefficients describing correlation of proteins identifications and LFQ intensities between replicates), Table S-8 (Functional annotation of proteins in Table S-5), Table S9 (LFQ intensities of proteins in Table S-5), Table S-10 (Detected strain-exclusive proteins), Table S-11 (Identified alternative protein isoforms), Table S-13 (Detection of known or putative ETEC virulence determinants) and S-14 (Proteins with significantly differential abundance between 7 ETEC strain and BL21 as determined by ANOVA).

27 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 43

File “Supporting Information 3.xls” contains Table S-2 (Peptide evidence and spectra of singlepeptide-based protein identifications). AUTHOR INFORMATION *Corresponding author: e-mail: [email protected], phone: +47-55974650 #Present Address: Department of Pediatrics, University Hospital of North Norway, Tromsø, Norway; Department of Clinical Medicine, UiT The Arctic University of Norway, Tromsø Norway

Author Contributions VKP and HGW designed the study, VKP acquired data, VKP and HS performed analysis and interpretation of data, VKP drafted the manuscript and all authors critically reviewed the text. Funding Sources This work was funded by Norwegian Research Council, project grant 204743, awarded to HGW. The assembly and annotation of ETEC strains TW10598, TW10722, TW10828, TW11681 and TW14425 was supported in part by funding from the European Commission Seventh Framework Programme (Grant Agreement No 261472), and from the Global Health and Vaccination (GLOBVAC) Research Program under Research Council of Norway contracts 185872/S50 and 234364/H10. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors have declared that no competing interests exist. ACKNOWLEDGMENT

28 Environment ACS Paragon Plus

Page 29 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

We thank Ian R. Henderson, University of Birmingham, for kindly providing ETEC strain H10407. We are also grateful to Marit Gjerde Tellevik (Department of Clinical Science, University of Bergen) for providing E. coli 39R861 (NCTC 50192). We thank Sonja Ljosveit (Department of Clinical Science, University of Bergen) for excellent technical assistance. We further acknowledge the Proteomic Unit at the University of Bergen, and particularly Olav Mjaavatten, for the service and support for the LC-MS/MS experiments. We thank David W. Lacher at the Division of Molecular Biology, U.S. FDA, for generating the optical maps, David A. Rasko at the Institute of Genome Sciences, University of Maryland School of Medicine for the initial paired-end sequencing reads. The repeat sequencing service was provided by the Norwegian Sequencing Centre (www.sequencing.uio.no), a national technology platform hosted by the University of Oslo and supported by the “Functional Genomics” and “Infrastructure” programs of the Research Council of Norway and the Southeastern Regional Health Authorities. ABBREVIATIONS ANOVA - Analysis of variance ACN - Acetonitrile FA – Formic Acid LC-MS/MS - Liquid chromatography–mass spectrometry

29 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 43

REFERENCES 1. Estrada-Garcia, T.; Lopez-Saucedo, C.; Thompson-Bonilla, R.; Abonce, M.; LopezHernandez, D.; Santos, J. I.; Rosado, J. L.; DuPont, H. L.; Long, K. Z., Association of diarrheagenic Escherichia coli pathotypes with infection and diarrhea among mexican children and association of atypical enteropathogenic E. coli with acute diarrhea. J. Clin. Microbiol. 2009, 47, (1), 93-98. 2. Dutta, S.; Guin, S.; Ghosh, S.; Pazhani, G. P.; Rajendran, K.; Bhattacharya, M. K.; Takeda, Y.; Nair, G. B.; Ramamurthy, T., Trends in the prevalence of diarrheagenic Escherichia coli among hospitalized diarrheal patients in Kolkata, India. PLoS One 2013, 8, (2), e56068. 3. Gonzales, L.; Joffre, E.; Rivera, R.; Sjöling, Å.; Svennerholm, A.-M.; Iñiguez, V., Prevalence, seasonality and severity of disease caused by pathogenic Escherichia coli in children with diarrhoea in Bolivia. J. Med. Microbiol. 2013, 62, (11), 1697-1706. 4. Liu, L.; Oza, S.; Hogan, D.; Perin, J.; Rudan, I.; Lawn, J. E.; Cousens, S.; Mathers, C.; Black, R. E., Global, regional, and national causes of child mortality in 2000–13, with projections to inform post-2015 priorities: an updated systematic analysis. Lancet 2015, 385, (9966), 430-440. 5. Qadri, F.; Svennerholm, A.-M.; Faruque, A. S. G.; Sack, R. B., Enterotoxigenic Escherichia coli in developing countries: epidemiology, microbiology, clinical features, treatment, and prevention. Clin. Microbiol. Rev. 2005, 18, (3), 465-483. 6. Haycocks, J. R. J.; Sharma, P.; Stringer, A. M.; Wade, J. T.; Grainger, D. C., The molecular basis for control of ETEC enterotoxin expression in response to environment and host. PLoS Pathog. 2015, 11, (1), e1004605. 7. Joffré, E.; von Mentzer, A.; Abd El Ghany, M.; Oezguen, N.; Savidge, T.; Dougan, G.; Svennerholm, A.-M.; Sjöling, Å., Allele variants of enterotoxigenic Escherichia coli heat-labile toxin are globally transmitted and associated with colonization factors. J. Bacteriol. 2015, 197, (2), 392-403. 8. Lasaro, M. A.; Rodrigues, J. F.; Mathias-Santos, C.; Guth, B. E. C.; Balan, A.; SbrogioAlmeida, M. E.; Ferreira, L. C. S., Genetic diversity of heat-labile toxin expressed by enterotoxigenic Escherichia coli strains isolated from humans. J. Bacteriol. 2008, 190, (7), 24002410. 9. Joffré, E.; von Mentzer, A.; Svennerholm, A.-M.; Sjöling, Å., Identification of new heatstable (STa) enterotoxin allele variants produced by human enterotoxigenic Escherichia coli (ETEC). Int. J. Med. Microbiol. 2016, 306, (7), 586-594. 10. Turner, S. M.; Chaudhuri, R. R.; Jiang, Z.-D.; DuPont, H.; Gyles, C.; Penn, C. W.; Pallen, M. J.; Henderson, I. R., Phylogenetic comparisons reveal multiple acquisitions of the toxin genes by enterotoxigenic Escherichia coli strains of different evolutionary lineages. J. Clin. Microbiol. 2006, 44, (12), 4528-4536. 11. Sahl, J. W.; Steinsland, H.; Redman, J. C.; Angiuoli, S. V.; Nataro, J. P.; Sommerfelt, H.; Rasko, D. A., A comparative genomic analysis of diverse clonal types of enterotoxigenic Escherichia coli reveals pathovar-specific conservation. Infect. Immun. 2011, 79, (2), 950-960. 12. Wolf, M. K., Occurrence, distribution, and associations of O and H serogroups, colonization factor antigens, and toxins of enterotoxigenic Escherichia coli. Clin. Microbiol. Rev. 1997, 10, (4), 569-584. 13. Svennerholm, A.-M.; Lundgren, A., Recent progress toward an enterotoxigenic Escherichia coli vaccine. Expert Rev. Vaccines 2012, 11, (4), 495-507.

30 Environment ACS Paragon Plus

Page 31 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

14. von Mentzer, A.; Connor, T. R.; Wieler, L. H.; Semmler, T.; Iguchi, A.; Thomson, N. R.; Rasko, D. A.; Joffre, E.; Corander, J.; Pickard, D.; Wiklund, G.; Svennerholm, A.-M.; Sjoling, A.; Dougan, G., Identification of enterotoxigenic Escherichia coli (ETEC) clades with long-term global distribution. Nat. Genet. 2014, 46, (12), 1321-1326. 15. Steinsland, H.; Lacher, D. W.; Sommerfelt, H.; Whittam, T. S., Ancestral lineages of human enterotoxigenic Escherichia coli. J. Clin. Microbiol. 2010, 48, (8), 2916-2924. 16. Nada, R. A.; Shaheen, H. I.; Khalil, S. B.; Mansour, A.; El-Sayed, N.; Touni, I.; Weiner, M.; Armstrong, A. W.; Klena, J. D., Discovery and phylogenetic analysis of novel members of class b enterotoxigenic Escherichia coli adhesive fimbriae. J. Clin. Microbiol. 2011, 49, (4), 1403-1410. 17. Monk, J. M.; Charusanti, P.; Aziz, R. K.; Lerman, J. A.; Premyodhin, N.; Orth, J. D.; Feist, A. M.; Palsson, B. Ø., Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, (50), 20338-20343. 18. Mc Ginty, S. E.; Rankin, D. J., The evolution of conflict resolution between plasmids and their bacterial hosts. Evolution 2012, 66, (5), 1662-70. 19. Bodero, M. D.; Munson, G. P., Cyclic AMP receptor protein-dependent repression of heat-labile enterotoxin. Infect. Immun. 2009, 77, (2), 791-798. 20. Gonzales, L.; Ali, Z. B.; Nygren, E.; Wang, Z.; Karlsson, S.; Zhu, B.; Quiding-Järbrink, M.; Sjöling, Å., Alkaline pH Is a signal for optimal production and secretion of the heat labile toxin, LT in enterotoxigenic Escherichia coli (ETEC). PLoS One 2013, 8, (9), e74069. 21. Sahl, J. W.; Rasko, D. A., Analysis of global transcriptional profiles of enterotoxigenic Escherichia coli isolate E24377A. Infect. Immun. 2012, 80, (3), 1232-1242. 22. Jeong, H.; Barbe, V.; Lee, C. H.; Vallenet, D.; Yu, D. S.; Choi, S.-H.; Couloux, A.; Lee, S.-W.; Yoon, S. H.; Cattolico, L.; Hur, C.-G.; Park, H.-S.; Ségurens, B.; Kim, S. C.; Oh, T. K.; Lenski, R. E.; Studier, F. W.; Daegelen, P.; Kim, J. F., Genome Sequences of Escherichia coli B strains REL606 and BL21(DE3). J. Mol. Biol. 2009, 394, (4), 644-652. 23. Steinsland, H.; Valentiner-Branth, P.; Gjessing, H. K.; Aaby, P.; Mølbak, K.; Sommerfelt, H., Protection from natural infections with enterotoxigenic Escherichia coli: longitudinal study. Lancet 2003, 362, (9380), 286-291. 24. Crossman, L. C.; Chaudhuri, R. R.; Beatson, S. A.; Wells, T. J.; Desvaux, M.; Cunningham, A. F.; Petty, N. K.; Mahon, V.; Brinkley, C.; Hobman, J. L.; Savarino, S. J.; Turner, S. M.; Pallen, M. J.; Penn, C. W.; Parkhill, J.; Turner, A. K.; Johnson, T. J.; Thomson, N. R.; Smith, S. G. J.; Henderson, I. R., A commensal gone bad: Complete genome sequence of the prototypical enterotoxigenic Escherichia coli strain H10407. J. Bacteriol. 2010, 192, (21), 58225831. 25. Pettersen, V. K.; Steinsland, H.; Wiker, H. G., Improving genome annotation of enterotoxigenic Escherichia coli TW10598 by a label-free quantitative MS/MS approach. Proteomics 2015, 15, (22), 3826-34. 26. Kislyuk, A. O.; Katz, L. S.; Agrawal, S.; Hagen, M. S.; Conley, A. B.; Jayaraman, P.; Nelakuditi, V.; Humphrey, J. C.; Sammons, S. A.; Govil, D.; Mair, R. D.; Tatti, K. M.; Tondella, M. L.; Harcourt, B. H.; Mayer, L. W.; Jordan, I. K., A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 2010, 26, (15), 1819-1826. 27. Carver, T.; Berriman, M.; Tivey, A.; Patel, C.; Böhme, U.; Barrell, B. G.; Parkhill, J.; Rajandream, M.-A., Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics 2008, 24, (23), 2672-2676.

31 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 43

28. Barton, B. M.; Harding, G. P.; Zuccarelli, A. J., A general method for detecting and sizing large plasmids. Anal. Biochem. 1995, 226, (2), 235-240. 29. Clermont, O.; Christenson, J. K.; Denamur, E.; Gordon, D. M., The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups. Environ. Microbiol. Rep. 2013, 5, (1), 58-65. 30. Reid, S. D.; Herbelin, C. J.; Bumbaugh, A. C.; Selander, R. K.; Whittam, T. S., Parallel evolution of virulence in pathogenic Escherichia coli. Nature 2000, 406, (6791), 64-67. 31. Larsen, M. V.; Cosentino, S.; Rasmussen, S.; Friis, C.; Hasman, H.; Marvig, R. L.; Jelsbak, L.; Sicheritz-Pontén, T.; Ussery, D. W.; Aarestrup, F. M.; Lund, O., Multilocus sequence typing of total-genome-sequenced bacteria. J. Clin. Microbiol. 2012, 50, (4), 13551361. 32. Joensen, K. G.; Tetzschner, A. M. M.; Iguchi, A.; Aarestrup, F. M.; Scheutz, F., Rapid and easy in silico serotyping of Escherichia coli isolates by use of whole-genome sequencing data. J. Clin. Microbiol. 2015, 53, (8), 2410-2426. 33. Rodas, C.; Iniguez, V.; Qadri, F.; Wiklund, G.; Svennerholm, A.-M.; Sjöling, Å., Development of multiplex PCR assays for detection of enterotoxigenic Escherichia coli colonization factors and toxins. J. Clin. Microbiol. 2009, 47, (4), 1218-1220. 34. Wiśniewski, J. R.; Mann, M., Consecutive proteolytic digestion in an enzyme reactor increases depth of proteomic and phosphoproteomic analysis. Anal. Chem. 2012, 84, (6), 26312637. 35. Cox, J.; Mann, M., MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008, 26, (12), 1367-72. 36. Cox, J.; Hein, M. Y.; Luber, C. A.; Paron, I.; Nagaraj, N.; Mann, M., Accurate proteomewide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 2014, 13, (9), 2513-2526. 37. Peterson, E. S.; McCue, L. A.; Schrimpe-Rutledge, A. C.; Jensen, J. L.; Walker, H.; Kobold, M. A.; Webb, S. R.; Payne, S. H.; Ansong, C.; Adkins, J. N.; Cannon, W. R.; WebbRobertson, B.-J. M., VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data. BMC Genomics 2012, 13, (1), 1-13. 38. Tyanova, S.; Temu, T.; Sinitcyn, P.; Carlson, A.; Hein, M. Y.; Geiger, T.; Mann, M.; Cox, J., The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 2016, 13, (9), 731-740. 39. Tusher, V. G.; Tibshirani, R.; Chu, G., Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, (9), 5116-5121. 40. Geiger, T.; Velic, A.; Macek, B.; Lundberg, E.; Kampf, C.; Nagaraj, N.; Uhlen, M.; Cox, J.; Mann, M., Initial quantitative proteomic map of 28 mouse tissues using the SILAC mouse. Mol. Cell. Proteomics 2013, 12, (6), 1709-1722. 41. Huang, D. W.; Sherman, B. T.; Lempicki, R. A., Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protocols 2008, 4, (1), 44-57. 42. Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K. P.; Kuhn, M.; Bork, P.; Jensen, L. J.; von Mering, C., STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015, 43, (Database issue), D447-D452. 43. Keseler, I. M.; Mackie, A.; Peralta-Gil, M.; Santos-Zavaleta, A.; Gama-Castro, S.; Bonavides-Martínez, C.; Fulcher, C.; Huerta, A. M.; Kothari, A.; Krummenacker, M.;

32 Environment ACS Paragon Plus

Page 33 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Latendresse, M.; Muñiz-Rascado, L.; Ong, Q.; Paley, S.; Schröder, I.; Shearer, A. G.; Subhraveti, P.; Travers, M.; Weerasinghe, D.; Weiss, V.; Collado-Vides, J.; Gunsalus, R. P.; Paulsen, I.; Karp, P. D., EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 2013, 41, (Database issue), D605-D612. 44. Magrane, M.; Consortium, U., UniProt Knowledgebase: a hub of integrated protein data. Database 2011, 2011, bar009. 45. Yu, N. Y.; Wagner, J. R.; Laird, M. R.; Melli, G.; Rey, S.; Lo, R.; Dao, P.; Sahinalp, S. C.; Ester, M.; Foster, L. J.; Brinkman, F. S. L., PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 2010, 26, (13), 1608-1615. 46. Vizcaíno, J. A.; Côté, R. G.; Csordas, A.; Dianes, J. A.; Fabregat, A.; Foster, J. M.; Griss, J.; Alpi, E.; Birim, M.; Contell, J.; O’Kelly, G.; Schoenegger, A.; Ovelleiro, D.; Pérez-Riverol, Y.; Reisinger, F.; Ríos, D.; Wang, R.; Hermjakob, H., The Proteomics Identifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 2013, 41, (D1), D1063-D1069. 47. Smith, K. D.; Andersen-Nissen, E.; Hayashi, F.; Strobe, K.; Bergman, M. A.; Barrett, S. L. R.; Cookson, B. T.; Aderem, A., Toll-like receptor 5 recognizes a conserved site on flagellin required for protofilament formation and bacterial motility. Nat. Immunol. 2003, 4, (12), 12471253. 48. Andreishcheva, E. N.; Vann, W. F., Gene products required for de novo synthesis of polysialic acid in Escherichia coli K1. J. Bacteriol. 2006, 188, (5), 1786-1797. 49. Fleckenstein, J. M.; Roy, K.; Fischer, J. F.; Burkitt, M., Identification of a two-partner secretion locus of enterotoxigenic Escherichia coli. Infect. Immun. 2006, 74, (4), 2245-2258. 50. Kumar, P.; Luo, Q.; Vickers, T. J.; Sheikh, A.; Lewis, W. G.; Fleckenstein, J. M., EatA, an immunogenic protective antigen of enterotoxigenic Escherichia coli, degrades intestinal mucin. Infect. Immun. 2014, 82, (2), 500-508. 51. Roy, K.; Hamilton, D. J.; Munson, G. P.; Fleckenstein, J. M., Outer membrane vesicles induce immune responses to virulence proteins and protect against colonization by enterotoxigenic Escherichia coli. Clin. Vaccine Immunol. 2011, 18, (11), 1803-1808. 52. Pilonieta, M. C.; Bodero, M. D.; Munson, G. P., CfaD-dependent expression of a novel extracytoplasmic protein from enterotoxigenic Escherichia coli. J. Bacteriol. 2007, 189, (14), 5060-5067. 53. Michie, K. A.; Boysen, A.; Low, H. H.; Møller-Jensen, J.; Löwe, J., LeoA, B and C from enterotoxigenic Escherichia coli (ETEC) ae bacterial dynamins. PLoS One 2014, 9, (9), e107211. 54. Ludwig, A.; von Rhein, C.; Bauer, S.; Hüttinger, C.; Goebel, W., Molecular analysis of cytolysin A (ClyA) in pathogenic Escherichia coli strains. J. Bacteriol. 2004, 186, (16), 53115320. 55. Strozen, T. G.; Li, G.; Howard, S. P., YghG (GspS(β)) is a novel pilot protein required for localization of the GspS(β) type II secretion system secretin of enterotoxigenic Escherichia coli. Infect. Immun. 2012, 80, (8), 2608-2622. 56. Luo, Q.; Kumar, P.; Vickers, T. J.; Sheikh, A.; Lewis, W. G.; Rasko, D. A.; Sistrunk, J.; Fleckenstein, J. M., Enterotoxigenic Escherichia coli secretes a highly conserved mucindegrading metalloprotease to effectively engage intestinal epithelial cells. Infect. Immun. 2014, 82, (2), 509-521.

33 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 43

57. Castanié-Cornet, M.-P.; Treffandier, H.; Francez-Charlot, A.; Gutierrez, C.; Cam, K., The glutamate-dependent acid resistance system in Escherichia coli: essential and dual role of the His–Asp phosphorelay RcsCDB/AF. Microbiology 2007, 153, (1), 238-246. 58. Foster, J. W., Escherichia coli acid resistance: tales of an amateur acidophile. Nat. Rev. Microbiol. 2004, 2, (11), 898-907. 59. Luzader, D. H.; Clark, D. E.; Gonyar, L. A.; Kendall, M. M., EutR Is a direct regulator of genes that contribute to metabolism and virulence in enterohemorrhagic Escherichia coli O157:H7. J. Bacteriol. 2013, 195, (21), 4947-4953. 60. Ogasawara, H.; Shinohara, S.; Yamamoto, K.; Ishihama, A., Novel regulation targets of the metal-response BasS–BasR two-component system of Escherichia coli. Microbiology 2012, 158, (6), 1482-1492. 61. Selkrig, J.; Belousoff, M. J.; Headey, S. J.; Heinz, E.; Shiota, T.; Shen, H.-H.; Beckham, S. A.; Bamert, R. S.; Phan, M.-D.; Schembri, M. A.; Wilce, M. C. J.; Scanlon, M. J.; Strugnell, R. A.; Lithgow, T., Conserved features in TamA enable interaction with TamB to drive the activity of the translocation and assembly module. Sci. Rep. 2015, 5, 12905. 62. Deckers, D.; Masschalck, B.; Aertsen, A.; Callewaert, L.; Van Tiggelen, C. G. M.; Atanassova, M.; Michiels, C. W., Periplasmic lysozyme inhibitor contributes to lysozyme resistance in Escherichia coli. Cell. Mol. Life Sci. 2004, 61, (10), 1229-1237. 63. Okamoto, K.; Yamanaka, H.; Takeji, M.; Fujii, Y., Region of heat-stable enterotoxin II of Escherichia coli Involved in translocation across the outer membrane. Microbiol. Immunol. 2001, 45, (5), 349-355. 64. Wülfing, C.; Rappuoli, R., Efficient production of heat-labile enterotoxin mutant proteins by overexpression of dsbA in a degP-deficient Escherichia coli strain. Arch. Microbiol. 1997, 167, (5), 280-283. 65. Haines, S.; Arnaud-Barbe, N.; Poncet, D.; Reverchon, S.; Wawrzyniak, J.; Nasser, W.; Renauld-Mongénie, G., IscR regulates synthesis of colonization factor antigen I fimbriae in response to iron starvation in enterotoxigenic Escherichia coli. J. Bacteriol. 2015, 197, (18), 2896-2907. 66. Loiseau, L.; Ollagnier-de-Choudens, S.; Nachin, L.; Fontecave, M.; Barras, F., Biogenesis of Fe-S cluster by the bacterial Suf system: SufS and SufE form a new type of cysteine desulfurase. J. Biol. Chem. 2003, 278, (40), 38352-38359. 67. Atuma, C.; Strugala, V.; Allen, A.; Holm, L., The adherent gastrointestinal mucus gel layer: thickness and physical state in vivo. Am. J. Physiol. Gastrointest. Liver Physiol. 2001, 280, (5), G922-G929. 68. Roy, K.; Hilliard, G. M.; Hamilton, D. J.; Luo, J.; Ostmann, M. M.; Fleckenstein, J. M., Enterotoxigenic Escherichia coli EtpA mediates adhesion between flagella and host cells. Nature 2009, 457, (7229), 594-598. 69. Cress, B. F.; Englaender, J. A.; He, W.; Kasper, D.; Linhardt, R. J.; Koffas, M. A. G., Masquerading microbial pathogens: capsular polysaccharides mimic host-tissue molecules. FEMS Microbiol. Rev. 2014, 38, (4), 660-697. 70. Echeverria, P.; Seriwatana, J.; Taylor, D. N.; Changchawalit, S.; Smyth, C. J.; Twohig, J.; Rowe, B., Plasmids coding for colonization factor antigens I and II, heat-labile enterotoxin, and heat-stable enterotoxin A2 in Escherichia coli. Infect. Immun. 1986, 51, (2), 626-630. 71. Tobias, J.; Von Mentzer, A.; Loayza Frykberg, P.; Aslett, M.; Page, A. J.; Sjöling, Å.; Svennerholm, A.-M., Stability of the encoding plasmids and surface expression of CS6 differs in

34 Environment ACS Paragon Plus

Page 35 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

enterotoxigenic Escherichia coli (ETEC) encoding different heat-stable (ST) enterotoxins (STh and STp). PLoS One 2016, 11, (4), e0152899. 72. Srimanote, P.; Paton, A. W.; Paton, J. C., Characterization of a novel type IV pilus locus encoded on the large plasmid of locus of enterocyte effacement-negative shiga-toxigenic Escherichia coli strains that are virulent for humans. Infect. Immun. 2002, 70, (6), 3094-3100. 73. Dudley, E. G.; Abe, C.; Ghigo, J.-M.; Latour-Lambert, P.; Hormazabal, J. C.; Nataro, J. P., An IncI1 Plasmid contributes to the adherence of the atypical enteroaggregative Escherichia coli strain C1096 to cultured cells and abiotic surfaces. Infect. Immun. 2006, 74, (4), 2102-2114. 74. Gophna, U.; Parket, A.; Hacker, J.; Ron, E. Z., A novel ColV plasmid encoding type IV pili. Microbiology 2003, 149, (1), 177-184. 75. Khan, A. U.; Di Mascio, P.; Medeiros, M. H.; Wilson, T., Spermine and spermidine protection of plasmid DNA against single-strand breaks induced by singlet oxygen. Proc. Natl. Acad. Sci. U. S. A. 1992, 89, (23), 11428-11430. 76. Pettersen, V. K.; Mosevoll, K. A.; Lindemann, P. C.; Wiker, H. G., Coordination of metabolism and virulence factors expression of extraintestinal pathogenic Escherichia coli purified from blood cultures of patients with sepsis. Mol. Cell. Proteomics 2016, 15, (9), 28902907. 77. Patra, T.; Koley, H.; Ramamurthy, T.; Ghose, A. C.; Nandy, R. K., The Entner-Doudoroff Pathway Is Obligatory for Gluconate Utilization and Contributes to the Pathogenicity of Vibrio cholerae. J. Bacteriol. 2012, 194, (13), 3377-3385. 78. Pradel, E.; LemaÓtre, N.; Merchez, M.; Ricard, I.; Reboul, A.; Dewitte, A.; Sebbane, F., New Insights into how Yersinia pestis adapts to its mammalian host during bubonic plague. PLoS Pathog. 2014, 10, (3), e1004029. 79. Breazeale, S. D.; Ribeiro, A. A.; Raetz, C. R. H., Oxidative decarboxylation of UDPglucuronic acid in extracts of polymyxin-resistant Escherichia coli: Origin of lipid A species modified with 4-amino-4-deoxy-L-arabinose. J. Biol. Chem. 2002, 277, (4), 2886-2896. 80. Tomasinsig, L.; Scocchi, M.; Mettulio, R.; Zanetti, M., Genome-wide transcriptional profiling of the Escherichia coli response to a proline-rich antimicrobial peptide. Antimicrob. Agents Chemother. 2004, 48, (9), 3260-3267.

35 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 43

List of figure legends Figure 1. Distribution of detected ETEC proteins within specific subsets. Venn diagram shows how many proteins were shared between all plasmid-encoded proteins, virulence determinants, strain-specific proteins and detected protein variants (isoforms). Figure 2. Relative amounts of selected chromosome- and plasmid-encoded proteins. Median of log2 LFQ intensities across minimum two replicates is shown, and grey fields indicate that the protein was either not detected or not quantified. (a) A selection of 8 isoforms that were represented by 19 proteins and detected by unique peptides in one or several of 7 ETEC strains and/or in E. coli strain BL21(DE3). All of these isoforms had >2 log2 LFQ intensities difference between the protein variants. The proteins include: tetracycline repressor (TetR), plasmid stable inheritance proteins A and B (StbA and StbB), phosphomannomutase (ManB), 3-oxoacyl-ACP reductase (FabG), 3-oxoacyl-ACP synthase, acyl carrier protein (AcpP) and high-affinity choline transporter (BetT). (b) A selection of 34 proteins associated with ETEC virulence and described by LFQ intensities. The detected proteins needed to produce different ETEC colonization factors (CFs) were: colonization factor antigen I (CFA/I; CfaABCE), coli surface antigen 2 (CS2; CotAC), CS3 (CstA, CS3-1,2), CS6 (CssAB) and CS21 (LngA-I). Other detected virulence factors were the heat labile toxin (LT; EltAB), the immunogenic serine protease EatA, EtpBC proteins associated with the flagella tip adhesin EtpA, the GTPase LeoA, TibA invasin and the type IV pilus (PilLNOQSV). Phylogenetic relationships of the strains (MLST and phylogroup) are described in Table 1 and illustrated in Figure S-1. Figure 3. Relative amounts of selected differentially produced proteins in 7 ETEC strains and BL21(DE3). Median of log2 LFQ intensities across minimum two replicates is shown, and grey fields indicate the protein was not detected or could not be quantified. BL21(DE3) showed

36 Environment ACS Paragon Plus

Page 37 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

substantially higher levels of proteins participating in arginine biosynthesis, melibiose, galactitol and gluconate metabolism than the ETEC strains, while the ETEC strains had higher amounts of proteins involved in maltose and maltodextrins metabolism, iron acquisition and utilization, virulence and acid resistance. Proteins that displayed significantly different levels from the respective log2 LFQ intensity mean (>2 log2) are marked by stars (see Table S-14 for the associated ANOVA q-values).

37 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Page 38 of 43

Table 1. E. coli strains used in the study Genome Strain Name

Plasmid(s)

MLST

Heat Stable (ST) or

Colonization

Phylo-

GenBank

Heat Labile (LT) Toxin

Factors

group

BioProject

Serotype

(Mb) TW10598

5.24

4

171

O6:H16

ST+ LT+

CS2, CS3, CS21

A

PRJNA59743

TW10722

5.70

2

706

O115:H5

ST+ LT-

CS5, CS6

B1

PRJNA59745

TW10828

5.28

3

127

O114:H49

ST- LT+

CS7

B1

PRJNA59747

TW11681

5.30

3

713

O19:H45

ST+ LT-

CFA/I, CS21

A

PRJNA59749

TW14425

5.21

3

88

O78:H9

ST+ LT-

CS14

A

PRJNA59751

H608c

ND

2

88

ND

ST+ LT-

ND

A

-

H10407

5.53

4

171

O78:H1

ST+ LT+

CFA/I

A

PRJEA42749

BL21(DE3)pLysS

4.57

1

93

O7:H-

-

-

A

PRJNA30681

Abbreviations: MLST – multilocus sequence typing, ND - not determined

38 Environment ACS Paragon Plus

Page 39 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Table 2. Summary of ETEC proteomic analysis Number of CDS (chr/pl)

Detected

Predicted

Mapped by MS/MS

% (chr/pl)

LFQ intensity (chr/pl)

TW10598

4,609/308

2,231/71

48.4/23.1

1,895/59

TW10722

4,958/426

2,211/73

44.6/17.1

1,858/60

TW10828

4,634/236

2,170/43

46.8/18.2

1,793/34

TW11681

4,660/392

2,129/86

45.7/21.9

1,756/65

TW14425

4,749/228

2,183/55

46.0/24.1

1,814/35

H608

ND

2,099/52 a

ND

1,885/39

H10407

4,756/210

2,292/50

48.2/23.8

2,009/39

BL21(DE3)pLysS

4208/2

2,163/0

51/0.0

1,844/0

proteome,

Proteins

described

by

Strain

Abbreviation: CDS – Coding DNA Sequence, chr/pl - chromosome/plasmids, LFQ – Label-Free Quantification, ND – Not Determined a

Number of H608 plasmid proteins was derived from matches to plasmid proteins identified in the rest of ETEC strains.

39 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Table of Conten nts graphic (TOC C) - for TOC on nly

40 Environment ACS Paragon Plus

Page 40 of 43

Page 41 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Journal of Proteome Research

Plasmid origin (197)

Virulence determinants (77) 52

13

27 31

17

7

47

7 196

6

72

Isoforms (118)

Strain-exclusive proteins (319) 27

ACS Paragon Plus Environment

Journal of Proteome Research

E

3ODVPLGV

3ODVPLGV

ORJ /)4 LQWHQVLW\

+LJK /HYHOV

&KURPR VRPHV

/RZ /HYHOV

ACS Paragon Plus Environment

7: + +

7: 7: 7:

7: 7: 7: 7: 7: + + %/

7:

'+

D

&KURPRVRPHV

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 43

Melibiose, Galactitol, Maltose/Maltodextrin GLU Transport Gluconate Metabolism Metabolism Acid Resistance

ARG Biosynthesis

Iron Acquisition and Utilization Virulence

Lipid A Modification

Ethanolamine Metabolism

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

TW10598 TW10722 TW10828 TW11681 TW14425 H608 H10407 BL21(DH3)

Page 43 of 43

Low Levels

ACS log Paragon Plus Environment LFQ intensity

2

TW10598 TW10722 TW10828 TW11681 TW14425 H608 H10407 BL21(DH3)

Journal of Proteome Research

High Levels