Transcriptomic and Neuropeptidomic Analysis of ... - ACS Publications

Orbitrap RAW data were corrected prior to the analysis (precursor mass correction only). Fragment spectra with a peptide score (-10 lgP) equivalent to...
23 downloads 4 Views 5MB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Article

Transcriptomic and Neuropeptidomic Analysis of the Stick Insect, Carausius morosus Sander Liessem, Lapo Ragionieri, Susanne Neupert, Ansgar Buschges, and Reinhard Predel J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00155 • Publication Date (Web): 27 Apr 2018 Downloaded from http://pubs.acs.org on April 27, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Transcriptomic and Neuropeptidomic Analysis of the Stick Insect, Carausius morosus Sander Liessem*a, Lapo Ragionieri a, Susanne Neupert a, Ansgar Büschges a, Reinhard Predel* a a

Department for Biology, Institute for Zoology, University of Cologne, Zülpicher Straße 47b, D-50674 Cologne, Germany

* address for correspondence: Reinhard Predel, Institute for Zoology, University of Cologne, Zülpicher Straße 47b, D-50674 Cologne, Germany, [email protected], Phone: +49-221-4705817

* address for correspondence: Sander Liessem, Institute for Zoology, University of Cologne, Zülpicher Straße 47b, D-50674 Cologne, Germany, [email protected], Phone: +49-221-4703132

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT One of the most thoroughly studied insect species, with respect to locomotion behaviour, is the stick insect Carausius morosus. Although detailed information exists on premotor networks controlling walking, surprisingly little is known about neuropeptides, which are certainly involved in motor activity generation and modulation. So far, only few neuropeptides were identified from C. morosus or related stick insects. We performed a transcriptome analysis of the central nervous system to assemble and identify 65 neuropeptide and protein hormone precursors of C. morosus, including five novel putative neuropeptide precursors without clear homology to known neuropeptide precursors of other insects (Carausius neuropeptide-like precursor 1, HanSolin, PK-like1, PK-like2, RFLamide). Using Q Exactive Orbitrap and MALDI-TOF mass spectrometry, 277 peptides including 153 likely bioactive mature neuropeptides were confirmed. Peptidomics yielded a complete coverage for many of the neuropeptide propeptides and confirmed a surprisingly high number of heterozygous sequences. Few neuropeptide precursors commonly occurring in insects, including those of insect kinins and sulfakinins, could neither be found in the transcriptome data nor did peptidomics support their presence. The results of our study represent one of the most comprehensive peptidomic analyses on insects and provide the necessary input for subsequent experiments revealing neuropeptide function in greater detail.

KEYWORDS: Carausius morosus, stick insects, neuropeptide precursors, neurohormones, peptidomics, mass spectrometry, transcriptomics, nervous system

1 Environment ACS Paragon Plus

Page 2 of 41

Page 3 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1. INTRODUCTION In the stick insect Carausius morosus, the generation and neural basis of walking has been extensively studied and this insect has become a model organism for studying control of locomotion1, 2. Meanwhile, it is well established that rhythmic motor activity, such as those underlying locomotion, is generated by local central pattern generating networks (CPGs) under the influence of local sensory feedback, which is capable of adapting those outputs to the behavioral requisites3-5. Furthermore, it has been shown that these CPGs are segmentally arranged in the three thoracic ganglia and, hence, that each limb is innervated by its own CPG3. In addition, the morphology and electrophysiological properties of neurons comprising the microcircuitry that drives locomotion have been described in great detail6-10.

It is generally accepted that neuromodulation plays a pivotal role in reconfiguration of neuronal networks and, thereby, extending their working range. In insects, for example the biogenic amine octopamine is known to have general arousal effects and is associated with the initiation and maintenance of locomotor behaviors11,

12

. Furthermore, substances like the muscarinic agonist

pilocarpine have been shown to evoke long-lasting alternating activity in antagonistic motoneuron pools supplying the individual leg joints3, 13.

However, very little is known about insect neuropeptides, which are involved in the generation and modulation of motor activity. This lack of information is particularly astonishing when one considers that neuropeptides represent the most diverse class of intercellular messenger molecules used for neurochemical communication of the nervous system. As such, neuropeptides play key roles in a wide range of physiological processes such as reproduction, regulation of energy balance, circadian rhythm and motility14, 15. Only little is reported about insect neuropeptides associated with locomotion16-18 which is in contrast to, for example, the peptidergic regulation of gut motility via the stomatogastric nervous system of crustaceans19-22. Examples are the adipokinetic hormones (AKHs), which, in addition to their metabolic functions, act directly on the central nervous system (CNS) through the release of octopamine from the thoracic dorsal unpaired median neurons (DUM)23. The latter in turn then stimulates locomotor activity.

In order to investigate how network motor patterns are being activated, inhibited, or otherwise modified via neuropeptides in C. morosus, it is first necessary to characterize the peptidome of this species, upon which functional studies are based. So far, only 15 neuropeptides have been described from C. morosus24-28. Therefore, the purpose of this study was to identify precursor sequences and mature neuropeptides from the nervous system and neurohemal release sites of C. morosus, using a combination of next generation sequencing (NGS, transcriptome) and different mass spectrometry (MS) approaches. Our transcriptomic analysis identified the precursors of 53 neuropeptide and protein

2 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

hormone genes but failed to detect precursors for sulfakinins and kinins which are otherwise widely distributed in insects. Differential transcription is suggested for seven of the genes which increased the number of identified neuropeptide and protein hormone precursors of C. morosus to 60. Subsequent peptidomics focused on 42 neuropeptide and neuropeptide-like precursors and neuropeptides from 38 of these precursors were confirmed. Not less than six of these precursor sequences were found with allelic differences which is surprizing since the cultures of C. morosus are presumed to represent a clonal line. In addition to the identification of neuropeptides with clear homologies to those of other insects, we used information from de novo peptide sequencing to obtain five precursor sequences which contain 16 novel neuropeptide candidates. Altogether, these attempts yielded a list of 153 mature and likely also bioactive neuropeptides and many additional peptides; the latter being extended or truncated versions of neuropeptides or further precursor peptides (PP). The results of our study represent one of the most comprehensive peptidomic analyses of an insect and provide the necessary information for subsequent experiments revealing the effects of neuropeptides on regulation of locomotor activity.

2. EXPERIMENTAL SECTION 2.1 Animals If not otherwise specified, adult females of C. morosus were used from a colony maintained at the University of Cologne. Animals were held in cages at constant temperature of 28°C under a 12:12 hour dark/light cycle. They were fed ad libitum with dewberry leaves and had free access to water. 2.2 Tissue preparation Prior to dissection of the nervous system, animals were cooled down in a fridge (4°C) for 10 min and then fixed with Protemp™ II (composite for temporary crowns and bridges, 3M, MN, U.S.A) on a dissection plate. Throughout dissection, the preparation was covered with chilled artificial physiological saline (approximately 6°C; composition according to Weidler and Diecke29; pH 7,2). Adjacent tissues such as fat body, trachea and also pigmented layer of optic lobes were removed. 2.3 Transcriptome Sequencing Prior to transcriptome sequencing, the CNS of one adult female was dissected as described above and transferred into a 1.5 ml reaction tube filled with RNAlater (Thermo Fisher Scientific, Waltham, MA, USA) to minimize RNA degradation. Subsequent RNA isolation, cDNA preparation and Illumina Next Generation Sequencing were carried out by BGI (Beijing Genomics Institute, New Territories, HongKong, China) as described in a previous study30. Each library was sequenced for 100 bp pairedends, which were stored as FASTQ files. The resulting RAW reads were filtered, by removing adapter sequences, contamination and low-quality reads (clean reads 49784152/clean bases 4978415200;

3 ACS Paragon Plus Environment

Page 4 of 41

Page 5 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Phred quality score Q20: 99.11/97.92%) and submitted to NCBI (Sequence Read Archives (SRA): SRR4423265 (SUB2017206); BioProject: PRJNA348314, pending for release).

2.4 De novo Assembly of Nucleotide Sequences De novo assembly of RNA-Seq data was conducted by using Trinity (v2.2.0)31 and Bridger (v2014-1201)32. The Trinity transcriptome assembly of the CNS has been submitted to NCBI (Transcriptome Shotgun Assembly (TSA) database: GFAX00000000, SUB2014136, pending for release). Additional assemblies were made from publicly available SRA of C. morosus33 (Accession: PRJNA314295, BioProject: SAMN04531660), which were assembled using Trinity (adult middle midgut: SRR3211828- SRR3211831, appendices of the midgut: SRR3211825- SRR3211827, adult Malpighian tubules: SRR3211821-SRR321182123). These assemblies were used to search for precursor sequences that were missing or incomplete in the transcriptome from CNS.

2.5 Compiling of Precursor Sequences The tBLASTn algorithm from the BLAST+ suite command-line tool (v2.4.0.)34 was mainly used to conduct database searches for C. morosus neuropeptide and protein precursor sequences. As reference queries, sequences of known insect neuropeptide precursors were used. Aligned and assigned nucleotide sequences were translated into protein sequences using the ExPASy translate tool (http://web.expasy.org/translate/; Swiss Institute of Bioinformatics, Switzerland)35. The signal peptide (SP) for each

putative

peptide precursor

sequence was predicted

using SignalP

4.1

36

(http://www.cbs.dtu.dk/services/SignalP/, Technical University of Denmark, Denmark) . If no SP could be predicted or no stop codon was present, precursors were considered as incomplete. In these cases, we attempted to complete the precursors by either using the additional Trinity assemblies, the Bridger assembly from our transcriptomic data or by using the BLAST tool on the RAW transcriptome data. In the latter case, the resulting transcripts were recursively used as an input query until they were considered complete (no further increase in length of the transcript, SP could be predicted at the N-terminus and/or in-frame stop codon was detected).

2.6 MALDI-TOF MS (direct tissue profiling) For neuropeptidomic analyses by means of direct tissue profiling, different parts of the CNS and stomatogastric nervous system as well as pieces of neurohemal organs were dissected separately as described for cockroach samples37 and directly transferred into a drop of distilled water on the sample plate. Immediately after the transfer, the water was removed. Dried tissue samples were covered with 10 mg/ml 2,5-Dihydroxybenzoic acid (DHB, Sigma-Aldrich, Steinheim, Germany; dissolved in 79% H20, 20% acetonitrile (ACN), 1% formic acid (FA); v/v/v) or 10 mg/ml α-cyano-4-hydroxycinnamic

4 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

acid (CHCA, Sigma-Aldrich; dissolved in 60% ACN, 39.5% H20, 0.5% FA; v/v/v) as matrix solution. Depending on the size of the tissue, 0.2 µl – 0.4 µl CHCA or DHB were applied to the sample spots. Only after DHB application, spots were blow-dried. Mass fingerprint spectra (MS1) were acquired in positive ion mode with an ultrafleXtreme TOF/TOF mass spectrometer (Bruker Daltonik, Bremen, Germany). All MS acquisitions were made under manual control in reflector positive mode in a detection range of m/z 600–10,000. Instrument settings were optimized for mass ranges of m/z 600– 4,000 and 3,000–10,000, respectively. Bruker peptide and protein standard kits were used for calibration (mass accuracy of 1.4 ppm). DHB crystals of different size resulted in lower resolution in direct tissue profiling experiments. The data obtained in these experiments were processed with flexAnalysis 3.4 software package (Bruker Daltonik). MS2 experiments were either performed with LIFT™ technology without CID or using an ABI 4800 proteomics analyzer (Applied Biosystems, Framingham, USA; Data Explorer v4.1) in gas-off mode. MS2 fragment spectra were acquired manually. Peptide identities were verified by comparison of theoretical (http://prospector.ucsf.edu) and experimentally obtained fragments.

2.7 Single cell MS: Retrograde backfilling and dissection of cells was performed as described in 38. Briefly, cell bodies in thoracic ganglia were backfilled via the transverse nerves (TN) or lateral nerves with 5% dextrantetramethylrhodamine (MW 3000, anionic, lysine-fixable, Molecular Probes, Eugene, USA) for 2 days at 4°C. Labelled cells were dissected under a stereo fluorescence microscope (SteREO Lumar.V12, Carl Zeiss AG, Goettingen, Germany) and transferred with a glass capillary (Hilgenberg GmbH, Malsfeld, Germany) onto a sample plate for MALDI-TOF MS analysis. Subsequently, residual saline was removed, samples were covered with ~100 nl DHB and blow-dried. All single cell MS spectra were acquired in a detection range of m/z 600–4,000.

2.8 Quadrupole Orbitrap MS with nanoflow HPLC Dissected tissue samples were collected in 0.5 ml safe-lock tubes (Eppendorf, Hamburg, Germany), containing 20 µl peptide extraction buffer (50% MeOH in 1% trifluoroacetic acid (TFA) or 1% FA). Samples were sonicated for 5 min in an ultrasonic bath, centrifuged for 2 min at 4 °C (15,000 rpm), again treated with multiple, contiguous bursts using an ultrasonic homogenizer (BANDELIN SONOPULS HD 200, BANDELIN electronic GmbH & Co. KG, Berlin, Germany), replenished with 20 µl water and finally centrifuged at 15,000 rpm and 4°C for 15 min. The supernatants were transferred into safe-lock tubes (0.5 ml, Eppendorf) each and evaporation of the organic solvent was accelerated using a low speed centrifuge under low pressure vacuum (Hetovac VR-1, Heto Lab Equipment) until the volume of the supernatants was reduced to about 15 µl. Samples were desalted using self-packed Stage Tip C18 (IVA Analysentechnik e. K., Meerbusch, Germany) spin columns39.

5 ACS Paragon Plus Environment

Page 6 of 41

Page 7 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Peptides were then separated on an EASY nanoLC1000 UPLC system (Thermo Fisher Scientific) using in-house packed RPC18-columns 50 cm (fused Silica tube with ID 50µm ± 3µm, OD 150 µm ± 6 µm; Reprosil 1.9µm, pore diameter 60A°; Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) and a binary buffer system (A: 0.1% FA; B: 80% ACN, 0.1% FA). Running conditions were as follows: linear gradient from 2 to 62% B in 110 min, 62 to 75% B in 30 min, and final washing from 75 to 95% B in 6 min (45°C, flow rate 250 nL/min)40. Finally, the gradients were re-equilibrated for 4 min at 5% B. The HPLC was coupled to a Q-Exactive Plus (Thermo Fisher Scientific) mass spectrometer. MS data were acquired in a top 10 data-dependent method dynamically choosing the most abundant peptide ions from the respective survey scans in a mass range of m/z 300−3000 for HCD fragmentation. Full MS1 acquisitions ran with 70,000 resolution, automatic gain control target (AGC target) at 3e6 and maximum injection time at 80 ms. HCD spectra were measured with a resolution of 35,000, AGC target at 1e6, maximum injection time at 120 ms, 28 eV normalized collision energy, and dynamic exclusion set at 25 s. The instrument was run with peptide recognition mode (two to eight charges); singly charged and unassigned precursor ions were excluded. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository41 with the dataset identifier PXD009536. Orbitrap RAW data were analyzed by means of MaxQuant (v1.5.4.1, MPI, Martinsried, Germany)42 and PEAKS (PEAKS Studio 8.0, BSI, ON, Canada)43. For analyses using MaxQuant, only predicted peptides with a minimum amino acid length of five amino acid residues, a maximum molecular weight of 10 kDa and a P-score> 60 were taken into account. The maximum number of posttranslational modifications (PTM; Tyr-sulfation, C-terminal amidation, methylation [E, K, R], disulphide bridge, oxidation [M, W], N‐pyroglutamyl formation [Q, E], phosphorylation [S, T], acetylation [K, Nterminus], mannosylation [W], sulfation, per peptide was set to five; digestion mode was set to none. For analyses with PEAKS, the same set of PTMs was used and peptides were searched against an internal database comprising the C. morosus neuropeptide precursor sequences with a parent error mass tolerance of 10 ppm and fragment mass error tolerance of 0.05 Da. None enzyme mode was selected. The False discovery rate (FDR) was determined by the decoy database search implement in PEAKS 8.0 and set below 1%. To provide the accurate monoisotopic mass of a peptide, Q Exactive Orbitrap RAW data were corrected prior to the analysis (precursor mass correction only). Fragment spectra with a peptide score (-10 lgP) equivalent to a P-value of about 1%, were manually reviewed.

3. RESULTS AND DISCUSSION 3.1 Transcriptomics Clean reads from transcriptome sequencing of CNS tissue were first assembled into contigs using Trinity. Subsequent BLAST searches in this assembly, using precursor sequences of non-pterygote

6 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

hexapods44 and more closely related termites45

Page 8 of 41

mainly containing neuropeptides, and different

pterygote insects (mainly precursors containing the protein hormones and neuropeptide-like sequences) yielded a set of 60 C. morosus precursor sequences (Table 1) that likely originate from 53 different genes. Among the few precursors which resulted from alternative splicing of a specific gene, four precursors represented largely different transcripts with neuropeptides that did not show sequence congruities (OK-A/B, orcokinin A/B; calcitonin A/B). Alternative splicing of ok and calcitonin genes seems to be typical in other insects as well45, 46. Other transcripts belonged to short and long forms of precursors that otherwise contained similar or even identical neuropeptides (MIP, myoinhibitory peptide; NPF-1, neuropeptide F-1; NVP, NVP-containing PP). The obtained precursor sequences include the complete core set of neuropeptides which is known from other insects44, 47. Nine precursor sequences turned out to be incomplete (see Table 1). Three of these sequences could be completed (calcitonin B) or at least complemented (ACP, adipokinetic hormone/corazonin-related peptide; OK-B) by preparing and using Trinity assemblies of published SRAs from C. morosus. The latter assemblies also revealed the precursor sequence of ecdysis-triggering hormone (ETH, see Table 1) which was, as expected, not present in the transcriptome from the CNS. Mature products from 38 of these neuropeptide and neuropeptide-like sequences were detected (see Supporting Information S1) and are discussed below in more detail. The remaining 18 precursors contained protein hormones (Supporting Information S2) were also found based on similarities with known precursor sequences of other insects. Potential allelic differences in the precursor sequences of several neuropeptide precursors were observed in both, the Trinity and Bridger assemblies. These potential heterozygous states were confined to precursors (AstA, allatostatin A; CAPA; FMRF, FMRFamides; PK, pyrokinin; RYa, RYamide; TKRP, tachykinin-related peptides) containing multiple copies (paracopies) of neuropeptides. For most of these sequences, peptidomics supported the existence of the predicted alleles and ruled out sequencing or assembly errors (see below). The large number of allelic differences in C. morosus was quite unexpected since all cultures of C. morosus are presumed to represent a single clonal line48. Whereas heterozygous states of a single neuropeptide gene can usually be distinguished from duplicated genes by analysing several individuals per population49,

50

, this

clarification is not possible for parthenogenetic insects such as C. morosus.

Four insect neuropeptide precursors (EFLa, EFLamide; inotocin; kinin; sulfakinin), which are described from other insects, could not be detected in any transcriptome assembly of C. morosus and sequences of the corresponding mature peptides from other insects did also not reveal any sequence hits in the transcript raw data. A gene coding for multiple EFLa was first described from the spider mite Tetranychus51 and is widely distributed in non-pterygote hexapods44 which represent the sistergroup of winged insects. Recently, a gene coding a single EFLa was also described from the bed bug Cimex lectularius40. However, mature EFLa have not been biochemically characterized in these arthropods so far and therefore it remains unclear whether the gene, if present, is expressed in the

7 ACS Paragon Plus Environment

Page 9 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

CNS. Inotocin52 is more widely distributed in insects but is absent in several lineages53. Locusts and termites, which are closely related with stick insects, possess inotocin genes58. These insects also have kinin and sulfakinin genes and the respective bioactive peptides have been identified in the CNS of locusts54, 55. In related cockroaches, sulfakinins are even integral components of the neuroendocrine system and enriched in the retrocerebral complex (RCC)56. The presumed role of these peptides in feeding and digestion in polyneopteran insects cannot apply to C. morosus if the peptides are not present at all. Comparative studies will show, if the potential loss of these peptidergic systems is typical of Phasmatodea in general.

3.2 Peptidomics 3.2.1 Database construction For the interpretation of MALDI-TOF mass spectra and Q Exactive Orbitrap data analyzed by MaxQuant, we compiled a list of putative mature neuropeptides based on predicted cleavage sites in the precursor sequences shown in Supporting Information S1. For the analysis of Q Exactive data by means of PEAKS, we directly used the above-mentioned precursor sequences. To consider potential neuropeptides with sequences that were identified by MS2 experiments but were not found in the assembled precursors, we developed an algorithm that used a reverse approach to identify the respective precursors (see Fig. 1, implemented in MATLAB, R2016a, Natick, MA, USA). For this, the scaffolds from the transcriptome data were converted into the six open reading frames (ORFs) each and subsequently filtered for putative precursor sequences. Each of the six sequences were examined for shorter sequences starting with a methionine and ending with a stop codon or the end of the respective ORF. Sequences that are shorter than 50 amino acids and identical sequences were removed. The resulting list of about 200,000 potential precursor sequences was used as a second database in PEAKS. This way, all precursor identified via conventional BLAST with the exception of natalisin, tryptopyrokinin (tPK), ACP and OK-B, because their precursor sequences did not start with a methionine, were included in the database as well as putative novel precursors. The latter can later be identified based on their expressed peptide products (see below).

Figure 1. Overview of the combined peptidomic/transcriptomic approach used to search for potentially novel neuropeptide precursor. First, nucleotide sequences of the transcriptome assembly were converted into all six open reading frames (ORF) which are then surveyed for shorter sequences (starting with a methionine and ending on a stop codon or the end of the sequence). Sequences shorter than 50 amino acids and identical sequences were removed (crossed-out arrow). The resulting list of putative precursor sequences (FASTA-file) was used as a database in our peptidomic analysis using the PEAKS software. In the final step, novel precursors were identified by reference to their expressed peptide products observed in the Q Exactive Orbitrap data.

8 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3.2.2 Q Exactive Orbitrap MS To get an overview about processed neuropeptides of C. morosus, tissue extracts of brain, gnathal ganglion (GNG), ventral nerve cord, frontal ganglia (FG), and corpora cardiaca (CC) as part of the RCC were each analyzed by means of Q Exactive Orbitrap MS. These experiments confirmed the presence of mature products from the majority of neuropeptide genes (Supporting Information S3). For most of the precursors containing a single neuropeptide (AKH; ACP; AstCC, allatostatin CC; AstCCC, allatostatin CCC; AT, allatotropin; CT-DH, calcitonin-like diuretic hormone; CRF-DH, corticotropin releasing factor-like diuretic hormone; CCHa-1, CCHamide-1; CCHa-2, CCHamide-2; CNM, CNMamide; corazonin; IMF, IMFamide; myosuppressin; natalisin; NPF-1; NPF-2; PDF, pigment dispersing factor; sNPF, short neuropeptide-F; SIFa, SIFamide) and other precursors containing multiple neuropeptides (CAPA, MIP, OK-A, PK, RYa) all predicted and likely also bioactive neuropeptides could be identified. Only individual sequences were missing from precursors containing multiple copy peptides (AstA; FMRF; TKRP; tPK). The outcome of these Q Exactive experiments showed an exceptionally complete coverage and the confirmation of predicted neuropeptides was accompanied by the detection of a large number of PPs (Supporting Information S1, S3). Using the precursors with similarity to known precursors from other insects as a database, 121 mature neuropeptides from 36 neuropeptide precursors and 38 additional PPs were confirmed by the Q Exactive Orbitrap data. These numbers do not include truncated and/or extended forms which are also listed in Supporting Information S3. Novel peptides identified in Q Exactive experiments by using the alternative database are discussed in the paragraph on novel insect neuropeptides (see below). The neuropeptides crustacean cardioactive peptide (CCAP) and Elevenin were not confirmed by Q Exactive Orbitrap MS but identification of adjoining PPs indicated at least the processing of the respective precursors. We did not detect any peptides from the precursors of calcitonin A/B, ETH, NPF-1b, OK-B, proctolin, and trissin. Moreover, screening of the MS2 dataset with sequences of insect sulfakinins and kinins, whose precursors were not detected in our transcriptome data, did not yield positive hits.

3.2.3 Direct tissue and cell profiling by MALDI-TOF MS Direct tissue profiling of neuropeptides by MALDI-TOF MS is particularly suitable for a fast assessment of abundant neuropeptides in a specific tissue and, with it, provides first hints for a functional relevance of these peptides in that tissue. This statement is not inconsistent with sequencedependent ionization properties of neuropeptides but profound knowledge about these ionization properties generally supports the interpretation of MALDI-TOF mass spectra37. Information obtained in these spectra include distinct differences in the relative abundance of neuropeptides in different tissues and the ratio of predicted neuropeptides and their extended/truncated forms to determine cleavage efficacy and, hence, the functionally more significant sequences (mature neuropeptides). In

9 ACS Paragon Plus Environment

Page 10 of 41

Page 11 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

our study, we mainly analyzed brain tissues, FG as part of the stomatogastric nervous system, and hormone release sites of the CNS (Fig. 2).

Figure 2: Simplified overview of the nervous system of Carausius morosus, showing tissues and cells that were analyzed in this study. A, abdominal ganglion; AL, antennal lobe; aPSO, abdominal perisympathetic organ; CA, corpora allata; CC, corpora cardiaca; FG, frontal ganglion; GNG, gnathal ganglion; PDM, posterior dorso-median cell; PLC, postero-lateral cells; RCC, retrocerebral complex; tPSO, thoracic perisympathetic organ; TN, transverse nerve.

Figure 3: MALDI-TOF MS spectra (direct tissue profiling) obtained from a preparation of an antennal lobe tissue (AL, 100 µm). All marked ion signals represent single charged peptides ([M + H]+) from multiple precursors. (A) m/z 8501300. (B) m/z 1300- 3050 (*: m/z 1331.8 sNPF; **: m/z 1633.9 NPLP-8). Superscript letters are for allelic differences. AstA, allatostatin A; MIP, myoinhibitory peptide; TKRP, tachykinin-related peptides; NVP, NVP-containing PP; ACP, adipokinetic hormone/corazonin-related peptide; AT, allatotropin; NPLP, neuropeptide-like precursor 1; OK-A, orcokinin A; RYa, RYamide.

As expected, a large number of ion signals of neuropeptides could be detected in mass spectra from preparations of brain tissue. This is illustrated here with a mass spectrum from antennal lobe (AL) tissue (Fig. 3), which indicated the presence of neuropeptides from at least 11 genes (astA; at; mip; myosuppressin; nplp1, neuropeptide-like precursor 1; nvp; ok-A; rya; snpf; tkrp; acp). ALs represent the first synaptic relay of olfactory information57 and the presence of specific neuropeptides in that area might suggest a role in fine tuning of information transfer to higher brain centers58. The large

10 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

number of neuropeptide ion signals in CNS preparations generally suppresses detection of less abundant peptides. Therefore, we also used randomly collected single neurosecretory cells of retrogradely labelled thoracic ganglia to search for neuropeptides which were not present or barely detectable in mass spectra from tissue samples of the CNS. This strategy enabled the detection of several neuropeptides which were not identified by means of Q Exactive Orbitrap MS; e. g. elevenin (Fig. 4) or CCAP. In addition, we confirmed mature ETH peptides which were not found in the nervous system but in preparations of tracheal trunks in the abdomen of nymphal stages of C. morosus (not shown).

Figure 4: MALDI-TOF MS spectrum obtained from a randomly selected single neuron from the posterior dorsomedian region (PDM) of the metathoracic ganglion; showing products of the OK-A, myosuppressin and elevenin precursors (* m/z 1611.8, OK-A-PP-21-13). PP, precursor peptide; OK-A, orcokinin A; scale bar, 50 µm; n = 3.

The major neurohemal release sites in insects are the CC which store neurohormones produced in neurosecretory cells of the brain and GNG, and the segmentally-arranged perisympathetic organs (PSOs) which store hormones produced in cells of ganglia from the ventral nerve cord. Mass spectra from CC are discussed in the paragraph on novel insect neuropeptides (see below). With regard to the peptidome, PSOs can be divided into two types, thoracic and abdominal PSOs, that store completely different neuropeptides each37. Insect PSOs were first described from C. morosus59 but although the neurohormonal function of these tissues was postulated immediately, the corresponding hormones have never been identified in this species.

11 ACS Paragon Plus Environment

Page 12 of 41

Page 13 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 5: MALDI-TOF MS spectra (direct tissue profiling) obtained from a preparation of (A) a single abdominal PSO and (B) the adjacent transverse nerves (TN) projecting into the periphery (*PVK-1b; **ext. PVK-1b). Superscript letters are for allelic differences. Alleles of PVK-1 were first identified by means of MS2 analyses and subsequently found in transcriptome RAW data as well. PVK, periviscerokinin; tPK, tryptopyrokinin; MIP, myoinhibitory peptide; AT, allatotropin. PSO, perisympathetic organ.

As it was already known from other insects60, abdominal PSOs of C. morosus accumulate the products of the capa gene (Fig. 5). Obviously, abdominal PSO tissue is always crossed by axons transporting other neuropeptides via the TN into the periphery. Whereas mass spectra from tissue samples at the junction of median nerve and TN (‘real’ PSO) showed abundant signals of CAPA-peptides (Fig. 5A), less abundant signals of peptides from other precursors (tryptoPK, MIP, AT) became more prominent in preparations of distal TNs (Fig. 5B). Remarkably enough, we identified in all mass spectra from abdominal PSO preparations a CAPA-PVK (HVSGLIPFPRVa), which was not present in the determined CAPA precursor. Therefore, we first suggested the presence of an alternative transcript as it was already reported for another polyneopteran insect, the American cockroach61. However, BLAST searches in transcript RAW data revealed a longer sequence identical with the sequence in the already assembled CAPA-precursor; except for a single amino acid substitution. This finding indicates allelic difference (see Supporting Information S1) rather than different transcripts.

12 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6: MALDI-TOF mass spectrum obtained from a single postero-lateral neuron (PLC) of the metathoracic ganglion, which was visualized via backfilling of a transverse nerve (TN). The spectrum exclusively contains ion signals of extended FMRFamides (*: m/z 896.6 FMRF-8 (Q); **: m/z 1129.6 FMRF-PP-31-11). Superscript letters are for allelic differences. PP, precursor peptide. Most abundant FMRF precursor products are depicted in blue; scale bar, 50 µm; n = 16.

Mass spectra from tissue samples of thoracic PSOs of C. morosus contained almost exclusively ion signals of extended FMRFamides (not shown). This finding is in agreement with the condition in other insects where extended FMRFamides were always the most prominent or even only neuropeptides in thoracic PSO62. Since we wanted to analyse in detail the relative abundances of paracopies from the extended FMRFamide precursor, we backfilled, via the TN, postero-lateral cells (PLC) in thoracic ganglia which provide the thoracic PSOs with neurosecretions. Resulting single cell mass spectra (Fig. 6) provided comprehensive insights into the processing of a neuropeptide precursor with multiple paracopies and this information is explained here in more detail. First, single cell mass spectra in the given mass range exclusively shows ion signals of products of the extended FMRFamide precursor and verified the processing of the two predicted alleles (see Supporting Information S1). Secondly, all predicted products of this precursor were observed; with the exception of PP 5, because its ion mass was beyond the analysed mass range (see Supporting Information S3). The relative abundance of the different extended FMRFamides differed considerably between each other but the pattern remained near-constant in mass spectra from several cells (FMRF-3 > FMRF-7 > FMRF-2 > FMRF-4 > FMRF5 >FMRF-6 > FMRF-1 > FMRF-8, [pQ], see Fig. 6, depicted in blue). As shown in Fig. 6, several of the peptides with lower ion intensity are either subdivided into peptides with different N-terminal PTMs (pQ/Q in FMRF-8), partial use of internal cleavage sites (e.g. in FMRF-1) or are not yet completely processed (e.g. FMRF-5+6). An unusual cleavage site was observed for FMRF-7. Whereas a potential internal monobasic Arg cleavage site (IFG-RGK) was not used at all, this peptide was alternatively cleaved with considerable efficacy between GKL-TDN. Confirmation of the length of the signal peptide was achieved indirectly by the identification of the PP N-terminal of the signal peptide. However, we also found a less prominent ion signal indicating alternative cleavage of the signal peptide which resulted in three amino acids longer PP (Fig. 6).

13 ACS Paragon Plus Environment

Page 14 of 41

Page 15 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 7: MALDI-TOF MS spectra (direct tissue profiling) obtained from a preparation of a frontal ganglion (FG). All marked ion signals represent single charged peptide products ([M + H]+) from multiple precursors. Major ion signals could be confirmed by MS2 experiments. Furthermore, four mature products of the novel CNP1 precursor were confirmed in the FG by MS2 experiments. (A) m/z 750-1350 (*: m/z 1099.5 AstA-8; **: m/z 1133.5 NVP-41-11; ***: m/z 1273.6 RYa-21120 ; ****: m/z 1289.6 NVP-41-12), (B) m/z 1350-1700 (*: m/z 1358.7 NPLP-10; **: m/z 1357.7 MIPa-11), (C) m/z 1700-4000 (*: m/z 1887.0 NVP-81-18 (pQ); **: m/z 1924.9; ***: m/z 2171.9). Superscript letters are for allelic differences. AstA, allatostatin A; AstCCC, allatostatin CCC; MIP, myoinhibitory peptide; TKRP, tachykinin-related peptides; NVP, NVPcontaining PP; AT, allatotropin; NPLP, neuropeptide-like precursor 1; OK-A, orcokinin A; PP, precursor peptide; RYa, RYamide; sNPF, short neuropeptide F; CNP1, Carausius neuropeptide-like precursor 1.

Different lengths of signal peptides were also suggested, by mass spectrometry, for several other precursors; e. g. CAPA, NPF-2, OK-A, and SIFa. For OK-A, we found evidence that the predicted signal peptide differed completely from the signal peptide used (5 amino acids shorter).

Mass spectra from FG preparations revealed a large number of neuropeptides which potentially regulate foregut activity or modulate feeding behaviour. In contrast to preparations from single cells, neurohemal organs, and also AL tissue, which generally yielded highly reproducible mass spectra, relative abundance of neuropeptides in mass spectra from FG was rather inconsistent. Among the most abundant neuropeptides in MALDI TOF mass spectra from FG preparations were AstA, AT, CT-DH, MIP, NPLP1, NVP, and particularly sNPF (Fig. 7). Other peptides enriched in the FG were products of NPLP1, OK-A, RYa, and CNM precursors whereas TKRPs, CCAP and AstCCC were generally

14 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

present but showed lower signal intensity (Fig. 7). Notably, proctolin and myosuppressin, which each represent the most myostimulatory and myoinhibitory neuropeptides in foregut assays in related insects such as cockroaches

37,

63

, were not detected in mass spectra of the FG.

15 ACS Paragon Plus Environment

Page 16 of 41

Page 17 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

3.2.4 Novel neuropeptides

Figure 8. List of novel neuropeptide precursors from C. morosus. The detection of these precursors is based on the identification of the processed peptide sequences first. Underlined sequences were confirmed by MS2. Blue, signal peptide; yellow, putative bioactive neuropeptide; green, predicted C-terminal glycine amidation site; red, confirmed cleavage sites.

Figure 9: MALDI-TOF MS spectrum (direct tissue profiling) obtained from a preparation of a corpus cardiacum (CC). Ion signals represent single charged peptides [M + H]+ from multiple precursors with the exception of AKH and corazonin. The latter peptides are represented mostly by their [M + Na]+ and [M + K]+ adduct ions. Ion signals of the predicted pyrokinin-like peptides (PKL) of novel PKL1 and PKL2 precursors as well as HanSolin belong to the prominent peptides in this mass range and their identity was confirmed by MS2 experiments. m/z 1140-1450 (*: m/z 1358.86 NPLP-10). AKH, adipokinetic hormone; MIP, myoinhibitory peptide; PK, pyrokinin; sNPF, short neuropeptide F.

15 Environment ACS Paragon Plus

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In addition to the identification of neuropeptides from the insect neuropeptide precursors listed in Supporting Information S1, we implemented an alternative approach to identify potentially unknown neuropeptide precursors from C. morosus. For this, transcriptome data were first filtered for putative precursor sequences as described above (Fig. 1) and the resulting database was used with PEAKS to re-analyse Q Exactive Orbitrap data. Indeed, several novel candidates typical of neuropeptide precursor sequences could be identified based on their expressed peptide products as well as about 80% of the neuropeptide and neuropeptide-like precursors that were identified by search of homology to other insects (Fig. 8). Mature peptides of three of these precursors could be identified also by MALDI-TOF MS in preparations of the CC (Fig. 9; Table 2) which suggested neuro-hormonal functions. The C-terminal of the first of these precursors, HanSolin (Fig. 10) – as a tribute to the late adventurer - contains a single neuropeptide with amidation (GALRPLGQPLRWa) and did not show significant similarity with known insect neuropeptides. The other two precursors contain a single PK each and, at a first sight, resemble a recently described PK-like precursor in Locusta migratoria64. Therefore, the respective C. morosus precursors and their neuropeptides are named pyrokinin-like (PKL). A single peptide previously described as PKL from L. migratoria65 is in fact a PVK from the capa gene64. Future analyses of other polyneopteran taxa will show if the PKLs of L. migratoria and C. morosus have a common origin. The precursor sequences show little sequence similarities except for the PKLs which argued against a common origin albeit the abundance of PKLs in the CC of both species suggests just the opposite. Mass spectra from preparations of the CC (Fig. 9) verified also the accumulation of other peptide hormones, including AKH, MIPs, corazonin, multiple PKs from the genuine PK precursor, NPLP1, sNPF, and myosuppressin.

16 ACS Paragon Plus Environment

Page 18 of 41

Page 19 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 10: Q Exactive Orbitrap MS data from an extract of corpora cardiaca (CC); confirming the sequence of HanSolin. Data were analyzed with PEAKS 8 software. (A) MS2 spectrum, (B) Aligned spectrum view, (C) Ion table, and (D) Error map. Confirmed ion signals of the y- (red) and b-fragment series (blue) are labelled, respectively. The (–.98) indicates a C-terminal amidation.

Another precursor that has been identified in this way, results in a single potential neuropeptide, PASAIFTNIRFLamide, which is located at the C-terminal of the precursor and is flanked by dibasic proteolytic cleavage sites (Fig. 8). In contrast to the many insect neuropeptides terminating with RFamide (sulfakinins, myosuppressins, sNPF, NPF-1, NPF-2, extended FMRFamides), this peptide possesses an amidated Leu as C-terminal amino acid; following the common RF motif. Therefore, it is named RFLamide (RFLa). Both Q Exactive Orbitrap MS as well as MALDI TOF mass fingerprint

17 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 41

spectra identified RFLa in CNS and CC tissue preperations. In MALDI TOF mass spectra, RFLa was indicated by ions with very low signal intensity. Finally, several peptides of a multiple copy precursor, named Carausius Neuropeptide-like Precursor 1 (CNP1), were confirmed in samples from the nervous system, particularly FG (see Fig. 7). The precursor (Fig. 8) contains a number of completely different peptide sequences, including several amidation sites, which are separated by mono- or dibasic cleavage sites. Although such an elusive mixture of peptides is also found in Drosophila neuropeptidelike precursors, CNP1 of C. morosus is not sequence-related with any of these Drosophila precursors.

4. CONCLUSIONS Over the past few years, the combination of genomics and/or transcriptomics with MS-based neuropeptidomics has become a very successful approach to provide researchers with comprehensive data necessary for designing meaningful physiological experiments

47, 66-72

. Both, the quality of

genome/transcriptome data as well as MS data increased considerably and it is now possible to overview the neuropeptidome of a given species after a single coordinated attempt; also in insects. Our analysis of neuropeptides of C. morosus, which benefitted from a transcriptome directly prepared from the CNS, surpassed with a single step the neurochemical information we have currently from other polyneopteran model organisms such as locusts and cockroaches. Altogether, the survey resulted in an exceptional comprehensive coverage of processed neuropeptides. The better the coverage of predicted peptides is, the greater the significance of information about missing peptides. It can be assumed, that missing neuropeptides of C. morosus were either outside the mass range analysed here (e.g. protein hormones), were not cleaved as expected, or the corresponding gene expression did not yield sufficient quantity of peptides in the nervous system for analysis. Some of the missing peptides occurred with N-terminally extended forms; thereby likely retaining functional properties. The complete absence of other peptides in mass spectra such as calcitonin A, B, OK-B, and IMFa (precursors were identified), is a clear indication that these peptides are not abundant in the nervous system. The situation is different for sulfakinins and kinins which are neuropeptides typical of the nervous system in related insects. Since these peptides and the corresponding precursors were not detected in our analyses, they are likely not present at all. This can be functionally relevant and a taxon-specific feature. The comprehensive coverage of the neuropeptides in our peptidomic analyses also enabled the identification of putative neuropeptides which were not present in the initially curated precursor sequences. Novel and putative neuropeptide precursors were identified by combining transcriptomic data and Q Exactive Orbitrap MS. Whereas the functional significance of PKs from the novel PKL precursors is certainly beyond debate, functions of the other novel C. morosus peptides have still to be investigated; preferably paralleled by the identification of specific receptors.

18 ACS Paragon Plus Environment

Page 21 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

In summary, our study revealed 65 neuropeptide, neuropeptide-like, and protein hormone precursors. The focus of the peptidomic analyses was on the 42 neuropeptide and neuropeptide-like precursors; including different transcripts, and their heterozygous sequences. A single peptide (Cam-AST B5, AWQDLQAGWa)28 out of 15 neuropeptides previously published from C. morosus was not detected by peptidomics and could not be assigned to any of the precursors either. Altogether, we compiled a list of 153 putative mature neuropeptides and additional 124 peptides mainly representing precursor sequences but also truncated or extended forms of the neuropeptides. With that, sufficient data exist to go one step further and to analyse the role of neuropeptides in the motor control networks and the leg muscle control system of C. morosus in much more detail.

ACKNOWLEDGMENTS Funding was provided by the Graduate School for Biological Sciences (GSfBS) Cologne; Deutsche Forschungsgemeinschaft (PR766/11-1; DFG-RTG 1960: Neural Circuit Analysis of the Cellular and Subcellular Level). We thank Corinna Klein, Astrid Wilbrand-Hennes, Ursula Cullman and Christian Frese from CECAD Proteomics Facility (Cologne, Germany) for supporting Orbitrap analysis and Till Bockemühl (University of Cologne, Cologne, Germany) for valuable support with MATLAB.

SUPPORTING INFORMATION:

Supporting Information S1: List of sequences of neuropeptide and neuropeptide-like precursors from Carausius. Supporting Information S2: List of putative precursor sequences containing protein hormone sequences. Supporting Information S3: Summary of neuropeptides/neuropeptide-like peptides and additional neuropeptide precursor products from Carausius morosus which were identified by MS analyses.

REFERENCES 1. Bidaye, S. S.; Bockemuhl, T.; Buschges, A., Six-legged walking in insects: how CPGs, peripheral feedback, and descending signals generate coordinated and adaptive motor rhythms. J. Neurophysiol. 2018, 119, (2), 459-475. 2. Ayali, A.; Borgmann, A.; Buschges, A.; Couzin-Fuchs, E.; Daun-Gruhn, S.; Holmes, P., The comparative investigation of the stick insect and cockroach models in the study of insect locomotion. Curr. Opin. Insect Sci. 2015, 12, 1-10. 3. Büschges, A.; Schmitz, J.; Bässler, U., Rhythmic patterns in the thoracic nerve cord of the stick insect induced by pilocarpine. J. Exp. Biol. 1995, 198, (Pt 2), 435-56. 4. Akay, T.; Bassler, U.; Gerharz, P.; Buschges, A., The role of sensory signals from the insect coxa-trochanteral joint in controlling motor activity of the femur-tibia joint. J. Neurophysiol. 2001, 85, (2), 594-604. 5. Büschges, A.; Gruhn, M., Mechanosensory Feedback in Walking: From Joint Control to Locomotor Patterns. In Advances in Insect Physiology, Casas, J.; Simpson, S. J., Eds. Academic Press: 2007; Vol. Volume 34, pp 193-230. 6. Büschges, A.; Kittmann, R.; Schmitz, J., Identified nonspiking interneurons in leg reflexes and during walking in the stick insect. J. Comp. Physiol. A 1994, 174, (6), 685-700. 7. Graham, D.; Epstein, S., Behaviour and Motor Output for an Insect Walking on a Slippery Surface: II. Backward Walking. J. Exp. Biol. 1985, 118, (1), 287.

19 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

8. Goldammer, J.; Buschges, A.; Schmidt, J., Motoneurons, DUM cells, and sensory neurons in an insect thoracic ganglion: a tracing study in the stick insect Carausius morosus. J. Comp. Neurol. 2012, 520, (2), 230-57. 9. Kittmann, R.; Dean, J.; Schmitz, J., An Atlas of the Thoracic Ganglia in the Stick Insect, Carausius morosus. Philos. Trans. R. Soc. Lond. B Biol. Sci. 1991, 331, (1260), 101. 10. Büschges, A., Nonspiking pathways in a joint-control loop of the stick insect Carausius morosus. J. Exp. Biol. 1990, 151, 133-160. 11. Büschges, A.; Kittmann, R.; Ramirez, J.-M., Octopamine effects mimick state-dependent changes in a proprioceptive feedback system. J. Neurobiol. 1993, 24, (5), 598-610. 12. Orchard, I.; Ramirez, J. M.; Lange, A. B., A Multifunctional Role for Octopamine in Locust Flight. Annu. Rev. Entomol. 1993, 38, 227-249. 13. Buschges, A., Role of local nonspiking interneurons in the generation of rhythmic motor activity in the stick insect. J. Neurobiol. 1995, 27, (4), 488-512. 14. Grimmelikhuijzen, C. J.; Hauser, F., Mini-review: the evolution of neuropeptide signaling. Regul. Pept. 2012, 177 Suppl, S6-9. 15. Nassel, D. R.; Winther, A. M., Drosophila neuropeptides in regulation of physiology and behavior. Prog. Neurobiol. 2010, 92, (1), 42-104. 16. Goldsworthy, G. J., The Endocrine Control of Flight Metabolism in Locusts. In Advances in Insect Physiology, M.J. Berridge, J. E. T.; Wigglesworth, V. B., Eds. Academic Press: 1983; Vol. Volume 17, pp 149-204. 17. Kodrı́k, D.; Socha, R. r.; SF imek, P.; Zemek, R.; Goldsworthy, G. J., A new member of the AKH/RPCH family that stimulates locomotory activity in the firebug, Pyrrhocoris apterus (Heteroptera). Insect Biochem. Mol. Biol. 2000, 30, (6), 489-498. 18. Socha, R.; Kodrík, D.; Zemek, R., Adipokinetic Hormone Stimulates Insect Locomotor Activity. Science of nature 1999, 86, (2), 85-86. 19. Hooper, S. L.; Marder, E., Modulation of the lobster pyloric rhythm by the peptide proctolin. J. Neurosci. 1987, 7, (7), 2097-112. 20. Thirumalai, V.; Marder, E., Colocalized Neuropeptides Activate a Central Pattern Generator by Acting on Different Circuit Targets. J. Neurosci. 2002, 22, (5), 1874. 21. Blitz, D. M.; Christie, A. E.; Marder, E.; Nusbaum, M. P., Distribution and effects of tachykinin-like peptides in the stomatogastric nervous system of the crab, Cancer borealis. J. Comp. Neurol. 1995, 354, (2), 282-294. 22. Wood, D. E.; Stein, W.; Nusbaum, M. P., Projection Neurons with Shared Cotransmitters Elicit Different Motor Patterns from the Same Neural Circuit. J. Neurosci. 2000, 20, (23), 8943. 23. Wicher, D.; Agricola, H. J.; Sohler, S.; Gundel, M.; Heinemann, S. H.; Wollweber, L.; Stengl, M.; Derst, C., Differential receptor activation by cockroach adipokinetic hormones produces differential effects on ion currents, neuronal activity, and locomotion. J. Neurophysiol. 2006, 95, (4), 2314-25. 24. Gade, G.; Rinehart, K. L., Jr., Primary structure of the hypertrehalosaemic factor II from the corpus cardiacum of the Indian stick insect, Carausius morosus, determined by fast atom bombardment mass spectrometry. Biol. Chem. Hoppe Seyler 1987, 368, (1), 67-75. 25. Gade, G.; Kellner, R.; Rinehart, K. L.; Proefke, M. L., A tryptophan-substituted member of the AKH/RPCH family isolated from a stick insect corpus cardiacum. Biochem. Biophys. Res. Commun. 1992, 189, (3), 1303-9. 26. Miksys, S.; Lange, A. B.; Orchard, I.; Wong, V., Localization and neurohemal release of FMRFamide-related peptides in the stick insect Carausius morosus. Peptides 1997, 18, (1), 27-40. 27. Predel, R. K., R.; Gade, G.;, Myotropic neuropeptides from the retrocerebral complex of the stick insect, Carausius morosus (Phasmatodea: Lonchodidae). EJE 1999, 96, (3), 275-278. 28. Lorenz, M. W.; Kellner, R.; Hoffmann, K. H.; Gade, G., Identification of multiple peptides homologous to cockroach and cricket allatostatins in the stick insect Carausius morosus. Insect. Biochem. Mol. Biol. 2000, 30, (8-9), 711-8. 29. Weidler, D. J.; Diecke, F. P. J., The role of cations in conduction in the central nervous system of the herbivorous insect Carausius morosus. J. Comp. Physiol. 1969, 64, (4), 372-399. 30. Ragionieri, L.; Özbagci, B.; Neupert, S.; Salts, Y.; Davidovitch, M.; Altstein, M.; Predel, R., Identification of mature peptides from pban and capa genes of the moths Heliothis peltigera and Spodoptera littoralis. Peptides 2017, 94, 1-9. 31. Grabherr, M. G.; Haas, B. J.; Yassour, M.; Levin, J. Z.; Thompson, D. A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q. D.; Chen, Z. H.; Mauceli, E.; Hacohen, N.; Gnirke, A.; Rhind, N.; di Palma, F.; Birren, B. W.; Nusbaum, C.; Lindblad-Toh, K.; Friedman, N.; Regev, A., Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011, 29, (7), 644-U130. 32. Chang, Z.; Li, G.; Liu, J.; Zhang, Y.; Ashby, C.; Liu, D.; Cramer, C. L.; Huang, X., Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol. 2015, 16, (1), 30. 33. Shelomi, M., De novo transcriptome analysis of the excretory tubules of Carausius morosus (Phasmatodea) and possible functions of the midgut ‘appendices’. PLoS One 2017, 12, (4), e0174984. 34. Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T. L., BLAST+: architecture and applications. BMC Bioinformatics 2009, 10, 421. 35. Gasteiger, E.; Gattiker, A.; Hoogland, C.; Ivanyi, I.; Appel, R. D.; Bairoch, A., ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003, 31, (13), 3784-8. 36. Petersen, T. N.; Brunak, S.; von Heijne, G.; Nielsen, H., SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 2011, 8, (10), 785-6.

20 ACS Paragon Plus Environment

Page 22 of 41

Page 23 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

37. Predel, R., Peptidergic neurohemal system of an insect: mass spectrometric morphology. J. Comp. Neurol. 2001, 436, (3), 363-75. 38. Neupert, S.; Predel, R., Peptidomic Analysis of Single Identified Neurons. In Peptidomics: Methods and Protocols, Soloviev, M., Ed. Humana Press: Totowa, NJ, 2010; pp 137-144. 39. Rappsilber, J.; Mann, M.; Ishihama, Y., Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2007, 2, (8), 1896-906. 40. Predel, R.; Neupert, S.; Derst, C.; Reinhardt, K.; Wegener, C., Neuropeptidomics of the Bed Bug Cimex lectularius. J. Proteome Res. 2018, 17, (1), 440-454. 41. Vizcaino, J. A.; Csordas, A.; del-Toro, N.; Dianes, J. A.; Griss, J.; Lavidas, I.; Mayer, G.; Perez-Riverol, Y.; Reisinger, F.; Ternent, T.; Xu, Q. W.; Wang, R.; Hermjakob, H., 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2016, 44, (D1), D447-56. 42. Cox, J.; Mann, M., MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008, 26, (12), 1367-72. 43. Zhang, J.; Xin, L.; Shan, B.; Chen, W.; Xie, M.; Yuen, D.; Zhang, W.; Zhang, Z.; Lajoie, G. A.; Ma, B., PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell Proteomics 2012, 11, (4), M111 010587. 44. Derst, C.; Dircksen, H.; Meusemann, K.; Zhou, X.; Liu, S.; Predel, R., Evolution of neuropeptides in nonpterygote hexapods. BMC Evol. Biol. 2016, 16, (1), 51. 45. Veenstra, J. A., The contribution of the genomes of a termite and a locust to our understanding of insect neuropeptides and neurohormones. Front. Physiol. 2014, 5, 454. 46. Sterkel, M.; Oliveira, P. L.; Urlaub, H.; Hernandez-Martinez, S.; Rivera-Pomar, R.; Ons, S., OKB, a novel family of brain-gut neuropeptides from insects. Insect Biochem. Mol. Biol. 2012, 42, (7), 466-73. 47. Hauser, F.; Neupert, S.; Williamson, M.; Predel, R.; Tanaka, Y.; Grimmelikhuijzen, C. J., Genomics and peptidomics of neuropeptides and protein hormones present in the parasitic wasp Nasonia vitripennis. J. Proteome Res. 2010, 9, (10), 5296-310. 48. Mazzini, M.; Scali, V., Stick insects: phylogeny and reproduction : proceedings of the 1st International Symposium on Stick Insects, Siena, Italy, September 30th-October 2nd, 1986. University of Siena: 1987; p 202-210. 49. Predel, R.; Neupert, S.; Huetteroth, W.; Kahnt, J.; Waidelich, D.; Roth, S., Peptidomics-based phylogeny and biogeography of Mantophasmatodea (Hexapoda). Syst. Biol. 2012, 61, (4), 609-29. 50. Sturm, S.; Predel, R., Mass spectrometric identification, sequence evolution, and intraspecific variability of dimeric peptides encoded by cockroach akh genes. Anal. Bioanal. Chem. 2015, 407, (6), 1685-93. 51. Veenstra, J. A.; Rombauts, S.; Grbic, M., In silico cloning of genes encoding neuropeptides, neurohormones and their putative G-protein coupled receptors in a spider mite. Insect Biochem. Mol. Biol. 2012, 42, (4), 277-95. 52. Stafflinger, E.; Hansen, K. K.; Hauser, F.; Schneider, M.; Cazzamali, G.; Williamson, M.; Grimmelikhuijzen, C. J., Cloning and identification of an oxytocin/vasopressin-like receptor and its ligand from insects. Proc. Natl. Acad. Sci. U S A 2008, 105, (9), 3262-7. 53. Liutkeviciute, Z.; Koehbach, J.; Eder, T.; Gil-Mansilla, E.; Gruber, C. W., Global map of oxytocin/vasopressinlike neuropeptide signalling in insects. Sci. Rep. 2016, 6, 39177. 54. Schoofs, L.; Holman, G. M.; Proost, P.; Van Damme, J.; Hayes, T. K.; De Loof, A., Locustakinin, a novel myotropic peptide from Locusta migratoria, isolation, primary structure and synthesis. Regul. Pept. 1992, 37, (1), 49-57. 55. Clynen, E.; Husson, S. J.; Schoofs, L., Identification of new members of the (short) neuropeptide F family in locusts and Caenorhabditis elegans. Ann. N Y Acad. Sci. 2009, 1163, 60-74. 56. Predel, R.; Brandt, W.; Kellner, R.; Rapus, J.; Nachman, R. J.; Gade, G., Post-translational modifications of the insect sulfakinins: sulfation, pyroglutamate-formation and O-methylation of glutamic acid. Eur. J. Biochem. 1999, 263, (2), 552-60. 57. Shipley, M. T.; Puche, A. C., Olfactory Glomeruli: Structure and Circuitry A2 - Squire, Larry R. In Encyclopedia of Neuroscience, Academic Press: Oxford, 2009; pp 119-127. 58. Gadenne, C.; Barrozo, R. B.; Anton, S., Plasticity in Insect Olfaction: To Smell or Not to Smell? Annu. Rev. Entomol. 2016, 61, 317-33. 59. Raabe, M., Etude des phenomenes de neurosecretion au niveau de la chaine nerveuse ventrale des Phasmides. Bull. Soc. Zool. 1965, 90:631-654. 60. Predel, R.; Wegener, C., Biology of the CAPA peptides in insects. Cell. Mol. Life Sci. 2006, 63, (21), 2477-90. 61. Neupert, S.; Derst, C.; Sturm, S.; Predel, R., Identification of two capa cDNA transcripts and detailed peptidomic characterization of their peptide products in Periplaneta americana. EuPA Open Proteom. 2014, 3, 195205. 62. Wegener, C.; Reinl, T.; Jansch, L.; Predel, R., Direct mass spectrometric peptide profiling and fragmentation of larval peptide hormone release sites in Drosophila melanogaster reveals tagma-specific peptide expression and differential processing. J. Neurochem. 2006, 96, (5), 1362-74. 63. Konopińska, D.; Rosiński, G., Proctolin, an insect neuropeptide. J. Pept. Sci. 1999, 5, (12), 533-546. 64. Redeker, J.; Bläser, M.; Neupert, S.; Predel, R., Identification and distribution of products from novel tryptopyrokinin genes in the locust, Locusta migratoria. Biochem. Biophys. Res. Commun. 2017, 486, (1), 70-75. 65. Clynen, E.; Baggerman, G.; Huybrechts, J.; Vanden Bosch, L.; De Loof, A.; Schoofs, L., Peptidomics of the locust corpora allata: identification of novel pyrokinins (-FXPRLamides). Peptides 2003, 24, (10), 1493-500. 66. Diesner, M.; Predel, R.; Neupert, S., Neuropeptide Mapping of Dimmed Cells of Adult Drosophila Brain. J. Am. Soc. Mass Spectrom. 2018, 1–13.

21 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

67. Hu, C. K.; Southey, B. R.; Romanova, E. V.; Maruska, K. P.; Sweedler, J. V.; Fernald, R. D., Identification of prohormones and pituitary neuropeptides in the African cichlid, Astatotilapia burtoni. BMC Genomics 2016, 17, (1), 660. 68. Dircksen, H.; Neupert, S.; Predel, R.; Verleyen, P.; Huybrechts, J.; Strauss, J.; Hauser, F.; Stafflinger, E.; Schneider, M.; Pauwels, K.; Schoofs, L.; Grimmelikhuijzen, C. J., Genomics, transcriptomics, and peptidomics of Daphnia pulex neuropeptides and protein hormones. J. Proteome Res. 2011, 10, (10), 4478-504. 69. Li, B.; Predel, R.; Neupert, S.; Hauser, F.; Tanaka, Y.; Cazzamali, G.; Williamson, M.; Arakane, Y.; Verleyen, P.; Schoofs, L.; Schachtner, J.; Grimmelikhuijzen, C. J.; Park, Y., Genomics, transcriptomics, and peptidomics of neuropeptides and protein hormones in the red flour beetle Tribolium castaneum. Genome Res. 2008, 18, (1), 113-22. 70. Hummon, A. B.; Richmond, T. A.; Verleyen, P.; Baggerman, G.; Huybrechts, J.; Ewing, M. A.; Vierstraete, E.; Rodriguez-Zas, S. L.; Schoofs, L.; Robinson, G. E.; Sweedler, J. V., From the genome to the proteome: uncovering peptides in the Apis brain. Science 2006, 314, (5799), 647-9. 71. Murphy, D.; Konopacka, A.; Hindmarch, C.; Paton, J. F.; Sweedler, J. V.; Gillette, M. U.; Ueta, Y.; Grinevich, V.; Lozic, M.; Japundzic-Zigon, N., The hypothalamic-neurohypophyseal system: from genome to physiology. J. Neuroendocrinol. 2012, 24, (4), 539-53. 72. Wardman, J. H.; Berezniuk, I.; Di, S.; Tasker, J. G.; Fricker, L. D., ProSAAS-derived peptides are colocalized with neuropeptide Y and function as neuropeptides in the regulation of food intake. PLoS One 2011, 6, (12), e28152.

LIST OF FIGURES:

Figure 1:

Overview of the combined peptidomic/transcriptomic approach used to search for potentially novel neuropeptide precursor.

Figure 2:

Schematic drawing of the nervous system of Carausius morosus.

Figure 3:

MALDI-TOF MS spectra (direct tissue profiling) obtained from a preparation

of

an antennal lobe.

Figure 4:

MALDI-TOF MS spectrum obtained from a randomly selected single neuron from the posterior dorsomedian region of the metathoracic ganglion.

Figure 5:

MALDI-TOF MS spectra (direct tissue profiling) obtained from a preparation

of

(A) a single abdominal PSO and (B) the adjacent transverse nerves (TN) projecting into the periphery.

Figure 6:

MALDI-TOF mass spectrum obtained from a single postero-lateral neuron (PLC).

Figure 7:

MALDI-TOF MS spectra (direct tissue profiling) obtained from a preparation

of

a frontal ganglion (FG).

Figure 8:

List of novel neuropeptide precursors from C. morosus.

Figure 9:

MALDI-TOF MS spectrum (direct tissue profiling) obtained from a

preparation Figure 10:

of a corpus cardiacum (CC). Q Exactive Orbitrap MS data from an extract of corpora cardiaca (CC).

LIST OF TABLES:

Table 1:

Precursors for neuropeptides, neuropeptide-like sequences and protein hormones identified in the transcriptome of the stick insect C. morosus. 22 ACS Paragon Plus Environment

Page 24 of 41

Page 25 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 2:

Summary of putative neuropeptides/neuropeptide-like peptides and additional precursor products (PP) from novel precursors of C. morosus

23 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 41

Table 1. Precursors for neuropeptides, neuropeptide-like sequences and protein hormones identified in the transcriptome of the stick insect C. morosus. Different transcripts are marked with subscript characters (e.g. MIPa, MIPb). Sequences are listed in Supporting Information S1 and S2 which also consider partial sequences (+). Additionally, obtained sequences with allelic differences, sequences completed from RAW data and publicly available databases33 are indicated by ‘*’. TSA, transcriptome shotgun assembly; SRA, sequence read archive; p, pending for release.t, unpublished Trinity assembly. b, unpublished Bridger assembly. Neuropeptide precursor

Accession

Transcript

complete

TSA: SUB2014136

p

TSA: SRR3211828

t

-

(+)

Allatostatin A (allele 1)

SRA: SUB2017206 b

-

+

Allatostatin A (allele 2)

p

Adipokinetic hormone

DN45543_c0_g1_i2,

+

DN45543_c0_g1_i1 Adipokinetic hormone/corazoninrelated peptide*

TSA: SUB2014136

DN44920_c0_g1_i1,

+

DN52183_c0_g1_i1 Allatostatin CC Allatostatin CCC Allatotropin Calcitonin A Calcitonin B* Calcitonin-like diuretic hormone CAPA (allele 1) CAPA (allele 2)*

TSA: SUB2014136

p

DN48048_c1_g1_i4

+

TSA: SUB2014136

p

DN43401_c0_g1_i2

+

TSA: SUB2014136

p

DN51283_c0_g1_i3

+

TSA: SUB2014136

p

DN47529_c0_g1_i2

+

TSA: SRR3211828

t

TSA: SUB2014136

p

DN46707_c0_g1_i2

+

TSA: SUB2014136

p

DN45647_c0_g1_i2

+

SRA SUB2017206

p

@K00211:180:H7T73BB

+

-

+

XX:4:2227:26748:15451 1:N:0 CCHamide1

TSA: SUB2014136 p

DN46454_c0_g1_i1

CCHamide2

TSA: SUB2014136

p

DN42932_c0_g1_i1

+

TSA: SUB2014136

p

DN55890_c0_g4_i2

+

TSA: SUB2014136

p

DN33564_c0_g1_i1

+

TSA: SUB2014136

p

DN48797_c0_g1_i1

+

TSA: SUB2014136 p

DN53762_c0_g1_i3

+

TSA: SUB2014136

p

DN46047_c0_g1_i1

+

TSA: SUB2014136

p

DN49426_c0_g1_i1

+

SRA SUB2017206

p

@K00211:180:H7T73BB

+

CNMamide Corazonin Corticotropin-releasing factor-like diuretic hormone Crustacean cardioactive peptide Elevenin Extended FMRFamides (allele 1) Extended FMRFamides (allele 2)*

XX:4:1209:24434:19953 1:N:0 IMFamide Myoinhibitory peptidea Myoinhibitory peptideb

TSA: SUB2014136

p

DN45048_c0_g1_i1

+

TSA: SUB2014136

p

DN54048_c0_g1_i6

+

TSA: SUB2014136

p

DN54048_c0_g1_i5

+

24 ACS Paragon Plus Environment

Page 27 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

TSA: SRR3211830

t

TSA: SUB2014136

p

DN49827_c0_g1_i2

+

TSA: SUB2014136

p

DN80050_c0_g1_i1

(+)

TSA: SUB2014136

p

DN48937_c0_g1_i2

+

TSA: SUB2014136

p

DN48937_c0_g1_i1

+

TSA: SUB2014136

p

DN49613_c0_g3_i1

+

TSA: SUB2014136

p

DN54808_c2_g1_i1

+

TSA: SRR3211828

t

TSA: SUB2014136

p

DN40622_c0_g1_i1

+

TSA: SUB2014136

p

DN56579_c2_g2_i1

+

Pyrokinin (allele 1)

TSA: SUB2014136

p

DN37947_c0_g1_i1

(+)

Pyrokinin (allele 2)*

SRA SUB2017206 p

@K00211:180:H7T73BB

(+)

Myoinhibitory peptidec* Myosuppressin Natalisin Neuropeptide F1a Neuropeptide F1b Neuropeptide F2 Orcokinina Orcokininb* Pigment dispersing factor Proctolin

-

-

(+)

(+)

XX:4:1221:20506:29255 1:N:0 RYamide (allele 1) RYamide (allele 2)*

TSA: SUB2014136

p

DN47580_c0_g1_i1

+

SRA SUB2017206

p

@K00211:180:H7T73BB

+

XX:4:2216:24008:33387 1:N:0 TSA: SUB2014136

p

DN44624_c0_g1_i1

+

TSA: SUB2014136

p

DN18133_c0_g1_i1

+

Tachykinin-related peptide (allele 1)

TSA: SUB2014136

p

DN52157_c0_g1_i1

+

Tachykinin-related peptide (allele 2)*

TSA: SUB2014136 &

Short Neuropeptide F SIFamide

p

SRA SUB2017206

p

DN52157_c0_g1_i2 &

(+)

@K00211:180:H7T73BB XX:4:1216:28787:21658 1:N:0

Trissin Tryptopyrokinin

TSA: SUB2014136

p

SRA: SUB2017206

b

DN12347_c0_g1_i1 -

+ (+)

Neuropeptide-like precursor Agatoxin-like Neuropeptide-like precursor 1 NVP-containinga NVP-containingb

TSA: SUB2014136 p

DN39928_c0_g1_i1

+

TSA: SUB2014136

p

DN57492_c0_g1_i9

+

TSA: SUB2014136

p

DN53725_c1_g1_i1

+

SRA: SUB2017206

b

-

+

Protein hormones precursor Bursicon alpha 1

TSA: SUB2014136 p

Bursicon alpha 2

b

SRA: SUB2017206

25 ACS Paragon Plus Environment

DN55493_c1_g3_i4 -

+ +

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 41

TSA: SUB2014136

p

DN46435_c0_g1_i1

+

TSA: SUB2014136

p

DN65927_c0_g1_i1

(+)

TSA: SRR3211831

t

TSA: SUB2014136

p

DN52423_c0_g2_i1

+

TSA: SUB2014136

p

DN43708_c0_g1_i1

+

TSA: SUB2014136

p

DN47145_c0_g1_i1

+

TSA: SUB2014136

p

DN56801_c0_g1_i2

+

TSA: SUB2014136

p

DN43203_c0_g1_i3

+

TSA: SUB2014136

p

DN41365_c0_g2_i1

+

TSA: SUB2014136

p

DN45509_c0_g1_i1

+

Insulin-like peptide 4

TSA: SUB2014136

p

DN43203_c0_g1_i2

+

Insulin-like peptide 5

TSA: SUB2014136 p

DN41365_c0_g1_i1

+

TSA: SUB2014136

p

DN40690_c0_g1_i1

+

SRA: SUB2017206

b

TSA: SUB2014136

p

DN48667_c0_g1_i2

+

TSA: SUB2014136

p

DN40670_c0_g1_i2

+

TSA: SUB2014136

p

DN40670_c0_g1_i1

+

Bursicon beta Eclosion hormone Ecdysis-triggering hormone* Glycoprotein hormone alpha Glycoprotein hormone beta IDL-containing ITG-like Insulin-like peptide 1 Insulin-like peptide 2 Insulin-like peptide 3

Insulin-like peptide 6/insulin-like growth

-

+

factor Insulin-like peptide 7/relaxin* Ion transport-like peptide Neuroparsin (allele 1) Neuroparsin (allele 2)

26 ACS Paragon Plus Environment

-

+

Page 29 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 2. Summary of putative neuropeptides/neuropeptide-like peptides and additional precursor products (PP) from novel precursors of C. morosus which were identified by MS analyses (direct tissue profiling with MALDI-TOF MS including MS2; extract analysis with Q Exactive Orbitrap MS). Designation

Sequence

m/z [M + H]+

MALDI-TOF MS1

MALDI-TOF MS2

Q Exactive Orbitrap

GALRPLGQPLRWa

1362.81

+

+

+

RPSYQGVLEPVKDEVWLTRVPELLQN

5352.61

+

-

-

4495.21

+

-

-

HanSolin HanSolin HanSolin-PP-1

GWIDVYTSDKYTDTDMPDPS-OH ALSMLSRWRPFSGFMSRYMQPRAPPA

HanSolin-PP-2

NVLPDNSDLVSAET-OH Carausius neuropeptide-like precursor 1 (CNP1) CNP1-2

GFHESVFDRLGDYLPYWQRa

2384.16

+

-

+

CNP1-3

DPLGINSRGFHDDVFNQDFGSFHPV-OH

2817.30

+

-

+

CNP1-5

GPGLTHD-OH

696.33

-

-

+

CNP1-6

SVMAGSRAQAVDEDRKFEKDVLTKTFLDDE

3914.94

-

-

+

RPEMTGSGFHGDTFTRGFGDFWPMKKS-OH

3076.42

+

-

+

RPEMTGSGFHGDTFTRGFGDFWPM-OH

2733.20

+

-

+

RMGMGPSGFHGDTFTSGFGDFTTM-OH

2541.06

+

-

+

CNP1-10

SLGDPPELSVPRa

1265.69

+

-

+

CNP1-11

GGLQSDFTSED-OH

1155.48

-

-

+

CNP1-12

RPDTGSNGFHGDTFTGGFGDFWTM-OH

2607.10

+

-

+

CNP2-13

SERNGEENPFD-OH

1293.53

-

-

+

pQSGPGFFRPRPa

1227.64

+

+

+

QSGPGFFRPRPa

1244.67

+

-

+

PKL1-PP-1

pQGTEDGKKEPSGVWLLQPMRVDTSSGLSKDS-OH

3315.62

+

-

+

PKL1-PP-2

FVPQEGYVY-OH

1101.53

+

-

+

DESGVGFFRPRLa

1378.72

+

+

+

REEDVEEAPSPA-OH

1328.60

+

-

+

PASAIFTNIRFLa

1348.77

+

-

+

KISAD-OH CNP1-7 CNP1-7

1-24

CNP1-9

Pyrokinin-like 1 (PKL1) PKL1 [pQ] PKL1 (Q)

Pyrokinin-like 2 (PKL2) PKL2 PKL2-PP-2 RFLamide (RFLa) RFLa

27 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for TOC only

28 ACS Paragon Plus Environment

Page 30 of 41

Page 31 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

84x47mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Overview of the combined peptidomic/transcriptomic approach used to search for potentially novel neuropeptide precursor. First, nucleotide sequences of the transcriptome assembly were converted into all six open reading frames (ORF) which are then surveyed for shorter sequences (starting with a methionine and ending on a stop codon or the end of the sequence). Sequences shorter than 50 amino acids and identical sequences were removed (crossed-out arrow). The resulting list of putative precursor sequences (FASTA-file) was used as a database in our peptidomic analysis using the PEAKS software. In the final step, novel precursors were identified by reference to their expressed peptide products observed in the Q Exactive Orbitrap data. 265x53mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 32 of 41

Page 33 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2: Simplified overview of the nervous system of Carausius morosus, showing tissues and cells that were analyzed in this study. A, abdominal ganglion; AL, antennal lobe; aPSO, abdominal perisympathetic organ; CA, corpora allata; CC, corpora cardiaca; FG, frontal ganglion; GNG, gnathal ganglion; PDM, posterior dorso-median cell; PLC, postero-lateral cells; RCC, retrocerebral complex; tPSO, thoracic perisympathetic organ; TN, transverse nerve. 209x52mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: MALDI-TOF MS spectra (direct tissue profiling) obtained from a preparation of an antennal lobe tissue (AL, 100 µm). All marked ion signals represent single charged peptides ([M + H]+) from multiple precursors. (A) m/z 850-1300. (B) m/z 1300- 3050 (*: m/z 1331.8 sNPF; **: m/z 1633.9 NPLP-8). Superscript letters are for allelic differences. AstA, allatostatin A; MIP, myoinhibitory peptide; TKRP, tachykinin-related peptides; NVP, NVP-containing PP; ACP, adipokinetic hormone/corazonin-related peptide; AT, allatotropin; NPLP, neuropeptide-like precursor 1; OK-A, orcokinin A; RYa, RYamide. 80x97mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 34 of 41

Page 35 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4: MALDI-TOF MS spectrum obtained from a randomly selected single neuron from the posterior dorsomedian region (PDM) of the metathoracic ganglion; showing products of the OK-A, myosuppressin and elevenin precursors (* m/z 1611.8, OK-A-PP-21-13). PP, precursor peptide; OK-A, orcokinin A; scale bar, 50 µm; n = 3. 80x43mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: MALDI-TOF MS spectra (direct tissue profiling) obtained from a preparation of (A) a single abdominal PSO and (B) the adjacent transverse nerves (TN) projecting into the periphery (*PVK-1b; **ext. PVK-1b). Superscript letters are for allelic differences. Alleles of PVK-1 were first identified by means of MS2 analyses and subsequently found in transcriptome RAW data as well. PVK, periviscerokinin; tPK, tryptopyrokinin; MIP, myoinhibitory peptide; AT, allatotropin. PSO, perisympathetic organ. 80x94mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 36 of 41

Page 37 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 6: MALDI-TOF mass spectrum obtained from a single postero-lateral neuron (PLC) of the metathoracic ganglion, which was visualized via backfilling of a transverse nerve (TN). The spectrum exclusively contains ion signals of extended FMRFamides (*: m/z 896.6 FMRF-8 (Q); **: m/z 1129.6 FMRFPP-31-11). Superscript letters are for allelic differences. PP, precursor peptide. Most abundant FMRF precursor products are depicted in blue; scale bar, 50 µm; n = 16. 84x48mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7: MALDI-TOF MS spectra (direct tissue profiling) obtained from a preparation of a frontal ganglion (FG). All marked ion signals represent single charged peptide products ([M + H]+) from multiple precursors. Major ion signals could be confirmed by MS2 experiments. Furthermore, four mature products of the novel CNP1 precursor were confirmed in the FG by MS2 experiments. (A) m/z 750-1350 (*: m/z 1099.5 AstA-8; **: m/z 1133.5 NVP-41-11; ***: m/z 1273.6 RYa-211-20; ****: m/z 1289.6 NVP-41-12), (B) m/z 1350-1700 (*: m/z 1358.7 NPLP-10; **: m/z 1357.7 MIPa-11, (C) m/z 1700-4000 (*: m/z 1887.0 NVP-81-18 (pQ); **: m/z 1924.9; ***: m/z 2171.9). Superscript letters are for allelic differences. AstA, allatostatin A; AstCCC, allatostatin CCC; MIP, myoinhibitory peptide; TKRP, tachykinin-related peptides; NVP, NVPcontaining PP; AT, allatotropin; NPLP, neuropeptide-like precursor 1; OK-A, orcokinin A; PP, precursor peptide; RYa, RYamide; sNPF, short neuropeptide F; CNP1, Carausius neuropeptide-like precursor 1. 81x131mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 41

Page 39 of 41

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8. List of novel neuropeptide precursors from C. morosus. The detection of these precursors is based on the identification of the processed peptide sequences first. Underlined sequences were confirmed by MS2. Blue, signal peptide; yellow, putative bioactive neuropeptide; green, predicted C-terminal glycine amidation site; red, confirmed cleavage sites. 162x154mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 40 of 41

Page 41 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 9: MALDI-TOF MS spectrum (direct tissue profiling) obtained from a preparation of a corpus cardiacum (CC). Ion signals represent single charged peptides [M + H]+ from multiple precursors with the exception of AKH and corazonin. The latter peptides are represented mostly by their [M + Na]+ and [M + K]+ adduct ions. Ion signals of the predicted pyrokinin-like peptides (PKL) of novel PKL1 and PKL2 precursors as well as HanSolin belong to the prominent peptides in this mass range and their identity was confirmed by MS2 experiments. m/z 1140-1450 (*: m/z 1358.86 NPLP-10). AKH, adipokinetic hormone; MIP, myoinhibitory peptide; PK, pyrokinin; sNPF, short neuropeptide F. 80x46mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 10: Q Exactive Orbitrap MS data from an extract of corpora cardiaca (CC); confirming the sequence of HanSolin. Data were analyzed with PEAKS 8 software. (A) MS2 spectrum, (B) Aligned spectrum view, (C) Ion table, and (D) Error map. Confirmed ion signals of the y- (red) and b-fragment series (blue) are labelled, respectively. The (–.98) indicates a C-terminal amidation. 209x247mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 42 of 41