Genomics, Transcriptomics, and Peptidomics of Daphnia pulex

Aug 10, 2011 - Forty-three water flea genes encode 73 neuropeptide and protein preprohormones: adipokinetic hormone, allatostatins (A, B, C), allatotr...
3 downloads 12 Views 11MB Size
ARTICLE pubs.acs.org/jpr

Genomics, Transcriptomics, and Peptidomics of Daphnia pulex Neuropeptides and Protein Hormones Heinrich Dircksen,*,† Susanne Neupert,‡ Reinhard Predel,‡ Peter Verleyen,§ Jurgen Huybrechts,§ Johannes Strauss,† Frank Hauser,|| Elisabeth Stafflinger,|| Martina Schneider,|| Kevin Pauwels,§ Liliane Schoofs,§ and Cornelis J. P. Grimmelikhuijzen|| †

Department of Zoology, Stockholm University, Sweden Institute of Zoology, University of Cologne, Cologne, Germany § Department of Biology, K.U. Leuven, Belgium Center for Functional and Comparative Insect Genomics, Department of Biology, University of Copenhagen, Denmark

)



bS Supporting Information ABSTRACT: We report 43 novel genes in the water flea Daphnia pulex encoding 73 predicted neuropeptide and protein hormones as partly confirmed by RTPCR. MALDI-TOF mass spectrometry identified 40 neuropeptides by mass matches and 30 neuropeptides by fragmentation sequencing. Single genes encode adipokinetic hormone, allatostatin-A, allatostatin-B, allatotropin, Ala7CCAP, CCHamide, Arg7-corazonin, DENamides, CRF-like (DH52) and calcitonin-like (DH31) diuretic hormones, two ecdysis-triggering hormones, two FIRFamides, one insulin, two alternative splice forms of ion transport peptide (ITP), myosuppressin, neuroparsin, two neuropeptide-F splice forms, three periviscerokinins (but no pyrokinins), pigment dispersing hormone, proctolin, Met4-proctolin, short neuropeptide-F, three RYamides, SIFamide, two sulfakinins, and three tachykinins. There are two genes for a preprohormone containing orcomyotropin-like peptides and orcokinins, two genes for N-terminally elongated ITPs, two genes (clustered) for eclosion hormones, two genes (clustered) for bursicons alpha, beta, and two genes (clustered) for glycoproteins GPA2, GPB5, three genes for different allatostatins-C (two of them clustered) and three genes for IGF-related peptides. Detailed comparisons of genes or their products with those from insects and decapod crustaceans revealed that the D. pulex peptides are often closer related to their insect than to their decapod crustacean homologues, confirming that branchiopods, to which Daphnia belongs, are the ancestor group of insects. KEYWORDS: Daphnia pulex, crustacean, neuropeptide, protein hormone, preprohormone, genomics, transcriptomics, bioinformatics, MALDI-TOF mass spectrometry, RT-PCR

’ INTRODUCTION Neuropeptides and peptide hormones are the most diverse group of signaling molecules evolved to regulate most physiological processes of animals. In arthropods, especially in insects and crustaceans, peptide hormones have been initially detected by endocrinological ablation and reconstitution experiments to investigate endocrine control of a wealth of vital functions, before the same or similar peptides were also found to act as neuromodulators in the central nervous system. In decapod crustaceans, the first neuropeptides with neurohormonal functions have been discovered as chromatophorotropic factors eliciting color change in shrimp in vivo-bioassays or by eyestalk ablation experiments as factors being “diabetogenic” and responsible for growth inhibition. At least four different types of such hormonal peptides had already been identified up to the 1990s. They originate from the so-called X-organ or from other eyestalk ganglia and are released via the neurohemal sinus gland into the hemolymph. The first crustacean peptide to be fully sequenced was the nonapeptide red pigment concentrating hormone r 2011 American Chemical Society

(RPCH1), later found to occur in all decapods, before its functional antagonist, the pigment dispersing hormone (PDH), was identified. However, the major products of the X-organ sinus gland (XOSG) neurosecretory system are, apart from RPCH, large peptides of the so-called crustacean hyperglycemic hormone (CHH), molt-inhibiting hormone (MIH), and gonad/vitellogenesis inhibiting hormones (GIH/VIH) neuropeptide families.2 Members of the CHH peptide family exert at least eight well-established stimulatory and inhibitory actions related to growth and metabolism, whereas the MIHs and GIH/VIHs appear to be restricted to the inhibition of ecdysteroidogenesis of the Y-organ and of gonad development, respectively.3 In decapods, neurons of the eyestalk medulla are known to produce PDHs, some of which are released via the sinus gland.4 Other peptides were discovered in the pericardial organs, which are large neurohemal organs consisting of nerve endings next to the heart of decapod crustaceans. These peptides are affecting heart rate and volume output when tested in Received: March 29, 2011 Published: August 10, 2011 4478

dx.doi.org/10.1021/pr200284e | J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research semi-isolated preparations and were identified as proctolin,5 crustacean cardioactive peptide (CCAP),6 and molluscan FMRFamide-related peptides. Several of these peptides were later also found to be myoactive on certain visceral (e.g., hindgut) and skeletal muscles.7 Other myoactive peptides such as the orcokinins and orcomyotropin8,9 were discovered in crayfish hindgut bioassays. Additional peptides were detected in comparative studies by use of separation techniques such as high performance liquid chromatography (HPLC) combined with enzyme-immunoassays, for example, the large peptide family of crustacean allatostatins.1012 However, with the enormous technical improvements and increased sensitivity of mass spectrometry (MS) techniques including fragmentation capabilities (MS/MS), another peak of peptide discoveries recently occurred for several decapod species, mainly crabs, crayfish, and lobsters, so that we now know more than 200 different crustacean peptides.13 Among the first peptide gene structures available were those encoding classical CHHs and MIHs. As a complication, however, decapod CHHs and even MIHs are not derived from single genes only but from distinct clusters of 10 or more genes in tandem arrangement on the same chromosome as has been worked out in great detail for some shrimp species.14 Genome sequencing, expressed sequence tag (EST) library collections, and the availability of more sophisticated bioinformatics techniques for gene discovery and annotation allowed reading precursor and peptide structures from the (re)constructed genes. Thus, the finishing of the genome of Daphnia pulex prompted us as members of the Daphnia Genome Consortium (DGC) annotation team15 to use the available genomic information not only to further annotate several peptide genes but also to confirm peptide hormone precursors and to identify their derived neuropeptide(s) in the central nervous system of this species. Thus, this report describes for the first time the global analysis of neuropeptide genes and their products in a crustacean. We have in all cases newly annotated or thoroughly revised the computer-generated peptide-gene structures, partly confirmed their cDNAs by RT-PCR and confirmed more than half of the predicted neuropeptides by mass spectrometry. Our analysis revealed that Daphnia neuropeptides were often more similar to insect than to decapod crustacean neuropeptides, supporting the view that branchiopods, for example, tadpole shrimps and water fleas, to which Daphnia pulex belongs, are the ancestor group of insects.16

’ MATERIALS AND METHODS Adult parthenogenetic females of water fleas Daphnia pulex (Leydig 1860) (strain Livpu01, Dept. Biology, Leuven University, Leuven, Belgium; clone originally hatched from resting eggs from a natural population at Brown Moss, North Shropshire, U.K.) were used for mass spectrometry and de novo-sequencing experiments. Animals were fed ad libitum (food concentrations: 150 000 cells per mL corresponding to 1.1 mg cells per L) green algae Scenedesmus obliquus (K€utzing 1833:609) reared under laboratory conditions, (20 ( 2 °C; 14 h light/10 h dark photoperiod) in dechlorinated tap water (30 animals in 3 L jars). Animals were raised until they reached second or third adult instar before extracting peptides. Sample Preparation and Mass Spectrometry

A). Extraction. Batches of up to 50 specimens of D. pulex were dissected in a modified Ringer solution17 (pH 7.5) of the following composition: NaCl (12.0 g/L), KCl (0.4 g/L), CaCl2  2H2O (2.25 g/L), MgCl2  6H2O (0.5 g/L), maleic acid (0.58 g/L),

ARTICLE

Tris-base (0.61 g/L). Brain-optic ganglia complexes were collected on ice in Eppendorf caps containing 50 μL of 2 M aqueous acetic acid solution. The samples were sonicated (on ice) three times for 1 min. After spinning down debris, the extracts were concentrated and desalted using ZipTipC18 (Millipore, 15 μm). The ZipTipC18 was pre-equilibrated for sample binding using 0.1% aqueous TFA containing 50% CH3CN and subsequently washed with 0.1% aqueous TFA. The samples were loaded onto the ZipTipC18 and washed with 0.1% aqueous TFA. Then, samples were eluted in 1 μL acetonitrile/water/formic acid (70:29.9:0.1; v/v/v) directly onto the MALDI target and mixed with 0.5 μL saturated alpha-cyano-4hydroxycinnamic acid in ethanol/CH3CN/TFA (50:49.9:0.1; v/v/v), air-dried, and analyzed. MALDI-TOF-MS was performed on a Reflex IV instrument (Bruker Daltonik, Bremen, Germany), equipped with an N2 laser (337 nm) and pulsed ion extraction accessory. The instrument was operated in the positive ion, reflectron mode and calibrated using a standard peptide mixture containing angiotensin II (1045.54 Da), angiotensin I (1295.68 Da), substance P (1346.73 Da), bombesin (1618.82 Da), ACTH Clip 117 (2092.08 Da) and ACTH Clip 1839 (2464.19 Da) (Bruker Daltonik). The instrument settings were as follows: ion source 1, 25.00 kV; ion source 2, 20.35 kV; pulsed ion extraction set at 200 ns; lens voltage, 12.30 kV; and reflector voltage, 28.70 kV. Matrix deflection was set at 500 Da and the laser energy was adjusted to just above desorption/ionization threshold. Spectra were recorded in the reflectron mode within a mass range from m/z 500 to m/z 3000 and are the results of ca. 100 shots. B). Direct Tissue Profiling. Single specimens of D. pulex were pinned with microneedles and submerged in an insect saline (pH 7.25) of the following composition: NaCl (7.50 g/L), KCl (0.20 g/L), CaCl2 (0.20 g/L) and NaHCO3 (0.10 g/L). The body cavities were opened with fine forceps and ultrafine scissors, the brains dissected and subsequently cut into smaller pieces, which were transferred with a glass capillary into a drop of distilled water on the sample plate for MALDI-TOF mass spectrometry. The water was removed using the same glass capillary. The tissues were air-dried and covered with approximately 50 nL (depending on the sample size) of matrix solution (saturated alpha-cyano-4-hydroxycinnamic acid dissolved in methanol/water 1:1) over a period of about 5 s using a Nanoliter injector (World Precision Instruments, Berlin, Germany). Each preparation was air-dried again and covered with distilled water for a few seconds, which was finally removed with cellulose paper to reduce salt contents. MALDI-TOF analyses were performed on an ABI 4800 Proteomics Analyzer (Applied Biosystems, Framingham, MA). To determine the parent masses, the instrument was operated in reflectron mode. Tandem MS experiments were performed in gas-off mode. The number of laser shots used to obtain a spectrum varied from 800 to 4000, depending on signal quality, and a mass range from m/z 600 to m/z 5000 was chosen. The fragmentation patterns were analyzed by use of the Data ExplorerT software 4.3 package to confirm the sequence of the peptide. Bioinformatics

As bioinformatics tools BLAST searches were usually performed applying the tblastn or blastp algorithms at NCBI, the Joint Genome Institute (JGI), or the www.wfleabase.org bioinformatics Web sites. Alignments were performed with the CLUSTALW algorithm embedded within the BioEdit program version 7.0.5.3 or for small sequences fine-tuned by hand or put into in the Genedoc program version 2.7.0. The statistics output 4479

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Table 1. Alphabetical List of Mature Peptides and Protein Hormones of Daphnia pulex mass peptide name

[M + H]+

peptide sequence(s)

annotated/PCR-

match

MS/MS

confirmed

pQVNFSTSWa

950.44

+



+/nd

Allatostatin A1

SFGGNPTGDPNLNIYSFGLa

1968.94





+/+

Allatostatin A2

TSRSYSINPYSFGLa

1590.79

+

+

+/+

Allatostatin A3

GGNAKSYPQQIPYSFGLa

1825.92





+/+

Allatostatin A4

NPTKYNFGLa

1052.55

+

+

+/+

Allatostatin A5

PDRFGFGLa

907.48

+

+

+/+

Allatostatin A6 Allatostatin B1

LPVYNFGLa NNWNRMQGMWa

921.52 1335.58

+ +

+a +a

+/+ +/nd

Allatostatin B2

AWSDLSQQGWa

1176.54





+/nd

Allatostatin B3

SWTQLHGVWa

1112.56

+

+a

+/nd

Allatostatin B4

RWDQLHGAWa

1167.58

+

+

+/nd

Allatostatin B5

SGWNKMQGVWa

1191.57

+

+

+/nd

Allatostatin B6

GWNQLQGVWa

1086.54





+/nd

Allatostatin B7

NWNNLRGAWa

1129.56

+

+

+/nd

Allatostatin B8 Allatostatin C1

ESGWNNLKGLWa SYWKQCAFNAVSCFa

1302.65 1650.72

 

 

+/nd +/nd

Allatostatin C2

GQSSQRVFWRCYFNAVSCF

2283.02





+/nd

Allatostatin C3

SKQLRYHHCYFNPISCF

2140.98





+/nd

GFKTVGLATARGFa

1323.75

+

+

+/+

nd





+/nd

Adipokinetic hormone

Allatotropin Bursicon-alpha

[Dappu1:306405]

nd





+/nd

NCNKYGNACFGAHa

1395.56





+/nd

pQTFQYSRGWTNa PFCNAFAGCa

1369.63 926.36

+ 

+ 

+/nd +/nd

Bursicon-beta CCHamide Corazonin Crustacean cardioactive peptide

[Dappu1:221518]

DENamide 1

KCHFDENa

891.38





+/nd

DENamide 2

FMGLSHFDENa

1195.52





+/nd

DENamide 3

FKGLSHFDENa

1192.57





+/nd

DENamide 4

FKGLSHFGENa

1134.57





+/nd

Diuretic hormone-31

GVDFGLGRGYSGSQAAKHLMGLAAANYAIGPa

3048.55

+

+

+/+

Diuretic hormone-52

[Dappu1:304695]

nd





+/nd

2584.35 3724.88

 

 

+/nd +/nd

Ecdysis-triggering hormone 1 Ecdysis-triggering hormone 2

DPSPEPFNPNYNRFRQKIPRIa GEGIIAEYMNSESFPHEGSLSNFFLKASKAVPRLa

Eclosion hormone

[Dappu1:240158]

nd





+/nd

Eclosion hormone long

[Dappu1:442999]

nd





+/nd

SALNKNFIRFa

1208.69

+

+

+/nd

DEERSFHPARPSRSLRSNFIRFa SLRSNFIRFa

2703.39 1138.65

+ +

 

+/nd +/nd

nd





+/nd

 

 

+/nd +/+

FIRFamide 1 FIRFamide 2 FIRFamide 21422 Glycoprotein A2

[Dappu1:212447]

Glycoprotein B5 Inotocin

[Dappu1:319430] CFITNCPPGa

nd 948.40 nd





+/nd

Insulin-related peptide-2 B-chain

NARYCGSYLADALRMACS

1964.88

+



+/+

Insulin-related peptide-2 A-chain

GVHDECCVKGCTFKELTSYCTRPN

2689.19





+/+

Insulin-related peptide-1

[Dappu1:226060]

Insulin-related peptide-3

[Dappu1:302531]

nd





+/nd

Insulin-related peptide-4

[Dappu1:316719]

nd





+/nd

[Dappu1:322492]

nd





+/+

[Dappu1:442877] [hxAUG26res61g65t1][Dappu1:57088]

nd nd

 

 

+/+ +/nd

Ion transport peptides (ITP) ITP (short splice form of ITP) ITPL (long splice form of ITP) ITPLN (long, N-term. elongated)

[Dappu1:307029]

nd





+/nd

Myosuppressin [pQ]

pQDVDHVFLRFa

1257.64

+

+

+/+

Myosuppressin [Q]

QDVDHVFLRFa

1274.66

+

+

+/+

ITPN (short, N-term. elongated)

4480

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Table 1. Continued mass peptide name

peptide sequence(s)

Neuroparsin Neuropeptide F (splice form 1)

[Dappu1:63582] DGGDVMSGGEGGEMTAMADAIKYLQGLDKVYGQAARPRFa

annotated/PCR-

[M + H]+

match

MS/MS

confirmed

nd





+/+

4060.93

+



+/+

nd





+/+

Orcokinin 1

NLDEIDRSNFGTFA

1598.75

+



+/nd

Orcokinin 2

NLDEIDRSDFGRFV

1682.81

+



+/nd

Orcokinin 3

NLDEIDRSDFSRFV

1712.82





+/nd

Orcomyotropin-like peptide 1

LDSLTGLGFGSQ

1194.60





+/nd

Orcomyotropin-like peptide 2 Orcomyotropin-like peptide 3

GLDSLSGASFGIE FDSLTGLGINSQ

1252.61 1251.62

 

 

+/nd +/nd +/+

Neuropeptide F (splice form 2)

Pigment dispersing hormone

[Dappu1:224782]

NSELINSLLGLPRFMKVVa

2029.16

+

+

Periviscerokinin 1

APPQILKSQSLIPFPRVa

1890.12

+

+

+/nd

Periviscerokinin 2

HLIPFPRVa

977.60

+

+

+/nd

Periviscerokinin 3 [pQ]

pQNLIPFPRVa

1065.62

+

+

+/nd

Periviscerokinin 3 [Q]

QNLIPFPRVa

1082.65

+



+/nd

Proctolin Met4-Proctolin RYamide 1

RYLPT

649.37

+

+

+/nd

RYLMT pQTFFTNGRYa

683.36 1115.53

+ +

+a +

+/nd +/nd

RYamide 2

SEVRSRVASRSADERFFGGPRFa

2512.29





+/nd

RYamide 3

SGNGGIVLGNSELDARNPERFFIGSRYa

2924.48

+



+/nd

RYamide 31727

NPERFFIGSRYa

1384.71

+

+a

+/nd

short neuropeptide F

SDRSPSLRLRFa

1332.75

+

+

+/nd

SIFamide

TRKLPFNGSIFa

1278.73

+



+/nd

Sulfakinin 1 [pQ]

pQPDDYGHMRYa

1263.52

+

+

+/+

Sulfakinin 1 [Q] Sulfakinin 2

QPDDYGHMRYa DFDDYGHMRFa

1280.55 1301.54

+ +

+ +

+/+ +/+

Tachykinin-related peptide 1

TPNSRAFLGMRa

1248.66

+

+a

+/+

Tachykinin-related peptide 2

KMHGEKFLGMRa

1332.70

+

+a

+/+

Tachykinin-related peptide 3

APSSNSFMGMRa

1183.54

+

+a

+/+

a

Contaminated with fragments of other neuropeptides; -a, C-terminal -amide; nd, not determined; for longer peptide sequences the protein identifier (see Suppl. Table S1, Supporting Information) is provided in brackets.

of Genedoc served for quantifications of peptide similarities, by giving either the percent amino acid identity plot (aai) or the number of residues whose juxtaposition yields a greater than zero score in the scoring table (scr), i.e. a measure in percent of similar or conservative aa-residues and/or aasubstitutions. All shadings in following figures represent identities (black) or highly similar aas (gray). For nucleotide sequence analyses of mRNAs and genes Vector NTI v.9.0.0 (Informax/Invitrogen) was applied. More detailed analyses and novel annotations of genes, their exon-intron boundaries were performed usually with aid of Fgenesh-M gene finding tools (Softberry Inc., Mount Kisco, NY; http://linux1.softberry.com/berry.phtml?topic=fgenesh-m&group=programs& subgroup=gfind) applying templates from insects (e.g., Tribolium castaneum) and/or GeneSplicer (http://www.cbcb.umd. edu/software/GeneSplicer/gene_spl.shtml). Signal peptides have been determined with aid of the SignalP 3.0 server18 (http://www.cbs.dtu.dk/services/SignalP/) and its evaluating algorithms based on neural networks (NN) and hidden Markov models (HMM). The sulfinator predictor (http://expasy.org/ tools/sulfinator/) served for the analysis of post-translationally sulphated tyrosines in predicted peptides. Putative cleavage sites in preprohormones have been analyzed according to the rules of

Veenstra, 2000,19 or using the web-based NeuroPred program (http://neuroproteomics.scs.illinois.edu/cgi-bin/neuropred.py).20 Reverse Transcription-PCR and Product Sequencing

Batches of 50 entire D. pulex individuals were snap-frozen and pulverized using mortar and pestle under liquid nitrogen and processed for total RNA extraction by the Trizol method according to the manufacturer’s instructions (Invitrogen). For generation of cDNA used for 50 - and 30 -rapid amplification of cDNA ends (RACE), the mRNAs were isolated using the GenElute mRNA miniprep kit (Sigma-Aldrich). cDNA was then prepared from ca. 110 μg of isolated mRNA using the SMARTer PCR cDNA Synthesis kit (Clontech-Takara, BD Biosciences). In addition, total RNA preparations also served for regular reverse transcriptionpolymerase chain reactions (RT-PCRs). The RTs were carried out using either Superscript II reverse transcriptase (Invitrogen) and an oligo-dT-adapter (50 -GCTGTCAACGATACGCTACGTAACGGCATGACAGTGT18V-30 ), or Moloney murine leukemia virus (MMLV)-reverse transcriptase (Promega) and an oligo-dT anchor primer (50 -GACCACGCGTATCGATGTCGACT16V-30 ) for 50 min for both at 42 °C followed by an inactivation for 15 min at 72 °C and cooling on ice. For RT-PCRs, thermostable DNA polymerase (GoTaq green kit, Promega) or the Advantage2 PCR kit (Clontech-Takara) were 4481

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Figure 1. MALDI-TOF mass spectra from direct profiling of a D. pulex brain. (A) Altogether, 37 peptides and 4 precursor-related peptides (PRPs) originating from 17 different neuropeptide families were assigned; inset (arrow; stippled rectangle) shows enlarged FIRFamide 2 peak. (BD) Details from the boxed ranges in the upper complete mass spectrum. ASTA, allatostatin A-type; ASTB, allatostatin B-type; AT, allatotropin, CPON, carboxyterminal peptide of NPY; DH31, diuretic hormone 31; IRP-2, insulin-related peptide 2; MS, myosuppressin; NPF, neuropeptide F; RYa, RYamide; SIFa, SIFamide; SK, sulfakinin; sNPF, short neuropeptide F; OK, orcokinin; TKRP, tachykinin-related peptide.

used both following the manufacturers’ instructions. Gene-specific primers were created from genome sequences and synthesized commercially (MWG, Munich, Germany, Suppl. Table S2, Supporting Information). For PCRs, touch-down conditions were used as follows: 95 °C for 1 min, 4 cycles of 95 °C for 20 s, 72 °C for 1 min, 5 cycles of 95 °C for 30 s, 70 °C for 1 min, and 36 cycles of 95 °C for 1 min, annealing at 5860 °C depending on the respective primers for 1 min, elongation at 68 °C for 1 min, and finally for 3 min. For PCRs other than RACE-PCRs, the touchdown conditions were modified: 94 °C for 2.5 min, 5 cycles of 94 °C for 0.5 min, 72 °C for 1.5 min, 5 cycles of 94 °C for 0.5 min, 70 °C for 1.5 min, 26 cycles of 94 °C 1 min, annealing at 5665 °C depending upon the primers for 0.51 min, elongation at 72 °C

for 0.52 min and finally for 69 min depending upon product length. PCR products obtained with gene-specific primers are listed in Supplementary Table S2 (Supporting Information). In some cases, they were purified from excised agarose gel pieces (QIAEX II gel purification kit, Qiagen) and directly sequenced commercially (LGC Genomics, Berlin, Germany). For sequence and bioinformatics analyses Vector NTI v.9.0.0 (Informax/Invitrogen) or BioEdit v.7.0.5.3 was applied.

’ RESULTS An overview of our results is given in Table 1. We have identified and annotated 43 D. pulex genes that code for 73 4482

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Figure 2. Six novel Dappu allatostatin A-type peptides. (A) None of the DappuASTA peptides containing a typical Y/FXFGLamide-motif is identical to any known decapod crustacean or insect ASTA. Asterisks indicate peptides identified by mass spectrometry. (B) MALDI-TOF/ TOF fragmentation spectrum (direct profiling) of DappuASTA5 at m/z 907.48 from a brain preparation with identified b- and y-fragment series (P, immonium ion of proline).

predicted neuropeptides from 45 precursors (Table 1; Suppl. Table S1, Supporting Information). The final annotations were all checked by gene prediction programs, and many neuropeptide gene structures were supported by expressed sequence tags (ESTs) and/or confirmed by RT-PCR and RACE methods (Suppl. Table S2, Supporting Information). MALDI-TOF mass spectrometry applied to water flea brain tissue established mass matches for 40 neuropeptides and seven precursor-related peptides (PRPs), of which 30 neuropeptides and four PRPs were sequenced by MS/MS (Figure 1, Suppl. Tables S1 and S3, Supporting Information). These peptides and protein hormones are dealt with alphabetically in the following. Adipokinetic Hormone (AKH)/RPCH-Related Peptide

In decapod crustaceans, RPCH is highly conserved21 but structures of crustacean RPCH genes were hitherto known only for the crab Callinectes sapidus.22 Decapod RPCH is closely related to a large number of insect AKHs.23 The D. pulex genome contains a single gene similar in structure to the decapod RPCH gene in C. sapidus22 and insect AKH-genes (Suppl. Figure S1A, Supporting Information). The derived precursor is similar in structure to those of decapod RPCH precursors and some insect AKH precursors.23 It consists of a 21aas SP, an AKH/RPCH-like octapeptide with a C-terminal glycine as putative amide-donor adjacent to a typical dibasic KR-processing site, and a precursor related peptide (PRP). We confirmed the proposed octapeptide as pQVNFSTSWamide by mass spectrometry (Table 1). As this peptide is much more similar to AKHs from insects than to decapod RPCH, it is adequate to call it DappuAKH (Suppl. Figure S1B, Supporting Information, and Table 1). Compared with decapod RPCH, DappuAKH has three aa-exchanges in positions 2,

Figure 3. Identified Dappu allatostatin B peptides compared with those from other arthropods. (A) Seven of the eight DappuASTB-peptides show the typical W(X6)W-amide structure (*identified by direct profiling fragmentation analysis). (B) MALDI-TOF/TOF fragmentation spectrum of DappuASTB7 with labeled y- and b-fragments and immonium ions. (C) Alignments of DappuASTBs (underlined) with the closest related crustacean (dark blue) and insect (green) ASTBs. Only DappuASTB5 shows higher aa-identities with lobster ASTB than with cockroach (Periplaneta americana PeramASTB152) and fly ASTBs, but not with crab Cancer borealis CanboASTB1. DappuASTB6,-7,-8 are closer related to insect ASTBs, DappuASTB8 being particularly close to DromeASTB4. (Crustacean ASTBs from: lobster Homarus americanus ASTB, crab Cancer borealis CanboASTB1 (identical to Carcinus maenas ASTB13); insect ASTBs from: Drosophila melanogaster DromeASTB4,153 cricket Gryllus bimaculatus GrybiASTB6 Q5QRY7).

6 and 7, but shares up to two more identities with some AKHs from hemimetabolous insects such as GrybiAKH and AnaimAKH (Val2Thr6 or Val2Ser7, respectively). Other closely related AKHs from insects with maximally two exchanges (as e.g. in bug CorpuAKH, locust SchniAKHII, and beetle TricaAKH1, among further AKH-isoforms with intermediate characters) share the Thr6 or the Ser7 but are with a Leu2, Pro6 or Gly7 similar to RPCH (Suppl. Figure S1B, Supporting Information). By testing various related insect AKHs, Marco et al. found that in fact the Leu2 to Val2 exchange already leads to a substantial loss of RPCH-like activity, which is virtually lost 4483

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Figure 4. MALDI-TOF/TOF fragmentation spectrum (direct profiling) of Dappu allatostatin B3 (DappuASTB3, black letters on white background) at m/z 1112.6 and DappuRYamide 1 (RYa1; white letters on black background) at m/z 1115.5 from a brain preparation. Prominent y- and b-fragments from both ion signals are labeled. The fragments were analyzed manually to confirm the predicted amino acid sequence of DappuASTB3 as well as of DappuRYamide 1 in the sample that could otherwise not be separated due to mass similarity.

when, in addition, exchanges such as Gly7 or Pro7 to Ser7 or Thr7, respectively, occur as is the case in DappuAKH.24 Allatostatin-A (ASTA) Type Peptides

In crustaceans, up to 40 different allatostatin-A (ASTA) type peptides were identified from crab, crayfish and prawn CNSs,1012 some of which even arose from post-translational processing and modification.12 Even more ASTA peptides were found in several crustaceans with the aid of transcriptome mining and mass spectrometric analyses.13 Whereas in insects, a maximum of 14 different ASTA peptides can be derived from a single mRNA precursor,25 the three known crustacean ASTA-precursors yield up to 29 different peptides, as revealed in crayfish, shrimp and a copepod.2628 The 3-exon Dappu astA-gene is the first unraveled crustacean gene for ASTA-type peptides and fully supported by several ESTs and our RT-PCR results (Suppl. Table S2, Supporting Information). It encodes single copies of six different Dappu ASTA peptides with a typical C-terminal F/YxFGLamide motif, and three precursor-related peptides (PRPs) separated by R or KR-cleavage sites (Suppl. Table 1, Supporting Information). We have identified four of the six DappuASTA peptides (Figures 1A, B,D and 2A) by fragment analysis (Table 1, Figure 2B). This small ASTA peptide diversity in D. pulex is much lower when compared with the multitude of ASTA peptides on all known decapod and many insect ASTA-precursors.29 Interestingly, however, all six DappuASTA peptides are different from those of decapods and insects with regard to one or more aas among the first 319 aas before the C-terminal pentapeptide motif. Allatostatin-B (ASTB) Type Peptides

Allatostatin-B (ASTB) peptides are well-conserved nonapeptides sharing the characteristic sequence motif W(X)6Wamide and some weak similarity to vertebrate galanin. These peptides exhibit myoinhibitory, juvenile hormone, and ecdysteroid biosynthesis-inhibiting effects in insects.30 VPNDWAHFRGSWamide was the first neuromodulatory ASTB-type peptide identified in the crab C. borealis followed by a number of similar decapod crustacean peptides found by mass spectrometry.13

The 5-exon Dappu astB-gene is much more complex than the corresponding 2-exon genes in D. melanogaster (CG6456) and T. castaneum (GLEAN_07661). It gives rise to a single mRNA. Beginning with an SP of 20aas, the open reading frame (ORF) spans exons 1 to 5 and encodes seven different typical ASTB type peptides with a W(X)6Wamide motif, a single DappuASTB2 with an unusual W(X)7Wamide motif (Figures 1B,C and 3A), and three PRPs. We identified five W(X)6Wamide-type peptides by fragmentation-sequencing (Table 1, Figures 3B and 4). To our knowledge, none of the DappuASTB peptides is identical to any known crustacean or insect ASTB-type peptides. However, ASTB 68 show very close similarities to insect ASTB isoforms by sharing not only the typical W(X)6Wamide motif but up to a maximum of six identical and up to even seven closely related aas. Only DappuASTB5 shows higher aa-identities with lobster HomamASTB than with insect ASTB, but only three identical residues can be found when comparing the DappuASTB sequences with that of the longest C. borealis CanboASTB31 as detailed in Figure 3C. Allatostatin-C (ASTC) Peptides

Allatostatin-C (ASTC) peptides were first discovered in Manduca sexta. They share the common sequence motif (X)5C(X6)CF which can be C-terminally amidated or not. They are derived as a single copy from precursors giving rise to mature peptides usually carrying either a nonamidated (X)6CYFNPISCF -often N-terminally blocked by pyro-Glu- or a SxWKQCAFNAVSCFamide as distinctions of this motif.32 Recently, it was established that pairs of paralogous astC- and closely related so-called astCC-genes form head-to-tail clusters on the chromosomes of several insects from very different systematic groups.33 The derived AST-C/AST-CC type peptides normally belong to either the mentioned subtypes or further N-terminally extended forms. In decapod crustaceans, the tetradecapeptide SYWKQCAFNAVSCFamide and the pentadecapeptide CanboASTC1 pQIRYHQCYFNPISCF were recently found well conserved in a large variety of species3436 and recently also as EST-predictions from D. pulex.37 SYWKQFNAVSCFamide has both neuromodulatory (in the crab stomatogastric system) and cardioactive properties.34 4484

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Figure 5. Four paired clusters of D. pulex neuropeptide and protein hormone genes without intervening genes in tandem arrangements and their derived mRNAs. (A) Dappu astC2/astC1-gene cluster showing tandem head-to-tail arrangement of two similar genes with the positions of the respective DappuASTC1/C2 (A) codons (note that the intergenic space has been truncated for clarity). (B) Dappu bursicon B and A (bursB/A; BB, BA)-genes in a head-to-head arrangement only 370bp apart. (C) Tail-to-tail orientated tandem of long and short Dappu eclosion hormone (ehl/eh)-genes. (D) Two glycoprotein hormone-encoding Dappu gpa2- and Dappu gpb5-genes in tail-to-tail arrangement. Note the only 6 bp-long stretch between both genes. Light-gray shading, S or SP signal peptide; medium dark-gray shading, coding regions; dark-gray shading, identified peptide/protein hormone coding regions; very light-gray shading, 50 and 30 untranslated regions; e, exon; i, intron (superscript numbers 02 indicate intron phase); bp, base pairs; asterisk indicates stop codon; >,< plus, minus strand reading direction.

Figure 6. DappuASTC13 peptides and alignment with insect (green) and decapod (dark blue) crustacean ASTCC/C peptides. (A) Alignment of the sequences of the novel DappuASTC2 and the DappuASTC1 peptides (as arranged in tandem in the gene cluster, see Figure 5A) shows strict conservation of the cystine-bridges and the Cys-flanking aas with those in ASTCs/ASTCC tandems from insects. Note higher numbers of aa-identities in DappuASTC2 and ASTCC-like peptides compared with those in the ASTC1/ASTC-peptides. (B) DappuASTC13 peptides aligned to each other show all hallmark aas with lowest consensus aas among each other (bold-faced) in between and next to the cystine-bridges but distinct differences at the N- and C-termini. (C) DappuASTC3 is more similar to the decapod crustacean CanboASTC135 and locust ASTCC than to the DappuASTC1 and DappuASTC2 and has the largest consensus. (Insect ASTC/CC sequences from Veenstra, 200933).

Similar gene, precursor and peptide structures were found for three astC13-genes in the D. pulex genome. Our current gene models clearly comprise a gene cluster of the 3-exon Dappu astC2-gene followed downstream by the revised 4-exon Dappu astC1-gene spanning about 5.2kB (5227bp) on scaffold 9 (Figure 5A). This gene cluster is an intermediate between the known Drosophila and Tribolium AST-CC/C gene clusters. The peptides derived from this gene cluster are either highly similar or identical to several known ASTC peptides from insects33 and crustaceans (Figures 5A and 6A). All DappuASTC13 peptides have a similar core structure with a consensus x--C-FN-xSCF (Figure 6B). DappuASTC1 (SYWKQCAFNAVSCFamide) is

identical to the peptide widely conserved among decapod crustacean and some insect groups.34 DappuASTC2 (GQSSQRVFWRCYFNAVSCF) closely resembles several of the insect AST-CClike peptides, but such peptides are not known from other crustaceans. The third Dappu astC3-gene has been high-score annotated as a 3-exon gene on a different scaffold 26 (Suppl. Table S1, Supporting Information). Its intron-2 splits the coding region of the peptide DappuASTC3 with the predicted sequence SKQLRYHHCYFNPISCFOH, which is much more similar to crab CanboASTC1 than DappuASTC1 and DappuASTC2 (Figure 6C). We have so far only been able to identify the fragment ASTC3-PRP11142 by mass spectrometry (Suppl. Table S3, Supporting 4485

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

middle part of the peptides to an AT-like L. migratoria abdominal ganglia myotropin-1 LocmiAGMT141 and moth M. sexta AT, which are larger in the cases of chelicerate (tick Ixodes scapularis) or mosquito AT (Figure 7C). This is the first AT identified in crustaceans; and it came to our surprise, that decapod crustaceans do not seem to have an AT-like peptide since we did not find any even by sophisticated BLAST searches. Bursicon

Figure 7. Novel Dappu allatotropin (AT) gene, the derived DappuAT and comparison with insect and chelicerate allatotropins. (A) The 2-exon gene (upper panel) encodes DappuAT (A) flanked by two precursor-related peptides (PRP), the second being split by the intron (shadings as in Figure 5). (B) Fragmentation mass spectrum (direct profiling) of DappuAT at m/z 1323.7 with labeled y- and b-fragments and immonium ions. (C) DappuAT shows high similarities to allatotropin-like peptides from hemimetabolous (L. migratoria accessory gland myotropin LocmiAGMT140) and holometabolous insects (green; M. sexta ManseAT, A. gambia AnogaAT154), but also to two allatotropin-isoforms from the chelicerate Ixodes scapularis (EEC06620; orange).

Information), which, however, confirms the expression of the ASTC3-precursor in the D. pulex brain. Allatotropin

Allatotropins were originally discovered in a moth as stimulators of juvenile hormone production in corpora allata but are now considered multifunctional insect peptides with myotropic, cardio-excitatory and circadian clock-modulating activities.38 In the CNS of different moth species, complex alternative splicing processes can lead to up to three different mRNAs derived from a single gene, which tissue-specifically encode up to three different peptide splice forms.39,40 In D. pulex, one single 2-exon Dappu allatotropin (at)-gene yields a novel AT-like peptide DappuAT, the amidated tridecapeptide GFKTVGLATARGFamide, flanked by dibasic cleavage sites (RR, KR) and two PRPs that are different in length (Suppl. Tables S1, Supporting Information, and Figure 7A). We have confirmed the exon/intron structure of the precursor by RTPCR (Suppl. Table S2, Supporting Information) but we did not find any products of alternative splicing. We sequenced the peptide by fragmentation mass analysis (Table 1; Figures 1C and 7B). DappuAT shows only minor sequence differences in the

The protein hormone bursicon is known for more than 80 years as the cuticle tanning and sclerotization factor of insects. It occurs in insects as a heterodimer or homodimer of bursicon-A and/or bursicon-B subunits. Both subunits are large N-glycosylated proteins with typically eleven cysteines, ten of which are involved in the formation of cystine-knots.42 Bursicons occur ubiquitously in all hitherto analyzed insects, similarly in some decapod crustaceans and Daphnia arenata, and perhaps even in some chordates, but are absent from nematodes.4345 The D. pulex genome contains separate bursA- and bursBgenes, one for each bursicon subunit, that lie only 370bp apart to form a head-to-head inverted tandem (Figure 5B). The Dappu bursA-gene shows the best fit with currently available ESTs especially with regard to the SP and the mature peptide ORF (Suppl. Figure S2A, Supporting Information). Although several insects do also have 3-exon-genes, this cluster formation of the Dappu bursA and bursB subunit genes appears unique. According to Robertson et al. 2007,46 the water flea bursA-gene may have lost the first phase-1 intron (now a phase-0 intron) that the authors considered characteristic for all insects. However, the Dappu bursB-gene has retained such phase-1 intron (Figure 5B). While both bursicon subunit mRNAs of the two species D. pulex and D. arenata44,45 are almost identical (one silent base pairexchange in the bursA-ORFs, three more outside), their precursor-derived protein hormones are completely identical. Alignments of DappuBursA and DappuBursB with decapod crustacean and insect sequences show a high degree of similarities especially with regard to the cystine-knot part (Suppl. Figure S2A,B, Supporting Information). Most likely a gene duplication event gave rise to this gene cluster. CCHamide

Originally described as a parturition hormone with the sequence GCLSYGHSCWGAHamide in the viviparous Tsetse fly Glossina palpalis,47 a related peptide, GCAMFGHSCYGAHamide was later discovered in the genome of the silk moth B. mori and named CCHamide because of its cyclic nature brought about by a cystine-bridge and the C-terminal His-amide.48 We have recently found that all insects with a sequenced genome have two distinct CCHamides, CCHa1 and CCHa2, derived from two different genes, and that each activates its own specific receptor.49 In the D. pulex genome we have only found one CCHamide gene and not two as in insects. This gene gives rise to a 106aas long precursor from which DappuCCHamide (Table 1) is liberated directly after the SP via a signal peptidase and from a C-terminal dibasic proprotein convertase cleavage site preceded by a Gly that allows for an amidation (Suppl. Table S1, Supporting Information). This precursor structure is a hallmark of CCHa1-type peptides. The CCHa2-type peptides all derive from precursors with dibasic cleavage sites at both ends of the peptides.49 When aligned with some selected insect CCHamide1 and -2 subtypes, DappuCCHamide is intermediate between the 4486

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research two forms with slightly higher similarities to CCHa1-type peptides (Suppl. Figure S3A,B, Supporting Information). Corazonin

The N- and C-terminally blocked undecapeptide corazonin, pQTFQYSRGWTNamide (Arg7-CRZ), is known from many insect species.50 In locusts, however, a His7-CRZ isoform controls behavioral and morphological transitions during swarm formation and migration.5153 However, Arg7-CRZ is more ubiquitous with pleiotropic functions related to the initiation of ecdysis behavior54 and stress control55,56 in insects, but CRZ-genes are lacking in some insects such as beetles and aphids.57,58 Arg7-CRZ is so far the only isoform detected in decapod crustaceans,59,60 and single ESTs are known for Arg7-CRZ-precursors of shrimp L. vannamei60 and water flea Daphnia carinata.61 The Dappu crz-gene is a 3-exon gene which encodes a 133aas long precursor containing the peptide DappuArg7-CRZ directly following the SP (Suppl. Table S1, Supporting Information). We fully identified this peptide as N-terminally blocked by a pyro-Glu using fragmentation sequencing (Table 1, Figure 1B). Apart from DappuCRZ, however, the insect, decapod and DappuCRZ precursors have only little in common except for the presence of cysteine residues in CRZprecursor associated peptides (Suppl. Table S1, Supporting Information). In fact, these peptides are much less conserved than AKH-precursor related peptides (APRPs) which may indicate a more rapid evolution of the crz-genes. Crustacean Cardioactive Peptide

In all hitherto investigated arthropod groups, crustacean cardioactive peptide (CCAP) is an extremely conserved cyclic nonapeptide PFCNAFTGCamide with an intramolecular disulfide bridge.6 CCAP is a well-known key player in ecdysis regulation in insects and crustaceans.6264 As unraveled here for the first time in crustaceans, there is a single 3-exon Dappu ccap-gene that encodes the nonapeptide DappuCCAP typically flanked by a dibasic and a tribasic processing site. This novel isoform of CCAP, PFCNAFAGCamide, with Ala7 exchanged for Thr7 (Table 1; Suppl. Table S1, Supporting Information) has a primary structure that is unusual among crustaceans and insects, since in all hitherto investigated arthropod groups, no CCAP isoform other than the original one has ever been found. Much more different isoforms are only known from some worms and mollusks.6,65 The DappuCCAPprecursor structure, however, resembles those of decapods and insects.66 Furthermore, the overall Dappu ccap-gene structure itself is remarkably similar to ccap-gene structures of M. sexta,67 D. melanogaster (CG4910) and T. castaneum (LOC664330), except that the Manse ccap-gene has 50 upstream a fourth additional exon followed by an intron interrupting the SP. DENamides

The novel Dappu dena-gene gives rise to a precursor that liberates at least nine closely related previously unknown peptides which we named DENamides: five copies of the decapeptide FMGLSHFDENamides, two copies of Lys2-DENamide, FKGLSHFGENamide, and another related peptide KCHFDENamide and the C-terminal FMGLSHFHTKTWTOH (Suppl. Table S1, Supporting Information). It was found by BLAST searches using invertebrate FMRFamide-related peptide precursors (e.g., from Mytilus edulis CAA10949) with which it mainly shares similarly staggered KRFM sequences. Otherwise, DENamides have no resemblance with any other known arthropod peptide.

ARTICLE

Diuretic Hormones DH31 and DH52

Apart from CAPA/PVK-related peptides (see also below) and insect kinins (not found in the D. pulex genome), two other families of C-terminally amidated peptides are known as diuretic hormones (DHs), namely calcitonin-like DHs (CLDHs) with 3037aas and corticotropin-releasing factor (CRF)-like DHs with 4047aas in a large variety of insects.68 In crustaceans, so far only one 31aas-long member of CLDHs has been identified for the lobster Homarus americanus.69 CLDHs have a well-conserved structure of 31aas, therefore often being called diuretic hormones 31 (DH31). Furthermore, we provide here the first evidence for the existence of a CRF-like DH in crustaceans. The Dappu dh31-gene is a 3-exon gene in which all exons contribute to the DappuDH31-precursor. These structures clearly resemble those of the Drome dh31-gene (CG13094), which, however, contains two more exons 50 upstream separated by a large intron (10330bp) not found in D. pulex. We have confirmed the exon/intron-structure of the ORF by RT-PCR (Suppl. Table S2, Supporting Information) and the sequences of DappuDH31 together with the PRP-1 SLLRNSDYEHQ by fragment analysis (Table 1, Suppl. Table S3, Supporting Information, Figure 1A,C). The sequences of DH31s from isopod and decapod crustaceans and insects are extremely conserved as detailed in Suppl. Figure S4A (Supporting Information). The 4-exon Dappu dh52-gene, found on a different scaffold, gives rise to a 191aas long precursor with a 25aas long SP that liberates the novel 52aas long amidated DappuDH52 and three smaller PRPs (Suppl. Table S1, Supporting Information). DappuDH52 is the first crustacean CRF-like DH and shows closest similarities in length and overall structure to beetle TricaDH47, but is also quite similar to DH46s from hemimetabolous and to DH44s from holometabolous insects (Suppl. Figure S4B, Supporting Information). Ecdysis-Triggering Hormones

A single 4-exon Dappu eth-gene in the D. pulex genome encodes a precursor with two ecdysis-triggering hormones (ETHs) in tandem, DappuETH1 and DappuETH2, and two long PRPs (Suppl. Table S1, Supporting Information). The structures of gene, precursor and peptides closely resemble insect ETH genes and gene products. Alignment with other insect ETHs shows that both ETH-peptides bear the typical ETHsignatures, that is, PRXamide (X= I, L, V, M) C-termini.48,70 DappuETH2 is, however, much longer than insect ETHs (Suppl. Figure S5, Supporting Information). To date there are no decapod ETH-peptides known. Eclosion Hormones

The D. pulex genome contains two genes, the Dappu eh- and Dappu ehl-genes, each encoding a single eclosion hormone (EH)-like peptide. Both genes occur as a tail-to-tail inverted tandem cluster right next to each other on the same scaffold. DappuEHL is with unusual 113aas about twice as long as DappuEH (56aas, Figure 5C, Suppl. Table S1, Supporting Information). By use of RT-PCR we confirmed the predicted structure of the DappuEHL precursor especially with regard to the exon/intron positions (Suppl. Table S2, Supporting Information). DappuEH strikingly resembles insect EH peptides. It bears with several insects up to 53% of sequence identity (e.g., with beetle TricaEH; 71% sequence similarity) while a decapod EH from the crab C. sapidus and from the chelicerate I. scapularis exhibit less similarity as detailed in Figure 8. The DappuEHL structurally resembles DappuEH and other arthropod EHs with regard to signature parts such as the positions of the first 4487

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Figure 8. Dappu eclosion hormones (DappuEH and EHL, underlined) aligned with other arthropod EHs. Except for the long EHL isoforms of the branchiopod crustaceans D. pulex and horseshoe shrimp Triops cancriformis (FM869498; dark blue), all EHs show close structural similarities extending not only to the positions of cysteines (shaded red) but also to a similar helix-motif (h) character as worked out recently for ManseEH.155 DappuEH pertains, however, lowest similarity to the EH from the crab Callinectes sapidus (CalsaEH CV224237; dark blue). The overall aa-similarities of DappuEHL with DappuEH, with insect EHs (below, green), and the decapod EH (above) are low, slightly higher only with long TriopsEHL. DappuEHL has a signature similar to that of DappuEH but has an additional 16aa long stretch between the last two of the six cysteines and a much longer C-terminus; the latter feature it shares with the Triops EHL. (Insect EHs from: beetle Tribolium castaneum XP_969164, pea aphid Acyrthosiphon pisum XM_001943582, mosquitoes Aedes aegyptii CH477404.1, Anopheles gambiae XM_001230804, fruit fly D. melanogaster CAA51051, moths Bombyx mori NM_001043842, Manduca sexta M27808.1, Asian corn borer Ostrinia furnacalis ABG66962.1; chelicerate EH from: Ixodes scapularis EH XM_002399230, orange).

five cysteines and sequence stretches in between. However, DappuEHL has only 29aas identical to DappuEH and bears a glycinerich 16aas longer stretch between the last two cysteines and a 40aas longer C-terminus than DappuEH (Figure 8). The overall similarities of DappuEHL with DappuEH, other insect EH, and a single decapod EH are low but slightly higher only with a novel EHL from the branchiopod Triops cancriformis, with which DappuEHL also shares a similarly elongated C-terminus (Figure 8). This Dappu ehl-/Dappu eh-gene cluster may have resulted from a gene duplication.

and known only from a couple of insects such as the beetle T. castaneum (TricaFaRP, GLEAN_07769) and the cockroach Periplaneta americana71 but more common in nematodes such as C. elegans (three flp-genes with 12 FIRFamides72). Of note for comparison, in insects, the PeramFaRP-precursor encodes 10 different FIRFamides but no FLRFamide,71 whereas the TricaFaRP precursor encodes a mixture of FIRFamides and FLRFamides. Other FIRFamides in B. mori are the prothoracicostatins (BommoFMRFamides BRFa-1 to -4) that regulate ecdysteroidogenesis in silkworm prothoracic glands.73

Extended FMRFamides

Glycoprotein Hormone GPA2/GPB5

The Dappu firfamide-gene is the first extended FMRFamide gene unraveled in crustaceans, encoding two DappuFIRFamides and six PRPs (Suppl. Table S1, Supporting Information). We identified the mature FIRFamides, including an N-terminally truncated variant, and two PRPs by mass spectrometry (Table 1, Suppl. Table S3, Supporting Information, Figures 1A,C and 9A). The peptides are flanked by mono- or dibasic Arg-cleavage sites (R; RR) and an exceptional monobasic Lys-cleavage site occurring between the fully sequenced amidated 27-mer PRP-3 and DappuFIRFa-2 (Table 1, Suppl. Tables S1, S3, Supporting Information). FIRFamide-encoding genes are relatively rare in arthropods

Further members of the cystine knot-containing glycoprotein hormone family are essential for gonadal and thyroid functions in all vertebrates, and may have many roles in development. The glycoprotein hormone alpha- (GPA-) subunit forms heterodimers with different beta- (GPB-) subunits in mammalian gonadotropins (LH and FSH) to activate respective receptors. Two novel subunits recently discovered in humans form a heterodimer called thyrostimulin, consisting of alpha- (GPA2) and beta-subunits (GPB5), activate the thyroid-stimulating hormone (TSH)-receptor. 74 Similar molecules have been discovered in lower chordates and invertebrates including insects and nematodes.48,75 4488

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

similarities were found for the body louse Pediculus humanus, dipteran and beetle glycoproteins (Figure 10A,B). BLAST searches did not find any identified or predictable GPA2/GPB5 in decapod crustaceans. Inotocin

Oxytocin (OT)- and vasopressin (VP)-like peptides are phylogenetically old and highly conserved multifunctional nonapeptides widespread among proto- and deuterostomian lineages.76,77 In insects a VP-like peptide was originally isolated from L. migratoria78 and twenty years later VP-like preprohomone and receptor genes were identified in the beetle T. castaneum.77,79 It is interesting, however, that genes for insect oxytocin/vasopressin-like peptide (inotocin) and its receptor appear to have been lost in a number of holometabolous insects.77,79 Bioinformatics analyses of the D. pulex genome showed that also crustaceans contain an inotocin-like peptide, CFITNCPPGamide, with the typical cystine-bridge but with the unprecedented occurrence of a proline in position 8.77 Here, we provide a revised version of the Dappu-inotocin (it) gene as confirmed by RT-PCR (Suppl. Table S2, Supporting Information). Its gene structure is typical for all hitherto known OT/VP-like peptides encoding genes76 including those recently discovered in holometabolous insects57,77,80 (Figure 11A). The gene gives rise to a precursor similar to those from other OT/VPlike peptides: the SP is directly followed by the encoded DappuIT, and a C-terminal neurophysin domain. The latter contains 14 cysteine residues aligning well with neurophysins from other OT/VP-like precursors likely giving rise to seven cystine-bridges (Figure 11B). Insulin/Insulin-like Growth Factor (IGF)-Related Peptides

Figure 9. MALDI-TOF/TOF fragmentation spectra of FMRFamide related peptides of D. pulex. (A) DappuFIRFamide 1 at m/z 1208.68, (B) Dappu myosuppressin (MS) at m/z 1257.64, and (C) Dappu sulfakinin 2 (SK2) at m/z 1301.54 from direct profiling of brain preparations. Prominent y- and b-fragments and immonium ions are labeled. The fragments were manually analyzed and confirmed the predicted amino acid sequences.

For crustaceans, we provide here the first evidence for corresponding genes and separate derived protein hormone subunits in the D. pulex genome. The Dappu gpa2-gene and the Dappu gpb5-gene are both 3-exon genes clustered tail-to-tail right next to each other (only 6bp apart) on the same scaffold (Suppl. Table S1, Supporting Information; Figure 5D). They give rise to large precursor-derived protein hormones of 118aas (DappuGPA2) and 133aas (DappuGPB5) that have 10 cysteines involved in the typical cystine-knot formation (for details see: Figure 10A,B). Precursor comparisons show that DappuGPA2 and DappuGPB5 are closely related to well-conserved glycoprotein heterodimers in insects, and to some extent to mouse and human glycoprotein hormones. Interestingly, highest

Invertebrate insulin-related peptides (IRPs) are derived from multigene families that are expressed in the brain and peripheral tissues. These peptides can serve as hormones, growth factors or neuromodulators.81 Insulin-like peptides in arthropods were first discovered in B. mori as bombyxin,82 which turned out to be just one out of many similar IRPs encoded by 38 bombyxin genes in B. mori; this extreme is not found in other insects, which have only one IRP-gene as in locust species or up to 78 IRP-genes as in dipterans.81 In crustaceans, insulin-related peptides are so far known only as androgenic gland hormones (AGHs) responsible for differentiation and determination of sex. First AGHs were identified in woodlice as complex N-glycosylated clearly insulinrelated androgenic gland peptides (IAGs) that elicit male phenotypes83,84 and later also in decapod crustaceans.85,86 Hallmarks of these IRPs include the typical sequence of B-, C-, A-chain peptides with distinct cleavage sites (KR, RxxR) and putative N-linked glycosylation motifs (NxS/T). In all IRPs except for IGFs, the C-peptide is cleaved from the precursor after the canonical cystine-bridges have formed. Here, we describe four different single genes encoding one typical IRP, and three insulin-like growth factor/relaxin-related peptides in the D. pulex genome. Originally, DappuIRPs14 were numbered in series of their first annotation by us. The 5-exon Dappu irp2-gene gives rise to the DappuIRP2-precursor that resembles most closely the insect insulin precursors, especially those of locust IRP and some bombyxin precursors (Figure 12A,B). This gene structure appears unique since insulin genes are normally 3-exon genes, only IGFgenes have 45 exons.87,88 The DappuIRP2-ORF spans four exons and the coding regions of the B-chain and the C-chain are interrupted by introns 3 and 4, respectively, as confirmed by RTPCR (Suppl. Tables S1, S2, Supporting Information). By mass spectrometry, we identified a B-chain C-terminally flanked by a 4489

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Figure 10. Alignments of (A) DappuGPA2- and (B) DappuGPB5-subunit precursors with those of other arthropod and mammalian cystine-knot proteins. Similarities are higher in the cases of GPA2 than of GPB5 proteins. Most highly conserved are the actual cystine-knot sequences embraced by the first and the tenth cysteine with some parts of lower conservation between the fourth and the fifth cysteine in the GPA2s or the fifth and the sixth cysteine in the GPB5s (asterisks indicate ring parts with conserved aas of the cystine-knot built by cysteines 37 and 48). Insect (green) GPA2, GPB5 from: mosquitoes Anopheles gambiae AnogaGPA2 XM_317164.4, AnogaGPB5 XM_555160.3, Culex quinquefasciatus CulquGPA2 XM_001863270.1, CulquGPB5 XM_001863269, moth B. mori BommoGPA2 NM_001130903.1, BommoGPB5II BN001262.2, fruit flies Drosophila virilis DroviGPA2 XM_002051238.1, DroviGPB5 XM_002051240.1, DromeGPA2 AY940435.1, DromeGPB5A NM_001110865.2, louse Pediculus humanus PedhuGPA2 XM_002427773.1, PedhuGPB5 BN001256.1; beetle TricaGPA2 NM_001170773.1; TricaGPB5 BN001258.1; mammalian (gray) GPA2, GPB5: Homo sapiens HomsaGPA2 BC093962.1, HomsaGPB5 AF467770.1, mouse Mus musculus MusmuGPA2 NM_130453.3, MusmuGPB5 NM_175644.3; ∼ indicate ends of SPs.).

rare monobasic Lys-cleavage site but also two short PRPs directly upstream from the B-chain on the precursor (Table 1, Suppl. Tables S1, S3, Supporting Information, Figure 1A,D). Interestingly, the DappuIRP2-PRP-2 exhibits some similarity to a so-called locust IRP-precursor-related copeptide identified previously as the glycogenolysis-inhibiting peptide89 (Figure 12C). DappuIRP2

does not have any N-glycosylation sites, and when aligned with decapod and isopod crustacean IRPs distinct differences are detectable. These differences comprise not only the lack of an additional cystine-bridge as in isopod AGHs but also an additional amino acid between the cysteines of the B-chain and a 4aas-gap between the second and the third cysteine in the A-chain. 4490

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Figure 11. Comparison of oxytocin/vasopressin-like genes and preprohormones of D. pulex, other invertebrates and mouse. (A) The 3-exon genes for DappuInotocin (DappuIT), jewel wasp Nasonia vitripennis inotocin (NasviIT; NW_001819015), and mouse Mus musculus oxytocin (OT; OTTMUSG00000015526) are highly similar; note that both introns interrupt the neurophysin (NP) sequences twice in similar positions in all genes. (B) DappuIT and other oxytocin/vasopressin-like preprohormones from insects (green), mollusks, an earthworm (Eisenia fetida annetocin EisfoAT; BAA36458) and a vertebrate (Mus musculus; gray) align particularly well with regard to the oxytocin-like nonapeptides and the 14 cysteines of the neurophysins. Note longer stretches in the DappuIT-precursors flanking the first and the second neurophysin domains containing four and three cystine-bridges, respectively. (Insect ITs from: T. castaneum TricaIT GLEAN_06626, NasviIT NW_001819015; snails: Aplysia kurodai Lys-conopressin AplkuLCP BAB40371.1, Lymnaea stagnalis Lys-conopressin LymstCP AAA29289.1); intron positions are indicated by triangles, ∼ and diamond indicate ends of SPs.

However, the DappuIRP2 precursor clearly shares the latter two characters with typical insect IRPs and a molluscan IRP (Suppl. Figure S6, Supporting Information). The 4-exon Dappu irp1-gene encodes a precursor that does not have a C-peptide but a single putative dibasic cleavage site (RR) between the very long B-chain (50aas) and possibly a 24aas A-chain, if the Arg110 downstream of the last cysteine is used as a cleavage site. Thus, the precursor and the peptide are more reminiscent of D. melanogaster DromeILP6 and similar ILPs of other dipteran and hymenopteran insects than related to any decapod AGH except for the occurrence of a typical N-glycosylation site (NVS) at the beginning of the B-chain (Figure 12D, Suppl. Table S1, Suppl. Figure S6, Supporting Information). If not cleaved at Arg,110 there would be even two more such N-glycosylation sites (NQT, NES). Because of the lacking C-peptide, the long C-terminus with putative D- and E-domains and the similarity to DromeILP6, DappuIRP1 should thus be considered IGF-like88,90 as is likely also the case for the remaining DappuIRPs. The Dappu irp3- and irp4-genes are obvious paralogs of the Dappu irp1-gene and both have even more similar 4-exon gene structures and occur about 102kb apart from each other in tail-to-tail arrangement on the same scaffold 18. The

derived precursors are most similar to each other (49% amino acid identity, 62% conservation score) but also to the DappuIRP1 precursor (Figure 12D). When applying the rules for insect peptide hormone precursor cleavages,19 the DappuIRP3precursor resembles typical IGF-like precursors, in which the short C-peptide is not cleaved, since it has only unlikely dibasic (RK) and monobasic (R) cleavage sites. Ion Transport Peptides

Previous studies showed that water fleas Daphnia magna do not have a typical XOSG-system but only brain CHH-interneurons and peripheral CHH-ir neurons.91 The latter neurons are reminiscent of shore crab pericardial organ (PO) neurons expressing one of two alternative splice forms derived from a 4-exon CHH-gene encoding a 72 aas long amidated “short” sinus gland XOSG-CHH and a nonamidated 73aas POCHH.92 Similar alternative splice forms also occur for several insect short and long ion transport peptides (ITP/ITPLs), originally discovered as antidiuretic factors in locusts,93 and are expressed by the Dappu itp-gene. We used these names here instead of CHH/CHHL to adequately refer to their striking sequence similarities with ITP/ITPLs from insects (Suppl. Figure S7, 4491

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Figure 12. Dappu insulin-related peptide-2 (irp2) gene and comparisons of its gene products with insect IRPs and other DappuIRPs. (A) Complex 5-exon structure of the Dappu irp2-gene shows a long stretch between the SP and two identified precursor-related peptides (PRPs: 1, 2) preceding the B- and C-chains. The latter are both split by introns i3 and i4, respectively. The B- and A-chains of DappuIRP2 are closely related to those of insect IRPs (green). (C) DappuIRP2-PRP2 shows some similarities to locust glycogenolysis-inhibiting peptide (GIP89). (D) DappuIRP1, -3, and -4 precursors are very similar, particularly in the cases of DappuIRP3 and DappuIRP4 (note the DappuIRP3:4 comparison; putative cleavage sites in blue). Triple asterisks indicate possible N-glycosylation sites at NxS sites in DappuIRP1, -3, -4 or a NxT site in DappuIRP1; diamond indicates end of SP.

Supporting Information). Two further genes in the D. pulex genome encode clearly ITP-related but more elongated peptides. The Dappu itp-gene is a type-I CHH/ITP 4-exon gene.2,93 It gives rise to two mRNAs encoding either a short or a long alternative splice form of DappuITP (72aas) or DappuITPL (79aas), respectively, that we have confirmed by RT-PCR (Suppl. Table S2, Supporting Information). Only the short DappuITP has an amidated C-terminus. Both peptides lack the N-terminal pyroglutamate protective group characteristic for most decapod crustacean CHHs,2,14 but instead begin with a SFF-group as is typical for many insect ITPs. Further characteristics typical for all CHH-like ITP-precursors are the occurrence of a short SP, a short part of it usually being encoded by the first exon.2,93 The rest of the SP and a ITP-PRP, which precedes the

ITP-sequences, and the first half (here: aas 140) of the ITP in the mRNA-parts are derived from the second exon as is the case for all short and long CHHs and ITPs; the alternatively spliced exons 3 plus 4 or exon 4 encode the second half of all CHHs/ ITPs, respectively.2,93 The same is the case for the 4-exon Dappu itp-gene, in which exon-2 encodes a 12 aas DappuITP-PRP (MSALSSGHHSLS) that is similar in length to many insect ITP-PRPs but much shorter than in most decapod CHHprecursors3,93 (Suppl. Figure S7AC, Supporting Information). Two further CHH/ITP-like genes, the Dappu itpln-gene and the Dappu itpn-gene, exhibit clear-cut type-I CHH/ITP gene characteristics2,93 (Suppl. Figure S7A, Supporting Information), although these D. pulex genes are unusual 4-exon and 2-exon genes, respectively. Derived precursors of both genes do not contain an ITP-PRP but instead exhibit elongated N-terminal 4492

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research sequences following right after the predicted SP (Suppl. Figure S7B,C, Supporting Information). The DappuITPLN-precursor is, thus, a 99aas-long isoform that contains typical CHH-subfamily motifs2 but a very different N-terminus with a KHsequence in a position in which one would have expected a KR processing site (Suppl. Figure S7B, Supporting Information). Similar exchanges may be represented by the QQ-site instead of a KR in an EST previously claimed to encode a MIH (EST FE418183) which instead clearly resembles ITPs (Suppl. Figure S7C, Supporting Information). DappuITPLN hardly aligns with products of type II-MIH/MOIH-like genes but much better with long isoforms of DappuITPL and SchgrITPL (Suppl. Figure S8, Supporting Information; max. 35% vs 5557% identities, respectively). Another distinct feature of DappuITPLN is the lack of a short DappuITP-like half of such splice-form usually encoded by exon 4 in type-I CHH/ITP-genes.2 Thus, a clearcut type II MIH/VIH/GIH-like peptide apparently does not exist in the D. pulex genome contrary to a previous claim of the possible existence of a DappuMIH.37 The ESTs encode a variant with clearly DappuITP-like characteristics which, however, with the above-mentioned N-terminal characteristics is not present in the genome of D. pulex (Suppl. Figures S7C, S8, Supporting Information). Thus, D. pulex does neither have a MIH/MOIHnor a VIH/GIH-peptide with all the required molecular signatures, that is, (i) a typical additional Gly at position 5 after the first cysteine and (ii) a Val in position 4 before the second cysteine when aligned with CHHs/ITPs, (iii) the N-terminal so-called A10 - and A50 -helix motifs,2 and (iv) typically 3-exon genes2 (Suppl. Figure S8, Supporting Information). The DappuITPN-precursor derived from the third CHH/ ITP-like gene type clearly shows a putative deletion of the N-terminal 6aas SFFDIN (Suppl. Figure S7C, Supporting Information). This precursor, on the other hand, perfectly aligns with highest sequence identity with the DappuITPprecursor and shares three distinct characteristics: (i) a SP that is more than 2/3 identical, (ii) a nearly identical DappuITP-PRP-like part (2 aa-exchanges only) and (iii) highest identities from the first cysteine onward (6 aa-exchanges only; Suppl. Figure S7C, Supporting Information). Myosuppressin

Myosuppressins are conserved arthropod decapeptides with the consensus motif X1DVX2HX3FLRFamide. These peptides have their own specific receptors that are not activated by above-mentioned FMRFamides.94,95 The first peptide of this type was discovered as a factor inhibiting hindgut-contraction in the cockroach Leucophaea maderae, named leucomyosuppressin, pQDVDHVFLRFamide.96 Later on several isoforms with Thr1-, Pro1-, or other substitutions were discovered in a variety of insects.97,98 To date, the only isoform identified in several crustaceans and known to decrease heart beat frequency in the lobster H. americanus is pQDLDHVFLRFamide.99,100 The Dappu ms-gene shows typically two exons and gives rise to an 112aas precursor (with 36aas SP) similar to those of insects and decapods (Suppl. Tables S1, S2, Supporting Information). Fragmentation analysis confirmed the sequence of DappuMS (Figure 9B). Both the post-translationally modified pGlu-MS and the noncyclized Gln1-MS were identified by mass spectrometry and occur in a ratio of about 30/70% (Table 1, Figure 9B). In large parts of the insect and crustacean precursors aa-identities occur, especially two cysteines in the precursor-related peptide directly after the SP stand out as

ARTICLE

conserved feature apart from the mature peptide part itself. DappuMS (pQDVDHVFLRFamide) is identical to the myosuppressin from cockroaches, locusts, honeybee, and beetle, but differs by one aa from the MS of decapods (Leu3, H. americanus, GQ303179, P. clarkii, BAG68789.1) and dipteran insects (Thr1, A. gambiae, AGAP001474-PA, D. melanogaster, CG6440-PA). Neuroparsin

Neuroparsins (NPs) belong to a family of structurally related large peptides (>100aas) of arthropods. NPs are characterized by conserved positions of 1214 cysteine residues forming 6 cystine-bridges and share structural and functional similarities with vertebrate insulin-like growth factor binding proteins (IGFBP).101103 NPs were discovered in Locusta migratoria as products of pars intercerebralis neurosecretory cells of the brain (hence the name) and are now known multifunctional neurohormones.101 While usually only one NP-precursor has been found in many insects with annotated genomes,48,57,80 except for sophophoran Drosophila species which lack NPs,103 locusts have up to four NP splice forms,101 but the Chagas-bug Rhodnius prolixus has even three np-genes.103 In crustaceans, NP-like peptides have so far only been predicted from ESTs for some decapods and a copepod.5961,103 In the D. pulex genome, there is one single 3-exon Dappu npgene encoding a 78aas long neuroparsin with 14 cysteines (allowing for seven putative cystine-bridges), as we have confirmed by RT-PCR (Suppl. Table S2, Supporting Information). Strongest aa-identities with decapod and insect NPs occur in some core regions from the fourth to the thirteenth cysteine of DappuNP. Overall aa-similarities reach 4954% but appear to be slightly higher when compared with decapod NPs (Suppl. Figure S9, Supporting Information). Neuropeptide F

Neuropeptide F (NPF)-related peptides are commonly considered vertebrate neuropeptide Y (NPY) homologues (strictly 36aas in vertebrates) containing precursor signatures such as the carboxy-terminal peptide of NPY (CPON), a PRP following NPY upon its precursor.104106 Almost all NPFs bear a C-terminal RPRFamide motif and two tyrosine residues at positions 9/10 and 17 from their C-termini.105,107 Recently, a novel definition of two different NPF-types (NPF1 and NPF2) has been proposed, which subdivides all known arthropod NPFs, some of which may even arise from two different genes in a given species.106 The NPF2 peptides usually carry C-terminal GX1X2RYamide motifs.107 The Dappu npf-gene gives rise to two mRNAs by alternative splicing (Suppl. Figure S10A, Supporting Information) as evidenced by the occurrence of two distinct amplicons after RT-PCR (Suppl. Table S2, Supporting Information). The DappuNPF1 mRNA encodes a typical vertebrate NPY-like amidated DappuNPF1 of 39aas length followed by a DappuCPON of 19aas in length. We have identified both peptides by mass spectrometry (Table 1, Suppl. Table S3, Supporting Information, Figure 1A). A predicted D. magna NPF (EG565358108) is almost identical to DappuNPF1 but the former lacks the Gly3 (Suppl. Figure S10B, Supporting Information). The DappuNPF1 splice form appears to be favored when looking at confirming ESTs37 and our RT-PCR data. The latter shows the strongest amplicon corresponding to this mRNA (Suppl. Table S2, Supporting Information). Except for the existence of the additional exon 2 in the Dappu npf-gene, its exon-intron structure is remarkably similar to that of the human npy-gene104 4493

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research (Suppl. Figure S10A, Supporting Information). These close similarities provide further support for the close relationship between invertebrate NPFs and vertebrate NPY supposed earlier.105 As a second product of the Dappu npf-gene retaining exon 2, the DappuNPF2 mRNA, encodes a 66aas-long DappuNPF2 and the same 19aas-long DappuCPON as partly revealed by RT-PCR (Suppl. Figure 10A, Suppl. Tables S2, S3, Supporting Information). Shorter peptides such as a 37aas-long DappuNPF2 fragment could be derived from the former protein, since it contains several monobasic cleavage sites. We were not able to find these peptides by mass spectrometry. However, these smaller peptides may be processed from their precursor in locations other than the brain-optic ganglia complex, for example, gut endocrine cells or ovaries. In some insects, NPF type-1 and -2 or smaller peptides, such as the YS/GQVARPRFamides, even arise from different NPF genes and are posttranslationally processed.106,109 DappuNPF1 and DappuNPF2, however, likely arise by alternative splicing of a single gene, a phenomenon recently reported for silk moth’s BommoNPF1a and BommoNPF1b48 as well. Both DappuNPFs align best with NPFs corresponding to the two NPF subgroups from several insects and single decapod and copepod crustaceans. However, a recently proposed invertebrate consensus sequence106 is largely conserved only in the Daphnia NPF1 sequences (Suppl. Figure S10B, Supporting Information). Orcokinin and Orcomyotropin

The myoactive tridecapeptide Asn13-orcokinin NFDEIDRSGFGFA was first discovered in the crayfish Orconectes limosus8 and further orcokinin (OK) isoforms were identified later in various decapods and other crustaceans.13 Insect orcokinins first identified in cockroaches and locusts110,111 are in most insects and a chelicerate 14aaslong and closely related to crustacean OKs. Most carry the signature sequence NXDEIDR (X being F or L). However, to our knowledge, there is no 13aas-long OK as of the typical length and structure of decapod OKs in insects. Another myoactive peptide orcomyotropin FDAFTTGFamide (OMT) was discovered as a potent hindgut stimulatory factor in the crayfish O. limosus.9 The first precursors for decapod crustacean orcokinins contain 7 and 8 copies of four different orcokinin isoforms, and one copy of the OMT-like undecapeptide (OMTL) FDAFTTGFGHS.112 OMTs and OMTLs have been identified from several decapods99,113 but a separate OMTprecursor has never been found. In insects, OK-precursors usually give rise to OMTL peptides in similar positions but only 23 single copies of different OKs. We describe here the structures and products of two orcokinin-genes that occur in the D. pulex genome in different positions about 191kb apart from each other on the same scaffold (Suppl. Table S2, Supporting Information). The Dappu okI-gene encodes the isoforms DappuOK1 NLDEIDRSNFGTFA and DappuOK2 NLDEIDRSDFGRFV following two OMTL-related peptides (DappuOMTL1 and DappuOMTL2) located directly upstream (Suppl. Figure S11A, Supporting Information). The Dappu okII-gene is a 3-exon gene encoding DappuOK1 and a closely related third isoform DappuOK3 NLDEIDRSDFSRFV following a further OMTL-related peptide (DappuOMTL3; Suppl. Figure S11B, Supporting Information). Mass matches have confirmed both DappuOK1 and DappuOK2 but not yet DappuOK3 (Table 1; Figure 1D). The sequences of both DappuOK precursors containing only two OKs closely resemble

ARTICLE

those of insects such as the honeybee A. mellifera, the beetle T. castaneum, the moth B. mori, and the mosquito A. gambiae. However, the full length SP of the DappuOKII precursor is not clear yet because of 50 upstream gaps in the genomic sequence. The exon/intron arrangement of the Dappu okII-gene, however, as inferred by bioinformatics, is clearly EST-supported, at least for the second exon (Suppl. Tables S1, S2, Supporting Information). Evidence from peptide alignments suggests that the DappuOKs are more closely related to the OK-tetradecapeptides of insects, a copepod crustacean, and a chelicerate than to all OK-tridecapeptides of decapods. This is similarly the case for DappuOMTLs which, however, exhibit more divergent sequences. Thus, both decapod crustacean OKs and OMTLs clearly lack one amino acid in their sequence compared to DappuOKs and other arthropod OKs (Suppl. Figures S11C, S11D, Supporting Information). Periviscerokinins

Periviscerokinins (PVKs) and CAPA-pyrokinins (PKs) were discovered in abdominal perivisceral or perisympathetic neurohemal organs of the American cockroach.114 These peptides are very likely encoded by the same gene as was first described for the capability (capa)-gene in D. melanogaster.115 Its name was coined as a reminder of the “abilities” known for related neuropeptides that act as cardio-acceleratory peptides (CAPs). The structure of this capa-gene is similar to those of many other insects usually encoding 24 PVKs followed by only one PK. While CAPAPVKs typically have a FPRVamide C-terminus, CAPA-PKs usually show a much more conserved C-terminus of WFGPRLamide.114 FXPRLamides of insects are also encoded in a second gene expressing precursors that give rise to peptides designated as diapause hormones, pheromone-biosynthesis-activating neuropeptides (PBAN116) or PKs.114 Similar PKs were isolated from the shrimp Litopenaeus vannamei117 and later in several other decapod crustaceans,59,60,118,119 but PVKs have never been identified in decapods. The Dappu pvk-gene is a 4-exon gene encoding a 188aas long precursor with three different PVKs but no PK (Suppl. Table S1, Supporting Information). We identified the three DappuPVKs by fragment analysis (Table 1; Figure 1A,B). Although the general gene structure is similar to those of insect Capa-genes, only one putative dodecapeptide SNSDERTAVFIP-(RRW) is encoded close to the C-terminus of the DappuPVK-precursor, where one would have expected a pyrokinin. Interestingly, similarly to the DappuPVK precursor, two slightly different precursors from another branchiopod species, the brine shrimp Artemia franciscana, contain similar ArtfrPVKs but also lack a pyrokinin (ES499501.1, ES514145.1; Suppl. Figure S12, Supporting Information). Pigment Dispersing Hormone

The 2-exon Dappu pdh-gene occurs as a single gene that differs with regard to length and an intron interrupting the PRP from the closely related but intron-less gene for the fruit fly pigment dispersing factor (DromePDF; CG6496; Suppl. Figure S13A, S13B, Supporting Information). The derived precursor structure of DappuPDH is similar to that of the CarmaPDH and DromePDF precursors in that the beta-PDH-type octadecapeptide sequence of NSELINSLLGLPRFMKVVamide appears at the 30 end of the DappuPDH-ORF as confirmed here by RT-PCR (Suppl. Table S2, Supporting Information). DappuPDH is identical in D. pulex and D. magna as previously shown by means of mass spectrometry,120 and as also confirmed here (Table 1, Figure 1A). In the two water fleas, it occurs in almost identical 4494

dx.doi.org/10.1021/pr200284e |J. Proteome Res. 2011, 10, 4478–4504

Journal of Proteome Research

ARTICLE

Figure 13. Comparison of the two novel D. pulex and the Drosophila melanogaster proctolin genes and preprohormones. (A) Proctolin genes are similar with regard to the large introns splitting only the precursor-related peptides. However, the Dappu M4proct-gene and the Drome proct-gene share an additional first intron splitting the 50 -untranslated regions (M4 Met in position 4; SP or S signal peptide light gray, P proctolin codons dark gray). (B) Proctolin preprohormones of D. pulex are highly similar to each other (66% aai, 78% scr), but differ considerably in length and lack of polyQ and polyS stretches from the DromeProct precursor (green). Note the identical proprotein convertase (>