Analysis of Major Ampullate Silk cDNAs from Two Non-Orb-Weaving

Apr 10, 2004 - spider silk gene transcripts from two non-orb weavers: three for ... and polyGV in the cDNA sequences from the two non-orb-weaving spid...
0 downloads 0 Views 97KB Size
May/June 2004

Published by the American Chemical Society

Volume 5, Number 3

© Copyright 2004 by the American Chemical Society

SilksBiology, Genetics, Biomaterials, Materials Science, and Engineering Analysis of Major Ampullate Silk cDNAs from Two Non-Orb-Weaving Spiders Maozhen Tian,† Congzhou Liu,† and Randolph Lewis* Department of Molecular Biology, University of Wyoming, Laramie, Wyoming 82071-3944 Received October 2, 2003; Revised Manuscript Received March 25, 2004

Compared to other arthropods, spiders are unique in their use of silk throughout their life span and the extraordinary mechanical properties of the silk threads they produce. Studies on orb-weaving spider silk proteins have shown that silk proteins are composed of highly repetitive regions, characterized by alanine and glycine-rich units. We have isolated and sequenced four partial cDNA clones representing major ampullate spider silk gene transcripts from two non-orb weavers: three for Kukulcania hibernalis and one for Agelenopsis aperta. These cDNA sequences were compared to each other, as well as to the previously published orb-weaver silk gene sequences. The results indicate that the repeats encoding conserved amino acid motifs such as polyA and polyGA that are characteristic of some orb-weaving spider silks are also found in some of the cDNAs reported in this study. However, we also found other motifs such as polyGS and polyGV in the cDNA sequences from the two non-orb-weaving spiders. The amino acid composition of the silk gland extracts shows that alanine and glycine are the major components of the silk of these two non-orb weavers as is the case in orb-weaver silks. Sequence alignment shows that A. aperta’s cDNA displays a C-terminal encoding region that is about 44% similar to the one present in N. claVipes’s MaSp1 cDNA. In addition, as previously observed for spider silk sequences, the analysis of the codon usage for these four cDNAs demonstrates a bias for A or T in the wobble base position. 1. Introduction The ability to produce silk threads is one of the characteristic features of spiders. Researchers have shown great interest in spider silk fibers, much of it due to their extraordinary high tensile strength and elasticity.1,2 These properties give the spider silk special toughness (energy absorbed before breakage per unit mass), which is superior to natural and synthetic fibers.1,3 Because of their unique combinations of desirable mechanical properties, spider silk proteins provide promising biomaterial options for numerous * To whom correspondence should be addressed. Department of Molecular Biology, University of Wyoming, 1000 E. University Avenue, Laramie, WY 82071. Phone: 307-766-2147. Fax: 307-766-5098. E-mail: [email protected]. † These authors contributed equally to this work.

medical and industrial applications. In addition, the presence of a multi-gene family encoding the spider silk proteins makes them an ideal system for studying protein structurefunction relationships.4,5 Recent cloning and comparison of araneoid (orb weaver) silk cDNAs revealed that spider silk proteins share a common hierarchic organization.6,7 Spider silks are highly repetitive proteins, composed of alternating repetitive regions with nonrepetitive amino and carboxyl termini. Some spider silks also have non-repetitive spacer regions whose functions are not clear yet. The amino acid sequences of the repetitive regions can be generalized as a set of tandemly arrayed consensus repetitive units. Comparison of individual repetitive units of spider silk proteins demonstrates that they are organized from iterations of several shared amino acid motifs: polyA, GA,

10.1021/bm034391w CCC: $27.50 © 2004 American Chemical Society Published on Web 04/10/2004

658

Biomacromolecules, Vol. 5, No. 3, 2004

GGX, and/or GPG(X)n, where X represents a small set of amino acids and n ranges from 2 to 5.8 The relative abundance of these amino acid motifs in each individual silk protein is believed to directly correspond to the formation of distinct domains of secondary structure, which may be responsible for the unique mechanical properties of the spider silk fibers.7,9,10 The orb weavers represent only a fraction of spider species. Compared to the wealth of data obtained from the orb-weaving spiders in the past 10 years, the non-orbweaving spiders were not very well characterized at the DNA level until the report from Hayashi and co-workers (personal communication).11 This was partly due to the difficulty in harvesting silk and shortage of comparative sequence data from the non-orb-weaving spiders. Few similarities were observed when the partial cDNA sequences were compared between the non-araneoid spiders and the araneoid spiders. The only shared amino acid motifs are polyA and GGX, where X represents A, Y, L, Q and S. This divergence may be the result of millions of years of evolution, since most of the non-orb weavers are located in the basal part of the phylogenetic tree.11 To expand the silk sequence data from the more primitive spiders, we constructed cDNA libraries from two primitive spider species and obtained four different partial cDNA sequences from these two libraries. 2. Materials and Methods 2.1. Spiders. The spiders studied in this research, Kukulcania hibernalis and Agelenopsis aperta, were chosen based on their phylogenetic location, availability, and sizes for easy dissection. Kukulcania and Agelenopsis spiders were obtained from Halari Invertebrates (Portal, AZ) and Spider Pharm Inc. (Yarnell, AZ), respectively. The spiders were maintained at 24 °C 12 h/12 h light/dark in individual chamber and fed with frozen crickets weekly. 2.2. mRNA Extraction. Major ampullate glands from adult female spiders were identified and isolated as described.12,13 mRNA was extracted from spider glands using Dynabeads oligo (dT)25 (Dynal Inc., Lake Success, NY). The concentration of the isolated mRNA was determined using a spectrophotometer (Beckman Instruments, Inc., Fullerton, CA). 2.3. Amino Acid Composition Analysis. Protein samples saved from mRNA isolation were purified by precipitation prior to amino acid composition analysis following a standard procedure, using the fluorescent reagent AQC in the AccuTag System (Water Chromatography Division-Millipore Corp., Milford, MA; Cohen & Michaud, 1993) and HPLC. 2.4. cDNA Library Construction. cDNA was synthesized from isolated mRNAs using the SuperScript Choice System kit (Invitrogen Corp., Carlsbad, CA). To reduce mRNA secondary structure, 0.5% Tween-20 was used in the synthesis of the cDNA second strand. The synthesized bluntended cDNA was size-fractionated by a Chroma Spin+TE1000 gel filtration column (Clon Tech Laboratories Inc., Palo Alto, CA), and large fragments (>500 bp) were ligated into the EcoRV site of the pZErO-2 vector (Invitrogen Corp.). The recombinant DNA was used to transform electrocom-

Communications

petent E. coli TOP10 cells (Invitrogen Corp.), and libraries ranging from 1200 to 2400 recombinant clones were constructed. 2.5. cDNA Library Screening. Colonies were replicated, and the DNA was fixed on Hybond-N nylon membranes (Amersham Pharmacia Biotech UK Lim., Amersham, England) for screening. The cDNA libraries were screened by colony hybridization with γ-32P-dCTP end-labeled probes, designed based on known spider silk sequence data and sequences obtained from screening the library. The following probes were used in cDNA library screening (Ransom Hill Bioscience, Inc., Ramona, CA) (W)A+T, N)A+G+C+T, H)A+C+T, YdC+T, R)A+G, K)T+G, S)G+C, M)A+C): 5′-GCNGCNGCNGCNGCNGC-3′, 5′GGWGGATATGGKGGAGGATACGGATACGGA-3′, 5′-CCAGCWCCAGCWCCTGCWCC-3′, 5′-HCCRCCNKSHCCRCCNKSHCCRCC-3′, 5′-ACCRTAWCCWCCTAGACCACC-3′, 5′-GGTGGAGCAGGCGGMTATGGACGT-3′, 5′-CCTAATGGTAAYTTTAAYTTA-3′, 5′-GGWGCAGGWGCWGGAACYGGWTCY-3′. Hybridization was carried out in QuikHyb according to the manufacturer’s protocol (stratagene, La Jolla, CA). 2.6. Southern Blot Analysis. Southern blot analysis was performed to further identify potential positive clones. Plasmid DNA was digested with NsiI. There is one NsiI site at each end of the pZErO-2 vector and no NsiI site expected in the silk insert, based on our knowledge of spider silk sequences. Samples were electrophoresed through 0.7% agarose gel, denatured, neutralized, and blotted to Hybond-N nylon filter (Amersham Pharmacia Biotech) by capillary transfer in 10× SSC buffer. Hybridization was carried out at 42 °C using QuikHyb method previously mentioned. Membranes were washed in 2×SSC/0.1%SDS once at room temperature and once at 42 °C respectively, and once in 0.1 × SSC/0.1%SDS at 42 °C. The membranes were blotted dry and subjected to autoradiography. Each probe was used on a separate blot. 2.7. DNA Sequencing. Plasmid DNA from positive clones was isolated by a modified mini alkaline-lysis/PEG precipitation procedure (http://dnasc.byu.edu) and sequenced with both M13 forward primer (5′-CTGGCCGTCGTTTTAC-3′) and Sp6 reverse primer (5′-ATTTAGGTGACACTATAG-3′) by the DNA sequencing center at Brigham Young University. The results were analyzed by MacVector 6.5.3 (Oxford Molecular). Transposon insertion reaction was used to complete sequencing of longer transcripts (GPS-1 Genome Priming System, New England Biolabs, Inc., Beverly, MA). 3. Results and Discussion We have obtained three variants of major ampullate silk cDNAs from Kukulcania hibernalis cDNA library, named cDNA1 (2282bp), cDNA2 (2551bp), and cDNA3 (600bp), respectively. The only cDNA clone for major ampullate silk from Agelenopsis aperta is 2694bp in length.

Biomacromolecules, Vol. 5, No. 3, 2004 659

Communications Table 1. Amino Acid Composition of Major Ampullate Silk Proteins from K. hibernalis and A. aperta amino acid

K. hibernalis

A. aperta

glycine alanine serine

44.2% 21.6% 15.4%

45.5% 29.3% 11.7%

Figure 1. Consensus repetitive regions of spider silk protein sequence determined from K. hibernalis and A. aperta cDNAs. The symbol “-” is used when there are some differences in amino acid sequence among the repetitive units. The symbol “‚” is used to represent for nonidentified sequences. Genbank accession number: K. hibernalis, AY571307, AY571308, AY571309, AY571310; A. aperta, AY566305.

Silk proteins are characterized by the high content of glycine and alanine due to the presence of highly repetitive amino acid motifs: polyA, polyGA, GPGXX, and GGX.7,9 It has been shown that polyA and polyGA sequences appear in β-sheet regions of major and minor ampullate silks, which are presumably responsible for the high tensile strength of the spider silk.14,15 The GPGXX repeat regions are thought to build a β-spiral conformation that may contribute to the elasticity of the fibers.8,16 The various combinations of these amino acid motifs provide the spider silks with special mechanical properties, which are observed in different types of silks with distinct functions. The glycine-alanine prevalence in spider silk proteins is preserved in the two non-orb weavers, K. hibernalis and A.

aperta. Amino acid compositions of the silk from major ampullate silk gland extracts in both spider species follow the same trend as shown in orb weavers, with glycine being the most abundant amino acid, followed by alanine, and then serine (Table 1). The consensus repetitive amino acid sequences derived from the four identified silk cDNAs are listed in Figure 1. Comparison of individual repeats within each sequence shows some variation. They are different in the insertion, deletion, or substitution of very few amino acids. Although the identified silk protein sequences from non-orb weavers retain the important amino acid motifs (GA, polyA, and GGX) that compose orb weaver fibroins, these motifs are distributed sparsely. In orb weavers such as Argiope, it has been shown that the content of the above motifs can be as high as 70% in terms of polyA and polyGA and over 20% when GGX is taken into account (unpublished data). The amino acid sequences of the repetitive regions from non-orb weavers are not as conserved as those from the orb weavers, whether they are compared within the same gene family or among different species. For example, the predicted amino acid sequence of K. hibernalis cDNA1 has a high percentage of glycine and serine, whereas the sequence derived from cDNA2 is dominated by alternating polyGA and polyA, which is obviously different from that of cDNA1. The amino acid sequence translated from K. hibernalis cDNA3 is composed of polyGA and (GV)n motifs for the majority of the repetitive region; however, (GV)n is rarely seen in other silk sequences. The repeat unit for A. aperta is basically a complex mixture of GS and GA motifs. There are a few GX motifs scattered in the sequence, where X represents L, T, V, and Y. Of all of the four partial cDNAs reported in this work, only A. aperta’s has the C-terminal encoding sequence following the ensemble repetitive region. This region is distinct from the bulk of the sequence because it is nonrepetitive and is comprised of amino acids that do not conform to the typical amino acid composition of the spider silk proteins. When the C-terminal sequence of A. aperta is aligned with that of Nephila claVipes MaSp1, one of the most studied orb weavers, we found 44% identity between the two sequences (Figure 2). Since the C-terminal regions between orb weavers and non-orb weavers are much more conserved than the repetitive region, we proposed that C-terminus must play an important role that has been preserved among different spider species through evolution, although its function has not been determined yet.

Figure 2. Spider silk protein carboxyl-terminal amino acid alignment. A.a Major, A. aperta major ampullate spidrion; N.c MaSp1, N. clavipes major ampullate spidrion.

660

Biomacromolecules, Vol. 5, No. 3, 2004

Communications

Table 2. Codon Usage (%) of the Three Most Abundant Amino Acids in A. aperta and K. hibernalis cDNAs Gly

A.a K.h 1 K.h 2 K.h 3

Ala

Ser

GGT

GGA

GCT

GCA

TCA

TCT

AGT

41.91 44.79 42.49 52.94

46.15 31.55 34.20 35.29

37.44 47.91 16.21 75.00

47.18 37.67 81.08 2.50

11.90 26.14 31.75 0.00

37.30 50.00 53.97 40.00

29.39 5.68 3.17 40.00

Besides the overall similarity in the sequence patterns, the identified cDNA sequences share another noteworthy common feature with the previously published silk sequences, which is the codon usage bias. In most cases, the codons used in spider silk genes choose the base A or T in the wobble position. Codon usage frequencies of the three most abundant amino acids, glycine, alanine, and serine, for the identified four cDNAs are listed in Table 2. The preference for A and T as the third nucleotide in glycine (GGN) ranges from 76.3% to 88.2% in the four cDNAs, whereas it is even as high as 97% in alanine (GCN) in K. hibernalis cDNA2. The percentage of A and T as the wobble base is up to 81.8% within the four cDNAs when serine is taken into account which is coded by six codons. It has been suggested that this codon usage bias can be explained by considering the secondary structure of the silk mRNA.9,17,18 Since the silk sequences are highly GC-rich, it would be much easier to form a hairpin loop secondary structure between nearby amino acid coding regions if the wobble base were G and C. The selection of A or T in the third base of a codon may be necessary to minimize the formation of secondary structure that would affect efficiencies of DNA transcription and RNA translation. Since the discovery of the first spider silk gene,18 the information relating to spider silk sequences, structures, and mechanical properties has been greatly increased. However,

there is still much to be learned either in orb-weaving spiders or non-orb-weaving spiders to correlate the evolutionary diversification of silk protein sequences with changes in the function of silk fibers. Acknowledgment. We thank the U.S. ARO (DAAD1902-1-023) and the AFOSR (F49620-03-1-0341) for support of this research. We thank all of the reviewers for improving the manuscript. References and Notes (1) Gosline, J. M.; DeMont, M. E.; Denny, M. W. EndeaVour 1986, 10, 37-43. (2) Tirrell, D. A. Science 1996, 271, 37-40. (3) Stauffer, S. L.; Coguill, S. L.; Lewis, R. V. J. Arachnol. 1994, 22, 5-11. (4) Guerette, P. A.; Ginzinger, D. G.; Weber, B. H.; Gosline, J. M. Science 1996, 272, 112-5. (5) Altman, G. H.; Diaz, F.; Jakuba, C.; Calabro, T.; Horan, R. L.; Chen, J.; Lu, H.; Richmond, J.; Kaplan, D. J. Biomaterials 2003, 24, 40116. (6) Colgin, M. J.; Lewis, R. V. Protein Sci. 1998, 7(3), 667-72. (7) Hayashi, C. Y.; Lewis, R. V. J. Mol. Biol. 1998, 275, 733-84. (8) Hayashi, C. Y.; Shipley, N. H.; Lewis, R. V. Int. J. Biol. Macromol. 1999, 24 (2-3), 271-5. (9) Lewis, R. V. Acc. Chem. Res. 1992, 25, 392-8. (10) Casem, M. L.; Turner, D.; Houchin, K. Biol. Macromol. 1999, 24, 103-8. (11) Gatesy, J.; Hayashi, C. Y.; Motriuk, D.; Woods, J.; Lewis, R. V. Science 2001, 291, 2603-5. (12) Glatz, L. Z. Morphol. Tiere. 1972, 72, 1-25. (13) Hajer, J. Acta Entomol. BohemosloV. 1989, 8, 401-13. (14) Parkhe, A.; Seeley, S.; Gardener, K.; Thompson, L.; Lewis, R. V. J. Mol. Recogn. 1997, 10, 1-6. (15) Simmons, A.; Ray, E.; Jelinski, L. Macromolecules 1994, 27, 52355237. (16) Lombardi, S. J.; Kaplan, D. L. J. Arachol. 1990, 18, 297-306. (17) Hinman, M. B.; Lewis, R. V. J Biol. Chem. 1992, 267 (27), 193204. (18) Xu, M.; Lewis, R. V. Proc. Natl. Acad. Sci. U.S.A. 1990, 87 (18), 7120-4.

BM034391W