Structural Study of Irregular Amino Acid Sequences in the Heavy

However, in the heavy chain of B. mori silk fibroin, there are also present 11 irregular sequences, with about 31 amino acid residues (irregular GT∼...
0 downloads 0 Views 783KB Size
Biomacromolecules 2005, 6, 2563-2569

2563

Structural Study of Irregular Amino Acid Sequences in the Heavy Chain of Bombyx mori Silk Fibroin Sung-Won Ha,† Hanna S. Gracz,‡ Alan E. Tonelli,† and Samuel M. Hudson*,† Fiber and Polymer Science Program, College of Textiles, and Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695-8301 Received April 27, 2005; Revised Manuscript Received May 26, 2005

Recently, genetic studies have revealed the entire amino acid sequence of Bombyx mori silk fibroin. It is known from X-ray diffraction studies that the β-sheet crystalline structure (silk II) of fibroin is composed of hexaamino acid sequences of GAGAGS. However, in the heavy chain of B. mori silk fibroin, there are also present 11 irregular sequences, with about 31 amino acid residues (irregular GT∼GT sequences). The structure and role of these irregular sequences have remained unknown. One of the most frequently appearing irregular sequences was synthesized and its 3-D solution structure was studied by high-resolution 2-D NMR techniques. The 3-D structure determined for this peptide shows that it makes a loop structure (distorted Ω shape), which implies that the preceding backbone direction is changed by 180°, i.e., reversed, by this sequence. This may facilitate the β-sheet formation between the crystal-forming building blocks, GAGAGS/ GY∼GY sequences, in the fibroin heavy chain. Introduction It is known from X-ray diffraction studies1,2 that the antiparallel β-sheet crystalline silk II structure is mainly composed of hexaamino acid sequences of GAGAGS. Besides this repeating sequence, there are also present several types of irregular amino acid sequences in the fibroin heavy chain that cannot be accommodated in silk II crystals. The regularity of amino acid sequences in proteins comes from the conservation of their corresponding genes. The reason for this is that the conserved amino acid sequences have unique functional roles or structural importance. Silk fibroin is not a functional protein like enzymes but a structural material like the keratin in hairs or nails. Therefore, repetitive and conserved amino acid sequences appearing in Bombyx mori silk fibroin heavy chain should have important structural roles. Conformational transitions in model silk peptides have been described.3 Fibroin A (GAGAGY) was synthesized and shown to undergo a silk I to silk II transition upon the addition of methanol. The peptide AHGGYSGY has been suggested to be a pH-sensitive conformational switch for B. mori fibroin.4 Asakura et al.5 studied the structure and role of a small irregular unit, GAAS, in the fibroin heavy chain using solid-state 13C CP/MAS NMR spectroscopy and WAXS. They reported that this sequence maybe one of the factors that induce the structural transition from silk I to silk II during silk fiber formation. The irregular but conserved sequences (GT∼GT sequences) appearing in fibroin heavy chains may also play a major role in the overall conformation * To whom correspondence may be addressed. Present address: Faculty, Fiber and Polymer Science Program, North Carolina State University, Raleigh, NC 27695-8301. E-mail: [email protected]. † Fiber and Polymer Science Program, College of Textiles. ‡ Department of Molecular and Structural Biochemistry.

adopted by the silk fibroin molecules. For this reason, here we attempt to determine the solution conformation of one of the conserved GT∼GT irregular amino acid sequences found in the heavy chain of silk fibroin by means of 2DNMR and molecular modeling techniques. It is not yet known if this solution structure will match that of the solid protein, but it is the solution phase that we are most interested in, as it is the precursor to silk fiber. Experimental Section Structural Study of an Irregular Amino Acid Sequence of B. mori Silk Fibroin Heavy Chain. The entire amino acid sequence of B. mori silk fibroin heavy chain was obtained from GenBank.6 The amino acid sequences appearing in B. mori silk fibroin heavy chain were categorized into several groups in terms of their regularity and the content of their amino acids. Peptide Synthesis. One of the chemically irregular but evolutionarily conserved GT∼GT sequences [31 residues; gtgssgfgpyvanggysgyeyawssesdfgt, from N (amino)-terminus to C (carboxyl)-terminus] was synthesized by a peptide synthesizer utilizing Fmoc chemistry (9-fluorenylmethoxycarbonyl, a temporary base-labile R-amino protecting group) and purified by high performance liquid chromatography (HPLC) in the Department of Microbiology & Immunology at the University of North Carolina, Chapel Hill. This sequence was chosen because it appears most frequently among GT∼GT irregular sequences (six times out of 11 in each fibroin heavy chain). The N-terminus and C-terminus were protected by acetylation and incorporation of a blocking amide group, respectively. However, after the incorporation of the protecting groups on each chain end, the solubility of the synthesized peptide was severely decreased. As a result,

10.1021/bm050294m CCC: $30.25 © 2005 American Chemical Society Published on Web 07/12/2005

2564

Biomacromolecules, Vol. 6, No. 5, 2005

Ha et al.

Figure 1. Rearrangement of the amino acid sequence of B. mori silk fibroin heavy chain. (The sequence starts from the N-terminus and ends at the C-terminus. Blue, GAGAGS; green, GY∼GY; red/bold, GT∼GT; shade, GAAS; gray, completely nonrepeating sequences at each chain end; black/bold, Cys.)

over 90% of the synthesized peptide material was intractable and could not be dissolved. Consequently, less than 100 mg of the synthesized peptide could be used. Because of the limited amount of this peptide sample, natural abundance 1 H and 13C high-resolution two-dimensional (2-D) NMR techniques were chosen to investigate the three-dimensional (3-D) structure of the irregular GT∼GT sequence in solution, instead of using X-ray crystallography to observe the solid peptide. Two-Dimensional NMR Spectroscopy. The synthesized GT∼GT silk peptide was dissolved in 100% deuterated

dimethyl sulfoxide (DMSO-d6). To protect the labile amide protons, which are important for correct 1H chemical shift assignment, DMSO-d6 was chosen as a solvent instead of D2O to reduce the proton exchange. The sample concentration was 3.2% w/v (about 16 mM). NMR data were obtained using a 500-MHz Bruker DRX NMR spectrometer. All spectra were acquired at 298 K. A combination of 2-D homonuclear (1H-1H) NMR experiments, such as correlated spectroscopy (COSY), total correlation spectroscopy (TOCSY), nuclear Overhauser exchange spectroscopy (NOESY), and rotating frame Over-

Amino Acid Sequences in B. mori Silk Fibroin

hauser enhancement spectroscopy (ROESY), and 2-D heteronuclear (1H-13C) correlated NMR experiments, such as heteronuclear multiple quantum correlation (HMQC) and heteronuclear multiple bond correlation (HMBC), were applied to study the 3-D solution structure of the short silk peptide molecules. COSY and TOCSY were used for obtaining precise 1H chemical shift assignments. HMBC and HMQC provided supplementary data for confirmation of 1H chemical shift assignments from COSY and TOCSY. NOESY and ROESY experiments were used to investigate 1H1 H through-space interactions. Three-Dimensional Molecular Modeling. To obtain the 3-D structure of the synthesized silk peptide, computer software, Crystallography-NMR System (CNS) and PyMol, was used. CNS was used to generate models satisfying NMR restraints through molecular dynamics and energy minimization and PyMol for 3-D visualization of the molecular structure. From the amino acid sequence of the peptide, an extended conformational model was generated in a Protein Database (pdb) file format using CNS. Data for throughspace 1H-1H interactions obtained from NOESY and ROESY experiments (total of 103 NOE connectivities) were used as NMR restraint inputs for the simulated annealing regularization and distance geometry calculations. Since the synthesized peptide sample was not enriched with 13C or 15N (both at natural abundance), the intensities from NOESY and ROESY data were considered unreliable. For this reason, all possible distances (from 1.8 to 4.8 Å), which are normally determined by the intensities, were used as inputs. The CNS calculations generated several 3-D models, with possible lowenergy structures in pdb format files, for the synthesized silk peptide, and their 3-D structures were visualized with PyMol.

Biomacromolecules, Vol. 6, No. 5, 2005 2565

There are some regularities in the amino acid sequence of the fibroin heavy chain. (1) The crystalline segments, GAGAGS, are separated by 60 segments containing GY (GY∼GY) sequences. There are 13 different types of GY∼GY sequences, and the five most abundant are as follows: (1) GY(GX)3GYGXGY(GX)3GY, 24 times; (2) GAGAGY, seven times; (3) GYGXGY(GX)3GY(GX)3GY(GX)3GY, seven times; (4) GY(GX)3GY, six times; and (5) GYGXGY(GX)3GY, six times (where X is usually A and sometimes V and I residues. The positions of V and I are irregular). (2) GY∼GY sequences are usually followed by the small GAAS segment. (3) GAAS segments always appear at the same positions, except for each terminus. (4) GAGAGS/GY∼GY crystalline building blocks are usually composed of Gly, Ala, Ser, and Tyr, which are and/ or may be the main components of the β-sheet crystals of silk fibroin. Val, Thr, Ile, and Phe are not major residue types but sometimes appear in GAGAGS/GY∼GY blocks. (5) Two to eight GAGAGS/GY∼GY blocks appear between 11 GT∼GT irregular sequences. (6) GT∼GT irregular sequences are immediately followed by GAGAGS sequences, except for the last one. (7) Evolutionarily conserved GT∼GT irregular sequences are usually composed of 31 residues, except for the first one (32 residues) and the last one (34 residues).

Results and Discussion Structural Study of an Irregular Amino Acid Sequence of B. mori Silk Fibroin Heavy Chain. The amino acid sequence of B. mori silk fibroin heavy chain is highly repetitive.7 From X-ray diffraction studies,1,2 it is well-known that the hexaamino acid sequence, GAGAGS, is the main component of the typical β-sheet crystals of silk fibroin, which form the monoclinic unit cell of silk II (a ) 9.40 Å; hydrogen bonding direction, b ) 6.97 Å; fiber axis, c ) 9.20 Å; intersheet distance, and β ) 90°). The amino acid sequence of B. mori silk fibroin heavy chain from GenBank was rearranged according to sequence regularities (Figure 1). The sequence starts from the N-terminus and ends at the C-terminus. The crystalline segments, GAGAGS sequences, are marked in blue characters. Besides the GAGAGS sequences, there are several types of irregular sequences. One type of irregular sequence (GT∼GT) that appears 11 times in the heavy chain is marked in red and bold characters. Another type of irregular sequence containing GY (GY∼GY) is marked in green characters. Small irregular units, GAAS, are shaded. The amino acid residues at the N-terminus and C-terminus of the heavy chain are also irregular and are shown with gray characters. Cys residues in each chain end, which are capable of covalent disulfide bonding with other Cys residues, are marked in bold characters.

(8) GT∼GT irregular sequences always contain Pro residues, which can be a major factor for changing the backbone direction. The C-terminus and N-terminus are composed of completely nonrepeating amino acid residues. Fibroin light chain and P25 are composed of completely nonrepeating amino acid sequences. This indicates that these subunits are globular and may be parts of the amorphous regions of the silk fiber. The completely irregular N-terminus and C-terminus of fibroin heavy chain contain a couple of Cys residues (two in the N-terminus and three in the C-terminus) that do not appear in GAGAGS, GY∼GY, or GT∼GT sequences. This indicates that the disulfide bond(s) between heavy chain and light chain can occur at either

2566

Biomacromolecules, Vol. 6, No. 5, 2005

Ha et al.

Figure 2. Proposed 2-D schematic of fibroin heavy chain and light chain. The 3-D structure is most likely lamellar.

end of the heavy chain. Inoue et al.7 showed that the heavy chain and the light chain are covalently connected by a disulfide bond between the C-terminus of each subunit. Since each chain end contains several Cys residues, there may be a possibility of intramolecular disulfide bond(s) between the C-terminus and N-terminus ends and a possibility of intermolecular disulfide bond(s) between heavy chains. Not much is known about the detailed structure and function of GY∼GY irregular sequences. Zhou et al.8 reported that most of the GX dipeptide units, where X is A in 65%, S in 23%, and Y in 10% of the repeats, are present as parts of two hexapeptides GAGAGS (432 copies) and GAGAGY (120 copies). Even though there has not been a detailed crystalline structure reported for the GAGAGY sequence, the X-ray fiber patterns1,2 show evidence that different crystalline structures are present, especially involving the intersheet distances (c-axis), from the crystalline unit cell formed by the GAGAGS sequence. Asakura et al.9,10 studied the structural role of Tyr residues and reported that they favor a β-sheet structure in silk II and a random coil in silk I. Their recent study11 showed that the Tyr residues significantly affect the intermolecular chain arrangement, indicating the presence of long-range packing effects in the semicrystalline regions of silk II. Therefore, these aromaticcontaining GY∼GY sequences may also be crystal-forming segments, which cannot fit in the crystalline unit cell that is formed by aliphatic GAGAGS segments. They may form β-sheet crystals with different unit cell structures, particularly with regard to the intersheet distance (c-axis), from the one formed by GAGAGS sequences. The irregular but conserved GT∼GT sequences always contain a Pro residue, which forms a five-membered ring in the backbone and imparts a great preference to form a reverse turn.12 These GT∼GT sequences are particularly interesting,

because of their sequence irregularity, which implies that they are not part of the β-sheet crystals of fibroin heavy chain. They may form folds, turns, helices, or random coils between GAGAGS/GY∼GY crystalline building blocks of β-sheet crystals. If they form turn conformations, one heavy chain molecule would have 12 intramolecular antiparallel β-strands with 11 turns. A 2-D structure is shown in Figure 2. X-ray diffraction studies of the 3-D structure of chainfolded antiparallel β-sheet assemblies indicate that the structure in Figure 2 would most likely be lamellar.13 3-D Modeling of Silk Peptide. In this study, the 3-D solution conformation of the most frequently appearing irregular GT∼GT sequence was investigated using highresolution homonuclear (1H-1H) and heteronuclear (1H13C) 2-D NMR experiments. The primary sources of experimental structural information were NOESY (Figure 3) and ROESY spectra (see Supporting Information for details). Two 3-D models were generated, one from energy minimization and molecular dynamics simulated annealing alone and the other was additionally constrained to satisfy the conformational/structural/distance geometry restraints derived from the NMR observations (Figure 4). Both models showed very similar overall 3-D structural motifs. After the final calculations, no geometric violations were found for either model, indicating that the NOE data were reliable. NMR spectroscopy and X- ray crystallography are the best methods to determine the 3-D structures of folded biopolymers. In many cases, the 3-D structures derived from both techniques indicate very similar solution and solid state structures. If this were the case for silk fibroin, the conformation of this irregular but conserved sequence would be similar in the solid state. Since each chain end is flexible and contains amino and carboxylic acid groups, which are different from the real molecule, the backbone geometry at each chain end may be different from that in silk fibroin.

Amino Acid Sequences in B. mori Silk Fibroin

Biomacromolecules, Vol. 6, No. 5, 2005 2567

Figure 3. 1H-1H long-range through-space interactions appearing in NOESY. (a) 1H-1H long-range interactions (NH:NH, NH:aromatic-H, aromatic-H:aromatic-H) and (b) 1H-1H long-range interactions (NH:OH, CRH:aromatic-H, CRH:CβH).

However, a loop structure for the amino acids internal to the peptide sequence is clearly shown, and the general feature of these irregular but conserved GT∼GT sequences may be to function as turn structures (distorted Ω shape). These irregular GT∼GT sequences may facilitate the β-sheet formation of GAGAGS/GY∼GY crystalline building blocks in the fibroin heavy chain by imparting loop or turn structures, as shown in Figure 2. There is little possibility that these irregular GT∼GT sequences are included in the β-sheet crystals of silk fibroin, because of their sequence irregularity (especially Pro residues) and their long side chains, which do not fit in the crystalline unit cell (silk II) of silk fibroin formed by the GAGAGS/GY∼GY crystalline building blocks.

Three of the Asn (N13) residues in 11 irregular GT∼GT sequences are replaced by His (H13) residues, and the first and the last irregular GT∼GT sequences have one and three more residues with functional side chains, respectively. This may imply that these irregular GT∼GT sequences interact with one another to form, not a platelike single sheet structure, but a 3-D piled sheet structure for a single fibroin molecule. Clearly further study is needed to delineate the complete relationships and interactions among these irregular yet conserved sequences. Conclusions In the case of synthetic polymers, with regular repeating units, it is uncertain which specific repeating units will make

2568

Biomacromolecules, Vol. 6, No. 5, 2005

Ha et al.

For example, recently, Jin and Kaplan14 proposed a mechanism of silk processing in insects and spiders. They calculated the hydrophobicity of each amino acid residue and found some hydrophilic blocks and hydrophobic blocks. The two largest hydrophilic blocks are the completely irregular amino acid sequences found at each chain end. In the silk gland, fibroin molecules are mixed with sericin, which is a hydrophilic protein. In this environment, the fibroin molecules assemble into micellar structures. During silk spinning, the globular micellar structures collapse and fibroin molecules crystallize into β-sheets as the water concentration decreases. They recognized six small hydrophilic blocks inside the hydrophobic block, in addition to two large ones at each chain end. These seem to be the 11 GT∼GT irregular sequences discussed in our study. The discrepancy between the number of hydrophilic blocks and irregular GT∼GT sequences seems to reside in the method they used to calculate the hydrophobicity of the amino acid residues. In any event, macroscopic understanding of the mechanism of silk processing in vivo, the microscopic understanding of the silk fibroin protein molecule (structural transition from silk I to silk II and the role of each segment), and improved material processing techniques should bring an expanded use of this excellent structural material. Furthermore, the knowledge accumulated from the studies of B. mori silk fibroin can be applied to other silks that are now becoming available through improved biotechnologies.15-20

Figure 4. 3-D models of the synthesized silk peptide. (a) 3-D model of silk peptide calculated by simulated annealing only (backbone geometry is shown in thicker sticks). (b) 3-D model of silk peptide calculated using NMR distance geometry restraints with simulated annealing (backbone geometry is shown in thicker sticks).

folds to form lamellar crystals. However, silk fibroin has the irregular yet conserved GT∼GT sequences, which maybe designed by nature to form turn or loop structures that facilitate the formation of crystals in silk fibroin molecules. High-resolution 2-D NMR observations in solution and 3-D modeling of an irregular but conserved GT∼GT sequence in the fibroin heavy chain brought us a further understanding of this protein molecule. Some caution is noted regarding these results, since they were obtained in DMSO. It has not been proven that DMSO is an appropriate solvent to represent the native state. At present it is very difficult to study the conformation and morphology of the entire silk fibroin molecule, due to its high molecular weight. However, it can be studied segment-by-segment and by then expanding the segment size from 30 residues to 50 residues and from 50 residues to 100 residues and so on. This will lead us to an understanding of the roles/functions of each regular/crystal forming and irregular/amorphous segment. In the end, though possibly not complete, we will certainly have an improved understanding of the entire fibroin molecule following this approach.

Acknowledgment. We are grateful to Mr. David Klapper of the Department of Microbiology & Immunology at the University of North Carolina, Chapel Hill for synthesizing and providing the silk peptide and Mr. Douglas J. Kojetin in the Department of Molecular & Structural Biochemistry at North Carolina State University for assistance with the CNS and PyMol molecular modeling work. Note Added after ASAP Publication. At initial Web posting on July 12, 2005, the description for parts A & B of Figure 4 were inadvertently placed in the Figure 3 caption, and the description for parts A & B of Figure 3 were shifted to the Figure 2 caption. This was corrected and the article reposted on July 26, 2005. Supporting Information Available. Details about the 2-D NMR experiments, 1H chemical shift assignments, and 1H-1H through-space interactions found in NOESY and ROESY are presented. This material is available free of charge via the Internet at http://pubs.acs.org. References and Notes (1) Marsh, R. E.; Corey, R. B.; Pauling, L. Biochim. Biophys. Acta 1955, 16, 1-34. (2) Takahashi, Y.; Gehoh, M.; Yuzuriha, K. Int. J. Biol. Macromol. 1999, 24, 127-138. (3) Wilson, D.; Valluzzi, R.; Kaplan, D. Biophys. J. 2000, 78, 26902701. (4) Zong, X.; Zhou, P.; Shao, Z.; Chen., S.; Chen, X.; Hu, B.; Deng, F., Yao, W. Biochem 2004, 43 (38), 11932-11941. (5) Asakura, T.; Sugino, R.; Okumura, T.; Nakazawa, Y. Protein Sci. 2002, 11, 1873-1877. (6) Fibroin heavy chain precursor (Fib-H) (H-fibroin), gi|9087216|sp|P05790|FBOH_BOMMO[9087216], http:// www.gl.iit.edu/frame/genbank.htm,

Amino Acid Sequences in B. mori Silk Fibroin (7) Inoue, S.; Tanaka, K.; Arisaka, F.; Kimura, S.; Ohtomo, K.; Mizuno, S. J. Biol. Chem. 2000, 275 (51), 40517-40528. (8) Zhou, C.; Confalonieri, F.; Medina, N.; Zivanovic, T.; Esnault, C.; Yang, T.; Jacquet, M.; Janin, J.; Dugust, M.; Perasso, R.; Li, Z. Nucleic Acids Res. 2000, 28, 2413-2419. (9) Asakura, T.; Yamane, T.; Nakazawa, Y.; Kameda, T.; Ando, K. Biopolymers 2001, 58, 521-525. (10) Asakura, T.; Sugino, R.; Yao, J.; Takashima, H.; Kishore, R. Biochemistry 2002, 41, 4415-4424. (11) Asakura, T.; Suita, K.; Kameda, T.; Afonin, S.; Ulrich, A. S. Magn. Reson. Chem. 2004, 42, 258-266. (12) Creighton, T. E. Proteins: Structures and Molecular Properties, 2nd ed.; W. H. Freeman and Company: New York, 1997. (13) Krejchi, M.; Cooper, S.; Dequchi, Y.; Atkins, E.; Fournier, M.; Mason, T.; Tirrell, D. Macromolecules 1997, 30 (17), 5012-5024. (14) Jin, H. J.; Kaplan, D. Nature 2003, 424, 1057-1061.

Biomacromolecules, Vol. 6, No. 5, 2005 2569 (15) Fahnestock, S. R.; Irwin, S. I. Appl. Microbiol. Biotechnol. 1997, 47, 23-32. (16) Fahnestock, S. R.; Bedzyk, L. A. Appl. Microbiol. Biotechnol. 1997, 47, 33-39. (17) O’Brien, J. P.; Fahnestock, S. R.; Termonia, Y.; Gardner, K. H. AdV. Mater. 1998, 10 (15), 1185-1195. (18) Scheller, J.; Gu¨hrs, K. H.; Grosse, F.; Conard, U. Nat. Biotechnol. 2001, 19, 573-577. (19) Lazaris, A.; Arcidiacono, S.; Huang, Y.; Zhou, J. F.; Duguay, F.; Chretien, N.; Welsh, E. A.; Soares, J. W.; Karatzas, C. N. Science 2002, 295, 472-476. (20) Kaplan, D.; Fossey, S.; Mello, C. M.; Arcidiacono, S.; Semecal, K.; Muller, W.; Stockwell, S.; Viney, C.; Kerkam, K. MRS Bull. 1992, 41-47.

BM050294M