Chapter 10
Three-Dimensional Energy-Minimized Model of Human Type II "Smith" Collagen Microfibril 1,3
James M . Chen
Downloaded by SUFFOLK UNIV on January 19, 2018 | http://pubs.acs.org Publication Date: December 14, 1994 | doi: 10.1021/bk-1994-0576.ch010
1
2
and Adrian Sheldon
2
Departments of Chemistry and Enzyme Biochemistry, OsteoArthritis Sciences, Inc., 1 Kendall Square, Building 200, Cambridge, MA 02139 The development of a molecular model of a typeIIcollagen "Smith" microfibril is described. The model is a complex of five individual collagen triple-helical molecules, and is based on structural parameters known for collagen. Advantages of these three -dimensional models are that the stereochemistry of all the sidechain groups is accounted for and specific atomic contacts or interactions between atoms can now be studied. This model is useful for: development of therapeutics for collagen related diseases; development of synthetic collagen tissues; design of chemical reagents (i.e., tanning agents) to treat collagen-related products; and understanding the structure-function aspects of collagen folding, stability and interaction.
1. Background Collagens, whose main functions are to provide an extracellular scaffold, are the most abundant mammalian proteins. The macromolecular proteins are the major structural components of skin, cartilage, tendons and ligaments, blood vessels, cornea and bone. 19 types of collagen molecules with varying amino acid sequences have been described. Types I, Π and ΠΙ are known as fibril-forming collagens, with type I primarily found in skin, type Π found in cartilage, and type ΠΙ found in blood vessels (1-4). These three collagen types form triple-helical molecules, unlike the other types which also contain regions that are non-triple-helical. Each fibril-forming collagen polypeptide chain is about 1050 amino acid residues in length, including telopeptides, and assumes a left-handed helical secondary conformation, with 3.3 residues per turn. Type I collagen triple helices contain two ccl (typel) chains and one, homologous but distinct, oc2 (typel) chain [described as [ocl(I)]2(x2([)] wound into aright-handedtriple helix, whereas the type II collagen triple helix is comprised of three ocl(typell) polypeptides. This triple helix structure is a semi-flexible rod-shaped complex, the length and diameter of which are approximately 300nm and 1.3nm, respectively. These triple helices pack together to form microfibrils, which in turn associate to form collagen fibers (approximately 3nm and 20 -200nm diameter, respectively). The microfibril unit exists in vitro, and 3
Current address: DuPont Agricultural Products, Stine-Haskell Research Center, P.O. Box 30, Building 300, Newark, DE 19714 0097-6156/94/0576-0139$10.16/0 © 1994 American Chemical Society Kumosinski and Liebman; Molecular Modeling ACS Symposium Series; American Chemical Society: Washington, DC, 1994.
Downloaded by SUFFOLK UNIV on January 19, 2018 | http://pubs.acs.org Publication Date: December 14, 1994 | doi: 10.1021/bk-1994-0576.ch010
140
MOLECULAR MODELING
is proposed to be an intermediate during the formation of fibers (5-7). The variability in fiber size is thought to result mainly from tissue-specific differences in intermolecular crosslinking and chemical structure. Although the amino acid sequences of collagen polypeptides are complex, a repeated Gly-X-Y tripeptide motif is apparent (for example, in types I, Π and HI); approximately 33% of the residues are glycine and 25% are proline/hydroxyproline. Glycine is important to the structure since it is sufficiently compact to pack inside the central portion of each triple helix. Hydroxyproline, formed by post-translational modification, is also important in stabilizing both the triple helix and fiber via intrachain, interchain, or water-bridged hydrogen bond interactions (8). These interactions help to make the collagen molecule stable at mammalian body temperature. Collagens exhibit a high degree of polymorphism (1, 9, 10). Since they are ubiquitous and multipurpose structural proteins, they can form a diverse range of fibrillar structures in vivo. Collagen in vitro has been observed to form various structures such as segment long spacing crystals, fibrillar long spacing aggregates, and obliquely banded and nonbanded fibrils. Many of these forms have been analyzed using the techniques of x-ray diffraction, freeze fracture and electron microscopy (9,11-13). These and other studies have provided much information on the three-dimensional structure of collagen packing. The longitudinal packing arrangement has been characterized, but the nature of the lateral packing of the collagen triple helices within the microfibrils has not yet been well defined. Although x-ray diffraction and electron microscopic analysis indicate that the order of packing may depend both on the type and function of the tissue (10, 12) and sample preparation (14), previous studies had shown that the lateral packing has crystalline properties (15-18). In light of this, models of the microfibril have been proposed, such as the Smith five-stranded helical microfibril (19) and that proposed by Veis and Yuan (four-stranded; 20). In these models, the collagen molecules are staggered by 1 D-unit. The length of a D-period is approximately 67nm, corresponding to about 234 amino acid residues (21, 22). Each triple helix, when arranged in the microfibril, has a stagger of slightly less than one-quarter relative to each other; that is, each triple helix is displaced by 0, 1, 2, 3 or 4 D-periods relative to laterally packed adjacent molecules, where D=l/4.4 of the molecular length (see 23, 24 for review and diagrams). The Smith model is able to explain the negative staining banding patterns of transverse collagen fibril sections. The light bands correspond to regions of more dense lateral packing where adjacent collagen molecules overlap laterally. The length of this "overlap" is about 0.4D (19, 25). The dark bands correspond to "gap" regions, domains of low density molecular packing noted by Hodge and Petruska (25), where a separation exists between adjacent collagen molecules along the same longitudinal axis; this end-to-end separation is approximately 0.6D (19, 25). Thus, no end-to-end interactions occur between adjacent collagen molecules along the longitudinal axis. The above model emphasizes a rope-like structure for the microfibril with an overall left-handed supercoil of pitch 20D/11 (i.e. between 115200nm in length; 26-28). Other microscopy studies imply, however, that the lateral packing arrangement of collagens has properties which are less crystalline and more liquid-like (21, 29). The octafibril model is one such fluid model (27). It is proposed by this model that there is no intermediate substructure; this is supported by and ^ C NMR data showing that there is significant mobility in the intermolecular collagen interactions (30-32). Models which emphasize crystalline properties of collagen packing are the quasihexagonal (33-35) and five-stranded "compressed" microfibril model (36). The quasihexagonal molecular crystal model was proposed based on the observed
Kumosinski and Liebman; Molecular Modeling ACS Symposium Series; American Chemical Society: Washington, DC, 1994.
Downloaded by SUFFOLK UNIV on January 19, 2018 | http://pubs.acs.org Publication Date: December 14, 1994 | doi: 10.1021/bk-1994-0576.ch010
10.
CHEN & SHELDON
Human Type II "Smith" Collagen Microfibril
141
periodicity from optical diffraction analyses of electron micrographs, which suggested concentrically oriented crystalline domains. Collagen molecules simply packed in a cylindrical, hexagonal lateral array do not produce the correct X-ray reflections, but modification of the lattice spacings and tilting of the collagen molecules by 4-5° to the fibril axis ("quasihexagonal") does give rise to the desired X-ray diffraction pattern. The compressed microfibril model mentioned above contains five collagen strands compressed laterally; this distortion results in the microfibril occupying a unit cell similar to that of the quasihexagonal arrangement. In 1991, Chen et al. (37) developed an energy-minimized three-dimensional collagen microfibril model using molecular modeling techniques. Their goal was to develop a model, based on the earlier "Smith" microfibril model (79), in order to describe both intra- and inter-fibrillar interactions. This recent energetic model consisted of 5 triple helices symmetrically packed in a left-handed superhelical arrangement; the polypeptide sequence used was 15(Gly-Pro-Hyp)i2- Analysis demonstrated that van der Waals interactions are important in microfibril formation, and that electrostatic interactions are important in microfibril stability and the specificity by which collagen molecules pack within the structure. A preliminary fibrillar model of bovine type I collagen, incorporating the primary amino acid sequence for this collagen, was described by Chen et al. (23, 24). To date, no analogous computer model has been developed for type Π collagen. This collagen type is the primary form found in cartilage; other collagens (e.g., type DC) are minor components in this tissue. Modeling of type Π collagen is attractive since the 3 helical chains are identical; i.e., the structure is described as [α1(Π)]3. Determination of the structure of type Π collagen is of intense interest since pathological conditions such as arthritis involve the degradation of this structure, leading to loss of integrity of this load-bearing tissue. A microfibril model for type Π collagen (and other forms) can be created by substituting the native amino acid sequence into linked 15(Gly-Pro-Hyp)i2 templates mentioned above. Furthermore, since a single D-spacing is the repeating unit making up collagen fibers, a native microfibril model can be constructed simply by using mis unit as a building block; it is an important concept that all of the possible intra- and inter-molecular interactions in collagen can be incorporated into mis repeating unit. The construction of such a model would permit a three-dimensional analysis of amino acid sidechain interactions which contribute to triple helical and microfibril interaction and stability, crosslink sites of importance, and exterior surface contour and properties. For example, study of the exterior surface could depict regions or sites where proteolytic enzymes (e.g., collagenase) and other molecules or synthetic compounds can interact with collagen type II. This chapter describes the construction of a microfibril model containing the repeating motif, a D-spacing unit, of native type Π collagen. The approach used was to construct a 15(Gly-Pro-Hyp)3oo template model. The primary amino acid sequence of human type II collagen was then substituted into the 15(Gly-ProHyp)3oo model, followed by structural refinement methods. Each of the polypeptide chains within the final fibrillar structure has a length of 300 amino acids, corresponding to about 1.3D; the reason for choosing this particular length was that it allowed for the construction of a unit that contained a single gap region flanked by overlap regions. The characteristics of this model are described, and its features are compared with experimental data.
Kumosinski and Liebman; Molecular Modeling ACS Symposium Series; American Chemical Society: Washington, DC, 1994.
142
MOLECULAR MODELING
2. Methods
Downloaded by SUFFOLK UNIV on January 19, 2018 | http://pubs.acs.org Publication Date: December 14, 1994 | doi: 10.1021/bk-1994-0576.ch010
2.1. Potential Energy Function. Molecular modeling was performed on an SGI Crimson Elan workstation (Silicon Graphics, Inc.). Calculations were performed using the molecular modeling software SYBYL, version 5.5, developed by TRIPOS Associates, Inc. (1992). SYBYL v.5.5 contains a combination of computational tools for efficient and reliable modeling of both large proteins and small molecules. For this work, emphasis was placed upon utilizing the Biopolymer methods within SYBYL for modeling of protein structures. The molecular mechanics method along with the Kollman force-fields (38) were used to refine the protein structures constructed . The conjugate gradient method for energy minimization was used to minimize the potential energy of the proteins. For the collagen fibril structures, the United Atoms approach was utilized in order to improve the efficiency of all calculations since a single fibrillar model contains over thirty thousand atoms (38; SYBYL, v.5.5). Nonbonded interactions were not computed beyond a distance of 8Â from each atom. The 1-4 interaction terms were given a scaling factor of 0.5 in accordance with Weiner et al. (39). Water molecules were not explicitly included in order to account for solvation, but a distance-dependent dielectric function, D = (Rij + 1), where D is the dielectric function and Rij is the distance between atoms i and j , was used to implicitly account for solvation effects, as all calculations were carried out in vacuo. The amino- and carboxyl-termini contained N-acetyl and N H C H 3 groups, respectively, to minimize any possible end effects. The convergence criterion for all energy rninimizations was a root-mean-square (rms) derivative of 0.01kcal/mol-À.
2.2. Microfibril Modeling Strategy. The general modeling strategy involved the construction of a "Smith" 15(Gly-ProHyp)300 microfibril model starting from a shorter 15(Gly-Pro-Hyp)i2 microfibril model (37), followed by incorporation of the human type Π collagen amino acid sequence into the 15(Gly-Pro-Hyp)3oo model. The final stages consisted of structural refinement using both interactive graphics manipulations and energy minimization methods. Details for these procedures are given below. 2.3. Backbone and Sidechains. An intermediate step in constructing a model of the type II fiber was the incorporation of the complete collagen type Π amino acid sequence into the 15(GlyPro-Hyp)3oo template model. Substitutions were then made for each amino acid type into all their respective positions along the three-dimensional fibril template of 15(Gly-Pro-Hyp)3oo. Before energy minimizing the modified model, unfavorable steric contacts between sidechain atoms were removed by an algorithm which rotated all the sidechain torsional angles in an iterative manner until no further bad contacts were found (SYBYL, v.5.5). Energy minimization of the structure with its corresponding sidechains was carried out in a two step process, described in detail later. First, all the polypeptide backbone atoms were constrained to remain fixed in their original positions while only the sidechains were allowed to move during energy minimization. Finally, the polypeptide backbone constraints were removed and the potential energy of the backbone and sidechains were minimized.
Kumosinski and Liebman; Molecular Modeling ACS Symposium Series; American Chemical Society: Washington, DC, 1994.
10.
CHEN & SHELDON
Human Type II "Smith" Collagen Microfibril
143
Downloaded by SUFFOLK UNIV on January 19, 2018 | http://pubs.acs.org Publication Date: December 14, 1994 | doi: 10.1021/bk-1994-0576.ch010
2.4. Molecular Dynamics Simulations. As a final step in the structural refinement procedure, molecular dynamics was performed at different stages of the microfibril construction in order to "repair" and structurally refine regions in the model which had undergone modifications. For example, when amino acid substitutions are made at specific positions in the models, both the backbone and sidechain parameters for the new residues had to be reminimized. This was because the use of energy minimization alone may not necessarily be effective in allowing a system (i.e., the modified structure) to escape from a local energy minimum previously defined for the unmodified structure; hence, molecular dynamics was used to better and more thoroughly explore the possible conformations allowed for each new substitution and/or modification. Constrained molecular dynamics was also performed on the sidechain groups of the type Π microfibril model for optimization of sidechain contacts or interactions. As described in the Potential Energy Function section, the Kollman force field with the United atoms parameter set was used (SYBYL, v.5.5). The list of non-bonded interactions was updated every 25 femtoseconds and the non-bonded cutoff distance was set at 8.0Â. Time steps used were 1 femtosecond and the temperature was changed according to the "annealing" procedure (SYBYL, v.5.5). Simulations were carried out until system equilibration was obtained. The final equilibrated model was reminimized in order to derive the final minimized model. Due to the large size of these molecular models, no attempts were made to performrigorous(i.e., greater than 100 picoseconds after system equilibration) molecular dynamics simulations as this was not the purpose of this initial study.
2.5. Construction of the 15(Gly-Pro-Hyp)3oo Microfibril Model. The energy-minimized "Smith" model of the 15(Gly-Pro-Hyp)i2 microfibril was used as the starting template (23, 37) for building the final 15(Gly-Pro-Hyp)3oo microfibril template model. The 15(Gly-Pro-Hyp)i2 molecule was duplicated and aligned longitudinally. Following removal of the appropriate end groups (see Potential Energy Section) prior to coupling, the molecules were docked and positioned such mat no unfavorable van der Waals contacts existed, and the length of each polypeptide chain was modified slightly (i.e., some C-terminal residues were "cropped" to place each chain's twist conformation into register) in order to maintain the specific left-handed helical conformation characteristic of collagen polypeptides. A covalent amide bond was then created joining the two molecules. In the molecular modeling process, this intermediate structure was a 15(Gly-Pro-Hyp)22 extended microfibril structure. The structure then underwent structural refinement using both energy minimization and molecular dynamics methods (SYBYL, v.5.5). The next stage of the microfibril construction involved modifying the 15(Gly-ProHyp)22 structure into a 15(Gly-Pro-Hyp)42 structure, in the same manner as described above. This procedure was repeated until an energy-minimized microfibril of slightly over 300 residues per chain was constructed (Scheme la). As mentioned above, the removal of some unnecessary 15(Gly-Pro-Hyp)n tripeptide segments was carried out; this is also shown in Scheme lb and described in the Results and Discussion section. The final product exhibited symmetrical packing (C5 rotational symmetry around the longitudinal axis) as the fiber was created based on the Smith model for collagen packing.
Kumosinski and Liebman; Molecular Modeling ACS Symposium Series; American Chemical Society: Washington, DC, 1994.
144
MOLECULAR MODELING
2.6. Incorporation of the Native Human Type Π Sequence. The model for the human type Π collagen fiber was constructed by substituting the primary amino acid sequence in place of the Gly-Pro-Hyp model (Scheme lb). The type Π collagen sequence is known to contain most of the standard amino acids in addition to some post-translationally modified residues (40). The complete sequence containing 1014 amino acids per ccl chain is represented in a single D-space unit of Scheme la
Downloaded by SUFFOLK UNIV on January 19, 2018 | http://pubs.acs.org Publication Date: December 14, 1994 | doi: 10.1021/bk-1994-0576.ch010
300 Residues Length THl
J
300
TH2
J
300
TH3
1
300
ΤΉ4
I
300
TH5
1
300
Scheme lb 300 Residues Length THl
1
TH2
I
300 78
235
300
TH3
_1
300
TH4
I
300
TH5
I
300
OVERLAP,,
GAP„
OVERLAP„+i
this model. It is important to note that because this model is a repeating unit of the collagen fiber, it is possible to study all of the relevant inter- and intra-molecular sidechain interactions. Since type Π collagen contains 3 identical