Amino acid sequence diversity in proteins

structural components, locomotive agents, enzymes, hor- isted as the classical a-helix. Taking the pitch (i.e., rise) of the mones. antibodies, and tr...
0 downloads 0 Views 1MB Size
David Blackman Federal City College Washington D.C. 20005

Amino Acid Sequence Diversity in Proteins

It is usual, in the undergraduate biochemistry course, to note the diversity of structures and functions ascribed to the proteins. This class of macromolecule serves as cellular structural components, locomotive agents, enzymes, hormones. antibodies, and transporters of oxygen, electrons, and nutrients. ~ v e so n rudimettriry an organism as the hamrium E. coli a)nwins as manvas3WOdiffersnr proteins ( I ) . It has been estimated that the total numher ofdifferent proteins appearing in all current forms of life (2) is of the order 10'0-10'2. T h e question arises as to whether it is reasonable that this number of sequence isomers can be generated from the set of 20 naturally occurring amino acids. The answer, of course, is in the affirmative, since even a relatively small rotei in of 100 amino acid residues can have 20100 possible sequence isomers. The number 20100 (which is more convenientlv dealt with as 10'") . is.. however. so large as to be almost beyond comprehension unless it can be related to some familiar reference points. This paper contains a numher of simple calculations whose aim is to make this quantity more meanineful. ~" of water were collected a t 25% they If 1 0 ~molecules would occupy avolume of 3.01 X l o g 2 km3 (7.23 X lo9' mile?), corresponding to a sphere 8.31 X 1030 km (5.17 X 1030miles) in diameter. Traveling a t the speed of light, 299.793 kmlsec (3), 8.80 X 10'7 years-would be required to traverse the diameter of this sphere. atoms of hydrogen to our sun. One can compare Taking the solar mass as 1.99 X 1033g, and its density as 1.41 glcm? ( 4 ) ,gives a volume of 1.41 X 1018 km?, corresponding to a sphere 1.39 X 106 km in diameter. Assuming the sun to consist entirelv of hvdroeen. " - . i t would contain 1.18 X atoms of hydrogen, atoms of that element. In contrast, a t the same densitvas the sun. would o c c u..~ av sphere . 2.83 X 10" km in diametG and 1.19 x 1091 km3 in volume. Traveling, as before, a t the speed of light, one could (in principle, a t least) traverse the diameter of the sun in just under 5 sec. In order to cross our hypothetical super-sun a t its diameter, however, one would have to travel (again a t the speed of light) for 2.99 x 1017 p.

170 1 Journal of Chemical Education

These calculations can also be cast in the dimensions of protein molecules. Suppose there was one molecule of each of the 10'30 types of protein, and that every one of them existed as the classical a-helix. Taking the pitch (i.e., rise) of the helix as 1.5 Alresidue (5). . . each molecule would then be 150 A long. Laying these heliresend-to-end would createachain 1.50 X 10"" km. or 1.58 X 101"Vieht vears (4.86 X 10tn' ~ a r s e c ' l long. ~lternatively,if all theseupritein molecules assumed the 8,or pleated sheet, configuration, having an interaxial distance of 4.7 A and a repeat distance of 7.0 Alresidue (7),they could be laid out to cover an area of 3.68 X loa1square light years (3.46 X lososquare parsecs). Taking the observable universe as a sphere 2.6 X 101° light years in diameter2, its surface (about 2 X loz1square light years) could be com~letelycovered some 2 X lofi"times by such protein sheets, assuming negligible thickness of the sheet. It is clear that an astronomical number of distinct proteins is possible, using only the twenty naturally occurring amino acids. If one takes into account the fact that these calculations Astronomical distances. with which we are clearlv,dealine here. ~~~. arr often givm in terms d t h pmw. ~ or parallax-srwnd.The parallax o f n srnr ir,d t t i w d as the an&, measured nr rhe srnr, which is subrenrled hy thr radius uf the earth'r mbil A pmre, thm, is rhc dkrance from the earth to a star whose parallax is one second, or 'hew of a degree (6). One parsec is equal to 3.086 X 1013km, or 1.92 X 10'3miles, or 3.262 light years. The size of the universe is difficult, if not impossible, ta estimate. It is possible, however, toobtain an upper limit on thesize of the observable universe from Hubble's law, which states that the velocity, V, with which a galaxy recedes from the earth is directly proportional to its distance, D,from the earth, or V = kD (8).A galaxy receding at the speed of light would be theoreticallyunobservable because its light would never reach earth. Consequently, the limit of the ohsewable universe would be that distance which is sufficientlyfar from earth to allowa galaxy to recede at the speed of light. This is just Vlk. Since k. Hubble's constant. is usuallv taken a3 beine in the ranee of 50--100 k m FCC mFgGpGr4CL' (31. one van estimnw k = ":, km src :Mpc-I. Carrying out rl~cralrulatiowyiwithe d~srrvnhleuniwrseasapeo. eantrir rphrrc of radius about 1.3 X 10' hght years. ~

~~

were carried out for small proteins of only 100 residues (proteins several times this size are not uncommon 19)) or the additional complications wrought by quaternary aggregation of independent polypeptide chains, the number of possible distinct proteins becomes virtually limitless. An estimate of the length of time required for nature to "try" all possible protein sequences can be obtained in the followine wav. Assume the surface of the earth (5.10 X lo8kmz ( 4 ) ) t o be covered to a depth of 100 km with a "saturated" bacterial growth medium (i.e., one containing about 108 cells/cm3). If each cell contains 3000 different rotei ins, each of which contains 100 amino acid residues, and tbese mutated3 a t the rate of one amino acid per protein per second (1.53 X mutations/sec worldwide), i t would require 2.07 X loa5 years to synthesize a sample of each of the possible protein

molecules. Contrast this with the estimated ( 1 1 1 remaining lifetime of the sun, approximately 10' 'years. There may he some comfort in the realization that the s& will probably bum itself out long before nature depletes its store of possible protein sequences. Literature Clted 111 Lehninger, A. L., "Bioehemislry: 2nd ed.. Worth Publishers Inc, New York. 1975, p. 5.

121 Re/. 11). p . 6 . (31 Menrel, D. H., Whipple, F. L., and de Vaueouleurs, G., "Survey of the Universe." Prentico-Hall, Englewuod Cliffs, N.J., 1970. p. 756. (41 Weast. R. C. (Edrtor), "Handbook af Chemistry and Phyaics: 54th ed., CRC Press, Cleveland. 1 9 7 3 , ~F-160. . I51 Edsall, J. T.. and Wyman. J.. "Biophy8ical Cheminhy: Vol. 1, Academic Press, lne., NewYork, 1958.p. 110. 161 Ref. IJl,o.431. (71 Ref 151. 106. (81 Animov. 1,"The Univerrp: From Flat Earth to Quasar." rev, ed.. Walker and C o . New Ynrk 1971. n. 201

b.

This mutation rate is actually at least 3 X 101° times greater than the estimated spontaneous rate of 4 X 10-lomutationsper DNA base per replication in E. coli (10).

Volume 54, Number d March 1977 / 171