computer ferief. 84 Protein Structure Prediction in Color
Table 1.
Lawrence C. Davis and Gary A. Radke Depanment of Biochemistry Kansas State Universitv Manhattan. KS 66506
a-Helical Residues
Biochemists are interested in applying findings in basic chemistry to living systems in orderto uLderstand the ways in which those systems do chemistry efficiently a t room temperature and pressure. Enzymes, the catalysts of most biological reactions, have to maintain very specific structures to carrv out their function. Three-dimensional structures of some enzymes have been revealed by methods of crvstalloerawhv: .. . .. but manv more. for one reason or another. have remained refractor;. ~ l t e m a t i v emeans of probing their structure and organization have been devised. Simple, empirical methods for prediction of secondary structure of polypeptide chains are especially useful when the amino . .. . sequence of a protein is derived from a nucleic acid sequence and the enzyme itself is not available in large amounts for detailed crystallographic or kinetic study. Two quite popular methods for structure prediction are the Chou and Fasman ( I ) method for prediction of secondary structure and the Kyte and Doolittle (2) method for predicting "hydrooathic character". T o enhance the usefulness of these nrograms, we have implemented them onan Apple IIe computer with color erawhics. Use of color orovides an additional visual input rnode.for the observer and facilitates comparison of pattern of structure between similar proteins.
GI"Met Ala Leu Lys+ Phe
Methods
Chou and Fasman ( I ) have devised a set of rules for predicting the probable secondary structure of a Drotein on the basis of its amino acid sequence. Using the known structure of 29 proteins as determined through X-ray crystallography, they established a hierarchical order of the a-helix, p-sheet, and 8-turn "conformation parameters" for each amino acid. The conformational parameter was defined ( I ) as the frequency with which a particular residue is found in a structure relative to the average frequency for all amino acids being found in that structure. Within the a-helix and p-sheet categories, the amino acids are classified as being strong formers, weak formers, indifferent, weak breakers, and strong breakers. Using this classification, we have assigned colors in both the a-helix and p-sheet categories to each of the amino acids. The amino acids, their conformational parameters for n helix andpsheet, and their assigned colors are listed in Table 1. T o predict structure and assign colors to a given residue, the program uses a four-point moving average and calculates that it resides in a helical region) the averaee " P... (orobabilitv .. - . using the paramekrs for the residue in question and the next three residues. and the Darameters for the residue in auest w n nncl the prc\,ious three rcsidws. If the larger of the two averarcs is greater thnn 1 .U:l and greater than the calculated paforWthesame residue, i t is jxediked to be within an a helix and is assigned the color oranae. The average B-sheet potential is calculated in the same way, but it must be great& than 1.05 and be greater than the residue's P, to qualify as being within a p sheet and have the color purple assigned to it. Those residues that do not meet the above restrictions are ~
582
Journal of Chemical Education
~
edited by JOHN W. MOORE Eastern Michigan University. Ypsiianti, MI 48197
Conlormational parametersa and Color Asslgnments for a H e l i c a l and /3-Sheet Residues of 29 Proteins
P,
1.51 1.45 1.42
1.21
1.16 1.13
1.11
Wn
1.08
T~P Ile Val AspHis+ Arg+ Thr Ser CYS TY~ Asn Pro G~Y
1.08 1.06 1.01
1.00 0.96 0.83 0.77 0.70 0.69
0.67 0.57 0.57
a-Helical Color Assignment
&Sheet Residues
Hn - - - - - - - - :Val Hn : Ile Hn RED Tyr Hn - - - - - - - - :Phe h a - - - - - - - - :Trp ha : Leu ha : CYS ha ORANGE Thr he : Gin hn - - - - - - - - :Met la - - - - - - - - : Arg+ la YELLOW Asn i a - - - - - -.-His+ -. in : Ala ia GREEN Ser ior ...--... . Gly b e - - - - - - - - :~ y s + be BLUE Pro Bn - - - - - - - - :AspBa PURPLE Glu-
Pg
1.70 1.60 1.47
1.38
&Sheet Color Assignment H@ H@
.-...... : PURPLE
HB ...--... :
.-...--.
ho : hB 1.30 hB 1.39hB BLUE 1.19 h@ 1.10 hB 1.05 hfl -......: 0.93 i@ -----: 0.89 ia 0.87 i? GREEN 0.83 ip .....--. : 0.75 bp ......-: 0.75 b@ YELLOW 0.74 b? - - - - - - - - :
1.37
- --
0.55 80 -...--.. : 0.54 BO ORANGE 0.37 BB - - - - - ---:
'Conlnmaionai parameters baoed an 29 proteins from Chou and Farman (0.
assigned the color green, meaning they are not reliably predicted to be either a helix or p sheet. Predictions for 8 turns a t a eiven residue use the Droduct of positional probability ( ~ a b 2) c for a residue plus the next three residues, and the average p-turn potential. Those residues with a turn product probability greater than 7.5 X and with an average 0-turn potential greater than 1.0 and greater than both ;he average a-helix potential and p-sheet potential are predicted to be possible turn regions and are assigned the color blue. Those residues with a turn product and an average D-turn probability greater than 1 X potential greater than 1.0 and greater than both the average a-helix potential and p-sheet potential are predicted to be orobable turn reeions and are assiened the color white. An " " example of all the calculations for a simple peptide is given in Table 3. This articular . o e.~ t i d ,ethe . C terminal of bovine rhodopsin, cannot clearly be assigned either a or p structure, nor does it contain a turn by the criteria of Chou and Fasman. The hydropathic character of a protein is predicted and displayed in the following manner. Kyte and Doolittle (2) developed a hydropathy index for amino acids based on relative hydrophilicity-hydrophobicity for each residue. This was based both on the free energy transfer t o aqueous phase and on observed amino acid locations in the structure of crystalline proteins, with some subjective adjustments of scale. Using the amino acid hydropathic index values assigned to Kyte and Doolittle (2) (Table 4), a nine-point m o v i n ~sum is calculated heeinnine at the amino terminus. I T thi&m is greiuw than l i pur& is ussignd to rhe r T 4 residue. A sum ~ r e a t r than r U hut leis thnn l u asiiens blue lo e than 0 but greater than Goassigns residuei 4. ~ i a l u less green, while a value of less than -10 assigns orange. Thus,
+
Table 2.
Frequency Hierarchy of Amino Acids In the 6 Turns of 29 Protelns from Chov and Fasmana
,+ 9
ic 1 Asn 0.161 Cys 0.149 Asp 0.147 His 0.140 Ser 0.120 Pro 0.102 Gly 0.102 Thr 0.086 Tyr 0.082 Trp 0.077 Gln 0.074 Arg 0.070 Met 0.068 Val 0.062 Leu 0.061 Ala 0.060 Phe 0.059 G$ 0.056 Lys 0.055 Ile 0.043
Pro 0.301 Ser 0.139 Lys 0.115 Asp 0.110 Thr 0.108 Arg 0.106 Gln 0.096 Gly 0.085 Asn 0.083 Met 0.082 Ala 0.076 Tyr 0.065 Glu 0.060 Cys 0.053 Val 0.046 His 0.047 Phe 0.041 Ile 0.034 Leu 0.025 Trp 0.013
Asn 0.191 Gly 0.190 Asp 0.179 Ser 0.125 Cys 0.117 Tyr 0.114 Arg 0.099 His 0.093 Gly 0.077 Lys 0.072 mr 0.066 Phe 0.065 Trp 0.064 Gln 0.037 Leu 0.036 Ala 0.035 Pro 0.034 Val 0.028 Met 0.014 Ile 0.013
I+ 3 Trp Gly Cys Tyr
0.167 0.152 0.128 0.125 Ser 0.106 Gln 0.098 Lys 0.095 Asn 0.091 Arg 0.085 Asp 0.081 Thr 0.079 Leu 0.070 Pro 0.068 Phe 0.065 Glu 0.064 Ala 0.056 Ile 0.056 Met 0.055 His 0.054 Val 0.053
i.. i + 1.. i + 2 ., if3 reore4em me freauencies of the first. secand, mird, and fourm rsldues, respectively, in a revbased on all four positions of a reverse turn. From Chau and Fssmsn In.
Table 3.
Residue Amino acid
Table 4.
Samnle Calculation for a Small PeDtide
p,
Residue Pe
1.16 0.83 1.51 0.83 0.77 1.11 1.06 1.42 0.57 1.42
0.74 1.19 0.37 1.19 0.75 1.10 1.70 0.83 0.55 0.83
Residue Predictiona
P,
OL
101 0.96 0.74 0.96 1.43 0.98 0.50 0.66 1.52 0.66
a
8 6 u
B 6 @ a
Turn-Product Probability (XIOs) 3.6 3.5 7.4 2.4 1.9 0.84 0.93
-
-
Based on four-point moving averagss brward and backward compared by ihe rules de~cribedin the text.
purple designates regions predicted t o be hydrophobic in character and orange designates hydrophilic regions, while blue and green designate transitional regions. The program disolavs information in four . .. . . predictive . mudes. 'l'hr iirit modt, displays the s3me sequence five times in parall~lwith each line ui the display relating a different piece of iniormation. 'l'hr first line colnr cudes the u-helix parameter vnlu's of the residues in rhe sequmce, as defined in 'l':~hlc I. The second line codes the J-sheet values, also defined in Table I . and the third the turn parameter va1ue.i. frtrm'l'nhle 2. In the fourth line. red denotes rerioni - oredict. ed to be cu-helix, purple denotes predicted P-sheet regions, and green designates residues that are neither. The fifth row is largely blank with characters appearing only in predicted turn reeions. In this line white denotes a higher predictive . index &an blue. The second disolav mode shows the helixlsheet prediction iines of two sequences in parallel, while and turn the third mode displays the helixlsheet prediction lines of four sequences i n parallel. T h r final displ& mode shuws the helix sheet prrdictiun and the hvdropathic character prcdiction lines for a single sequence. We find it convenient to use a C. Itoh color printer (Model ~
P. Asn1.56 Gly 1.56 Pm 1.52 Asp 1.46 Ser 1.43 Cys 1.19 Tyr 1.14 Lys 1.01 Gln 0.98 Thr 0.96 Trp 0.96 Arg 0.95 His 0.95 Glu 0.74 Ala 0.66 Met 0.60 Phe 0.60 Leu 0.59 Val 0.50 Ile 0.47
--------:
ORANGE
--------: --------:
BLUE
--------:
B turn. P,isthe canformatlonalpdantialof a residue in a Bturn
Resldue Assignments on the Hydropathy Scale Hydropathy Indexd
Amino Acid Ile Val Leu Phe CYS Met Ala GIY Th, Trp Ser Tyr Pro His GI" GI" ASP As" LYS Arg Mean
'Hydropathy index values from Kyfe and OarliUle (8, 8510scp) for producing hard copy of significant findings. This is useful in one-to-one teaching but the character size is rather small for projection slides, except to show overall patterns of color when com~arinasetsof seanences. It would be possible to use charactel form, rather than color, to convey the structural information for a protein sequence. As an example using character form we show the predicted structure of bovine rhodopsin and its hydropathy plot (see Fig. 1). The discrepancy between predicted helices and predicted mrmhmne spanning regions is readily apparent. Roth the Chou and Fasman 11)and the liyte and 1)uolittle r21methods iind favor with biuchemists in making structure predictions for proteins. Unfortunately, the statistical nature of the predictors is often neglected and results are interpreted as absolutes rather than as probabilities. Our aim is to facilitate teaching of these methods so that stuVolume 64
Number 7 July 1987
583
.~
ir classes ~. - - - ~ - - -of ~ oroteins if one wished to obtain more refined probabilities for structure prediction within a class.Simply addine more nroteins to tbis same table did little to improve p&dic;ive ac'curacy (8). Using both Chou and (1) and Kvte and Doolittle (2) methods together one can attempt t o make some intelligent guesses about regions with ambieuous scores. For instance. alternating hvdrophilic and hydrophobic residues will give 'an intermdiite hidropathy value hut may well be strongly predicted as a-helical. This might represent a helix lying along the surface of a protein with one side buried and the other exposed. The program Protein Structure prediction is available from Project SERAPHIM a t $5 for a 5'14-in. disk plus $2 domestic-or $10 foreign postage. Make checks payable to Project SERAPHIM. T o order or to get a Project SERAPHIM Catalogue write to Project SERAPHIM, Department of Chemistry, Eastern Michigan University, Ypsilanti, MI 48197. ~~
~~~
~
asm man
Flgure I Strucl.re predlct ons tor DOvlne rhwopsln comparea ~s ng the Chod method T k unaer ned and Fasman ( 11 memod ano ins W e and Dooim e (4 regoons are tnose prwleted oy hargrave el ai (6)to oe memaane-spannmg helices. In the tap line upper case italic is the helix, lower case italic is undetermined, and upper case bold is &sheet. For the lower line the upper case italic is most hydraphilic, lower case italic is moderately hydrophilic. lower case bold is maderately hydrophobic, and upper case bald is sUongly hydrophobic.
Acknowledgment
dents and faculty can gain a feeling for their reliability and sensitivity. Data entry and editing are sufficiently easy that it is possible for a student to enter quickly, for instance, four variants of a particular sequence, and to see how the amino acid substitutions alter the predicted conformation of a pnlgpeptide. \Vt. haw found thisof use in designing synthetic oeptides instudies of bovine rhodopsin ( 3 )and in romparin; &leotide binding domains in nitrogenase (4). The use of color in presenting a structure prediction of a sequence has a decided advantage over other methods because the human eye is able to take in and process more information with a ranee of colors than with simple blacka n d - w h i t r - p r e s e n t a h .I h i i allows more compact presentation tlf results. Karlier oresentations 01 Chou and Fasman predictions (5)typically showed the protein sequence on one line and on a separate line some series of symbols indicating probable conformation. Using color, it is possible to contain the same information on a single line, facilitating comparisons of different sequences one above the other. The eye then detects regularities in the color distribution of the four sequences as agestalt, rather than character by character. Earlier presentations of the Kyte and Doolittle (2) prediction typically showed a graph of hydropathy vs. sequence position number. With color, the pattern of alternating domains becomes readily apparent while still showing the amino acid sequence. The nature of the moving average means that there are no rapid fluctuations, so blocking out the scores arbitrarily as we have done produces no significant loss of information. The cutoff values that we have assigned for purple (hydrophobic) and red (hydrophilic) correspond very closely to known crystallographic domains, either buried or exposed, as discussed by Kyte and Doolittle (2) and give very few false positive predictions. Within the transitional score regions (green and blue) there is much less certainty, and no reliable prediction can be made without further information. When comparing predicted helical regions and hydrophobic regions for a protein such as bovine rhodopsin, it i s apparent that the Chou and Fasman method fails to predict the membrane-spanning helices of this molecule (6), presumably because the method was based on crystal structure of mostly globular protein (7). Argos et al. (7) showed, using hacterio rhodopsin, that the membrane-spanning helices had an amino acid distribution that correlated only weakly with either predicted a or j3 regions using the Chou and Fasman criteria. They developed an empirical set of predictors, similar to that used by Kyte and Doolittle but slightly different in rank order. In a similar way i t would be simple to rrplilre the prohahility table oiChou and Fasman with another tahlr 11usedon more recent cystral structuresofspecif~~~~
584
~~
This work was supported in part by NIH Grant GM 23039 and by the Kansas Agricultural Experiment Station. This is contribution 86-117j from the Kansas Agricultural Experiment Station.
Computer Programming in General Chemistry G. L. Breneman and 0. J. Parker Eastern Washington University Cheney, WA 99004
~
Journal of Chemical Education
Computer availability has increased to the point where most general chemistry students have easy access to them. Many of these students also know something about programming. It may be time to take advantage of this and have students start . oroerammine comouters to solve problems as an integral part of their general chemistry course. The possibilitv of students doing their own programming and how to introduce this into the-curriculum-are discuss& in tbis paper.
-
Are Pocket Computers tor Real?
Most people scoff at the mention of pocket computers; they envision the old programmable calculator. But there have been great advances recently in producing a true computer the size of a calculator. These have an alphanumeric keyboard, an excellent version of BASIC in ROM (more sophisticated than Applesoft), and enough memory in RAM to handle all the nroerammine we have in our . .. .. assignments .. I'hysizal Chemistry I.ab rourin easily. The Share K.:1.-5500ll u,ith a 10K ROM and 4.2K RAM is typical. It ailows several programs to be stored in memory at the same time and has a separate calculator mode with keys for all of the usual scientific functions including standard deviation and linear least squares. It has been selling for $70, and a combination small printer and cassette recorder interface is available for $55. The inexpensive pocket computer will lead to everybody having his or her own computer just as everybody had to have a calculator during the last decade. lmplementatlon
In a typical school a mixture of terminals, microcomputers, and pocket computers furnished. by the school andlor the students will be available. However, this could involve as many as 10 different types of computers and student programming ability from none to expert. An approach that gives everybody access to the programming projects must be carefully designed.
BASIC is the only computer language generally available on a wide range of computers. Fortunately i t is also one of the easiest t o learn and allows programs of modest length to be quite significant. These factors cannot be underestimated when we are dealing with nonprogrammers. We need to convince them that some programming is worth their time and that means they must see significant results with reasonable effort on their part. A chemistry classroom is not a computer science classroom. Thus most of the exercises should not involve the student writing all of the programs needed to do the work. Initiallv the student is furnished with programs that he enters,-debugs, and runs. Subsequent p r ~ b l i m swould require small changes in the program to handle some of the exercises and finally, depending on the group of students, there could b e a few situations where the students write their own programs. As the ability of incoming students increases, assignments can shift toward more actual program writing. Our Approach We have a number of terminals on a VAX minicomputer available in our department and in several other campus locations. This is the main resource we provide to the students. Some of our students have their own microcomputers and increasingly their own pocket computers. During a classroom demonstration showing how to use the chemistry department's program library, we show students how to access BASIC and then enter, debug, and edit a simole oroeram for calculatine density. Students follow this demon&aiion step by step o n a TV monitor and on a printed handout that they keep for future reference. Each programming exercise shows the program listing followed by a sample run with typical data. This sample run is used b; the student to determine whether his program is running correctly after he has entered i t into a computer. We assume there is no way to save the programs after they are used. Our class is not provided this option on our VAX, and manv students' own comouters mav not have a disk or tape storage. This, along with some students' inexperience, make short oroerams and modest-length assianments desirable. If storage isavailable, students could b&d up a substantial library of chemistry programs. We have started out with three extra-credit assignments oer quarter, often . giving.students a choice between two so they can choose a topic of more interdifferent est to them. As computer access becomes even more available, we expect to make these assignments a regular part of the course work.
Table 5. 10 20 30 40 50 55 60 65 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250
Listing ot Weak Acid Titration Curve Program
PRINT "WEAK ACID TITRATION" INPUT "KA=";KA INPUT "CONC=":CA INPUT "VOL=";VA KW=IE-14 KB=KWIKA CB=.l NA=CASVA PRINT" 0 2 4 6 8 10 12 14 PH" PRINT" FOR VB=O TO 65 STEP 5 IF VB=O THEN H=SQR(KA'CA)\GOTO 150 NB=CB'VB IF NB.Biochem.1982,128,565. 8. Arga,P.:Hsnei,M.;Garevito, R. M F E B S L a l f . 1978.3(1), 19.
.
.
.
Electronic Journal To Be Produced The Journal of Chemical Education and Project SERAPHIM will be jointly producing an electronicjournalunder the sponsorship of the Dreyfuss Foundation. Details are given in a note on page 644 of this issue.
588
Journal of Chemical Education