Stephen D. Daubert and Stephen F. Sontum Deoatiment of Biochemistry and Biophysics University of California Davis. 95616
I
I
Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides
We have written a program that provides the experimental setting in which biochemistry students can apply the methodology of protein sequencing. With the aid of the computer, the students learn in an active, participatory mode, just as they would if the material were presented to them in the laboratory, but a t far less expense. The program encourages the students to teach themselves since it contains mechanistic descriptions of all the reagents (Fig. 1);these can he seen by the user upon request. Thus, the program has the potential for bringing all the students to the same level of understanding and releases lecture time which would otherwise he used for listing the specificities of sequencing reagents. This gives the instructor the chance to explore the subject in greater depth. Exposure of the students to the computer comes as a side benefit. MASTERMINDfPROTEIN, as we have named the program, was constructed by analogy with the game of logic commonly called "Mastermind", in which the object is to deduce the order of a short sequence of characters using the fewest possible steps. The difficulty increases as a power of the numher of possible characters. In the conceptual frame of MASTERMINDIPROTEIN, with 20 possible characters for each digit, it is very nearly impossible to arrive a t the correct sequence without the aid of the clues provided in the ' form of results from the application of various protein sequencing tools. These tools generate small, and hopefully overlapping, oligomer sequences from a larger polymer. The simulated application of these chemical or enzymatic reagents enables the student to deduce the sequences of given unknown heteropolymers, using exactly the same logic that biochemists use in the laboratory. The program generates a string of amino acid characters, the length of which is chosen by the student. The order of the amino acids in the string is set hy a function of the product of the computer's random number generator and the time of day, in microseconds. Thus it is highly unlikely that any two students will he given identical peptides. The main operation in the program is the scanning of the string of characters for the selection of the subset of characters a t which the string is cleaved. Some of the other algorithms in the program are listed in the Appendix. The program can provide a hard copy of the student's sequences including a record of how close to correct his final prediction came. MASTERMINDPROTEIN does not contain an operation that simulates an automated Edman degradation, because to merely ask for the sequential listing of the first 35 residues in the chain would not be challenging. The computer generates the type of data that a protein chemist who could not afford a sequenator would he likely to encounter. Short range manual Edman degradation data is analogous to the results of the exopeptidase reactions. The user can command operations in a sequence. If a given operation generates several fragments from the original parent polymer, subsequent commands can he directed to a fragment whose number is designated within parentheses. (Undesignated commands refer to the largest fragment produced from the previous command; if no cleavage has occurred, that largest fragment is the parent polymer.) The numbers refer to the set of fragments generated by the penultimate operation and they are assigned according to fragment size. For example, Jane Student asks the program to generate a heptapeptide for
THE AVAILABLE OPERATIONS ARE: CARNOXYPEPTIDASES A AND B CYANOGENBRONIDE N-BROMO-SUCCINIWIDE @ PH 3.5 N-BROMO-SUCCINAMIDE @ PH 4.5 SODIUM NETAL I N L I Q U I D hWMONIA TRYPSIN
HYDROXYL-AMINE LEUCINE AMINO PEPTIDASE SANGER'S REAGENT ISOELECTRIC POINT OF PEPTIDE CONP-AMINO ACID CONPOSITION OF THE PEPTIDE
OPERATIONS (THIS LIST) NECHANISM OF ACTION OF REAGENTS GUESS PANCREATIC RNA-ASE(RNA-ASE A) RNA-ASE A + CHEMICAL MODIFICATION T I RNA-ASE U2 RNA-ASE SNAKE VENOM PHOSPHODIESTERASE SPLEEN PHOSPHOOIESTERASE NEAREST NEIGHBOR ANALYSTS ~~NUCLEOTIDE COMPOSITION ~
~~
~
Figure 1. Operations available to (a1 the protein sequencing program as they appear upon request; the two letter codes seen at the left are all that need be entered by the user. (b) Common operations. (c) Different nomenclature for selected opwations in (a). These operations would use the same programming. but would be used for the potential revision called MASTERMlND/NUCLEiC ACIDS.
her tn sequmce. The program comes hack to her saving that a hrpinpeptide could have a n y of 211- poiihle sequenws, so
she had t)t.ttl-r nbt try to solve thc pruhlem I n just guessing. She agrees, and uiki; 1, the ident~tyof the N.terminal amino acid, 2) how many fragments are generated by trypsin treatment, 3) how many fragments are generated by cyauogen bromide treatment, 4) how many fragments are generated by pepsin treatment of the largest fragment that was generated hy cyanogen bromide treatment, and 5) the composition of the largest fragment generated in 4). This question is posed in the form of the string of commands-SANGER'S, TRYPSIN, CNBR, PEPSIN(l), COMPOSITION(1). The output appears as in Fiaure 2. From this o u t ~ ushe t deduces: 1) N-terminal ALA, z i t h a t if there is ARG or LYS present, it would have to he on the C-terminal end, 3) an internal MET, and 4) a pepsin cleavage site (LEU) which must contrihute the N-terminal member of the trimer ~roducedhv ~ e o s i nfrom the C-terminal end of her original peptide. At this point, Jane asks for the composiii~mu i the uthw peptic fragment from the large t.yanogen tmrmide product hy keying C U , P E I I , . C O I ~ Iand , finds that it contains tar, glvcinc residues. She then d>ierves that she can now unnmliig~~ously ;1s4gn one ,equence, out u i tmore than a billion pt,siil)ilitirs, tu her unknwm h r p r n ~ e ~ t l d c . (As you have probably figured out, her sequence is ALAMET-GLY-GLY-LEU-THR-LYS.) MASTERMINDiFROTEIN should provide a workable platform upon which to generate a sister program dealing with Volume 54. Number 1, January 1977 / 35
SA SANGER'S REAGENT N-TFRMINAL RESIDUE OF FRAGMENT 1 I S ALA TR -~~ TRYPSIN --
OPERATING ON FRAGMENT 1 GENERATES 1 FRAGMENT CN CYANOGENBROMIDE OPERATING ON FRAGMENT 1 GENERATES 2 FRAGMENTS
1s rnlrullted hy \aryinr the pH hs incremrnf