computer mief. 94
edited by JOHN W. MOORE Eastern Michigan University, Ypsilanti, MI 48197
Programming Style G. Scott Owen Department of Mathematics and Computer Science, Georgia State University, Atlanta, GA 30303
Most scientists are, like myself, self-taught programmers. This is changing since many students now take programming courses in computer science departments, but the older breed "did it on their own". Many of us used FORTRAN IV on mainframes and then BASIC on the early microcomputers, such as the Apple 11.The books we used gave us all of the tricks necessary to make maximum use of limited memory (maybe only 20 KB or less) and slow processors. The languages had very limited control constructs, frequently only a GOT0 statement and a very limited IF-THEN statement. The machines supported upper case characters only, since after all "If God had wanted us to use lower case he would have put it on the ASR Teletype 33" (1). The problem is that the computer technology has changed greatly but many of us have not. We would feel disgraced as professionals if we refused to accept new developments in our scientific field the way we do in computer technology. I feel that it is our professional responsibility as teachers to provide a good example for our students in all technical areas (I will leave our private lives out of this), including computer science. In this article, I will discuss why we should use good programming style and provide several tips on how to accomplish this. Rationale
The primary reason to use good coding style in writing programs is for clarity and readability. Very few programs are written and then never changed. In large software systems the maintenance phase, which consists of finding errors that surface after the system is delivered, enhancement of the system, and moving the system to other hardware, costs an estimated 60% of the total amount of money spent on the system. Thus, over the total lifetime of the system, maintenance is more expensive than the initial development. One of the best ways to reduce this maintenance cost is by making the program code easy to read and understand since an estimated 50-60% of maintenance time is spent in comprehension of the old program. These statistics are valid for both microcomputer and mainframe programming projects. You may never develop a large system, but the same principle holds true for smaller programs. Frequently you may write a program, use it, and then, six months later want to change something in the original program, use part of the code in a new program, or move it to a new hardwarelsoftware environment. If it is particularly useful, you may want to have others use your program, and so you might write a paper based on it or else give it away to others. For all of the above reasons, you should make your program as easy to understand as possible. Style Guidelines
In this section I will briefly discuss several style issues. Some languages more naturally lend themselves to good 692
Journal of Chemical Education
coding style, for example, Pascal versus BASIC, but you can write bad programs in any language and good programs in almost any language (I have heard it said that APL programmers never modify their programs since it is easier to write a new program than to understand and modify an old one). I will try not to be too rigid or pedantic in my presentation and most of the remarks are language independent. For a general discussion on choice of a language read reference 2. Modular Programming
One of the most important components of style is the overall program structure. It is well established (3) that building programs in a modular fashion makes the development process easier and simplifies later program modifications. Studies have shown (4) that humans are unable to think simultaneously about more than about five to nine chunks of information. Beyond this level of complexity, humans begin to lose track. The implications of this are that we cannot solve complex problems in one large effort but, rather, we must use the "divide and conquer" strategy of breaking up intractable complex problems into simpler subproblems, which can be handled by our limited brains. This means that in designing a nontrivial computer program we should break the program into modules rather than viewing it as a monolithic entity. We can then construct the program, one module at a time, testing each module as we build it. This makes the debugging process much easier and leads to a better structured and more reliable program. We can do this by initially writing the main program, which primarily consists of calls to subroutines or procedures that perform the different tasks. For each subroutine we write a "stub", which is just the name and an output statement that indicates we have indeed reached that subroutine. We can then compile and test this initial version to ensure that we are calling all of the subroutines and in the correct order. We then implement each of the subroutines, one by one, testing each one individually. It may be necessary to write some extra testing code for the different modules, called a "test harness". This is analogous to the scaffolding that surrounds a new building as it is being constructed and is removed once the building is complete. As an example of this process let us assume that we are going to write a program that will read in some experimental data from an external file, perform some calculations on the data (for example a least-squares analysis or digital filtering), plot the data, and then write the massaged data out to another external file. We would first write the main program, which would have calls to the following subroutines: EnterData-subroutine that reads in the data from an external file. ComputeData-subroutine that performs the calculations on the data.
to get out of graphics mode and back to text mode when the program terminates. (3) If you use utilities and/or other things that are in files separate from the program, be sure to indicate which other files must be transferred with your program in order for it to work. (4) Machine-language calls-document them very carefully and tell people what you are doing. (5) Document any changes in the memory map that your program makes and restore the normal situation when you exit.
PlotData-subroutine that plots the data. WriteData-subroutine that writes the results to an external file.
In the first program version, each of the above subroutines would be just a stub, for example, for EnterData: procedure EnterData( var Datalout): DataType); {procedureto read in the data from an external file] begin writeln('in procedure EnterData') end; {EnterDataj
This version of the program can be compiled and checked to ensure that all syntax errors are removed and that the subroutines are being called in the correct order. The next step would be to implement each subroutine, recompile the program, and test the subroutine. The first to be implemented would be the EnterData procedure. T o test this procedure we would include some code to write the data to the screen as it is read from the external file. This code could be removed later. Then, one by one, the other subroutines would be implemented and the program recompiled and tested for each one. For testing some subroutines, such as WriteData (writing the data to an external file), it may be necessary to have an extra subroutine temporarily that would read back the file and display it to ensure that WriteData was correct (or we could use the EnterData subroutine for this if the file format was the same). Another advantage to modular programming is reusable code. For example, you can use the EnterData subroutine in other programs, either as is or with minor modifications. It is much easier and quicker to construct a new program from building blocks rather than from scratch. The resultant program will also be more reliable since it is constructed from previously tested components. Serious programmers all have their own libraries of subroutines that they use regularlyComments
You should always use comments to tell what the program is supposed to be doing. Every subprogram (procedures, functions, subroutines, etc.), including the main program, should have a prologue consisting of the following: 1. The purpose of the subprogram. 2. The input and output variables of the subprogram; any restrictions on these variables, for example, only positive integers for a factorial function; and error conditions. 3. If a complex algorithm is used, a brief description of the algorithm and perhaps a literature reference. 4. The author of the subprogram and the creation date. 5. A log of all changes to the subprogram giving the change, author, and date of the change.
The main prologue should also give the language processor and the version used, for example, "Turbo Pascal version 3.01". Some of the above information may be implicit in the subprogram specification and does not have to be in the prologue as for example: procedure compute-factorial (number: {inpositive) integer; var {out]answer: integer);
In addition to this prologue, use comments in the body of the program to illuminate obscure steps. This is more important for BASIC or FORTRAN than for Pascal since Pascal tends to be more self-documenting. Another very important use of comments is when you have done something unusual. People have had great difficulty in interpreting Apple I1 programs because of the many undocumented tricks others have used. Some examples of this follow (some of these pertain to the IBM PC world as well). (1) Changes to DOS-either do not do it or document it very thoroughly and be sure to fix it when you exit. (2) Graphics-explain how you have packed and unpacked screens and any other weird things you have done; also be sure
Identifier Names
.
Use good mnemonics for identifier names, for example, molar-concentration is easier to understand than X. Do not use all upper case letters. Studies have shown that lower case letters are much easier to read than upper case letters. Humans are excellent at pattern recognition, and lower case letters provide a richer pattern since they have height variation, whereas upper case letters are all the same height. Use either the underscore character, as shown above, or mixed upperAower case for multiple word identifiers, for example, Molarconcentration. You may have two objections to the above suggestions. The first is that long names require too much typing. One answer to this is that, except for trivial programs, the typing time is much less than the debugging time. A second answer is to use a text editor with find and replace capabilities. Thus, the above long names could originally be typed as mcand then be replaced by the full name. The second objection, which pertains only to programmers using interpreted BASIC, is that long variable names take up too much memory. Except for those who are still trying to write large programs on a 64KB Apple 11, this should not be a concern. If your program will not fit into a 64KB space then you need a different language. Even for Apple I1 programmers, there are freeware packing programs that will condense a good readable program into a smaller, faster, and totally unreadable program. So you can use the good program for maintenance and then pack it for execution. Control Constructs
A program is much easier to understand if the dynamic flow of control mirrors the static text. This means that the dynamic flow starts a t the top and, except for subprogram calls, moves down the text page. This can be accomplished by the use of modern control constructs and avoidance of undisciplined use of the GOT0 statement. If you program in BASIC or FORTRAN it may be necessary to use the GOT0 but use it only to emulate modern control constructs such as in the following examples: While-Do statement while (something is true) do begin (block of statements) end
Using a GOT0 to emulate a While-Do statement 890 Rem while-do loop 900 if (something is not true) goto 1000 (block of statements) 990 goto 900 1000 rem
Repeat-Until statement repeat (block of statements) until (something becomes true)
Using a GOT0 to emulate a Repeat-Until statement 890 Rem repeat-until loop 900 (block of statements) 990 if (something is not true) goto 900
Volume 65
Number 8
August 1988
693
Use IF-THEN statements rather than GOTO's for decision making. The following BASIC code checks two arrays for the number of different elements and also marks the position of the first different element. The first example is from a program submitted for publication (it has been slightly disguised to protect the guilty) and uses GOTO's and the second version uses a nested IF-THEN. The second version also dispenses with the unnecessary variable NERR. The code fragment is in all upper case since IBM PC BASIC converts all noncomments to upper case. Using GOT0 9338 FOR I = 1to ARRAY-LENGTH 9340 IF ARRAY-l(1) = ARRAY-2(I) GOT0 9348 9342 IF NERR ( ) 0 GOT0 9346 9344 NERR = 1:FIRST-ERROR = I 9346 MISSED = MISSED 1 9348 NEXT I
+
Using nested IF-THEN 9336 FIRST-ERROR = 0 9338 FOR I = 1TO ARRAY-LENGTH 9340 IF ARRAY-l(1) ( ) ARRAY-2(I) THEN MISSED = MISSED 1: IF FIRST-ERROR = 0 THEN FIRST-ERROR = I 9348 NEXT I
+
The second example is easier to follow than the first because the GOT0 statements in the first example cause the reader to skip forward and backward, whereas in the second example the flow is continuous through the loop. White Space
Unlike space in a journal, space in a computer, or on the screen (called white space) is quite cheap; use the space to make the program more readable. Several guidelines are given below for the use of white space. '
1. Put blanks on both sides of operators:
a := b; a+b-c
not not
a: =b; a+b-c
2. Indent blocks by 2-4 characters. begin not begin for i := 1to count do for i := 1to count do write(count); write(count); end end
3. Put blank lines between subprograms and after subprogram headings. In BASIC programs a blank line can be emulated by 1000 REM Conclusion
Building a program in modules speeds up the development process and results in a more understandable and reliable program. It is also an excellent way to build up your own library of reusable subroutines. Using good coding style adds only slightly to the work of writing a program, but it greatly increases the readability of the program. This means that the program can be modified and understood by others (and yourself a t some later date) much easier, and it is well worth the extra effort. For a more detailed discussion of programming style read the references given below ( 2 , 3 , 5 , 6). Literature Cited 1. Blinn, J. Presented at the Association for Computing Machinery SIGGRAPH meeting, Anaheim, California, June, 1987. 2. Owen, G. S. J . Chem. Educ. 1984,61,440. 3. Kernighan, B.; Plauger, P. The Elements of Programming Style; McGraw-Hill: New York, 1974. 4. Miller, G. "The Magic Number Seven, Plus or Minus Two: Some Limits on Our Capability for Processing Information"; Psych. Review, 1976, (March). 5. Ledgard, H. F.;Hueras, J. F.; Nagin, P. A. Pascal with Style: Programming Proverbs; Hayden Book, 1979. 6. Nevison, J. M. The Little Book of BASIC Style; Addison-Wesley: Reading, MA, 1978.
Journal of Chemical Education: Software
I
First MS-DOSIPC Compatible Issue JCE:Software announces its first issue in Series B (for MS-DOSIIBM PC compatible computers). Volume IB, Number 1is titled "KC? Discoverer" and was written by Aw Feng, John Moore, William Harwood, and Robert Gayhart. A 50-page workbook for students and faculty (written by William Harwood, Elizabeth Moore, John Moore, and Tamar Y. Susskind) accompanies the program. I t is described in detail in the Abstract on page 695. Volume IB, Number 1, "KC? Discoverer" will be shipped in October 1988. I t has a special introductory price of $20 ($22 foreign) for individuals-until September 30, 1988. Subsequently the price will be $35 ($37 foreign). To reserve a disk for delivery in October: (1) complete the form below (photocopy is acceptable); (2) make a check payable to JCE: Software; (3)send both to JCE: Software, c/o Project SERAPHIM, Department of Chemistry, Eastern Michigan University, Ypsilanti, MI 48197. Payment must be made in U.S. funds drawn on a U.S. bank or by international money order or magnetically encoded check.
I would like the floppy disk Vol. IB, No. 1 of JCE: Software ("KC? Discoverer") shipped to the address below. My check or money order for $20 (or $22) accompanies this request. ($35 or $37 after September 30, 1988.)
City
694
Journal of Chemical Education
State
Zip
,