Computers as Tools for Synthesis, Experimentation, and Information

element of truth to it. The use of ever more powerful computers has created or intensified problems to which little thought was previously given. Oddl...
1 downloads 0 Views 478KB Size
WILLIAM ORCHARD-HAYS Corporation for Economic & Industrial Research, 1200 Jefferson Davis Highway, Arlington, Va.

Computers as Tools for Synthesis, Experimentation, and Information Handling Computers are becoming more accessible and useful to us through new techniques of communication: problem-oriented “languages,” specialized coding systems, and an emerging data processing discipline

IT

IS sometimes said that computers have created more problems than they have solved. While this is facetious, there is an element of truth to it. The use of ever more powerful computers has created or intensified problems to which little thought was previously given. Oddly enough, these are the problems which seem to hold the greatest promise for future applications of the machines themselves. Computing machines were developed to the present high state primarily for scientific and engineering calculations. Their application to commercial problems has led to development of computers designed for that purpose. However, problems which do not fall easily within either of these classes occur in operations research, management, and economics studies. These present many of the same computing difficulties as business and commercial problems despite their mathematical nature. S i g n i f i c a n tl y , large computers are now called electronic data processing systems regardless of whether they are designed for scientific or commercial applications. A strict dichotomy between the two is not possible and there is really a whole continuous spectrum of problem types. Generally speaking, scientific calculations are characterized by being well defined and concisely stated, usually in mathematical symbolism. It is often sufficient to use stereotyped formats for input and simple tabular formats for output. They seem designed for mechanization. I n fact a great deal of such computation was done a decade ago on punched card equipment. However, there are situations where computations become almost secondary to the logic involved in tying the various parts together. Farther along the spectrum, the logic becomes more and more dominant. At the far end there are problems which cannot rightfully be called computing a t all, such as storage and retrieval of information and translation of languages. Modern largescale computers have three unique characteristics. First, they

1 654

are electronic and thus can achieve speed and accuracy which would be impossible with mechanical equipment. Second, they have the ability to carry out long strings of predetermined instructions automatically. Actually this idea is not new. I t was developed over a hundred years ago by Charles Babbage and even before him by Jacquard with his punched-card controlled looms. Third, there is Von Neumann’s idea of storing the instructions to the machine within the machine itself, in the same form as other information such as numbers or text. This allows the machine to change its own instructions on the basis of predetermined tests. Thus, although the instructions read into the machine are deterministic, the ultimate course of calculations or other manipulations in any particular case can be made to depend upon a very large number of complicated relationships of the input data. Many existing computer programs contain literally tens of thousands of possible paths within their! logical structure. It is clearly impractical to test every one of these possible courses of action separately and completely. Thus it is necessary to develop new techniques not only for preparing such programs but also for debugging and controlling them. Three areas in which computers are currently used are discussed in the sequel, The first is that of simulation, mathematical models. and such techniques of operations research and economic analysis. The second area is in designing experimental tests by taking advantage of the computer’s ability to combine, organize, relate, and make decisions with large masses of numbers. The third involves the problem of information storage and retrieval. Obviously, these areas are not separate and distinct but have considerable overlap. Operation Research Many tools of operations research are practical only with largescale computers-for example, linear programming; correlation, regression, and other

INDUSTRIAL AND ENGINEERING CHEMISTRY

statistical analyses; and largescale matrix manipulations of various kinds. One way to provide for running such problems is to have ready-made codes for specific problem types, and many such routines exist. They can be called specialized service techniques, and they have certain advantages and certain difficulties. The main trouble is that the user of such specialized service techniques must learn the operation of the programs and adapt his problem to their peculiarities. For this trouble, however, several advantages are obtained. Precoding a routine or system of routines for any problem of a particular type allows heavy programming costs to be spread over many jobs. Furthermore, there is reasonable assurance of obtaining the best available programming effort for this problem while at the same time being restrained to stay within the limits deemed necessary by those with experience in the matter. The codes can be made to operate with maximum efficiency for the given problem and the given machine and realistic estimates of elapsed time between submitting data and getting back results can be given. Another advantage of these standardized routines is that they provide a basis for judging what future developments are needed. If everyone complains about the same feature of a system it obviously needs improving. Furthermore the very difficulty of communication is eased by having many people use and discuss the same method. Last, but certainly not least, the programming of such routines sparks an educational process among programmers. As both the size and variety of applications of computers are increasing, it becomes necessary to have programmers and analysts who are specialists in various other fields. Mathematical Models. A term basic to operations research and economic analysis is “model” or “mathematical model.” These differ in nature from physical models. For example, a linear programming model of production in an



oil refinery probably will not contain equations which can be identified with any particular piece of physical equipment. Rather an operations research model is designed to simulate the functional aspects of a process or group of processes. A mathematical model involves three or perhaps four phases of development. First, there must be a conceptualization of the type of model which is appropriate to the real-life problem under consideration. Only after this can a particular model structure be formulated. However, these two activities may go together, and the distinction is often a matter of expediency. The formulation of a model has qualitative value in clarifying and singling out the various elements to be considered. Quantitative value comes from what is called implementation of a modelthe introduction of actual numerical values into the structure. This process is usually a laborious and often difficult one. Although some values will be of doubtful accuracy, this does not make the whole model suspect necessarily. Some testing is in order, and implementation may require side analyses to determine numerical relationships and to test the validity of certain assumptions. However, a model is usually constructed for the purpose of performing the necessary computations to answer specific questions. I t is easy to formulate and even to implement models which lead to gargantuan computations. Often models have to be modified to reduce the computing load, although this usually cannot be done without destroying some of the original conditions. At best a model only approximates reality, and there is always need for testing and critical review of a final model to see that it does represent the real situation with a n acceptable measure of detail. Strictly speaking, the use of a model is a simulation. However, there is usually a technical distinction made between a mathematical model and a simulator. The former refers to methods in which analytical techniques are available for obtaining the desired results. When analytic techniques are not available a simulator is often used. Here a combination of logical and arithmetic operations are programmed which simulate the carrying out of certain operations or decisions. Quite often an element of chance is introduced into the selection of courses of action by means of random numbers. In general, simulators are more difficult to program than analytical models. Of the latter one of the best known and most widely used is linear programming. There are elaborate systems of codes for linear programming computations which illustrate very well the idea of specialized service techniques. At CEIR such a system is used a great deal

for the IBM Type 704 in connection with large economic models. These models are basically input-output matrices of the Leontif type embedded in larger models requiring a n optimal choice of alternatives. The basic concept of input-output analysis is simple and has long been in use in industry under other names. An input-output chart merely states that a given amount of a product requires so much of other materials-for example, a ton of steel requires so much iron and coal. This technique of interindustry analysis is, in many respects, an engineering approach to economics because it rests on physical interrelationships. First, all inputs needed to produce a commodity or service are assembled; when this is done for all goods and services the result is called an “interindustry flow grid.” The next step is to reduce to simple mathematical equations all these technological relationships between industries; finally, the equations are used to calculate, on an electronic computer, the answers to various questions. The grid need not be for an entire economy; it may be prepared for only one industry or group of industries. Relations with other industries are then handled essentially as imports and exports. By considering in consistent and balanced fashion all industries which relate to a problem, input-output provides a framework to keep the analyst from going far astray. A unique contribution of interindustry analysis is the ability to trace throughout the economy the chain of requirements of a given change in the demand of final consumers. Such interconnections can be traced starting with almost any industry. To get the totality of such impacts, one needs to invert a matrix of the‘size of the grid. This requires a computer. The results are often surprising. For example, the motor vehicle industry purchases directly only 0.2% (1953 figures) of the coke and related products produced; but the coke indirectly required by the motor vehicle industry is 12.5%. Similarly, although it buys only 11.7% of synthetic rubber and miscellaneous rubber products, it indirectly requires another 12%.

Experimental Design There are two viewpoints in using computers for designing experimental tests. The computer might be used in statistical analyses for finding correlations in data so that experiments could be designed to find the causes of these correlations. Recently a chemical industry client required such a technique for determining where controls are necessary in the production of a new product. Another interpretation is to use the computer to simulate a desired

system-one determines the parts of the real system which are either inadequate or misunderstood and hence need further research and development work. It might appear inconsistent that a system is simulated for which the specifications are not complete. However, in practice this is sometimes the only way to discover the true nature of the problem. There is no substitute for experience, but when experience is either impossible or prohibitively expensive, the computer can sometimes provide synthetic experience. Because the aim here is not to develop an operational program but rather to try various methods and observe the results, it is important that programming costs and elapsed time be kept to a minimum. Depending upon the nature of the problem, it is sometimes possible to use automatic programming techniques. A recent job required the solution of three simultaneous, second-order differential equations and plots of various functions of the variables and their derivatives, both against the independent variable and as implicit functions. The object was to determine the behavior of a certain physical system under different assumptions of initial conditions and system parameters. An elaborate, ready-made routine for solving differential equations was available, but providing adequate input for the different parameter combinations and output suitable for plotting purposes would have required a great deal of coding. Because time was important, FORTRAN (the IBM formula translating coding system for the IBM 704 EDPM) was used for providing input to this differential equation solver and for. postediting the output for plotting. Such a method for production on a continuing basis would not be satisfactory. However, in this case there was no point in creating a polished program to carry through the entire process. Rush jobs of considerable complexity can be handled with existing techniques, even admitting that they are as yet imperfect. The combination of specialized service techniques, automatic coding devices, and computer know-how can often get the job done with a minimum of elapsed time and within a reasonable budget. Automatic Programming Language. An electronic computer carries out instructions which are coded in a form called machine language. A machine language program embodies a degree of detail which is completely unnatural to most people, since it is much greater than in instructions given to other human beings. For example, if a person is given a list of numbers and told to add them up and divide the sum by the number of entries, he is expected to be able to do it. But if a computer is to accomplish VOL. 50, NO. 11

NOVEMBER 1958

1655

the same task, instructions must be provided for picking u p each number, for counting the number of entries, for testing when the process is finished, and for specifying what to do with the results. Before beginning, means must be provided for getting the numbers into the machine and for outputting the answer. Furthermore, .all such detailed instructions must be 100% correct before the machine will carry out the Computations properly. Only very rarely is a human being capable of such perfection, and it is not normally required. We are continually correcting ourselves by taking advantage of our ability to scan material and make judgments concerning consistency and credibility. But the machines are implacable; they do exactly what they are told even if it is wrong. An automatic programming language is designed to use statements and expressions more nearly like those we commonly use so that w e are able to think more naturally, write more surely, and catch mistakes more readily than in straight machine language. A program written in this automatic coding language is then entered in the machine and is processed by a special program called a compiler which produces from the automatic coding language a machine language program. Special provisions are also made for input and output by means of very concise specifications. Although these are as yet somewhat stilted and unnatural, they nevertheless enable a programmer to condense one of the most tedious and difficult parts of programming while retaining considerable flexibility for format. The FORTRAN language, for example, consists of four different kinds of statements. There are those which look very much like mathematical formulas; the only differences are those imposed by mechanical recording modes. There are statements which look very much like flow charts. There is a sublanguage for specifying input and output, and finally there are some statements which refer to specific features of the IBM 704. When FOKTRAN is suitable, it can cut programming time by a factor of 5 to 10 or even more. There are features of the machine which are not accessible through FORTRAN, and when these become important, machine language programming must be used. Although high-speed arithmetic capability is essential to many applications of the computer, it is equally important that large masses of data be handled with convenience and facility, even a t the cost of some slight inefficiency. At present, a fairly good rule is that automatic programming is only useful for experimenting with systems involving standard, well-defined procedures of a classical variety. I t has not been used often at CEIR in economic and strategic 1 656

studies with the exception of pre- and postediting data. FORTRAN often makes a good output generator.

report. The steps may be done in a different order but these three tasks must be accomplished :

Information Storage

EXTRACTION OF THE SORTKEYwhich is the set of pertinent descriptors contained in a record ordered according to main heading, subheading, intermediate heading, minor heading. Not only must the proper classes of descriptors be selected, but often only subclasses or logical functions of subclasses are wanted. Thus, within each record there is a selecting and sorting process to set u p the sort key. SELECTION OF ELIGIBLERECORDS or parts thereof from the main files. This can be a time consuming job even on electronic equipment if the files are large. REORDERING OF SELECTED RECORDS to sort key order.

The problem of information storage and retrieval is one in which arithmetic plays a very minor role and in which it is necessary to take advantage of every facility of the machine for speed. Hence it is not possible to use a system such as FORTRAN. No existing machines or techniques are fully adequate to many realistic problems which exist today. The military has some really staggering problems in this area. The chemical industry also has a n interest in this field, for example, in the matter of patent searches. While admitting the inadequacy of current machines for these jobs, they can be used more than they are, and machines of the 704, 1103A class are adequate for worth-while experimental studies of this kind of problem. T o be sure the costs are somewhat high, but the ultimate benefits of automatic data processing equipment may very well lie largely in this field in the future. An idea of the interest in this field is given in a summary compiled by National Science Foundation (2). They list 43 projects reported by 41 companies, bureaus, and independent groups. Eight of these are on chemical structure notation and codes; 24 are on searching or selecting devices of various kinds including standard computers. I n applying standard computers, information storage and retrieval problems range all the way from simple installment loan accounting to complex intelligence file analysis systems. These jobs all have three parts: input of new information, updating of files, and preparation of reports. The formats and relative importance of these vary widely but there are three universal aspects: Sorting is required somewhere in the process, speed is essential, and parts of the propositional calculus are used explicitly (as opposed to mere “bookkeeping”). Sorting. All of these lead to programming complications, but the sorting problem is the most severe. Speed is limited by machine characteristics, and equipment is constantly improving though it often takes great coding skill to exploit it. Logical operations are already powerful, and much more can and will be done with automatic coding techniques. Sorting is really a special case of a more general problem which might be called classification and search or compilation. The problem turns u p in all nonnumerical data processing work including the manipulation of data in large scientific computing jobs. The following procedure describes either a sort or the compilation of a

INDUSTRIAL AND ENGINEERING CHEMISTRY

I n connection with a complex file maintenance system, the author recently wrote a sort generator. Its main feature is a routine to generate for any particular case a routine for extracting complicated sort keys. Tied in with this is a sort program to sort blocks of information on the sort keys. I t works very well for small files contained within high speed memory. For larger files, either a merge or a predistribution of blocks is required. Although not needed for this job, various methods of handling these sort blocks are being studied. This is essentially the same as the problem of selecting eligible records from a file. There are certain necessary conditions for application of computers in these and other areas. The machines have proved themselves in strictly scientific computations, but some have experienced disappointment in business applications. According to a survey made by Diebold (7), the most common mistakes made in the past involve failure to define objectives, improper organization of the data processing program, inaccurate estimates of real costs involved in conversion and installation, ill-defined standards of performance, poor personnel practices, and poor selection of applicants. Computers demand a degree of discipline often lacking in traditional approaches to problems. I n spite of this, they can add enormously to productive capacity if already available techniques are investigated and exploited. literature Cited (1) Diebold, Controllers Inst. of America, Bull,, March 14, 1958. (2) Natl. Science Foundation, Office of

Scientific Information, “Current Research and Development in Scientific Documentation,” 1957. RECEIVED for review April 16, 1957 ACCEPTED July 29, 1958 Division of Industrial and Engineering Chemistry, Symposium on Computers in the Chemical World, 133rd Meeting, ACS, San Francisco, Calif., April 1958.