1
I
W. H. WALDO and E. H. BARNETT Organic Chemicals Division, Monsanto Chemical Co., St. Louis 4, Mo,
An Electronic Computer as a Research Assistant ‘t be a slave to a slide rule -spend your time on creputer be the pencil pusher ative thinking and let the
THE
literature has been full of pleas for more creativity and methods of obtaining it. These authors miss the point. Research people are frequently too busy with routine repetitive operations to be creative. Some of them have studied their day-to-day functions to see if certain operations can be relegated to machines and have discovered that computers can relieve research men of unnecessary details. Some of the most susceptible operations to which computers may be applied are repetitive calculation performance, routine report preparation, and statistical correlation. Some examples of laboratory calculations better left to the computer are: Solution of simultaneous equations where the number of unknowns is less than the number of equations. Matrix inversion, iteration, and regression analysis are done rapidly with computers. Furthermore, they are extremely time consuming for technical people with a desk calculator. Repetitive copying of data for calculations, memos, progress reports, and final reports. If they are useful data, they may be copied more times to send to engineers or contractors or sales. Using computers to store data as it is developed and compile it on call is a relatively simple problem. Correlation, significance, meaning, and direction can be obtained from a mass of data with a computer when the mass of data is so large that one or even several men are not able to remember where it is. With the immense memor,ies found in computers, and recall being relatively easy, meaning can be evolved where there was only confusion before. Operations research problems are frequently of this type. These three types of laboratory work occur in many fields. A large number of studies on computer applications have appeared in the recent literature in engineering, chemistry, and mathematics (7, 7-9). ‘These all have one thing in common : computation volume or complexity, which otherwise would have meant thousands of man hours for calculation.
of analog signals representing temperatures, pressures, feed and product flows, mass spectrometer scans, vapor fractometer scans, continuous gas analyses, and the like. It logs them, compares the readings against predetermined control limits, signals violations of critical limits, and computes from the primary data such reduced values as conversions, yields, productivities, material balances, and heat balances. With a control computer of adequate memory, it would be possible to store a complete experimental program of perhaps 64 statistically designed experiments, and run through the entire program as fast as the establishment of steady-state conditions and the logging of data for a fixed test period will permit. The computer itself will decide when to start the collection of product and when to proceed to the next set of conditions. Because computers can be applied to this wide variety of problems, it is inevitable that different companies have started to relieve their research men of many routine, tedious tasks and staggering computations on different problems. Phillips Petroleum Co. is using a medium-scale computer for calculations in process design, equipment design, operations research, quality control, and even oil production (6). Shell Development has used computers since 1954 for problems such as data reduction, evaluation of elaborate integrals, solution of differential equations, and matrix calculations.
Problems Solved by Monsanto’s IBM 702 Instrumental color inspection New vapor-liquid equilibrium formula Mass spectrometer matrix inversion Sulfuric acid reactor design Terpolymer composition Resistance-temperature table for platinum thermometer
Heat-transfer correlation for viscous liquids Correlation of chemical structure with detergency Analysis of gaseous hydrocarbon mixtures Particle size analysis Granular fertilizer formulation Evaluation of infrared analysis Optimum conditions in analytical procedure Comprehensive economic evaluations Chemical-biological coordination and correlation
Some of these problems could have been solved by a man with a slide rule or desk calculator in a couple of years; others probably could never be solved by human beings because of their size and complexity. Research at Monsanto is more efficient since a computer assistant was hired for its scientists. Report Writing The last problem-chemical-biological coordination and correlationillustrates two of the three major functions : routine report preparation and statistical correlation. The solution to this problem is equivalent to adding a man and a half to the research staff, in addition to giving answers to some questions that had been foolish to ask in the years before computers. The data processing part of this problem is in operation currently as part of a report writing’ system. Screening reports are prepared by scientists in certain application research groups for chemicals prepared in the laboratories. These reports are one page forms giving the name, structure, and serial number of the compound, the name of the chemist who prepared it and his location, the raw data obtained in the screening operation, and the evaluation of the compound as a result of the screening test. These one-page reports are prepared in six copies by the computer from the carbon page of the application research man’s notebook and data about the compound from the chemist. This system has been in use for over a year. Thousands of screening reports have been printed, and dozens of scientists have been relieved of the daily chore of filling out forms and transcribing data. A VOL. 50, NO. 11
NOVEMBER 1958
164 1
Screening reports are prepared b y the IBM 702. Researchers' notes are transcribed to punch cards, and these are fed to the machine together with instructions and lists of chemical names and structures
backlog of chemicals- to be screened has been eliminated. These scientists have been provided with more time to be creative, imaginative, and inventive. Studies are in progress on the use of these data stored in the computer memory. As time goes on tables of data for certain types of final reports will be written by the computer directly onto duplicating mats from data stored for screening reports. Already use has been made of these data by searching for specific answers, and the procedure has been found efficient, convenient, and rewarding. The screening-report writing procedure consists of three steps. A carbon copy of a page from the research man's notebook is removed from his notebook when completed and transcribed to punch cards. A deck of these punch cards, another deck of cards containing machine instructions, a magnetic tape containing an up-to-date list of chemical names, and another tape containing the chemical structures corresponding to the names are all made available to the machine (see above). In a few minutes the results of a week's work by more than a dozen scientists are printed in six copies by the machine's output typewriter. The six copies are then merely put into the mail, and a screening laboratory has reported its week's work to supervision, management, development, and the originator of the request. The banal chore of writing reports by hand may soon be replaced in even greater measure by machine methods. Research work is under way in several organizations, including the International Business Machines Corp. and Monsanto, to use machines to prepare many types of tables, graphs, and ordinary prose found in the thousands of research reports. This research work 1 642
is not directed at usurping the prerogative of scientists to reduce research problems to words in the introduction of their reports. Nor is it the intention of this work to interpret or conclude from the data presented in these reports. The agony in writing reports is setting down on paper the meticulous detail of the experimental procedure, the analytical methods, the calculations involved, and the reams of data collected. Computers are available to do this work for the multitude of skilled research workers in this country, Knowledge is growing so fast today that we no longer hire chemists, physicists, and engineers; they must be geophysicists, cadastral engineers, or rare earth chemists. Also, there is the problem of cross-communication when the electronic engineer cannot understand the human engineer, and this problem lends itself to solution by computer. The work at Monsanto in chemical-biological correlation is beginning to bear fruit. I t is hoped that the computer can soon tell by purely empirical and statistical methods the moieties or fragments of an organic chemical structure that have botanical and zoological effects. The influence of chemicals on plants and animals is so complex, both in its chemistry and in its biology, that the computer has been called upon to reduce the mass of data to significance. Calculations
In contrast with these problems of handling data in large volume, the computer is also used to make any calculation which can be done by hand or desk machine. These problems fall in two classes: simple calculations done many times, and complex calculations.
INDUSTRIAL AND ENGINEERING CHEMISTRY
The former are exemplified by the reduction of pilot plant run data. The latter include the inversion of large matrices for multicomponent analysis by infrared or mass spectrometer and statistical calculations involving least square fits in n-dimensions, correlations, and analysis of variance. Electronic data-processing machinery has been very useful in reducing the amount of time required to make run calculations. In many cases, when a continuous converter is being operated, the calculations by which yield and throughput are obtained are quite laborious. In a typical case, a run requires one day to complete and a half day to calculate. In cases of this kind the calculations are programmed on the electronic machine and the results are tabulated in a few minutes. Once the data are in the machine it is easy to perform additional calculations which might be too time consuming if done by hand. The machine can answer bonus questions, too. For example, it might present data in the form of a plot of yield against time. If desired, run data can be stored on magnetic tape at the same time with a great saving in storage space and much easier accessibility. For the more extensive research projects, it is likely that data will later be examined in some additional manner. This is easy to do if some elementary precautions are observed when the data are stored. The computer has also been utilized to calculate tables. One example is the calibration tables for platinum resistance thermometers. A thermometer is calibrated at 0" and 100" C. and at the sulfur point. The calibration is not linear, but the general shape is known and the equation can be handled easily by the machine. Electronic data processing machinery fills a very important and long standing need for equipment which will make it possible to handle these mathematical and statistical problems in a realistic way. I n developing a new manufacturing process the best conditions of operation must be determined. In addition, more than one property of the product is of interest. For example, a high yield is desired of a chemical which has a certain minimum crystallizing point and a color at least as good as a standard. For a given process these criteria may be determined by arriving at a certain combination of pressure, flow, temperature, and concentration. I t has been customary in the past to make experiments in which one variable is changed at a time and thus arrive at an acceptable process. In the new approach, data are treated mathematically according to the general relation y = f(x1, 2 2 , . . . x,) where y is a dependent variable such as yield. The x's are independent variables such
COMPUTERS IN T H E C H E M I C A L WORLD as temperature or pressure. For a given- chemical process a function can be found, often of second order, which will reasonably fit the data. The equation is an empirical representation of the process which can be used to select the point at which to operate. The function often involves three independent variables, and a most usable form of presentation is a three-dimensional model or response surface (4). The technique has been known for years but chemists have been reluctant to use it because calculations may require several man days or weeks on a desk calculator. Electronic machinery does the job in a few minutes. Computer instructions have been prepared for obtaining the two-dimensional contours which are required; a threedimensional model has been constructed recently for three dependent variables in about 10 minutes of computer time and two hours of construction time. Data are utilized much more efficiently in this manner. It is possible to identify those runs which contain errors, and the ordinary random variability is smoothed out to a great extent by making a least-squares fit of the data. Because the curvature is known, a moderate amount of extrapolation can be done beyond the experimental region. An experiment of this type is usually set up in consultation with a statistician so the most efficient design is obtained at the same time. A logical extension of this method is the obtaining of response surfaces on an operating plant (2, 3 ) . This is desirable for two reasons: Optimum operating conditions obtained in the laboratory may not be the same as for the fullscale plant, and when an operating plant is available, research becomes much cheaper on it than in the laboratory. These statements are somewhat shocking to most research people. However, they are true in. general. Research done in the laboratory, and especially research done on the bench, usually results in different rates of heat transfer and different degrees of agitation than in the full-scale unit. These factors are often critical to the process. Turbulence and back mixing are also very difficult to duplicate. Certain phases of some processes have never been successfully duplicated in the laboratory. Concerning the second factor, it is necessary to change conditions in the plant a relatively small amount so that an acceptable product is always produced. I n this case the use of statistical methods has made it possible to distinguish between sets of conditions which may result in only a slight difference in yield or quality. It was the availability of a computer
which led to the exploration of these methods in the plant and an evaluation of their worth. Response surfaces already obtained on some of Monsanto’s largest plants have made possible a significant increase in productivity and have provided knowledge of how to operate most efficiently under bottleneck conditions or other limitations. in Monsanto operations involve of an excellent though expensive catalyst. I t has recently been possible to distribute the catalyst better within the reactor through data obtained on the machine. Kinetic data and operating correlations were programmed to give outlet gas composition and throughput as a function of the operating variables. A large number of combinations were tried, as one would do when running an extensive experimental program on a plant. The best values of the variables were selected, and the improvement was confirmed by plant runs. A research program of this magnitude would probably not have been attempted in the plant or in the laboratory.
Other Computers Recent acquisition of an analog machine has greatly increased Monsanto’s capacity to handle problems involving reaction rates or other differential equations (5). The machine performs experiments in which plant or pilot plant runs can be simulated and data on the relation between the dependent and independent variables obtained. This computer is operated by systems engineers who use it also in their studies of control functions and indeed of entire plants. Monsanto’s experience has been entirely with large computers. These are versatile, fast, and cheapest per computation provided they can be adequately loaded. Smaller companies which cannot justify their cost can also use machine methods by having their work done at one of the several computation “centers” throughout the country. One Monsanto plant uses such a service to eliminate the time required to transmit data to and from St. Louis. Another solution, and one which will be common in a few years, is to install one of the medium sized digital computers. Although slower and of limited storage capacity, these can handle most of the calculations we have mentioned. Because the programming is simplified, less training of personnel is required, and they do not require large floor area nor elaborate air conditioning. Maintenance may be a problem, although the trend is toward transistorized components and greater reliability, and most large research laboratories are in or near a metropolitan area where specialists can be obtained.
The cost of a “desk-side” computer is of the order of $100,000, and it could be used by a chemist after a few hours of instruction. Once a problem is solved the program is recorded on punched paper tape, cards, or plug boards for repeated use. The selection of data haddling techniques or computer models cannot be made by any ready-made formulas or pre-established criteria because selection is in the nature of an art rather than a science. The best approach is to consider the selection as a research decision in itself in which each of the factors of the over-all problem is studied and each is adapted so that it interfits into the research work of the organization. As a guide to making such a decision, the following steps have been found useful by the U. s. Patent Office in its major effort to automatize the tremendous job of patent searching: Thoroughly study the use and characteristics of the data to be handled. Acquire a general knowledge of machine theory and operation and have access to persons having a detailed knowledge thereof. Make an approximation or “model” of what is thought to be a proper operating procedure. Try this procedure using actual data on any available computing equipment. Work over the “model” and the equipment until a satisfactory or at least a useful operation is afforded. Those who employ a computer as a laboratory assistant must do so carefully with design and understanding.
Literature Cited (1) Benge, J., Petrol. Refiner 36, 313-14 (1957).
(2).Boy? G. E. P., l‘Evolutionary Operation, Proceedings of Symposium on Design of Industrial Experiments, Institute of Statistics, Consolidated University of North Carolina (under contract with Air Force Office of Scientific Research), Raleigh, N. C . (November
1956). (3) Box, G. E. P., Hunter, J. S., Ann. Math. Statist. 28, 195-241 (1957). (4) Box, G. E. P., Wilson, K. B., J . Roy. Statist. Soc., Ser. B 13, 1 (1951). (5) Chem. Week 82, 42-4 (Jan. 4, 1958). (6) Cobb, J. R., McIntire, R. L., “Mathematical Engineering at Phillips Petroleum Co.,” 35th Annual Convention of the Natural Gasoline Association of America, Fort Worth, Tex., April 1956. (7) Opler, A,, Norton, T. R., Chem. Eng. News 34. 2812-16 (1956). (8) Ray, g, C., Kirsch, R.’A., Science 126, 814-19 (1957). . (9)Schulz; H.’ W., Chem. Eng. Progr. 53, 548-50 (1957).
RECEIVED for review March 5, 1958 ACCEPTEQ July 29, 1958 Division of Industrial and Engineering Chemistry Symposium on Computers in the Chemfcal World, 133rd Meeting, ACS, San Francisco, Calif., April 1958. VOL, 50, NO. 11
NOVEMBER 1958
1643