science
Pattern recognition combats data explosion Teamed with a computer, the technique can help chemists cull irrelevant data when dealing with large quantities of data Systematic methods of determining which of millions of chemical candidates would actually perform as miracle drugs, or as catalysts for industrial re actions, or for many other applications have been sorely lacking. In the cancer field alone, at least half a million com pounds have been tested for anticancer activity in order to find a few that work, since the screening system has amounted mainly to a methodical test ing of almost everything in sight. Pattern recognition techniques may offer one way for the chemist to escape the tidal waves of data with which he is being inundated by modern instrumen tation as he conducts such screening operations as well as other chemical re search problems. The goal of pattern recognition is to detect or predict ob scure but perhaps highly desirable prop erties of substances by discerning pat terns among seemingly unrelated or very indirect and multitudinous data. A good way of accomplishing that goal, according to a number of investi gators, is for the scientist, with his uniquely human pattern recognition capabilities, to interact with a computer that is capable of comprehending ndimensional problems. Bruce Kowalski of Colorado State University and Charles Bender of the University of California's Lawrence Livermore Labo ratory have worked up a broad introduc tion to pattern recognition techniques [J. Amer. Chem. Soc, 94, 5632 (1972)]. And Stanford's Herman Chernoff has programed a computer to present chem ical data by drawing droll faces, taking advantage of the fact that human beings are used to noting differences in facial characteristics whereas they often find rows and columns of numbers confusing. A basic thrust of the pattern recogni tion approach is aimed at avoiding inun dation by data that are either totally irrelevant to the problem at hand or unnecessary to its solution, so as to be come more receptive to patterns among the pertinent data. A computer can be programed to evaluate contributions that various data make to the solution of a problem, to observe whether it has wasted its time, and to compile useful 14
C&EN August 28, 1972
information into η-dimensional pat terns that relate to properties that may not be directly measurable. Investiga tors in the field say that computers can perform in a manner analogous to the telephone operator who, when given a person's name, address, hair color, job description, and hobby, will quickly offer the feedback observation that the name is always necessary, the address is necessary if the name is common, and the other information is irrelevant. Dr. Kowalski and Dr. Bender cite a number of recent applications of one particular pattern recognition method, involving the linear learning machine, to the analysis of various types of spec troscopic data. For example, they say, computers have become 95% accurate in mass spectroscopic determination of hydrocarbon structure using only five to 10 parameters, compared to some 500 parameters typically examined in con ventional techniques.
Another example of the technique's applicability illustrates spectacular sav ings in dollars as well as in time. At Lawrence Livermore Laboratory, large blocks of very expensive high explosives, which were to be machined into special shapes and used to generate shock waves for physics research, were cracking, ap parently because of some compositional defect. Chemists tried unsuccessfully to relate the substance's cohesive proper ties to several physical and chemical characteristics. Finally the pattern recognition team was called in on the problem. Since the blocks were crack ing at the rate of $500,000 worth per month, the pattern recognition team was told to take whatever steps seemed necessary, with the assumption that the time required to solve the problem might be in the man-year order of mag nitude. That was on a Friday. On Saturday, the computer found relationships be-
Mineral analysis represented in facial features "Funny faces" like these represent computerized mineral analysis data from core samples drilled from a Colorado mountainside in a program devised by Stanford's Dr. Herman Chernoff. Significant changes in data are more quickly seen when the variables are represented by facial features, he says, as illustrated by the more drastic changes beginning in faces 6 and 10. In this example they show where the core divides into three major zones of mineral content. Each face represents 12 different percentages of minerals found in core samples taken at regular intervals. Length and curve of the mouth show two different percentages, for example, as do the sizes, positions, and shapes of the rest of the facial features. Altogether, 18 variables can be shown in each face drawn by the computer's plotter.
EDUCATION
New admissions method at Worcester A new educational program calls for a new admissions procedure. So reasoned admissions officials at Worcester Polytechnic Institute, Worcester, Mass. As a result, beginning in 1973, prospective students will admit themselves to WPI. Dean of admissions Kenneth A. Nourse is quick to point out that the procedure, called negotiated admissions, should not be confused with some universities' open admissions policies. An applicant to WPI, he explains, is LLLs Dr. Kowalski (left) and Dr. Bender investigate pattern recognition allowed to enroll only after a compretween defective compositions and the comes when the measurements and ob- hensive campus visit. In September 1971, the college injects are many. At this point, computer problem was virtually solved. The property in question can be fun- techniques should be used but carefully stituted its WPI Plan, a complete restructuring of the academic program damental, such as atomic or molecular supervised by the scientists. Faces (see diagram) illustrate the use (C&EN, Nov. 16, 1970, page 42). In part, structure, or less fundamental, such as reactivity, permeability, or absorptivity. of Dr. Chernoff's program by a doctoral for example, the plan calls for students A great many chemical problems are, of student in geology, Terence Elliott, to to earn degrees on the basis of demoncourse, amenable to a direct experi- interpret data from a core drilling that strated competence rather than by earnmental approach. And in some cases, a drove 4500 feet into a Colorado moun- ing a required number of credits. There known theoretical relationship between tainside to locate a rich deposit of mo- are no required courses. Each student measurements and the property can be lybdenum. Percentages of different plans his own academic program with used. For example, studies of orbital minerals, including quartz, pyrite, his faculty adviser. Such a program, WPI feels, calls for symmetry and steric hindrance can shed topaz, and others, were measured in core light on reactivities of molecules. But samples taken approximately every 100 self-motivated students. Reasoning from suppose a chemist wants some idea of feet. In all, there were 53 samples, each this basis, WPI sees the new applications procedure as allowing an applicant reactivity in a situation where the di- represented by 12 percentage numbers. rect approach is prohibitively expenDr. Chernoff fed the 12 percentages of a chance to determine his own level of sive, time-consuming, or dangerous, and each sample into the computer, one motivation. the known theoretical approach is im- number to represent the upper half outIn practice, the procedure calls for a possible because of the complexity of a line of the face, another the lower half, prospective student to visit the WPI mixture of compounds. He is usually another the size of the mouth, another admissions office, where he is first interforced into using a third method: the for the mouth curve, and so on for nose, viewed by an admissions officer. An ineducated guess. Though this approach eyes, eyebrows. The program shows up terview with a faculty member or uppershould not be thought of as unscientific, to 18 variables in a single face. And class student in his chosen field follows, Dr. Kowalski remarks, very little re- slight differences, similarities, and along with videotapes showing aspects search has been done to systematize the groupings in the data become quickly of study under the WPI Plan, a studentprocess in such a way as to provide gen- apparent in such a series of faces. guided tour, and attendance at a class eral solutions to problems. Pattern recognition techniques have in session. A final interview including Dr. Kowalski and Dr. Bender illus- been used to solve data processing prob- the applicant's parents explains the trate the broad applicability of the pat- lems in a number of diverse areas in- negotiated admissions procedure, and tern recognition approach with a study cluding handwritten and printed char- the prospective student is supplied with of 68 elements of the periodic table using acter recognition, weather prediction, any additional information needed to melting point, most important valence, medical diagnosis, and speech analysis. make his decision—College Board score covalent radius, ionic radius, electro- One area of chemical research in which ranges and averages of the previous ennegativity, and enthalpy of fusion as the pattern recognition technique seems tering class, for example, to evaluate data to determine whether the higher immediately applicable is the screening himself against. valence oxide of an element is acidic, of compounds for specific desirable If the prospective student decides to amphoteric, or basic. According to properties. There is certainly no paucity apply, he returns a completed applicavalues recorded in the literature, the of readily available chemicals. Modern tion along with an essay on how he reseparation of acidic oxides and basic synthetic methods bring many more lates to the WPI Plan—a procedure to oxides was 100% accurate. It's difficult within easy reach. Modern instrumenta- assure the college that the applicant to determine the accuracy of the tech- tion provides a vast number of facts con- correctly understands the school's acanique with respect to the amphoterics, cerning every one of this multitude of demic program. The applicant then rethe investigators say, since the sample compounds, many of which are surely ceives a self-acceptance letter on which irrelevant to the immediate problem. was not statistically large enough. he indicates his choice of a starting date That the computer is supposed to —September, November, February, Dr. Chernoff, professor of statistics at Stanford, emphasizes that his work in save research workers from such data April, or June, for any of the five terms pattern recognition runs counter to the explosions is not a new idea. But pat- under the WPI Plan—and returns it general trend in applications of artificial tern recognition is rapidly acquiring a along with a deposit. If no financial aid intelligence. That is, his goal is to get creditable track record that indicates is required, acceptance is confirmed by the human more involved in the process just such a capability. It does seem to return mail. For those requiring finanthrough interaction with the computer. promise a method more systematically cial aid, two additional forms are reHe agrees with Dr. Bender and Dr. scientific than the approach whereby quired. Subject to later confirmation, Kowalski that man is the best pattern one rearranges methyl groups while WPI will notify an applicant within a week what aid it will offer. recognizer known today. The difficulty rhythmically puffing on a pipe. August 28, 1972 C&EN
15