High Performance Liquid Chromatography: Theory and Practice

Jan 1, 1992 - High Performance Liquid Chromatography: Theory and Practice. Anal. Chem. , 1992, 64 (1), pp 59A–59A. DOI: 10.1021/ac00025a745...
0 downloads 0 Views 254KB Size
The American Chemical Society Presents Sprouter seems to handle symbolic descriptions well, but our world is described by both symbols and num­ bers. Because numbers are the tradi­ tional domain of the statistician, a natural question is, "What about nu­ meric attributes?" The following approach is com­ monly used for handling a continu­ ous numeric attribute. The instances are sorted by the numeric value of the attribute. From this sort, a list of possible thresholds is produced, each of which corresponds to the midpoint between adjacent values. For each possible threshold, the sets on both sides are tested for disorder. The best threshold value for the entire process is retained as a reference point for a test on t h a t attribute. Thus this ap­ proach produces rules with "greater than" and "less than" relationships. To illustrate, we extend our exam­ ple to show the simple way in which Sprouter can deal with a numeric at­ tribute—in this case, pH values. An­ o t h e r g r o u p of s a m p l e s causes a problem. By developing a decision tree using attributes of Client and pH as in Figure 1, we can determine that NTEX samples with pH values of 5 - 6 and 7 - 8 are P. This result can be identified as an OR relationship: pH = 5 - 6 OR 7 - 8 . An astute reader of the decision tree might then notice t h a t the numeric ranges 5 - 6 and 7 - 8 are adjacent and deduce that the Ρ representation should be 5 - 8 . Con­ sider a slightly modified data set in which some of the NTEX samples with pH values in the range 5 - 6 are S whereas others are P. The effect on the decision tree is quite interesting, as shown in Figure 2. There is addi­ tional branching from the pH 5 - 6 node. If more specific pH measure­ ments were available, it would be possible to determine a threshold that has real meaning to a chemist— for example, pH > 6.3. The decision tree depicted in Figure 3 illustrates this situation and the power of our basic rule, "The world should be sim­ ple." Uncertainty. One of the endear­ ing features of inductive learning systems is their capacity to tolerate uncertainty or noise. Consider the examples for which data are given in Table V. In the set of Noisy Data I both NTEX Mine samples and NTEX River samples are P. Is this a noise problem? Not necessarily. Here again statistics come into play. If only a small fraction of the data set sup­ ports this branch of the tree, then it might be pruned. If it is included in the data set it will cause an OR con­ dition so t h a t the rule will read

If Client is NTEX and Source is (Mine OR River), then Class is Ρ The viability of an OR condition could be determined by a supervising expert or statistical methods. The situation is slightly different in the set of NTEX Noisy Data II in which two Mine samples are Ρ and one is S. This is also a viable possi­ bility, and, once again, one could re­ sort to statistics or expert advice to decide whether this branch should be pruned or included. The effect on the rule set is somewhat different, how­ ever. Two rules result:

High Performance Liquid Chromatography: Theory and Practice Tuesday-Friday, February 11-14,1992 Tuesday-Friday, June 23-26,1992

If Client is NTEX and Source is Mine, then Class is Ρ (confidence 66%) and If Client is NTEX and Source is Mine, then Class is S (confidence 33%) Assuming that this subset of the full set has all of the NTEX Mine cases, then 66% of the time NTEX Mine samples are P, and 33% of the time they are S. This example illus­ t r a t e s the synergistic relationship t h a t can exist between inductive learning systems and rule-based ex­ pert systems. The inductive learning system develops the decision trees with appropriate weightings for the branches. The rule-based expert sys­ tem is designed to handle the "fuzzy logic" (weightings or confidence fac­ tors) so often required to describe real-world situations. It is not sur­ prising that inductive learning sys­ tems are often integral components of modern expert systems.

Other techniques We have described j u s t two algo­ rithms for extracting regularity from data; t h e r e are many more. Some other algorithms are inspired by the seductive desire to imitate nature, or at least what we think we see in na­ t u r e . Among t h e most popular of these are algorithms based on net­ works of neuron-like elements. The important point is that no one algorithm is right for all circum­ stances, so an expert in machine learning must be well versed in a va­ riety of techniques, and any system for helping chemists must include an armamentarium of techniques. We argue that AI will bring about an important change in instrument design. Within the next 10 years or so, leading-edge instruments will in­ corporate regularity-spotting learn­ ing algorithms in their ubiquitous microprocessors so as to supply not only data but also interpretations of the data and of their own health.

An intensive four-day short course providing practical hands-on experience with the techniques and instrumentation ofHPLC

Here's How You'll Benefit from This Course: • Learn how to solve tough separation problems • Find out how to perform quantitative and qualitative analyses • Be able to interpret and troubleshoot from chromatograms • Learn sample preparation techniques • Become familiar with new techniques and equipment Register Today! Enrollment is limited to 24 participants. For more information, phone the Continuing Education Short Course Office at (800) 227-5558 (TOLL FREE) or (202) 872-4508. Or, mail the coupon below to: American Chemical Society, Dept. of Continuing Education, Meeting Code VPI9202,1155 Sixteenth Street, N.W., Washington, DC 20036.

YES! Please send me a FREE tnjdwre on the ACS Short Course. High Performance Liquid Chnxnatognphy: Theory and Practice, to be he'd Februry 11 -14.1992, and June 23-26,1992, in Backsburg, Virgin».

NAME TITLE ADDRESS

CÎTY. STATE. ZIP

ANALYTICAL CHEMISTRY, VOL. 64, NO. 1, JANUARY 1, 1992 · 59 A