Hybrid Artificial Intelligence Tools for Assessing GC Data - American

Intelligence Tools for Assessing. GC Data. An expert system and drtificial nenral network combined has advantages over the individual techniiqes. A ut...
0 downloads 11 Views 28MB Size
AC

Interface

Hybrid Artificial Intelligence Tools for Assessing GC Data An expert system and artificial neural network combined has advantages over the individual techniiqes utomated instruments have become essential components of modern laboratories; instruments with autosamplers can process tens to hundreds of samples unattended, greatly reducing the work and time involved in generating data. Laboratory automation provides numerous benefits, including the potential to reduce the cost and to increase the speed, throughput, and consistency of sample analysis. Unfortunately, however, unattended operation can result in delayed detection of instrument faults and improperly prepared samples. When such faults occur, in many cases the instrument cannot successfully analyze the current sample or the remaining samples in the batch. An overconcentrated sample, for example, may contaminate the instrument and affect the data collected from subsequent samples. Continuing to run an instrument after a fault has occurred can exacerbate the prob-

A

John W. Elling Sharbari Lahiri Jason P. Luck Randy S. Roberts Los Alamos National Laboratory

Susan I. Hruska Kristin L. Adair Alan P. Levis Robert G. Timpany Florida State University

John J . Robinson Varian Chromatography Systems 0003-2700/97/0369-409A/$14.00/0 © 1997 American Chemical Society

lem, create new problems, and damage the instrument For instance, continuing to inject samples into a gas chromatograph after the carrier gas pressure has been lost will contaminate the injector and column. Ideally, instruments should automatically assess the data and detect problems with sample preparation or instrument operation during or immediately after each sample is analyzed. If a fault is detected, the instrument should stop processing and alert the operator. When no faults are detected, the instrument can pass the validated data to automated data interpretation software and can begin analyzing the next sample in the batch (2). Some faults could potentially be automatically corrected when detected. For example, if the assessment system determines that the most recently processed sample was too concentrated, the instrument could automatically process a series of blanks. More sophisticated error recov-

Analytical Chemistry News & Features, July 1, 1997 4 0 9 A

AC

Interface

ery may also be possible—the overconcentrated sample can be successfully reanalyzed if the instrument can automatically direct an autosampler to inject a smaller sample volume or to dilute the sample before reinjecting it (2). As laboratory automation capabilities improve, the range of faults that can be automatically repaired will increase (3) Although automated data assessment is certainly desirable, it is difficult to implement. Currently, when the instrument malfunctions during analysis or the sample is not prepared properly, the chemist must detect the problem from the appearance of the data and use the data to diagnose the fault—a process that is often accomplished with trial-and-error heuristics developed by experienced operators through years of problem solving. Automation of such heuristic knowledge-intensive tasks san bb accomplished using artificial intelligence (AD techniques In this article we describe the development and use of a hybrid AI that automates the processes of validating routine GC data and diagnosing instrumentmalfunctions.ThisGCdataassessmentsystemisan expert network a hybrid of an expert system and an artificial neural network (ANN) that has advantages over traditional expert system and neural network AI techniques (4)) Expert systems

The wide range of possible instrument configurations, operating conditions, and sample types makes it difficult to envision a static, algorithmic computer program that could effectively diagnose GC analysis faults. An alternative is a heuristic approach based on troubleshooting information from instrument guides, human experts, and previous work (5, 6). Expert systems can be used to reason with such heuristic, nonalgorithmic information. Rule-based expert systems. Traditional expert systems are composed of a knowledge base and an inference engine. The knowledge base contains a set of logical rules, usually in the form "if x, ,hen y (cf)", where x and y are assertions and cf is the certainty factor of the rule (7). Assertions can be system inputs or outputs or intermediate conclusions in an inference chain, and the certainty factor is a measure of the belief that y will appear as a consequence of x. With rules in this form, if x is 410 A

true with some degree of certainty, assertion^ is concluded with a certainty calculated by multiplying the certainty factor for x and the certainty factor of the rule cf. Multiple rules may have y as a conclusion. The inference engine processes inputs and certainty factors through all the rules in the knowledge base to form conclusions (8)) A developer typically creates the knowledge base for a traditional expert system by interviewing one or more human experts. After collecting, implementing, and testing the knowledge base, the developer and the experts adjust the rules, thresholds, or certainty values to refine areas in which the system performs poorly. This iterative process results in the well-known knowledge acquisition bottleneck, because it is difficult to identify and quantify the experts' decisionmaking steps and intuitive knowledge (8). The key advantage of expert systems is that the if-then representation of knowledge is a natural way to capture expert reasoning. The if-then representation also provides a built-in explanation capability. Operators can easily determine how the system reached a conclusion by tracing the inference chain (that is, the deductive reasoning) from the inputs, through the if-then statements, to the conclusion. Tracing the reasoning and explaining the conclusion increase the users' confidence in the results and make it easier to refine the system's performance as knowledge is acquired iterativelv One disadvantage of traditional rulebased expert systems is that tracing and validating the inference chain becomes more difficult as the number of rules increases and the logical structure becomes more complex. This tendency makes it difficult, in a large rule base, tofindredundant, inconsistent, or logically contradictory paths to the same conclusion (9). This problem is especially prevalent in rule bases constructed from knowledge gleaned from several human experts. Another drawback to expert systems is the rapid decline in performance caused by degraded, noisy, or missing data or changes in the data or the environment. Typically, a small subset of rules in the knowledge base is devoted to each situation the expert system will address; as a result, the information and processing are highly localized in the system. For exam-

Analytical Chemistry News & Features, July 1, 1997

ple, if as a result of noise in one input, the input no longer closely matches the antecedent on the left-hand side of the appropriate rule or rules, the system will fail to reach a conclusion from those rules. To improve the performance of an expert system, the developer must modify the knowledge base and/or the inference engine. However, changing an established knowledge base and expert system can be problematic; before introducing any new or modified knowledge, the developer must evaluate how it will interact with all of the existing rules to avoid introducing logical contradictions. Artificial neural networks. ANNs are another tool for artificially encoding intelligence, typically in the realm of pattern recognition types of decisionmaking processes. The ANN technique takes its inspiration from our basic understanding of how biological brains work, simplifying and abstracting this understanding into a simulated system of layers of neuronlike nodes and a network of connections between them (10). ANNs process inputs through the network of nodes, producing a composition of weighted signals at each interior node and ultimately firing nodes in the output tayerr The central feature of ANNs is that training the weights of the network connections allows an ANN to learn associations between input patterns and desired output patterns. A series of example input patterns associated with desired outputs,

Figure 1 . Expert network representations. (a) An "if x, then y(cf)" rule, in which xand yare assertions and cf is the rule certainty factor, (b) An "if (a or £>), and c, then d(cf)" rule. .n this representation, a, b, c, ,nd dare easertions, and OR and AND are operation nodes. The unlabeled connections carry a weight of 1.0.

called the training set, is presented to the network. The training algorithm usually involves some mechanism for comparing the outputs of the untrained network with the desired output patterns in the training set and systematically adjusting the weights of the connections to minimize the error between the actual outputs and the desired outputs. From this training, the ANN learns to associate the correct output patterns more accurately with the training inputs. Ideally the training leads to the network's ability to generalize that is to correctly recognize patterns in input data that were not in the training set The key advantage of ANNs is a simpler knowledge acquisition process, because the relationship between inputs and outputs is learned rather than specified. Once the appropriate ANN architecture is established and a training set has been acquired, the network can be trained to recognize the relationship between the inputs and the outputs in the training set. ANNs have an additional advantage in that, given a well-designed and welltrained network, the output of a network is the composite of many nodes firing throughout the network. The result is that an ANN system typically retains better performance in the face of degraded, noisy, or missing data than a traditional rule-based expert system. Also, if changes in the data or environment cause the performance of the network to degrade to an unacceptable level often the network can be retrained new data A primary drawback to the ANN approach is that it is difficult to trace the reasoning to justify a conclusion. The relationship that the network learns between the inputs and outputs is difficult to discern and difficult to relate to the problem domain. Another disadvantage is that the network may require an extensive training set, which is sometimes difficult and/or expensive to obtain. Expert networks. Expert network techniques can represent a rule-based expert system as a type of ANN (4)) The translated rule-based expert system is called an expert network, because the network uses specialized nodes and functions uncommon to traditional ANNs. This hybrid approach uses traditional rule-based expert knowledge, while providing a mechanism for data-driven ANN-style training.

Figure 2. GC data assessment system process.

Translating knowledge from a rule base to an expert network requires mapping the assertions and certainty factors of rules onto specialized network nodes and connections. The nodes represent the assertions, the connections between them represent the if-then rule, and the weight of the connection represents the certainty factor. Figure la shows the expert network representation of the rule "if x then y (cf)". Rules with multiple assertions and logical operators give rise to multiple nodes in the network. These multiple nodes include regular nodes, which represent simple assertions, and operation nodes, which represent logical conjunctions, disjunctions, or negations of assertions (complex assertions). Figure lb shows the network representation of the rule "if (a or b) and c, then d (cf)". The unlabeled connections between nodes implicitly carry a weight of 1.0. Translating an expert system into an expert network also requires that the functions of the system inference engine be mapped into the network. For example, an expert system's inference engine function that combines evidence from multiple rules becomes a combining function for the incoming signals to a node, and rulefiring functions become node activation functions. The specific node activation and combining functions used in an expert network depend on the inference engine that is translated into the expert network representation Any inference en-

gine used with an expert system can be converted into an expert network. An example of mapping EMYCIN-type inferencing (7) to an expert network has been described in detail (11). Once the expert system is translated into an expert network format, a developer can train the network with ANN-style, data-driven learning that optimizes the node connection weights as well as the parameters of the internal node functions. The key advantage of expert networks is the preservation of the natural knowledge representation and explanation capability of rule-based systems, while providing the learning capability of network systems. With this learning capability, the network can be trained on data to increase the network's performance. In addition, the network can be periodically retrained to adapt to changes in the data or the environment, increasing the robustness of the system. Training the expert network can also be used to validate the rules in the knowledge base Finally the learning ability of the network representation can be used to discover new knowledge from the training data GC assessment system

To validate the data from gas chromatograms, experienced chromatographers evaluate the overall appearance of the chromatogram to see if it meets expectations, identifying symptoms such as an

Analytical Chemistry News & Features, July 1, 1197 411 A

AC

Interface pert knowledge for GC instrument troubleshooting is available from the troubleshooting reference guides provided by several instrument manufacturers. Third-party instrument troubleshooting guides are also available (13). An additional knowledge source is "GCdiagnosis", an expert system developed to help diagnose faults (5,6). These sources provide true-false associations between symptoms and faults a "true" indicates that the fault does produce the given symptom, whereas a "false" indicates that the fault does not produce it. Although valuable as a starting point, the crisp true-false knowledge representation available from these sources cannot use symptom severity information to discriminate between different faults and contains no information about the relative significance of each symptom's presence or absence in making each fault diagnosis. In addition, expert systems built with strict true-false relationships are not robust in the presence of missing information, which occurs in our system when a symptom detection algorithm cannot be executed. To overcome these restrictions we created a knowledge table (Table 1) that uses linguistic qualifiers—Always Usually Sometimes Infrequentlv and Never asfuzzy

Figure 3. The portion of the expert network for the Contaminated Sample fault.

elevated, oscillating, or noisy baseline; a lack of peaks; or abnormal peak shapes. If they suspect that a problem exists, they attempt to identify the underlying fault by using the entire set of symptoms they have identified. The GC data assessment system emulates this two-step process. We developed an algorithm set that detects features in the chromatogram that correspond to the symptoms for which experienced chromatographers look (12). The output of all these algorithms becomes the input for an expert network that reasons about the symptoms and diagnoses the underlying sample or instrument fault (Figure 2). Most symptom detection algorithms produce a real-valued output that measures the severity of each symptom. The output is mapped to a continuous scale ranging from 0.0 to 1.0, in which 1.0 is the 412 A

maximum possible severity of the symptom. The exceptions are algorithms that detect the binary presence (1.0) or absence (0.0) of a symptom, such as clipped peaks. Sometimes individual symptom detection algorithms cannot be executed on a particular chromatogram. For example, if there are no peaks in a chromatogram, the system does not execute the algorithms that analyze peak shapes and instead sets the outputs of these algorithms to -2.0, indicating that this information is missing. After a chromatogram has been processed the outputs of all the symptom-detection algorithms piled into an ASCII symptom table that is used as the input to the exnert network The knowledgebase.Thefirst hnv& The first sten in developine' an expert network that operates on the list of deterred symptoms is to compile a knowledge base Tabulated ex

Analytical Chemistry News & Features, July 1, 1997

values to reflect the associations between symptoms and faults A blank indicates that a symnrom and the fault are unrelated and that partir lar „,.-„,•

t

' A-



U

4-tU

symptom is not used in reasoning about the fault Rules in this form capture the importance of each symptom's presence for diagnosing a fault. The No Fault diagnosis in Table 1 provides a trainable positive diagnosis of acceptable data. The symptom set that defines No Fault connects most symptoms with a Never qualifier, but the No Fault diagnosis is not connected to symptoms that frequently appear in normal chromatograms. These non-connected symptoms are commonly detected by algorithms that have high sensitivities and/or low thresholds and so often return false positives For example some symptoms relating to peak shape (Leading Peaks Tailing Peaks and Unresolved Peaks) are not connected to the No Fault diagnosis because peak overlap in acceptable chromatograms can cause peak shape and thus produces small positive outputs from these algorithms

exnert network and use a reinforcement factor of 1 0 for all Filter node