Chapter 7
Downloaded by UNIV OF MICHIGAN ANN ARBOR on November 7, 2017 | http://pubs.acs.org Publication Date: April 17, 1998 | doi: 10.1021/bk-1998-0690.ch007
Neural Networks and Self-Referent Acoustic-Wave Sensor Signaling Lan Bui and Michael Thompson Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
This paper presents the application of artificial neural networks (ANN) to the self-referent calibration of thickness-shear mode (TSM) acoustic wave chemical sensors. Spectrum analysis of impedance measurements affords complete characterization of the TSM sensor, which includes the use of the Butterworth - Van Dyke (BVD) equivalent circuit to quantify the electrical responses. The multidimensional nature of this method and a novel weight-adjustment procedure, applied to A N N calculations, are utilised to effect a method of calibration in the presence of interferents. A network is trained, using exemplary I/O data acquired for a potassium chloride (KCl) system, to predict unknown outputs ie. concentration, given four sets of measured inputs ie. series resonant frequency (Fs), parallel resonant frequency (Fp), motional capacitance (Cm) and motional resistance (Rm). The trained and tested network achieved a high predictive efficiency with errors in the range of 2%-6%. A n interferent, ethanethiol, is added to test the robustness of the trained network and was found to adversely affect the predictive ability of the network. The magnitudes of the weights, which are associated with the set of inputs deemed to be most affected by the interferent (Fs) are adjusted to minimise this deterioration. The resultant network, calibrated for the interferent, achieved the same predictive efficiency for adulterated samples as that achieved by the original network for unadulterated samples.
A major problem since the inception of chemical and biosensor technology has been the quality of the signal-concentration relationship with respect to the time course of
78
©1998 American Chemical Society Akmal and Usmani; Polymers in Sensors ACS Symposium Series; American Chemical Society: Washington, DC, 1998.
Downloaded by UNIV OF MICHIGAN ANN ARBOR on November 7, 2017 | http://pubs.acs.org Publication Date: April 17, 1998 | doi: 10.1021/bk-1998-0690.ch007
79 measurements. Lack of reproducibility in concentration values can be traced to changes in the performance characteristics of the device concerned over time and possible perturbations caused by matrix interferences. The incorporation of an artificial neural network (ANN's) into a sensor system may assist in solving this problem by providing a self-referent calibration mechanism for matrix effects. The main advantage of the neural network calculation over classical regression analyses is its ability to model at an implicit level any number of complex, non-linear data sets that would be impossible, or at least extremely difficult, to fit to an explicit mathematical model. This advantage has resulted in the application of neural network analyses to numerous fields among them, spectral interpretation (1-4), process control (5-7), prediction of protein structure (8-9) and various sensor systems (10-17). A N N ' s have also been utilised in piezoelectricbased sensor systems but the focus has been primarily on arrays of single-output sensors, with the output usually being the series resonance frequency (Fs) (18-22). This single-output array architecture, in addition to generating extra complexity in theory and application, has served to limit the utility of neural networks in pattern recognition and classification. In the present work, spectrum analysis of the impedance measurements of a single sensor, using a network analyser, provides multidimensional information thus effectively overcoming these drawbacks. Such multi-dimensionality, in conjunction with a novel weight-adjusment procedure, may provide a calibration mechanism that can be used to eliminate a number of matrix effects. Network Analysis To Provide Multi-dimensionality. A typical response pattern derived from network analysis of a T S M device, operated in liquid, is depicted in Figure 1. The theory behind network analysis of the T S M sensor has been discussed thoroughly by various investigators and, thus, will only be briefly summarised here (2328). Several different parameters can be obtained from the plots of phase angle and impedance versus frequency. These can be classified into two groups: frequency-based parameters such as the series resonant frequency and the parallel resonant frequency, and impedance-based quantities such as the maximum impedance (Zmax), the minimum impedance (Zmin) and the maximum phase angle (6max). Approximations in analysis of the electrical behaviour of the device, using the Butterworth-Van Dyke (BVD) equivalent circuit (Figure 1), can provide four more useful parameters: the static capacitance (Co) between the two parallel electrodes on the two surfaces of the quartz crystal as well as the motional resistance (Rm), the motional capacitance (Cm) and the motional inductance (Lm), all of which arise from electromechanical perturbation induced by the piezoelectric effect in the resonant region. Since neural calculations are inherently complex and data-intensive, this study is purposely focussed on a relatively simple system; the responses of the T S M device to an electrolyte, potassium chloride (KC1). For the same reason, only four of the more important parameters are selected to be used in the initial phase of this project. The four chosen quantities are the series resonant frequency (Fs) which represents mass loading and other interfacial effects, the parallel resonant frequency (Fp) which characterises the capacitive loading effect, the motional resistance (Rm) which is related to the energy dissipation of the oscillating sensor and the static capacitance (Co) which reflects
Akmal and Usmani; Polymers in Sensors ACS Symposium Series; American Chemical Society: Washington, DC, 1998.
Downloaded by UNIV OF MICHIGAN ANN ARBOR on November 7, 2017 | http://pubs.acs.org Publication Date: April 17, 1998 | doi: 10.1021/bk-1998-0690.ch007
80
I—ΑΛΛ—υϋϋϋ^-
Rm'
W
Cm'
Figure 1. (a) Network analysis traces and (b) equivalent circuit parameters.
Akmal and Usmani; Polymers in Sensors ACS Symposium Series; American Chemical Society: Washington, DC, 1998.
81 changes in the interfacial capacitance. Exemplary I/O data, comprising the chosen parameters is used to train a network, followed by appropriate testing. Once the optimal architecture is found, the trained and tested network is used to predict concentrations of various "unknown" solutions, given the respective inputs. In the second phase of the project, an interfèrent, hypothesised to affect only one of the parameters, is added to the "unknown" samples in order to test the robustness of the trained network. The adulterant of choice was ethanethiol (CH CH SH), which is postulated to affect mainly the series resonant frequency (Fs) through its absorption to the surface of the gold electrode through the well-known Au-S interaction. It has been shown that Rm and Co are not affected by a change in surface mass since the deposited layer can be regarded as an extension of the material comprising the electrode (24). Thus, power dissipation, reflected by Rm, and the dielectric properties of the quartz plate, reflected by Co, should remain constant. The parallel resonant frequency, Fp, is also affected by mass deposition at the surface but it is not as sensitive to mass changes as Fs (29). Adjustments of the weights associated with affected nodes are then performed to calibrate for the effect of the adulterant. A n analogous calibration scheme may also be possible for an array of single-output sensors, but this methodology is expected to be far more complex with respect to theory and application.
Downloaded by UNIV OF MICHIGAN ANN ARBOR on November 7, 2017 | http://pubs.acs.org Publication Date: April 17, 1998 | doi: 10.1021/bk-1998-0690.ch007
3
2
Experimental AT-Cut quartz piezoelectric crystals coated with gold electrodes were obtained from Classic Frequency Control, Inc., Oklahoma City, OK, USA. The resonant frequencies of the crystals were ~ 9 M H Z and the electrode surfaces were polished to < 1 μπι. The T S M device was clamped in a cell with O-rings on both sides. One side of the crystal was kept under nitrogen while the other side was immersed in the analyte solution. Potassium chloride and ethanethiol were used as received from Aldrich. Doubly distilled, deionised water was used as the reference and as the solvent for the standard solution. Network analysis was performed using a Hewlett Packard HP 4195A network/spectrum analyzer, equipped with an HP 41951A impedance test kit and an HP 16092A spring clip fixture. A l l data were transferred to an IBM-compatible computer through an IEEE-488 bus. Two different neural network packages were utilised in this project. The first, A N N A T , originates from collaborators in Abo Akademi, Finland. This program uses the Levenberg-Marquardt algorithm for optimization which makes use of an interpolation between the steepest descent and the Newtonian directions, based on a "trust region" where a linearised model is assumed to apply. The software is simple-touse, fast and robust but it does possess some drawbacks, mainly that the number of weights (connections) should be kept below 100 and that the architecture of a network must have the same number of nodes in each hidden layer. The second program is a commercial package, NeuralWorks Professional II/Plus v. 5.22, from NeuralWare, Pittsburgh, PA, USA. Several algorithms for RMS optimization are offered but only back-propagation was utilised. This package does not possess the limitations of the in-
Akmal and Usmani; Polymers in Sensors ACS Symposium Series; American Chemical Society: Washington, DC, 1998.
82
Downloaded by UNIV OF MICHIGAN ANN ARBOR on November 7, 2017 | http://pubs.acs.org Publication Date: April 17, 1998 | doi: 10.1021/bk-1998-0690.ch007
house program but the typical back-propagation disadvantages apply here, that is, longer training and testing times and a tendency towards local minima. Neuralnet Results for Unadulterated Systems. Training and testing data sets are constituted of 100 and 50 instances, respectively, covering the range of concentrations between 10^*M and 1M. The concentrations of the "unknown" samples are also in this range. A typical network, with a 4,6,1 architecture: 4 inputs, 6 nodes in one hidden layer and 1 output, is shown in Figure 2. A bias is used to ensure the nodal outputs are in the optimal range and the window in the top left hand corner tracks the decreasing RMS error. It can be seen that for a typical viable network, the RMS decreases in a near exponential fashion. A typical output of a trained A N N A T network is depicted in Table I. This represents a 4,5,1 network with 30 weights reaching convergence in 11000+ iterations. Each of the weights can be related back to a specific connection in the network. The relevance of this is that the weights can be traced back to a specific input and this will be important for the weight adjustment process. The sum of squares error (SSQ) shown was converted to RMS for meaningful comparison. The optimal architecture of any network is usually derived through a trial and error process. Figure 3 illustrates the relationship between RMS and the number of hidden nodes for a network with one hidden layer. The RMS shows an initial rapid improvement with increasing number of hidden nodes, followed by a slower and decreasing phase of improvement. After a certain number, the slight improvements no longer justify the extra computing time and effort, and the network is set. If testing procedures prove the network to be unsuitable then a different architecture may be chosen. Using the A N N A T package, the best architecture obtained is a 4,7,1 net with no further improvement after 10000+ iterations. The RMS for the test set can be seen to be of the same magnitude as that of the training set, validating the network (Table II). For the commercial package from NeuralWare, the best architecture is 4,9,1 and the number of iterations required is approximately doubled. The RMS for the training and testing sets are also of similar magnitudes, indicating a useful architecture. Slightly lower training RMS was observed for NeuralWorks but the test RMS for this program was actually slightly higher than that of A N N A T . The discrepancy is not large but it could be argued that the larger NeuralWorks network, although it may have trained better, may not generalise as well as the smaller A N N A T architecture. Twenty solutions of varied concentrations in the ΙΟ" M to 1 M range were chosen to be the unadulterated "unknown" set. The response sets for these "unknowns", comprising Fs, Fp, Co and Rm, were used to test the predictive ability of the trained networks. Using A N N A T , these predictions resulted in errors in the range of 3-6%, compared to the true values (Table III). This can be considered to be acceptable given the large range of concentrations being studied. A network optimised for a smaller range of concentrations will undoubtedly yield a corresponding reduction in the error range. The predictive ability of the commercial package, with respect to the unadulterated "unknown" data sets, is slightly better than that of A N N A T (Table IV). This is somewhat surprising in light of the higher test RMS for NeuralWorks. The 4
Akmal and Usmani; Polymers in Sensors ACS Symposium Series; American Chemical Society: Washington, DC, 1998.
Downloaded by UNIV OF MICHIGAN ANN ARBOR on November 7, 2017 | http://pubs.acs.org Publication Date: April 17, 1998 | doi: 10.1021/bk-1998-0690.ch007
83
Input Layer
Figure 2. Architecture of a neural network extracted from NeuralWorks.
Table I. Typical Output for Annat Training Run Number of input and output nodes: Number of hidden layers : Number of nodes in each hidden layer: Activation function for output nodes: Activation function for hidden nodes: Gain term - beta: File with input patterns and outputs: Print option (0/1/2): Maximum number of iterations: Analytical/numerical derivatives : Limit for initial weight-guesses: Seed:
4 1 1 5 1 1 1.0000 kcl.dat 0 20000 A .00000E+00 9823
Number of patterns in the input file: The weights W » > 1 to 30 : 7.6818 -.31707E-01 -48.077 -.12729E-01 7.304 .22442E-01 1096.1 -1097.0 -25.790 .90235E-01 3.1639 -66.097
100
SSQ:
.69635E-02 -.42298E-02 -.13044E-02 -1.0025 -6.0004 .36891E-01
-.14397 .23473 .86784E-01 -4.8776 -.34710E-0 .10492
3.6978 7.0486 -3.7655 1.0893 291.772 -69.200
.44755E-03
Number of iterations : 11124 Akmal and Usmani; Polymers in Sensors ACS Symposium Series; American Chemical Society: Washington, DC, 1998.
Downloaded by UNIV OF MICHIGAN ANN ARBOR on November 7, 2017 | http://pubs.acs.org Publication Date: April 17, 1998 | doi: 10.1021/bk-1998-0690.ch007
84
Table II. Comparison of Software Packages Parameter Best architecture Number of iterations RMS (train) RMS (test) Predictive error (unadulterated) Predictive error (thiol, unadjusted) Optimal weight reduction Predictive error (thiol, adjusted) - A l l errors are absolute.
ANNAT 4,7,1 10000+ 0.00142 0.00407 3%-6% 8%-12% 18% 3%-7%
NeuralWorks 4,9,1 20000+ 0.00129 0.00445 2%-5% 7%-14% 21% 3%-6%
Akmal and Usmani; Polymers in Sensors ACS Symposium Series; American Chemical Society: Washington, DC, 1998.
85
Downloaded by UNIV OF MICHIGAN ANN ARBOR on November 7, 2017 | http://pubs.acs.org Publication Date: April 17, 1998 | doi: 10.1021/bk-1998-0690.ch007
Molarity
Table III. Predictive Results for ANNAT % Error (adulterated) % Error (unadulterated)
(M Kcl) 0.0005 6.2 0.005 5.9 5.1 0.015 0.025 4.7 0.035 3.6 0.045 4.5 0.055 3.7 0.065 3.5 3.4 0.075 0.085 3.6 0.095 3.3 3.7 0.150 0.250 3.0 0.350 4.2 0.450 4.7 0.550 5.5 0.650 3.9 0.750 4.9 0.850 4.3 0.950 3.6 - All errors are absolute.
Molarity
non-adjusted 11.5 11.0 11.9 10.6 9.4 9.7 9.1 9.2 8.4 7.9 8.7 8.1 8.9 9.5 8.6 10.0 9.2 8.8 9.5 9.0
weight-adjusted 6.7 7.0 7.4 6.0 5.2 5.1 4.5 5.7 4.0 3.5 3.1 4.2 2.9 3.5 4.0 3.7 4.5 3.9 4.5 3.5
Table IV. Predictive Results for NeuralWorks % Error (adulterated) % Error (unadulterated)
(M KC1) 0.0005 0.005 0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095 0.150 0.250 0.350 0.450 0.550 0.650 0.750 0.850 0.950
5.4 4.1 4.8 3.8 4.3 3.5 3.9 3.0 2.4 3.6 3.1 1.9 3.8 3.0 4.2 3.5 2.9 4.0 4.3 3.6
non-adjusted 13.5 13.9 12.7 11.6 10.2 9.9 9.9 9.2 7.6 6.9 8.7 7.1 9.9 9.1 9.6 11.0 7.7 9.8 9.2 10.0
weight-adjusted 5.8 6.0 5.4 5.1 5.2 4.1 3.5 2.7 4.0 3.9 3.7 4.2 4.9 3.8 4.0 4.7 4.6 4.9 4.2 3.9
- All errors are absolute. Akmal and Usmani; Polymers in Sensors ACS Symposium Series; American Chemical Society: Washington, DC, 1998.
86
Downloaded by UNIV OF MICHIGAN ANN ARBOR on November 7, 2017 | http://pubs.acs.org Publication Date: April 17, 1998 | doi: 10.1021/bk-1998-0690.ch007
differences are small enough that they can be attributed to normal distribution and/or experimental errors. For both systems, no clear trend in the % error can be observed, with respect to concentration, but the highest % errors seem to occur for lower concentrations. Neuralnet Results for an Adulterated System. Neural network analysis has been shown to be effective in handling ideal data sets but this alone does not justify the time and effort spent creating the appropriate architectures. The next step involves the exposure of the trained networks to data generated by samples that have been adulterated with an interfèrent, ethanethiol (CH CH SH). This compound was chosen since the thiol functional group will interact with the gold electrodes of the TSM device to effect a change in the series resonant frequency. The thiol was introduced into samples of the same KC1 concentrations as those in the unadulterated "unknown" set. Although a larger molecule will cause a larger deviation in Fs, ethanethiol was chosen in view of its solubility in an aqueous system. The shorter chain length also precludes formation of a well-packed monolayer. This, combined with a purposely low concentration of thiol (