Anal. Chem. 1998, 70, 1249-1254
Articles
Classification and Quantitation of 1H NMR Spectra of Alditols Binary Mixtures Using Artificial Neural Networks Salvator Roberto Amendolia,† Angelino Doppiu,‡ Maria Luisa Ganadu,*,‡ and Giuseppe Lubinu‡
Istituto di Matematica e Fisica, and Dipartimento di Chimica, Universita` di Sassari, via Vienna, 2, 07100 Sassari, Italy
A pattern recognition method based on artificial neural networks (ANNs) to analyze and quantify the components of six alditol binary mixtures is presented. This method is suitable to classify the spectra of the 15 mixtures obtained from the six alditols and to produce quantitative estimates of the component concentrations. The system is user-friendly and is helpful in solving the problem of greatly overlapping signals, often encountered in NMR spectroscopy of carbohydrates. A “classification” ANN uses 200 intensity values of the 1H NMR spectrum in the range 3.5-4 ppm. When the correct mixture is identified, the quantification is solved by assigning a specific ANN to each mixture. These ANNs use the same 200 values of the spectrum and output the values of the two concentrations. The error in the ANN responses is studied, and a method is developed to estimate the accuracy in determining the concentrations. The networks’ abilities to recognize previously unseen mixtures are tested. When the classification ANN (trained on the 15 binary mixtures) is exposed to complex (i.e., more than binary) mixtures of the six known alditols, it successfully identifies the components if their minimum concentration is 10%. Given the precision of the results and the small number of errors reported, we believe that the method can be used in all fields in which the recognition and quantification of components are necessary. There is a continuing need for more rapid, precise, and accurate analyses of the chemical composition of biological systems. Many chemometrics techniques adopted in practical applications of analytical spectroscopy1-3 use a pattern recognition approach. An ideal method would require minimum sample preparation, would analyze samples directly (i.e. without reagent), and would be rapid, automated, quantitative, and relatively inexpensive. An analysis based on artificial neural networks †
Instituto di Matematica e Fisica. Dipartimento di Chimica. (1) Oshima, K.; Oka, K.; Pishva, D. In Computer Aided Innovation New Materials 2; Doyama, M., Ed.; North-Holland: Amsterdam, 1993; pp 931-4. (2) Kemsley, E. K.; Ruault, S.; Wilson, R. H. Food Chem. 1995, 54 (3), 321-6. (3) Twomey, M.; Downey, G.; McNulty, P. B. J. Sci. Food Agric. 1995, 67 (1), 77-84. ‡
S0003-2700(97)00868-8 CCC: $15.00 Published on Web 02/28/1998
© 1998 American Chemical Society
(ANNs) satisfies these requirements. ANNs are used in chemometrics and are well-known to be both user-friendly and robust. For example, a very extensive review from Brown et al.4 on chemometrics stressed the importance of ANN as the most novel research instrument in pattern recognition for the classification step, and several applications were reported.5-16 ANN is very often considered only as an efficient “black-box” tool for data classification, but we emphasize here that ANN analysis is also a powerful instrument for data quantification, as reported elsewhere for some recent applications.17-19 Thomsen and Meyer20 have shown that ANNs can recognize the 1H NMR spectra of alditols. In our work, we have extended the application to mixtures of alditols. We demonstrate how ANNs can be used to classify spectra of mixtures and to produce quantitative estimates of the relative concentrations of the components in the mixtures. This application can be used to deal with the common NMR spectroscopy problem of quantifying greatly overlapping signals. In general, the analysis of organic mixtures is a two-step procedure requiring the isolation of mixture components, followed (4) Brown, S. D.; Sum, S. T.; Despagne, F.; Lavine, B. K. Anal. Chem. 1996, 68, 8, 21R-61R. (5) Otto, M. Spec. Publ.sR. Soc. Chem. 1994, 154 (Reviews on Analytical Chemistry-Euroanalysis VIII), 195-213. (6) Li, Y.; van Espen, P. Chemom. Intell. Lab. Syst. 1994, 25 (2), 241-8. (7) Werther, W.; Lohninger, H.; Stanci, F.; Varmuza, K. Chemom. Intell. Lab. Syst. 1994, 22 (1), 63-76. (8) Visser, T.; Luinge, H. J.; van der Maas, J. H. Anal. Chim. Acta 1994, 296 (2), 141-54. (9) Luinge, H. J.; van der Maas, J. H.; Visser, T. Chemom. Intell. Lab. Syst. 1995, 28 (1), 129-38. (10) Sharma, A. K.; Sheikh, S.; Pelczer, I.; Levy, G. C. J. Chem. Inf. Comput. Sci. 1994, 34, 4, 1130-9. (11) Sato, T. J. Near Infrared Spectrosc. 1993, 1 (4), 199-208. (12) Schulze, H. G.; Blades, M. W.; Bree, A. V.; Gorzalka, B. B.; Greek, L. S.; Turner, R. F. B. Appl. Spectrosc. 1994, 48 (1), 50-7. (13) Hare, B. J.; Prestegard, J. H. J. Biomol. NMR 1994, 4 (1), 35-46. (14) Goodacre, R.; Kell, D. B. Anal. Chim. Acta 1993, 279 (1), 17-26. (15) Goodacre, R.; Neal, M. J.; Kell, D. B. Anal. Chem. 1994, 66, 1070-85. (16) Meyer, M.; Meyer, K.; Hobert, H. Anal. Chim. Acta 1993, 282 (2), 40715. (17) Hiltunen, Y.; Heiniemi, E.; Ala-Korpela, M. J. Magn. Reson. 1995, B106, 191-4. (18) Long, J. R.; Gregoriou, V. G.; Gemperline, P. J. Anal. Chem. 1990, 62, 17917. (19) Bos, M.; Bos, A.; van der Linden, W. E. Analyst 1993, 118, 323-8. (20) Thomsen, J. U.; Meyer, B. J. Magn. Reson. 1989, 84, 212-7.
Analytical Chemistry, Vol. 70, No. 7, April 1, 1998 1249
by identification of the pure materials. Computer-assisted methods are commonly used which employ chromatographic separation, followed by detection using infrared, NMR, or mass spectrometry. These provide information-rich spectra for library searches of spectral databases. However, these methods suffer from a number of shortcomings.21 In ANN analysis, the chromatographic separation and the subsequent identification by matching with a spectroscopic database spectrum are not required for quantitative analysis of mixture components. ANNs have a great advantage over conventional spectroscopic library searches in that ANNs do not require the definition of rules by scientists and the explicit knowledge about the form of the model function. Instead, they extract the characteristic differences between the spectra during the training process. Furthermore, the retrieval of information from an ANN system is significantly faster than conventional library searches. Once the ANNs have been trained, the resulting protocol is very fast, since the 1H NMR measurements can be performed in a time scale of minutes and the ANN analysis is instantaneous. This paper considers six alditols (D-arabinitol, D-galactitol, myoinositol, D-mannitol, D-ribitol, D-glucitol) and their binary mixtures. The NMR signals fall in the narrow range between 3.5 and 4 ppm, thus stressing the validity of the technique in the presence of overlapping signals. Also, to prove the capability of generalization and the robustness of the method, we use not only high-quality spectra to train and test the ANNs but also spectra with low signalto-noise ratio and low resolution. Thus, the ANNs are trained to tolerate variations in the input patterns, an important characteristics for the wide applicability of this technique.
encoded in a so-called target vector). After the calculation, an algorithm based on the differences between the output values obtained and the target values is used to modify the weights. As this process is repeated, the weights gradually converge to values which transform each input pattern to an output pattern closely conforming with its target. ANNs have the remarkable ability to robustly process information which contains a degree of uncertainty, due to variable, noisy, or incomplete input. In fact, any type of variation in the inputs may be accounted for by simply including training input patterns, containing such variations, in the training process. The ANN architectures implemented in this paper are threelayer (input, hidden, and output layer) feed-forward networks trained with back-propagation.24 The notation i-j-k will be used to label an ANN with i input, j hidden, and k output neurons. In our application, the input neurons receive the spectra as an i-dimensional vector whose elements represent the intensity at a specific frequency. Before training, all the weights were randomly chosen in the interval [-1,+1]. The activation function for all the neurons is the sigmoid, with a slope set to 1.0. The learning parameter, which relates the variation of the weights to the gradient of the error function, was optimized on a trial basis and set to 0.1 for all the ANNs used. The mean squared error (MSE) was used as the error function. MSE is defined as
∑ ∑ (t
MSE ) (1/P)
p
j
pj
- opj)2
(1)
EXPERIMENTAL SECTION Artificial Neural Networks. ANN models22,23 have received increasing attention in recent years. Aimed at achieving humanlike performance in cognitive science tasks, these models are composed of a highly interconnected mesh of nonlinear computing elements, whose structure is drawn from our knowledge of biological neural systems. The fundamental processing element of an ANN is the neuron, functionally analogous to those found in biological neural networks. Neurons are connected together by links, and there is a coefficient (the “weight”) associated with each link. The neurons receive input from the external world or from other neurons, perform a weighted sum of the incoming inputs, process the resulting signal with linear or nonlinear transfer function, and then give an output to the other neurons or to the external world. ANNs constitute a means of pattern classification in which an input pattern (a vector whose values represent a data object) is transformed into an output pattern (a vector whose values represent the desired classification of the data object). One feature of simulated neural networks is that the correct values of the coefficients required to transform a given input pattern to the desired output pattern need not be known in advance. The values can rather be developed by a process known as “training” or “learning”. Starting from random values of the weights, output vectors are calculated from a set of input vectors for which the “right answers” are known a priori (the right answers being
where tpj is the target output of output neuron j on pattern p, and opj is the actual output computed by neuron j. P is the total number of patterns presented to the network in the learning process. A learning cycle during which all the training patterns are presented randomly to the network is called an epoch. The performance of all the ANNs used in this work was tested on a test set (a subset of all the available patterns) and monitored during the learning session to determine when the learning phase had to be stopped to prevent the overtraining of the network. Overtraining25 occurs if the ANN is trained with an excessive number of learning epochs. When overtrained, an ANN performs an overconstrained fit to the training data, thus losing its capacity to classify unseen patterns. The simulation of the networks was carried out with the SNNS (Stuttgart Neural Network Simulator) program, version 4.0, which is distributed by the University of Stuttgart (Institute for Parallel and Distributed High Performance Systems) as free software. The calculations were made on an IBM-RISC 6000 workstation, running AIX 3.2.5 and X-Windows. Spectra and Patterns. The 1H NMR spectra of the six substances were recorded on a Varian VXR300S NMR spectrometer, using a spectral width of 2500 Hz and a digital resolution of 0.7 Hz/point. For each substance, 35 FIDs were obtained. The FIDs differ from each other by the number of transients (NT ) 2n, n f [2,8]) and by the number of acquired points (NP ) 2nk, n f [1,5]). From the 210 FIDs, a total of 360 spectra (60 for
(21) Leherfeld, J. Anal. Chem. 1984, 56, 1803-6. (22) Radomski, J. P.; van Halbeek, H.; Meyer, B. Nat. Struct. Biol. 1994, 1 (4), 217-8. (23) Zupan, J.; Gasteiger, J. Anal. Chim. Acta 1991, 246, 1-30.
(24) McClelland, J. L.; Rumelhart, D. E. Parallel Distributed Processing; MIT Bradford Press: Cambridge, 1986; Vol. 1. (25) Munk, M. E.; Madison, M. S.; Robb, E. W. Mikrochim. Acta 1991, II, 50514.
1250 Analytical Chemistry, Vol. 70, No. 7, April 1, 1998
Figure 1. Examples of spectra of arabinitol. (A) 32K data points, 4 transients, Gaussian weighting function; (B) 2K data points, 128 transients, exponential weighting function, zero-filling.
each substance) were produced using various weighting functions and zero-filling. The spectral transformations were carried out on a SUN Sparc Station 10 with the software package Varian VNMR 5.1. Sixty spectra for each compound are adequate to test the ANN approach to the problem. Each compound was represented both by high-quality spectra and by very poor quality spectra with low resolution and spectral distortion. In fact, training an ANN on deliberately imperfect data increases its ability to correctly recognize spectra that inherently contain similar imperfections. The ANN learns to include variable data and can thus deal with spectral variations caused by experimental conditions. Examples of the spectra obtained after preprocessing are displayed in Figure 1. From six pure substances, 15 types of binary mixtures were obtained. The 1H NMR spectra of the mixtures were produced as linear combinations of the spectra of the pure compounds. If f1(x) and f2(x) represent any two pure spectra, a mixture is given by
f12(x|a,b) ) af1(x) + bf2(x)
(2)
with a,b in the range [0,1] and b ) 1 - a. An example of a spectrum obtained with this procedure is displayed in Figure 2. The patterns of the mixtures obtained in this way contain examples in which the contributing spectra have well-resolved peaks and examples where both contributing spectra have broad peaks. We also chose to include in the learning set some patterns obtained by summing spectra with significantly different levels of line broadening. It may not seem realistic, but, in this way, we could
Figure 2. Spectra of arabinitol (A), galactitol (B), and their binary 50% + 50% mixture (C).
test the robustness of the method toward very high spectral distortion. The binary mixture spectra were normalized to lie in the range [0,1] after assembly. Only a finite number of relative concentrations of the two substances were used to train and test the networks. In the learning phase, the coefficients a and b were varied in steps of 0.1. In the test phase, the coefficients a and b were varied in steps of 0.05. This was done to verify the ability of the ANNs to interpolate between the data used as learning input. A pattern was constructed by sampling the spectral region from 3.5 to 4.0 ppm. It was presented as a set of intensities, in the form of a list of numerical values, each value representing the intensity at a frequency determined by its position in the list; therefore, also spectra obtained with different techniques (e.g., quadrature detection) could be equally used as input patterns, though this was not done in this work. RESULTS AND DISCUSSION The ANN approach was first used to classify the six pure substances. This allowed the optimum number of input neurons to be set to 200. Indeed, using 400 input neurons (the same granularity as the range of the spectra in ppm) leads to too many weights and, consequently, to overtraining of the ANN, while using 100 input neurons (which implies summing together four adjacent channels in the original spectrum) results in the loss of too much information (e.g., narrow, double-peak structures were merged into a single peak). The number of hidden neurons was optimized on a trial basis. The configuration which led to the minimum MSE was the 200-12-6 ANN, where each of the six output neurons was related to one of the six pure substances. As a result of the Analytical Chemistry, Vol. 70, No. 7, April 1, 1998
1251
network operation, the relevant output neuron had a value greater than 0.9 for all the patterns presented to the ANN, while all the other neurons were consistently below 0.1 (the output pattern was normalized to 1). The 200-12-6 ANN was 100% efficient on the test sample. Overtraining occurred only if the ANN was trained for more than 10 000 epochs. Optimum performance was achieved training for 1000 epochs. The same ANN architecture was used to train the network on the mixture recognition problem. Here, the goal was to identify the two substances forming the mixture. The learning set was composed of 10 065 spectra (671 for each mixture), and the test set was composed of 3150 spectra (210 for each mixture). The two sets were formed by combinations of the available 360 spectra. A subset of 300 spectra was chosen to form the learning set, and a subset of 60 spectra was chosen for the test set. The subsets contained examples of both high- and low-resolution spectra. The target vectors used in the back-propagation learning required that the two relevant neurons be equal to 1, and that the remaining ones be equal to 0. The result of the network operation on the test sample gave output neurons’ values above 0.9 and below 0.1, provided that the lowest of the two concentrations present in the mixture was at least 10%. For even lower concentrations of one of the two components, the value of the relevant output neurons was measured to be as low as 0.2, but it was always at least 2 orders of magnitude higher than the values of the neurons associated with the substances not present in the mixture (Figure 3). In this way, a fairly simple threshold algorithm allows the detection of the type of mixture with 100% efficiency. Note that this 200-12-6 ANN, trained on the mixtures, includes the pure substance classification network described above and gives the same efficiency in this particular case. The learning curve, which represents the evolution of the error of the ANN during training, reached the minimum point after 600 epochs, yielding the best performance and generalization ability. The latter was tested exposing the network to complex mixtures (up to five-component) of the six alditols. As a result, the ANN was still capable of identifying the components if the lowest percentage of any of them was at least 10%. Note that the network had not “seen” any example of tertiary or more complex mixtures during training. The quantitative measurement of the percentages in the mixtures could not be carried out by the 200-12-6 ANN with sufficient precision. Instead, 15 dedicated ANNs, configured as 200-12-2, have been trained, each one specialized on one of the 15 types of mixtures classified by the 200-12-6 ANN. The sum of the two output neurons values was normalized to 1, each value representing the percentage of the corresponding component in the mixture. The learning sample for each mixture was made of 5720 patterns, while the test sample was made of 210 patterns (namely, 10 sets of 21 spectra, where the percentage of each component varies from 0% to 100% in steps of 5%). From the 60 spectra available for each substance, we chose two subsets of 50 and 10 spectra to form the learning set and the test set, respectively. Table 1 reports the performance of the ANNs in terms of MSE measured on the learning sample and on the test sample. The number of epochs needed to train the networks is also shown. Differences in these values reflect the different complexities of 1252 Analytical Chemistry, Vol. 70, No. 7, April 1, 1998
Figure 3. Responses of the 200-12-6 network for the identification of mixtures. (A) A component is present with a concentration 10%. Table 1. Performance of the 200-12-2 Networks MSE mixture
learning time (epochs)
test sample
learning sample
arabinitol-galactitol arabinitol-inositol arabinitol-mannitol arabinitol-ribitol arabinitol-glucitol galactitol-inositol galactitol-mannitol galactitol-ribitol galactitol-glucitol inositol-mannitol inositol-ribitol inositol-glucitol mannitol-ribitol mannitol-glucitol ribitol-glucitol
200 300 500 1000 1000 200 500 1000 400 400 1000 400 1000 1000 1000
0.352 0.342 0.213 0.269 0.0722 0.296 0.166 0.256 0.100 0.202 0.454 0.147 0.232 0.0967 0.0927
0.00084 0.00052 0.00050 0.00035 0.00028 0.00061 0.00033 0.00028 0.00034 0.00025 0.00018 0.00026 0.00025 0.00019 0.00019
the mixtures from the point of view of pattern recognition, i.e., how much the spectrum of a mixture overlaps that of a pure substance. A typical example is the arabinitol-galactitol mixture,
Table 2. Coefficients To Evaluate the Accuracy of Networks’ Responses 0-50%a
50-100%a
mixture
a
b
a
b
arabinitol-galactitol arabinitol-inositol arabinitol-mannitol arabinitol-ribitol arabinitol-glucitol galactitol-inositol galactitol-mannitol galactitol-ribitol galactitol-glucitol inositol-mannitol inositol-ribitol inositol-glucitol mannitol-ribitol mannitol-glucitol ribitol-glucitol
0.15 0.11 0.066 0.06 0.031 0.11 0.05 0.11 0.072 0.11 0.18 0.088 0.073 0.059 0.079
-0.13 -0.088 -0.045 -0.035 -0.016 -0.11 -0.028 -0.11 -0.068 -0.11 -0.17 -0.087 -0.058 -0.054 -0.075
0.12 0.11 0.051 0.045 0.028 0.10 0.062 0.14 0.057 0.083 0.19 0.061 0.088 0.055 0.066
-0.11 -0.12 -0.041 -0.028 -0.020 -0.085 -0.057 -0.13 -0.047 -0.074 -0.18 -0.055 -0.070 -0.047 -0.065
a These ranges of concentration are referred to the first component (first output neuron). On calculating the error by formula 3, the value of concentration to be used is that of the more concentrated component.
Figure 4. Predicted values of concentration of two 200-12-2 networks (averaged on test set) versus the real concentration value. (A) Arabinitol-glucitol network with best performance. (B) Inositolribitol network with worst performance. Inside the error bars, the expected linear dependence is obtained.
where the two components exhibit a triplet structure in the very same region (Figure 2). This is consistent with the fact that the ANN associated with this mixture shows a lower performance in terms of the MSE measured on the test sample. Figure 4 shows the network response for two of the mixtures: the response is approximated by a linear dependence on the concentration, as it should be for a perfect ANN. The values reported are the mean (cm) of the values measured on the 10 spectra of the test set corresponding to any given input concentration (ci). The error bar reports the standard deviation (∆cm) of the distribution around this mean. The correlation between the network’s response and the concentration is very high. This is particularly true for the arabinitol-glucitol network, which has the best performance; the responses of this network are not only very precise but also accurate (linear dependence very close to the expected one). The inositol-ribitol network is that one with the worst performance: the MSE of this network on the test set is 6 times bigger than that of the arabinitol-glucitol network (Table 2); although it is less precise, the linear dependence is good. An indication of the accuracy of this procedure in determining the concentration in a binary mixture is the relative error, defined as the ratio ∆cm/cm between the standard deviation and the measured mean for any given input concentration. This relative
Figure 5. (A) Plot of the relative error on concentration as a function of the concentration itself. (B) An example of a linear fit of the relative error in the range 50%-100%. The error refers to the more concentrated component.
error is closely connected to the concentration ci, being higher for lower concentration values (Figure 5). The absolute error is not constant within the range of concentration used but shows a maximum value close to 50%. This is consistent with the fact that, at this value of concentration, there are many signals of similar intensity in the spectrum of a mixture. Analytical Chemistry, Vol. 70, No. 7, April 1, 1998
1253
Since the responses of the output neurons are normalized to 1, the absolute errors of the two output neurons are equal and opposite; therefore, what actually gives the quality of the measurement is the error on the higher of the two concentrations. A coarse fit to the relative error can be achieved with a straight line (Figure 5) of equation ∆cm/cm ) a - bci, where a and b are fit parameters specific to each mixture. For every network, we performed two fits separately, one associated with the range of concentrations where the first component is predominant and the other associated with the range where the second component is predominant; thus, we obtain two values of the fit parameters for each mixture. Making the assumption that ci is well approximated by cm (which is true inside the error bars), the absolute error on the measurement can be expressed by
∆cm ) aci - bci2
(3)
CONCLUDING REMARKS A method for the analisys of six alditols binary mixtures based on artificial neural networks was developed and evaluated. The ANNs (after the training phase, which requires a specific software environment) constitute a set of 16 software procedures embedded in a C code which is directly interfaced with the NMR spectra analysis software: after data acquisition, the software outputs the type and composition of the analyzed mixture and computes the uncertainties on the measurement. In this work, we chose to use only pure compounds and pure binary mixtures, i.e., made up by the six alditols and without any impurity. In presence of impurities, the 200-12-6 network was still capable of recognizing the two components of a mixture if the impurity had a concentration 2 orders of magnitude smaller than that of the less concentrated component. In real-world situations, where the impurity could be the predominant component, it would be necessary to train the ANN also with examples in which the impurities are present. ACKNOWLEDGMENT We thank Dr. Valeria Maida for technical assistance. This study was supported by Regione Autonoma Sardegna (RAS).
where ci is the concentration of the predominant component. In this way, we can give an estimate of the accuracy in the concentrations computed by the networks. Table 2 reports the a and b parameters for the 15 mixtures considered.
1254 Analytical Chemistry, Vol. 70, No. 7, April 1, 1998
Received for review August 13, 1997. Accepted January 16, 1998. AC970868G