Pattern recognition by audio representation of multivariate analytical

We present an objective recipe, based on the statistical dis- tribution of data on each axis in *-dimensional space, for audio display of multivariate...
0 downloads 0 Views 585KB Size
1120

Anal. Chem. 1980, 52, 1120-1123

Pattern Recognition by Audio Representation of Multivariate Analytical Data Edward S. Yeung Ames Laboratory, USDOE, and Department of Chemistry, Iowa State University, Ames, Iowa 5001 1

We present an objectlve recipe, based on the statistical dlstribution of data on each axis in kdimensional space, for audio display of multivariate analytical data. Each measurement in the data vector is translated into an independent property of sound. Scaling is performed in a continuous manner. The case for k I9 is treated here, but it should be possible to approach k = 20 with slight loss of resolution and with somewhat increased risk of nonorthogonallty. A significant feature of such a presentation of analytical data is that good acoustical standards are available so that the trained ear can associate each sound with a good estimate of the value in the original measurement. Excellent results were obtained when this method was applied to the pattern recognition of a test data set. Advantages over visual (graphical) representation schemes are discussed.

With the recent vast advances in instrumental methods of analysis, the analyst often has to deal with a large collection of data from experimental measurements on a given sample. In typical situations, one is not interested in the individual data points, but is interested in the data set taken as a whole. Some sort of a multivariate data analysis is needed to extract distinctive features in the sample and to classify the sample into well-defined categories. There exist several extensive statistical procedures for such applications, most notably ARTHUR ( I ) , SPSS (2),and BMD ( 3 ) . While statistical methods rank high in objectivity and in handling highly dimensionalized data sets, there is a need for pattern recognition by the human mind. T h e latter functions particularly well when computation costs are of concern, to provide the initial data analysis and reduction before the statistical treatment, and to key on whatever features are extracted by the statistical treatment for subsequent routine use by the analyst. Furthermore, the chemical basis for the extracted features must be established eventually to increase the confidence level in t h e data analysis, and can only be done with the aid of the analyst. I t is not trivial to properly present multivariate analytical data to the human mind, which normally cannot process two or three dimensions a t any one time. The availability of sophisticated computer graphics has made visual presentations popular. Some published attempts include rays of various lengths extending from circles of fixed radii ( 4 ) , triangles of varying sizes and orientations ( 5 ) .polygons based on differences from the mean of each variable (6, 7 ) , plots of the Fourier series of the principal components, and scaling various features of a cartoon face (8). The basis for such visual representations is that we are trained to recognize many different shapes and classify them according to similarity without going through specific mental analysis of details of the shapes. The most impressive trait is our ability to recognize faces of different people without the need to recall individual facial features, and this has been exploited recently ( 8 ) . There are several problems associated with visual presentations, a t least in the present state of development. Firstly, 0003-2700/80/0352-1120$01 OO/O

good standards are generally not available for the visual displays. This is best illustrated by the tedious nature of creating facial composites of suspects in police work, usually requiring many trials and errors and with the help of many witnesses. On the other hand, identification is much easier through photos in criminal record books or from police stand-in lines. The difficulty in describing the features that led to the recognition of the pattern, i.e., the lack of good standards, must be solved to broaden the scope of application in chemical analysis. Secondly, resolution of the representation is generally poor, probably not much better than one part in ten for each dimension in the data set. For representation of analytical data that have precision much beyond that, one will be introducing unnecessary degradation of the data. Thirdly, there are problems with true orthogonality of the axes in visual representations. In the use of cartoon faces ( 8 ) ,it is clear that if the ‘‘width of the mouth” is small in value, then one can hardly detect the “position of the mouth” or the “curvature of the mouth”. Again, in the rays from a circle representation ( 4 ) ,slight variations in two lengths are more easily recognized when the rays are in the proximity of each other and in the same direction than when they are far separated and in different directions. Finally, even when definite and orthogonal dimensions in the representation are found, there remains the problem of scaling the measurements in each dimension. While our judgments in lengths are usually good so that a linear transform is adequate, properties such as “curvature” are ill defined and a linear transform into the radius seems arbitrary. Most of the other human sensations are not suitable channels for data presentation because our perception of them is not quantitative. Among these are taste, smell (91, temperature, and pressure. Sound, however, seems to be a useful medium for data presentation. We can certainly recognize many different sounds, and can even distinguish subtle differences such as the voices of different people. What is more interesting is that quantitation of the various properties of sounds is a mature science. Voiceprints have found applications in forensic work, and computer simulation of music is well known. Speeth ( I O ) has used seismograms to distinguish between earthquakes and nuclear explosions. I n what follows, we shall describe certain orthogonal properties of sounds, quantitation of these properties, derivation of standards and scaling factors, and an actual test of this method for representing multivariate analytical data for application in pattern recognition.

MASTER SCHEME Before going through the actual translation of analytical data to properties of sound, one needs to standardize the data set, hopefully without introducing any bias. It is convenient, vide infra, to have all measurements scaled to range from 0 to 1. The range is selected to fit our particular computational scheme so that the corresponding audio property will always be well-defined and always be in the range of perception of the typical human ear. Several accepted scaling procedures have been suggested ( I I ) , and ours is merely a variation of the standard autoscaling method in ARTHUR ( I ) . Briefly, we C 1980 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 52,

first decide whether the raw data on a linear scale make sense. For example, absorbance is preferred over transmittance if the chemical basis is the reflectance of the sample. The mean and standard deviation are then determined for all data corresponding to each type of measurement, regardless of category and regardless of whether the vector belongs to the training set, evaluation set, or test set. Or, if there can be found an alternate estimate of the precision of the data, that value can be used. The mean is given a value of 0.5, the mean plus one standard deviation is given a value of 0.7, and the mean minus one standard deviation is given a value of 0.3. These particular values are used to ensure that 95% of the data (assuming Gaussian distribution) fall between 0.1 and 0.9. In addition, calculated values larger than 1.0 and smaller than 0.0 are artificially set a t 1.0 and 0.0, respectively. T h e pre-scaling process employed here is probably as statistically sound as any of the other ones (11). We now need to define and quantify each of the dimensions of sound that should be used. Our choices of the dimensions are based on the availability of quantitative standards, continuity in scaling, independence from each other, resolution, a n d relative ease in perception even by the untrained ear. Nine such dimensions are described below, but the number can approach 20 if one becomes less critical about resolution, ease in perception, etc., as we shall describe in a later section. Naturally, one would have to be much better trained in that case. Pitch. One of the most distinct properties of sound is its pitch. Even to people that are tone-deaf, a “high’ note is quite different than a “low” note. The mathematical representation of pitch is nothing more than the frequency of the pressure wave that has a cosine form. A good standard is available, namely the 440-Hz musical “A”. T o people with absolute pitch, this standard is internal and an immediate association of t h e pitch to the original value of the scaled variable is possible. The resolution is extremely good, since the trained ear can distinguish differences of the order of 1 Hz in the mid-ranges. T h e dynamic range is large, with human perception spanning 30-15 000 Hz. And, as we shall see later, the pitch is truly independent of any of the other chosen dimensions. T h e number of dimensions one can derive from t h e pitch is arbitrary, the limit being the resolution desired. Specifically, each octave (factor of two in frequency) can easily accommodate a resolution of one part in 24, Le., a quarter of a tone. In principle, therefore, one can establish 9 dimensions based on pitch alone. Symphony conductors certainly can simultaneously recognize all nine tones, but the average person may only digest four or five. In the demonstration presented here, we use pitch to represent only two dimensions in the data, with the range lO(r1000 Hz for one and 1000-10000 Hz for the other. Since musical intervals are ratios of frequencies, it is appropriate to relate the logarithms of the frequencies t o t h e magnitudes of t h e variables. We use the form

A, =

A0 (COS 2 x f l t

+ COS 2 ~ f 2 t )

(1)

where and to scale two dimensions in the data vector, X I and X2. Loudness. Loudness is another well-documented and well-quantified property of sound. T h e standard acoustical quantity is the decibel, which is the logarithm of the magnitude of the pressure wave. The human perception threshold to the point of safe short-term exposure is about 100 dB, and one can distinguish differences of about 3 dB. So, good resolution is available. The standard in this case is established

NO. 7, JUNE 1980

1121

by, e.g., sound-level meters because the human ear normally is not good as an internal standard, other than rough estimates such as “jet plane take-off’ a t 100 d B and “quiet audience in a movie house” at 40 dB. The ear also tends to adjust itself over long exposures to a particular sound level, and must be continually recalibrated to preserve an internal standard of loudness. It is also known that the same sound level measured in d B is interpreted as louder a t the mid-range than a t the high and the low frequency ranges. T o preserve independence in the dimensions, one needs to correct for the response of the ear. This is well-documented in the so-called FletcherMunson curves (12). Ideally one would modify Equation 1 to assign different values A I , A2, etc., for the individual frequency terms instead of a common term Ao. T o minimize computation in the present work, this correction was not incorporated. So we have

(4) where N1 is one-half the range of the particular D/A converter used to generate sound, and N 2 = log,, ( N , ) . We note t h a t the voltage applied to an earphone or loudspeaker is proportional to the square-root in the power delivered as sound. This merely changes the scaling constant and does not affect the representation. Damping. So far we have been concerned with sounds at steady intensities as a function of time. It is possible to use the damping of the amplitude in time as another dimension. This is what distinguishes a steady note played on the violin with a bow from that of a plucked string. There is a great deal of subtleties in this audio property, e.g., the distinction between a piano and a harpsicord. Even though t h e ear is not normally trained to quantify this property, a standard representation is possible. We choose the scale with undamped loudness on one end and damping to of the voltage (20 d B in loudness) in s (at least 10 cycles of the tone) on the other end. We simply replace A,, in Equation 1 by Ao’ such that This particular scale is chosen so that, even at the lowest amplitude given by Equation 4,there is still enough resolution in our particular D/A converter to make use of Equation 5. Direction. We are trained to recognize the directions from which sounds originate. In accordance with our perception in space, we can readily use directions to represent three of the dimensions in the data vector. \Ye can imagine that the listener is in the middle of a cube of unit dimensions, with the sounds originating from distinct points from the parts of the cube. One can choose any arbitrary corner, e.g., the lower-left-rear corner, as the origin, and place the point prescribed by the variables X 5 ,X6, and X7 within the cube. The only slight confusion is that each “direction” is in fact a series of values for X,, X,, and X i unless the “distance” is specified. This can be done by alternating the sound a t the center of the listener and then a t the desired point. T h e inverse-square law of sound pressure allows us to judge the distance without biasing the variable that describes loudness, X,. Our internal standardization of this property of sound is very good. T o simulate the point-source of the sound, one needs six loudspeakers, at the top. bottom, front, back, left, and right of the listener. Then it is a simple projection of the X,)onto the axes of the speakers, the listener point ( X 5 ,X6, being the origin. T h e projected distance along each axis is used to determine the acoustic power (inverse-square law) to be delivered to each speaker. Duration/Repetition. The duration of the sound can be used for one of the dimensions in the representation. This makes use of our internal sense of time. For sounds without damping, the duration can readily be established quantita-

1122

ANALYTICAL CHEMISTRY, VOL. 52, NO. 7, JUNE 1980

tively. For sounds with damping, the perception of duration may be masked by the loudness level dropping below an audible level. It is therefore more appropriate to use repetition to represent the magnitude. For our particular choice of D/A conversion, of a second is a convenient duration of each simulated sound. We simply repeat the sound consecutively for a total of M times, where

T h e duration is then from 0.1 to 10 s, with an inherent time marker if the damping constant X4 is not zero. We choose a logarithmic scale because musical scores use logarithmic timing. This minimum duration is about the limit of our perception in time and the maximum duration is based on convenience. Rest. T o properly recognize a sound and its duration/ repetition, one normally needs several repetitions of the entire sequence. The pause, or rest period, between the sequences can be used as an extra dimension in the representation. Again, we choose 0.1 to 10 s as the range of the pause. Thus, the number of consecutive 0.1-s intervals the rest lasts is

L =

10(2X9)

(7)

T h e correspondence of these rest periods with musical rests is obvious. T h e above establishes the nine dimensions for audio representation of multivariate data. In practice, to avoid confusions in X 5 through X 7 with X 8 and X,, the sequence used may be (a) duration M a t the origin, (b) duration M a t the point ( X 5 ,X,, X 7 ) ,and (c) rest period L. T h e listener can easily determine that the apparent duration of M is twice the actual duration of M in the limit that X 5 through X 7 are all close to 0.5, and no confusion will arise.

TEST PROCEDURE A good evaluation of the present scheme is to apply the concept to the pattern recognition of a well-studied data set, such as the ARCH obsidian data published earlier (13,14). The data set consists of elemental concentrations of ten metals in 63 obsidian samples from four sites near San Francisco, Calif., obtained by X-ray fluorescence. We were limited by not having available six truly high-fidelity loudspeakers (and the associated amplifiers) to take advantage of the three dimensions described earlier, or a room with good acoustics (no reverberations and no noise). Sound generated by the data is simply presented to a stereo headphone (SONY Model DR-5A) through a preamplifier (SONY Model NR-335). The headphones have fairly flat frequency response and can shield out much of the noise in the room, but they restrict the dimensionality of our representation to seven. We have therefore neglected the measurements of Mn, Rb, and Y in the data set, based on the univariate Fisher and variance weights determined by the ARTHUR program ( I ) . The seven measurements remaining (Fe, Ti, Ba, Ca, K, Sr, and Zr) were randomly associated with the variables X I through X 5 , X, and X9. Calculations were performed on a DEC PDP 11/10 minicomputer with dual floppy disk and LPS-11 laboratory peripheral system. We chose 10 data vectors known to belong t o each of the four categories (total of 40 data vectors) for both training and evaluation. A computer program first normalizes the data vectors (see Master Scheme section) to values between 0 and 1. A second program calculates, according to the equations in the same section, values in an array of 5000 each for the left and the right channels of the headphone to be stored on disk. The maximum D / A conversion rate of the present system is 100 kHz, so that each array represents a 0.1-sduration of the sound with time resolution to 50 kHz. This should be more than adequate for our range of frequencies from 100 to 10 000 Hz. A third program recalls any file on the disk and translates it t o a series of voltages on each of the two channels in the D/A converter, which then goes through a voltage-dividing network into the audio preamplifier. All of the programs were written in RT-11 FORTRAN, except for the immediate D/A conversion routine, which is written in machine

language. The construction of data files via a second program avoids lengthy calculations, and thus long delays, in the actual generation of sound. The resolution in D/A conversion is 12 bits, which explains the values of the constants in Equations 4 and 5.

The operator of the third program can elect t o be “trained” by listening to each category of data at a time, or to be “tested” by requesting a random file. After the operator responds at the terminal with the guess as to the category of the random file, the correct result is printed to check the prediction.

RESULTS AND DISCUSSION Test Results. The evaluation of the present scheme for pattern recognition applications gave excellent results. Three operators (S.D.W., J.C.K., and W.G.T.) who were not aware of the details of the programs were trained once, Le., listened to all 40 data files once. Each attained an accuracy of 90% in classifying the vectors immediately. After a second training, W.G.T. and J.C.K. attained a n accuracy of 98%. Another operator (E.S.Y.), who is familiar with the details of the programs and who has moderate musical training, achieved an accuracy of 100% immediately. More interestingly, certain “features” of the sounds from a particular category can be identified. In one random choice of audio properties for the measurements, one finds that group I has a particularly high frequency component (hiss), group I1 is particularly short and soft, group I11 is particularly pure (tuning-fork-like),and group IV has a particularly low component (rough noise). These features allowed the operator to retain the “training” over a week’s time without suffering in accuracy. Other random choices of the dimensions for each measurement gave similar results, although we did not come close to testing all 7! combinations. It is true that different people will key on different “features” in trying to classify the vectors. Also, different people effectively have higher resolution for certain audio properties. We found that in one series of tests, W.G.T. consistently misclassified one of the vectors, even after retraining, for a particular choice of dimensions. It was subsequently discovered that there was an indexing error in the computer program, and W.G.T.’s classification was in fact correct. Overall, the utility of this scheme in actual pattern recognition seems extremely promising. Extensions. Naturally, one would like to be able to represent as many dimensions as possible. As indicated in the Master Scheme section, seven more dimensions X l 0 through X L 6can be derived from the pitch alone. Along the same lines of reasoning as for damping, one can use a rise time to maximum loudness to represent X I 7 . In musical terms, this corresponds to the “attack” of a note. In the determination of direction, it is assumed that the sounds from all six loudspeakers are in phase but different in intensity. A phase factor can then be introduced to represent XIS. This is a known quality of sound for stereophonic applications. Equal intensities from the left and the right channels can sound differently depending on whether the two are in phase or out of phase. In the former case, one perceives sound originating from the center while in the latter case one perceives two distinct sounds, one from the two different channels. We therefore have close to 20 dimensions that seem fairly independent. Some of the finer qualities of sound turn out to be difficult to simulate. The difference between a note (same pitch, loudness) played on a violin and one played on a trumpet, and sometimes even between two violins, is readily recognizable. Mathematically, the difference is in the distribution of the various overtones of the note. We attempted to use an exponential and then a cosine function to distribute the overtones in relative intensity. The resulting sounds did show recognizable differences, but these subtle differences were hard to quantify subjectively. Further, if one builds in harmonics, the division of the pitch into several dimensions will not be

1123

Anal. Chem. 1980, 5 2 , 1123-1125

possible. We thus use simultaneous notes described by Equation 1 rather than introduce harmonics. Finally, tapping our ability to recognize speech as a series of sounds and music as a series of notes, one can use sequences of sounds to increase the dimensionality of the representation. For example, two consecutive sequences can already handle 36 dimensions. A Synthesizer. The use of a computer to generate sounds in this work is simply to allow data manipulation and changing of the representation for testing. The computer is really not necessary, and its elimination may even be advantageous. A faster D/A converter with more bits will improve the resolution achieved here, but demands a corresponding increase in computation for the cosine function in Equation 1. This is particularly serious when 6 loudspeakers and 9 pitches are used. The alternative is to make use of discrete components. Each pitch can simply be derived from a voltage-to-frequency converter with appropriate output smoothing to convert to a cosine wave. A summing amplifier can then perform the function of Equation 1. The exponential forms in the Master Scheme section can be represented by anti-log amplifiers. Division of the amplitude into each of the six loudspeakers can be accomplished with standard analog division networks. Scaling of the data vectors (most likely directly as input voltages from the instruments) requires simple op-amp circuitry, once the standard deviation or precision has been determined. Damping can be performed by a voltage-controlled amplifier tied to an anti-log circuit, which in turn monitors the discharge of a capacitor by a constant current specified by X,. Analog timing circuits can suffice in determining durations. The entire package probably will be significantly more convenient and less expensive than the corresponding computer-controlled version. Comparisons. We feel t h a t audio representation of multivariate analytical data, as shown here, is superior to the known visual methods. While audio methods do not require

a computer, visual graphics always do. T h e simplicity, dynamic range, orthogonality of the parameters, and presence of good standards are real pluses for the present scheme. Detailed statistical computation will always have its place in pattern recognition studies, but the present scheme is a viable alternative. ACKNOWLEDGMENT The author thanks William G. Tong, J. C. Kuo, and Steven D. Woodruff for participating in the evaluation process, Larry Steenhoek for discussions concerning voltage-to-frequency converters, and W.G.T. for helping in parts of the computer programming. LITERATURE CITED (1) Harper, A. M.; Duewer, D. L.; Kowalski, B. R.; Fashing, J. L. "Chemometrics: Theory and Application", Kowalski, B. R., Ed.; American Chemical Society: Washington D.C.. 1977; p 14. (2) Nie, N. H.; Bent, D. H.; Hull, C. H. "Statistical Package for the Social Sciences", 2nd 4.;McGraw-Hill: New York, 1975. (3) Dixon, W. J., Ed. "BMDP, Biomedical Computer Programs"; University of California Press: Berkeley, Calif., 1971. (4) Anderson, E. Technometrics 1960, 2, 387. (5) Pickett, R.; White, B. W. "Constructing Data Pictures"; Proceedings of the VI1 National Symposium of the Society for Information Display, 1966; p 75. (6) Siegal, J. H.; Goldwyn, R. M.; Friedman, H. P. Surgery 1971, 70, 232. (7) Daetz, D. "A Graphical Technique to Assist in Sensitivity Analysis"; unpublished report, 1972. (8) Chernoff, H. J. Am. Stat. Assoc. 1973, 68, 361. (9) McGill, J. R.;Kowalski, B. R. Anal. Chem. 1977, 4 9 , 596. (10) Speeth, S. D. J. AcousticalSoc. Am. 1961, 33, 909. (11) Kowalski, B. R.; Bender, C. F. J. Am. Chem. SOC.1972, 94,5632. (12) Eargle, J. "Sound Recording"; Van Nostrand: New York. 1976; p 34. (13) Kowalski, 8. R.; Schatzki, T. F.; Stross, F. H. Anal. Chem. 1972, 4 4 . 2176. (14) Stevenson, D. F. Archaeornetry 1971, 13, 17

RECEIVED for review November 1, 1979. Accept.ed March 24, 1980. This work was supported by the U S . Department of Energy, Contract No. W-7405-Eng-82, Office of Basic Energy Sciences, Division of Chemical Sciences (AK-01-03-02-3).

Gas Phase Ozone Evaluation by Thermal Decomposition Technique Vincenzo Caprio" and Pier Giorgio Lignola Istituto di Chirnica Industriale e Impianti Chimici, Universitg, P. le V. Tecchio, 80 125 Napoli, Italy

Amedeo Insola Laboratorio di Ricerche sulla Combustione, C.N.R., P. le V. Tecchio, 80 125 Napoli, Italy

An absolute method for gas phase ozone evaluatlon Is described. The method which is based on the measurement of pressure rises due to the ozone decomposition is sensitive to ozone contents of 0.02% by volume. Its reliability is verified by comparison with batch iodometric and UV analyses. Results also show the limits of the usually adopted iodometric analyses performed by bubbling the ozone stream through the K I solution.

During recent years there has been a marked increase of interest toward ozone with respect to its capability of acting 0003-2700/80/0352-1123$01 .OO/O

as a strong oxidant for the chemical treatment of wastes. Whereas some applications of ozone such as air deodorization and water disinfection are long since known and largely employed, only more recently has the attention of researchers and technicians been focused on chemical applications of ozone for pollution abatement. Its high reactivity as oxidant makes ozone capable of promptly destroying any oxidizable material even at very low concentrations a t which the use of less expensive oxidants is affected by serious limitations. Thus more stringent limits of cleanliness such as those imposed from new standards can be achieved. In this connection, one of the first and most serious problems encountered from investigators has been the evaluation 1980 American Chemical Society