Chemometrics Sunny-Side Up

Chemometrics Sunny-Side Up. Unscrambler. CAMO USA. 722 Port Walk PI. Redwood Shores, CA 94065. Version 5.03. 415-598-9860; 415-595-2321 (fax)...
0 downloads 0 Views 3MB Size
Software

Chemometrics Sunny-Side Up

Unscrambler CAMO USA 722 Port Walk PI. Redwood Shores, CA 94065 Version 5.03 415-598-9860; 415-595-2321 (fax) $3600 commercial; $2445 noncommercial; $1650 educational

Extracting information from spectral data for qualitative and quantitative purposes requires extensive chemometrics. Some­ times multilinear regression (MLR) works best, sometimes principal component re­ gression (PCR) or partial least squares (PLS) works best and, at other times, clas­ sification schemes are more applicable. Unfortunately, neither understanding the mathematics nor having knowledge of the data is adequate for making, a priori, spe­ cific recommendations about which partic­ ular multivariate pathway to follow. Conse­ quently, most modern software packages, including Unscrambler, contain several multivariate alternatives for use in differ­ ent situations. Unscrambler has evolved over a period of ~ 10 years. Developed originally by Harald Martens for analyzing near-IR data, Unscrambler was introduced by CAMO in 1987 and has been expanded over the years into a rather extensive chemometric package that includes principal component analysis (PCA), SIMCA classi­ fication, PCR, PLS, and neural nets. The neural net routines were not included in version 5.03; neural-UNSC is a separate product and sells for $1075 (industry), $945 (research), or $830 (educational). In its present configuration, Unscrambler will assist a user in the analysis of any type of spectral data. 584 A

One of the frustrating aspects of de­ signing a universal multivariate software package today is coping with the variety of available file formats. Every manufac­ turer of scientific instruments has its own file form, and every spreadsheet has its own file format. CAMO's approach to this problem dif­ fers little from that of other vendors. There are five different ways to get data into the Unscrambler: APC (standard Un­ scrambler format), ASCII (with and with­ out a header), JCAMP-DX, and spread­ sheets (Excel and Lotus). CAMO indi­ cates that import routines are available for ISI (InfraSoft International), NSAS (Near Infrared Systems), and Perkin-Elmer for­ mats, although none of these routines were included in the evaluation copy. CAMO, like most software vendors, assumes the user can write data in ASCII or JCAMP-DX formats. We chose to write our own routine to convert our near-IR data from a CSAS (Computerized Spectral Analytical System by North Carolina State University) format to the χ matrices (spec­ tral data) and y matrices (chemical data) required by Unscrambler. Unscrambler is not recommended for 286-based machines and it runs slowly on a 386SX/20 machine; therefore, we de­ cided to switch to a 486-based machine. Performance tests for this review were conducted on an 80486 AT clone with 8 MB of extended memory, a VGA adapter, and a 325-MB SCSI hard drive.

Residual variance shows how individual objects fit in the model, how well the model predicts new values, or how much of the model error is explained after each factor.

Analytical Chemistry, Vol. 66, No. 10, May 15, 1994

Installing the software went without a hitch. The program takes 2.4 MB of hard disk space, including the space for tutori­ als. You need a minimum of 4 MB of ex­ tended RAM, but you will be ahead of the game with 8 MB of extended memory. It is the extended memory that determines how many spectra or constituents can be included in the calibration/prediction ma­ trices. We were a little disappointed that in­ stallation was not more automatic. Our autoexec.bat and config.sys files had to be extensively modified to optimize the com­ puter for running Unscrambler. After the modifications, some of our other pro­ grams would not run. Loading the options according to the manual left only 437 Κ in which to run programs. However, this is not an insurmountable problem, because the autoexec.bat and config.sys files can easily be switched by using the DOS copy command. DOS 6.2 provides a boot-up option that allows you to select an auto­ exec.bat/config.sys pair. MLR was not included as an option in Version 5.03. This was a disappointment for us because MLR has been the back­ bone of near-IR spectroscopy for more than 30 years. It still is a very workable alternative and provides calibrations that perform equally as well as or better than PCA or PLS for many applications. MLR needs to be included, if for no other rea­ son than to provide the user with a bench­ mark for evaluating the performance of PLS and PCA. Our first application of the software was to import data formatted in JCAMPDX and generated by a Nicolet FT near-IR spectrometer. Version 5.03 had a problem reading the header of this data. As it turned out, the import routine, being somewhat restrictive, was looking for a space on either side of the equal sign (=) throughout the JCAMP header. This bug was unraveled by CAMO's software ex­ perts, and we were able to quickly modify the JCAMP-DX file and pull in our data. Their response was courteous and timely, and they indicated that a future version of the software would correct this problem.

Our second objective was to develop a calibration on a large file (A) and to test the calibration on yet another large file (B). File A was made up of 100 spectra (each spectrum contained 840 points at 2-nm intervals from 910 to 2588 nm) from pulverized tobacco samples. Six constitu­ ent analyses were performed by wet chemistry: nicotine, sugars, nitrogen, wa­ ter-soluble nitrogen, potassium, and cal­ cium. File Β consisted of 100 spectra of the same dimensions as File A, and the same analyses were used to validate the calibration. We had no problem importing the data and creating the χ and y matrices. Initially, we could not predict 100 samples. However, increasing the unscmem at­ tribute in the config.sys file to 400,000 provided enough extended memory for the predicted matrices. The graphics section of the package could use a few enhancements. Screen dumps have the same resolution as the screen (which was expected but does not have to be). However, direct drivers for high-resolution hard copies are lacking in version 5.03. To get high-resolution plots the user must make an HPGL file or Post­ Scriptfile,which then must be imported into a word processor or drawing program for delivery to a plotter, laser, or Post­ Script printer. If the plot is in a PostScript file, it can be sent directly to a PostScript printer by using the DOS command "copy filename lpt1 ," assuming the printer is attached to lpt1. These drivers need to be provided so that high-resolu­ tion plots can be sent directly to a laser printer. Labeling of axes is limited to what is on the screen plot; more flexibility in rela­ beling of axes is needed. Notations are easily added with the current package. Screen graphics need a cursor that can be used to obtain point-by-point χ and y ma­ trix information as well as sample identifi­ cation on calibration and prediction plots. Unscrambler has several nice fea­ tures. The screens are clearly labeled for operating in a multivariate analysis envi­ ronment. Some of the terminology took a little getting used to, but this is typical of

The response surface shows the yield as a function of the design variables. Optimum conditions can be located and stability and critical directions studied.

any software package. The more we used the software, the easier it was to find our way around. Actually, after we finished the tutorials, the various menus were fa­ miliar and we were able to move on to our own data. Multivariate Calibration (by Martens and Tormod Naes, John Wiley and Sons, 1991), a textbook that may be used to sup­ plement the manual, is available. The book describes in detail the algorithms used for PCA, PCR, and PLS. We found the book to be useful in understanding the philosophy of calibration and predic­ tion followed in the manual, particularly the procedures implemented in the tutori­ als. It is well known that multivariate soft­ ware packages from various vendors do not always produce the same results from an analysis of the same data. Further­ more, some software vendors do not even reveal the algorithms used in their pack­ ages. Consequently, the serious student of multivariate statistics is left in a befuddled condition. It was refreshing to find that CAMO was forthright in its presentation of the algorithms at the end of the manual and that the book by Martens and Naes elaborated further on the algorithms. CAMO provides four tutorial sessions in the manual. Each takes ~ 1 h to study and understand. The first three present increasing challenges to the learner. Tuto­ rial A is a simple analysis of a single com­ ponent from spectral data that consists of

two overlapping absorbance bands. Tuto­ rial Β goes deeper into interference prob­ lems, using data from a 19-filter spectrom­ eter to develop PLS models. Tutorial C deals with qualitative effects and teaches the user how to interpret PCA factors. Tutorial D ends with a study of experi­ mental design using a fractional factorial design procedure. By conducting the four tutorials we gained sufficient experience to undertake analyses of our own data set. In general, of all of the packages we have used, this one was the easiest to learn and operate. One of the more powerful and impres­ sive features of Unscrambler is its macro facility, which makes possible the execu­ tion of a series of commands by pressing one key. This facility has several uses, including making a series of calibrations automatically without operator interven­ tion, automating your favorite procedure, and allowing for unskilled operators to perform intricate analyses. The macro is generated by recording (storing) key­ strokes in afile,which may be edited as needed. The macro-generating procedure is clearly outlined in Chapter 17 of the manual. We set up a macro to develop 24 con­ stituent calibrations based on 5 principal components for the data set given above [i.e., 100 spectra (840 points per spec­ trum) for calibration and 100 spectra (840 points per spectrum) for prediction]. It took < 15 min to do the whole job. Since this software package was re­ viewed, CAMO has released version 5.5, which addresses some of the quirks (with file import and print-outs) found in version 5.03. In addition, several new features have been introduced, including classifica­ tion (SIMCA), central composite design (response surface method), splitting of matrices into submatrices, and improved sort speed (40-200 times faster). With the addition of these new features, Unscram­ bler will be a valuable tool for extracting information from spectral data. Reviewed by W. F. McClure, W. R. Cox, andJingLu, North Carolina State Univer­ sity, Raleigh, NC

Analytical Chemistry, Vol. 66, No. 10, May 15, 1994 585 A