2 Identification of the Components of Complex Mixtures by
GC-MS
Downloaded by UNIV OF MASSACHUSETTS AMHERST on March 25, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch002
J. E. BILLER, W. C. HERLIHY, and K. BIEMANN Department of Chemistry, Massachusetts Institute of Technology, Cambridge,MA02139
It has been well-known for many years that Gas Chromatography-Mass Spectrometry is a powerful tool for the qualitative identification of the components of complex mixtures. It has also been well-established that the computer is neededinall aspects of this process beginning with the initial acquisition of the data from the laboratory instrument, the processing and "crunching" of the data, and presentation of the data to the human interpreter. In recent years, techniques to improve and enhance the data, as well as the many methods of interpretation and identification have also fallen almost totally into the province of the computer. Our laboratory has been intimately involved in the development and use of many of these techniques. Our needs are somewhat unique in the very large number of chemical problems from very diverse sources which require identifications or structure determinations. We currently acquire and analyze approximately one-half million spectra a year from such varied sources as drugs in body fluids, geochemical extracts, metabolism studies, organic synthetic studies, and many others. We have even had to extend the capability of our data enhancement and interpretive algorithms and routines to the analysis of spectra returned from the GC-MS instruments in the two Viking landers on the surface of Mars (1). The need to analyze and interpret such a large volume of data has many implications. The most important requirement is a fully automated system capable of maximum performance and efficiency from end-to-end. This means that the GC-MS instrumentation must produce the best quality data possible under the fast scan conditions necessary for reasonable resolution of complex mixtures. The man-machine interface must be simple and human-engineered to limit the probability of operator error; the computer-instrument interfaces (both data and control) have to be accurate and highly reliable. The basic processing of the data must be efficient, fast, economical and complete, and the presentation of such vast amounts of information must be accomplished 18
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Downloaded by UNIV OF MASSACHUSETTS AMHERST on March 25, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch002
2.
BILLER ET AL.
Components
of Complex
Mixtures
by GC-MS
19
i n an e q u a l l y e f f i c i e n t , yet convenient form. Perhaps most imp o r t a n t l y (and c e r t a i n l y of the g r e a t e s t c o m p l e x i t y ) , the computer must be able to purge the s p e c t r a of background and other unavoidable i n t e r f e r e n c e s so t h a t i t can r e l i a b l y i n t e r p r e t the s p e c t r a to accomplish the primary task of the i d e n t i f i c a t i o n of the components of i n t e r e s t . In a d d i t i o n , the r o u t i n e analyses of such l a r g e amounts of data r e q u i r e p r a c t i c a l approaches to the design of r o u t i n e s to both enhance and i n t e r p r e t s p e c t r a . Very elegant and timeconsuming algorithms which improve the r e s u l t s by some s m a l l f r a c t i o n a l margin towards p e r f e c t i o n at the expense of e i t h e r l a r g e computer power or time are o b v i o u s l y not u s e f u l or a p p r o p r i a t e . In a s i m i l a r way, the approach to the p r e s e n t a t i o n of the data i n such huge q u a n t i t i e s n e c e s s a r i l y precludes impressive but imp r a c t i c a l i n t e r a c t i v e graphic systems which r e q u i r e i n o r d i n a t e amounts of both human and computer time. We f e e l that we have developed a p r a c t i c a l system a b l e to meet a l l of these needs. We would l i k e to b r i e f l y d e s c r i b e how t h i s has been accomplished, and i n a d d i t i o n , d e s c r i b e a new and powerful technique f o r the automated and i n t e l l i g e n t i d e n t i f i c a t i o n of o l i g o p e p t i d e s i n complex mixtures i n support of our work on p r o t e i n sequencing. A Comprehensive System f o r the A n a l y s i s of Complex M i x t u r e s As mentioned above, the f i r s t necessary component i n a balanced system f o r the i d e n t i f i c a t i o n of the c o n s t i t u e n t s of a complex mixture i s the source of the data. The GC-MS instrument must be designed and optimized f o r f a s t , continuous a c q u i s i t i o n of data over the e n t i r e GC a n a l y s i s . The scan f u n c t i o n must be accurate and r e p r o d u c i b l e , the data must be acquired w i t h h i g h s e n s i t i v i t y , low e l e c t r o n i c n o i s e , and wide dynamic range to accomodate the l a r g e c o n c e n t r a t i o n d i f f e r e n c e s common i n complex organic mixtures. This i s accomplished i n our l a b o r a t o r y on a h i g h l y modified H i t a c h i RMU-6L Mass Spectrometer i n t e r f a c e d to a P e r k i n Elmer 990 Gas Chromatograph v i a a Watson-Biemann f r i t t e d g l a s s separator (2). Almost a l l of the RMU-6L e l e c t r o n i c s have been replaced w i t h r e l i a b l e s o l i d - s t a t e c i r c u i t r y , and the magnet i s c o n t r o l l e d d i g i t a l l y to produce a f a s t , r e p r o d u c i b l e scan. The output of a s o l i d - s t a t e electrometer i s d i g i t i z e d at three separate g a i n l e v e l s to achieve a very accurate s i g n a l w i t h l a r g e dynamic range, high s i g n a l - t o - n o i s e r a t i o , and h i g h s e n s i t i v i t y (3). The system i s diagrammed i n F i g u r e 1. The data processing i n c l u d e s the normal m a s s - i n t e n s i t y r e d u c t i o n , and i n a d d i t i o n , the r o u t i n e generation of a l l mass chromatograms (3,4). To p r i n t or p l o t t h i s amount of i n f o r m a t i o n i n the conventional ways f o r even a s i n g l e sample would be very time-consuming and d i f f i c u l t . To s o l v e the data p r e s e n t a t i o n problem, a l l s p e c t r a and mass chromatograms are d i r e c t l y micro-
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
GAS
SCANNER UNIT
OPERATOR DIGITAL SCAN CONTROL/DISPLAY CONTROLLER CONSOLE
HITACHI RMU-6L ELECTRONICS CONSOLE
• SAMPLE-
CHROMATOGRAPH
t
Figure 1.
AUXILIARY PROCESSING UNIT
DIGITAL INPUT / OUTPUT
system
1800 DACS Basic GC-MS
IBM
1802 CPU
1851 MULTIPLEXER
configuration
ANALOG TO DIGITAL CONVERTER CORE MEMORY (32 K)
DATA CHANNEL CONTROLS (DMA)
DIGITAL SIGNAL CONDITIONING CONTROL AUXILIARY DIFFERENTIAL AMR, LOGIC AND FILTER, S/H AMR, PROG. PULSE MULTIPLEXER -GAIN ADJUSTMENT GENERATOR i J
ANALOG FM MAGNETIC TAPE
ELECR0N MULTIPLIER ELECTROMETER AMPLIFIER
MASS SPECTROMETER
Downloaded by UNIV OF MASSACHUSETTS AMHERST on March 25, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch002
TYPEWRITER
CARD READER AND PUNCH
LINE PRINTER
PLOTTER
DISKS (3)
DIGITAL TAPES (2)
CRT DISPLAY
s
bo o
Downloaded by UNIV OF MASSACHUSETTS AMHERST on March 25, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch002
2.
BILLER E T A L .
Components
of Complex
Mixtures
by
GC-MS
21
f i l m e d (5.,6). Thus, a l l of the i n f o r m a t i o n produced i n the course of the experiment i s permanently a v a i l a b l e t o the user independently of the computer. This approach i s very economical and more i m p o r t a n t l y , very e f f i c i e n t w i t h respect t o human and computer time and resources. To prepare the data f o r the s e v e r a l automated i n t e r p r e t i v e techniques a v a i l a b l e ( i n c l u d i n g both l i b r a r y searching and s e v e r a l more i n t e l l i g e n t and s p e c i a l i z e d i n t e r p r e t i v e a l g o r i t h m s f o r s p e c i f i c c l a s s e s of compounds), the technique of r e c o n s t r u c t i o n of the s p e c t r a was developed and i s being c o n s t a n t l y improved i n t h i s l a b o r a t o r y . Since t h i s technique has been described p r e v i o u s l y (_7), a b r i e f review of the concept w i l l be s u f f i c i e n t . S p e c t r a l data from a GC-MS a n a l y s i s i s acquired c o n t i n u a l l y , and a t a constant r a t e . Thus, the normal two dimensional view of the data and i t s i n f o r m a t i o n content (mass and i n t e n s i t y ) can be expanded t o three dimensions to i n c l u d e the time element. Components only p a r t i a l l y r e s o l v e d by the gas chromatograph w i l l normally show separate maxima i n the mass chromatograms characteri s t i c of those components a t the times t h e i r i n d i v i d u a l conc e n t r a t i o n s a r e g r e a t e s t . An enhanced s e t of s p e c t r a i s generated by performing a f u l l peak p r o f i l e a n a l y s i s on the e n t i r e mass chromatogram f o r every m/e value and then r e c o n s t r u c t i n g s p e c t r a at each scan by assembling only those m/e values which maximize at that scan. These r e c o n s t r u c t e d s p e c t r a a r e p r a c t i c a l l y f r e e of the c o n t r i b u t i o n s of unresolved companion substances, t a i l i n g f r a c t i o n s , column b l e e d , and other sources of i n t e r f e r e n c e . The Mass-Resolved Gas Chromatogram (the e q u i v a l e n t of a T o t a l I o n i z a t i o n P l o t generated from t h i s new s e t of r e c o n s t r u c t e d data) i s generated and the e n t i r e s e t of data i s m i c r o f i l m e d . The Mass-Resolved Gas Chromatogram i s now a w e l l - r e s o l v e d i n d i c a t i o n of the number, l o c a t i o n , and r e l a t i v e i n t e n s i t y of each of the components i n the mixture. The r e c o n s t r u c t e d s p e c t r a a r e r e l a t i v e l y f r e e of i n t e r f e r ences which o f t e n make the c o r r e c t i d e n t i f i c a t i o n of the v a r i o u s components more d i f f i c u l t i f not i m p o s s i b l e . Automated i d e n t i f i c a t i o n techniques such as l i b r a r y searching and others such as the a l g o r i t h m f o r the i n t e r p r e t a t i o n of peptide mixtures d e s c r i b e d below a r e g r e a t l y f a c i l i t a t e d , and the r e s u l t a n t r e l i a b i l i t y g r e a t l y improved. An I n t e r p r e t i v e A l g o r i t h m f o r Complex Peptide M i x t u r e s The determination of the amino a c i d sequence of p o l y peptides i s of great i n t e r e s t and has been shown to be amenable to a n a l y s i s by gas chromatography-mass spectrometry ( 8 ) . I n order to c l a r i f y the ensuing d i s c u s s i o n of an a l g o r i t h m to automate the i n t e r p r e t a t i o n of o l i g o p e p t i d e mass s p e c t r a , the peptide sequencing s t r a t e g y used i n t h i s l a b o r a t o r y w i l l be b r i e f l y d e s c r i b e d . A p o l y p e p t i d e , such as Subunit I of the sweet p r o t e i n M o n e l l i n ( 9 ) , i s p a r t i a l l y hydrolyzed w i t h weak
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
Downloaded by UNIV OF MASSACHUSETTS AMHERST on March 25, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch002
22
COMPUTER-ASSISTED S T R U C T U R E E L U C I D A T I O N
a c i d or enzymes to produce a complex mixture of d i - to pentapeptides. These peptides are d e r i v a t i z e d to enhance t h e i r v o l a t i l i t y (10) (Figure 2 ) , and i n j e c t e d i n t o a gas chromatograph-mass spectrometer system which r e s u l t s i n 200-300 mass s p e c t r a of the 15-50 o l i g o p e p t i d e s i n the mixture. One must now l o c a t e and i d e n t i f y a l l of these o l i g o p e p t i d e s and f i n a l l y reassemble them to determine the sequence of the o r i g i n a l p o l y peptide. Since the manual i d e n t i f i c a t i o n of a l l the components of these complex mixtures i s a time-consuming and d i f f i c u l t task, an automated procedure was needed. Searching the unknown s p e c t r a against a l i b r a r y of standard s p e c t r a i s not f e a s i b l e s i n c e a l i b r a r y of a l l p o s s i b l e d i - to pentapeptides would c o n t a i n approximately 3.4 m i l l i o n s p e c t r a . An i n t e r p r e t i v e a l g o r i t h m i s f e a s i b l e , however, s i n c e the p o l y aminoalcohol d e r i v a t i v e s (Figure 2) used i n t h i s l a b o r a t o r y d i s p l a y p r e d i c t a b l e fragmentation p a t t e r n s . By examining a l a r g e number of standard peptide s p e c t r a we have found that d i peptides always e x h i b i t an i n t e n s e Z l i o n ; t r i - to pentapeptides always show a prominent A2 i o n , and t e t r a - and pentapeptides always e x h i b i t an i n t e n s e A3 i o n . In a d d i t i o n to these fragment a t i o n r u l e s , the a l g o r i t h m makes use of the amino a c i d composition of the o r i g i n a l p o l y p e p t i d e , and the r e t e n t i o n index f o r each scan which i s c a l c u l a t e d by a p r e v i o u s l y described method (11). A l s o , as has been p r e v i o u s l y shown, (12) we can c a l c u l a t e the expected r e t e n t i o n index f o r any o l i g o p e p t i d e which i s a f u n c t i o n of i t s composition, but r e l a t i v e l y independent of the amino a c i d sequence. Based on t h i s i n f o r m a t i o n the a l g o r i t h m shown i n Table I was developed. I t should be noted that f o r the e n t r i e s i n the peptide l i s t i n steps 1 and 3, the order of the amino a c i d s i s not s i g n i f i c a n t . Thus, a l l t e t r a p e p t i d e s which have the composition (A,B2,C) w i l l be represented by a s i n g l e entry. In steps 4-7, however, the order of the amino a c i d s i n s p e c i f i c , so that these steps r e f e r to sequences and not j u s t combinations of amino a c i d s . Steps 1 and 3 are f i l t e r s based on the amino a c i d composition of the o r i g i n a l polypeptide and the r e t e n t i o n index of the unknown s p e c t r a . Step 4 i s a f i l t e r based on the r e l i a b l e A2 i o n ( Z l f o r d i p e p t i d e s ) . S i m i l a r l y , step 6 i s a f i l t e r based on the known presence of an A3 i o n i n the s p e c t r a of t e t r a - and pentapeptides. For the h y p o t h e t i c a l example shown i n Table I an A2 i o n i s assumed to have been found f o r the p a r t i a l sequences AB.. and BA.., and an A3 i o n i s assumed to have been found only f o r the sequences ABBC and BABC. The c a l c u l a t i o n i n step 7 i s complex and w i l l be presented i n d e t a i l elsewhere (13). I t should be noted that c o r r e c t i d e n t i f i c a t i o n of an unknown spectrum i s not dependent on the presence of a molecular i o n , nor any s p e c i f i c sequence i o n (except the r e l i a b l e A2, A3, and Z l as described above). This a l g o r i t h m was t e s t e d on the data from the a n a l y s i s of a mixture of o l i g o p e p t i d e s generated i n an a c i d h y d r o l y s i s e x p e r i -
Smith; Computer-Assisted Structure Elucidation ACS Symposium Series; American Chemical Society: Washington, DC, 1977.
BILLER ET A L .
Components
of Complex
Mixtures
by
GC-MS
Downloaded by UNIV OF MASSACHUSETTS AMHERST on March 25, 2016 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0054.ch002
o
Q
i 0
1