Toward automated assignment of nuclear magnetic resonance spectra

Bernhard Kräutler , Christian Caderas , Robert Konrat , Michael Puchberger , Christoph Kratky. Helvetica .... David J. Craik , Robert T.C. Brownlee. ...
0 downloads 0 Views 821KB Size
2510

Anal. Chem. 1985, 57, 2510-2516

X-ray system should enable us to determine the particle size distribution for each phase. The results obtained will certainly improve the comprehension of the peak shape in infrared analysis for the different minerals. A relation has already been found between the complete particle size distribution and the corresponding infrared spectrum of a-quartz (12). In the future, we hope to extend the relationship to a mixture of different mineral phases. In practice as a result of these analyses, the power plants are better informed on the coals burnt than would be possible with only the elemental analysis. Registry No. Kaolinite, 1318-74-7;gypsum, 13397-24-5;aquartz, 14808-60-7;dolomite, 16389-88-1;illite, 12173-60-3;calcite, 13397-26-7.

LITERATURE CITED (1) Estep, P. A.; Kovach, J. J.; Karr, C., Jr. Anal. Cbem. 1988, 4 0 , 358-363. (2) Randoux, M.; Platbrood, G. Paper presented at the ninth International Conference on Modern Power Stations, Conference organized by AIM

(Associationdes Ingenleurs electrictiens sortis de I’institut Montefiore), Liege, October 7-11, 1985. (3) Detaevernier, M. R.; Platbrood, G.; Derde, M. P.; Massart, D. L. J .

Insf. Energy 1985, 5 8 , 24-30. (4) Platbrood, G.; Quitln, J. M.; Barten, H. Adv. X-Ray Anal. 1982, 2 5 , 261-265.

( 5 ) Malinowski, E. R.; Howery, D. G. “Factor Analysis in Chemistry”; Wiley, New York, 1980; Chapter 3, p 23. (6) Massart, L. D.; Kaufrnan, L. “The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis”; Wley: New York, 1980. (7) Massart, D. L.; Kaufman, L.; Esbensen, K. H. Anal. Cbem. 1982, 5 4 ,

911-9 17.

(8) Anderberg, M. R. “Cluster Analysis for Appllcations“; Academic Press: 1973; Chapter 6, p 131. (9) Hlavay, J.; Jonas, K.; Elek, S . ; Inczedy, J. Clays Clay Miner. 1977, 25, 451-456. (10) Otvos, J. W.; Stone, H.; Haro, W. R . Specfrochlm. Acta 1957, 9 ,

148-145. (11) Painter, P. C.; Snyder, R. W.; Youtcheff, J.; Given, P. H.; Gong, H.; Suhr, N. Fuel 1980, 59, 364-365. (12) Platbrood, G.; Laire, C.; Barten, H. Paper presented at the Colloquium Spectroscopicum Internationale XXIV, Garrnisch-Partenkirschen, 15-21 September 1985.

RECENED for review February 4,1985. Accepted June 3,1985.

Toward Automated Assignment of Nuclear Magnetic Resonance Spectra: Pattern Recognition in Two-Dimensional Correlation Spectra Peter Pfandler, Geoffrey Bodenhausen,’ Beat U. Meier, and R. R. Ernst*

Laboratorium fur Physikalische Chemie, Eidgenossische Technische Hochschule, 8092 Zurich, Switzerland

Computer analysis of the multlplet structure of cross peaks In phase-sensitlve two-dimensional (2D) NMR correlation spectra allows one to trace out networks of coupled splns, to measure the magnitudes and signs of the scalar coupllng constants, and to determlne the number of rnagnetlcally equivalent spins at each site. Appllcations to mlxtures of small molecules show that pattern recognltlon Is feaslble even If the slgnal-to-nolse ratlo Is low, If the multlplets are barely resolved, or If the patterns are partly dlsgulsed because of accidentally overlapping cross peaks.

High-resolution nuclear magnetic resonance (NMR) spectra of coupled spins such as protons in isotropic solution lend themselves to a rigorous theoretical analysis. Although onedimensional (1D) spectra can be analyzed with programs such as LAOCOON (1,2) if the number of spins in the system is known, it is far from trivial to interpret spectra of mixtures, particularly if one has no prior knowledge of the type of spin systems contained in the mixture (number of spins and magnetically equivalent groups). It is necessary to identify pairs of coupled nuclei with the aid of double resonance or two-dimensional (2D) spectroscopy and it is essential to distinguish multiplets from accidental juxtapositions of chemically shifted singlets. The analysis is facilitated if the system under investigation is composed of known “building blocks”, such as amino acids in proteins or nucleotides in nucleic acids. In such systems, it is possible to search for characteristic patterns in 2D spectra by comparison with a library (3). However, automated analysis should not be limited *Current address: I n s t i t u t de Chimie Organique, Universit6 de Lausanne, rue de l a Barre 2, 1005 Lausanne, Switzerland.

0003-2700/85/0357-25 10$01.50/0

to such systems, and mixtures of compounds with extended networks of coupled protons (steroids, porphyrines, etc.) should also be amenable to interpretation. Two-dimensional correlation spectroscopy (COSY) and related methods (4-9)have become established techniques for investigating complex systems. Much of the current research on proteins and nucleic acids relies on 2D NMR spectroscopy(10-16). The information content of 2D spectra is sufficiently high that most of the ambiguities inherent in one-dimensional spectra can be avoided. The distinction of multiplets and chemically shifted lines, for example, is rather straightforward in 2D spectra. Such spectra are amenable to a logical step-by-step analysis and are therefore suitable for the application of computer procedures. The processing of 2D spectra has much in common with standard image processing applied in the image sciences (17-22) to enhance sensitivity or contrast and to bring out characteristic features. As long as linear processes are applied, the specific origin of the 2D data is not relevant. However, as soon as the logical structure of an image is to be analyzed by nonlinear techniques,the distinctive properties of the image become important. In fact, we have found few similarities between 2D NMR spectra and the type of images that are normally considered in other applications of pattern recognition. Two-dimensional spectra fulfill some rather unusual symmetry properties, as discussed below. Furthermore, it is possible to “tailor” 2D spectra by defining suitable pulse sequences designed to yield characteristic patterns. Thus the experimentalist has more freedom to generate “images” of the spin systems than in most other situations. The reduction from patterns to spin systems involves an understanding of coherence transfer (9) and of the connectivity of spectral transitions. Our approach, which has evolved from a procedure described in a preliminary paper (23),combines features 0 1985 Amerlcan Chemlcal Society

K EXPERIMENT I

A

Record two complementary

/

\ M

7

2D spectra I and I1

'x'

(e.g. two 2D correlation spectra QA

with different mixing pulses.)

Search for basic patterns in spectra I and I1

I

7 - - 7

:_c.Jt

tI

I

I

r - - 7

tt

t I 1'J

f Determine coordinates of centers of gravity of centrings (see Fig.6)

JAX

.1

J*x

Flgure 1. Schematic patterns of the cross peaks between two spins A and M which have two common coupling partners K and X. For Illustration, the same signs have been given to JM, Jm, and JMrwhile JAK has an opposite sign (sense of arrows). The active coupling J A M Is indicated by a double-headed arrow, since its sign cannot be determined from these cross peaks. Posithre and negathre 2D absorption peaks are represented by filled and open symbols, respectively. Only dominant peaks are shown. The two complementary 2D correlation spectra I and I1 can be obthlned wlth the sequence (90')-t 1-(/3)-f2 with /3 = 45' and /3 = 135'. The same patterns are obtained with double-quantum filtered correlation spectroscopy with the sequence (90°)-t 1-(45°)(/3)-t2, or with linear combinations of double- and triple-quantum filtered correlation spectra (E.COSY) (26, 27). (c) and (d): Reduced peaks derived from (a) and (b).

of pattern recognition and network analysis. We wish to stress the preliminary nature of the work described in this paper. It is concerned with the analysis of mixtures of small molecules with weakly coupled spins. Extensions to more complex systems such as proteins and nucleic acids are in progress.

BASIC PATTERNS In the basic form of 2D correlation spectroscopy (COSY) (4,9),a cross-peak multiplet centered at frequency coordinates (wl,wz) = (&,aI)appears whenever a resolved coupling Jklexists between two spins Ik and Il that have chemical shifts n k and Q. If the digital resolution is sufficient, and if the spectra are obtained such as to feature phase-sensitive pure absorption peak shapes (8, 24, 25), it is possible to identify individual multiplet components within cross-peak patterns in 2D spectra. Figure 1 shows typical cross-peak multiplets that appear in a system with at least four coupled spins. The signals in these multiplets have peak shapes described by the product of two Lorentzian lines (2D absorptive peaks) with amplitudes that are positive or negative (filled and open symbols in Figure 1). The patterns can be decomposed into squares and rectangles, which appear repeatedly with relative displacements that depend on the magnitudes and signs of the coupling constants. As may be appreciated in Figure 1, the distances between the peaks within the pattern correspond to sums and differences of J couplings. Such multiplets can be resolved in spectra of molecules with a molecular mass of at least 6000 (24). In larger molecules, or in systems with specially effective relaxation mechanisms (viscous solutions, paramagnetic species, couplings to quadrupolar nuclei, ex-

Search for spin systems, combining information from spectra I and I1 (see Fig. 7 )

I

Plot results in 2D format

I

Figure 2. Flow chart of procedure for automated analysis of 20 NMR spectra.

change broadening, etc.), the line widths may be so broad that the antiphase multiplet components partly cancel, thus limiting the use of 2D correlation spectroscopy. In conventional COSY spectra obtained with a single mixing pulse with a flip angle /3 # go', all cross-peak multiplets can be decomposed into superpositions of only three basic patterns: squares, horizontal rectangles, and vertical rectangles, with signals bearing alternating signs at their corners. These patterns must occur symmetrically on both sides of the diagonal, but, with the exception of two-spin systems, the patterns appear at different frequency coordinates in the spectra obtained with /3 = 45' and 135'. The size of the patterns is given by the "active coupling constant" Gtwhich is responsible for the appearance of a cross peak at frequency coordinates (wl,oz) = (i&,nl) and must be distinguished from "passive coupling constants" to nonparticipating spins, which merely lead to a duplication of the basic patterns shifted in frequency (see Figure 1). In spin (sub)systems of the type AX,, the squares and rectangles necessarily have length-towidth ratios, expressed in units of Jact, of 1:1, 2:1, 3:1, ..., n:l. The same patterns are found in double-quantum filtered correlation spectra obtained with the sequence (90')-t1-(45') (P)-tz with /3 = 45' or 135', although the signals do no longer appear in pure 2D absorption in this case. Both techniques yield multiplets with weak additional peaks (not shown in Figure 1). The E.COSY technique, which combines doubleand triple-quantum filtered correlation spectra, yields cleaner patterns with pure phase peak shapes (26,27)and may be the preferred method for future applications. In this paper, we focus attention on the analysis of experimental 2D spectra of molecules which have been selected to present suitable challenges, including small long-range

2512

x

A

ANALYTICAL CHEMISTRY, VOL. 57, NO. 13, NOVEMBER 1985

'

PAMKX 100 C

HZ

160 0

I20 0

w,

80 0

40 0 0 0

40 0

80 0

a, I2C 0

180 C

200 C

HI

Flgure 3. 2D correlation spectrum which was used to test pattern recognition procedures, consisting of the sum of two experimental double-quantum filtered COSY spectra obtained with the sequence (90°)-f ,-(45')(45')-t2 with a home-built spectrometer at 300 MHz from samples (a) and (b) described in the text. A complementary spectrum, where the last pulse has a flip angle p = 135' (not shown) was also used in the pattern recognition procedure. The chemical shifts (some of which are folded) and the multiplet structures of the fw spin systems with N = 6, 4, 3, and 2 spins are shown on top. Two cross-peak multiplets that overlap are emphasized by frames. Gaussian random noise was added to the experimental spectra. In the cross section shown below, a few Small cross peaks are indicated by arrows.

couplings, accidental degeneracies between two couplings, and magnetically equivalent spins. The problems that occur in larger molecules have been explored by introducing artificial line broadening (to simulate short Tis), by coadding different experimental 2D spectra (to obtain partly overlapping cross peaks), and by adding computer-generated Gaussian noise (to simulate the effect of low concentrations). In the preliminary studies reported here, strongly coupled systems have not been investigated, and cross peaks lying very close to the diagonal have been ignored. Systems with up to four distinct chemical shifts (including groups of magnetically equivalent spins) can be identified with the current version of the computer program.

OUTLINE OF PROCEDURE The automated assignment of NMR spectra can be broken down into five consecutive steps, which are summed up in Figure 2: 1. Spectra. A judiciously chosen set of at least two complementary 2D spectra must be obtained from the solution

under investigation. At present, only double-quantum filtered 2D COSY spectra with different flip angles are used. A representative example is shown in Figure 3. One could also envisage modified 2D techniques that yield unusual patterns, such as bilinear COSY (28) or E.COSY (26), and complementary experiments such as 2D J spectra (29, 30),relayed coherence transfer spectra (31),multiple quantum spectra (32) and 2D nuclear Overhauser effect spectra (NOESY) (9-12). While the human eye is quite powerful in recognizing patterns in a single 2D spectrum, a computer has a greater capacity for comparing a set of 2D spectra, a feature which should be exploited to make pattern recognition reliable. 2. Basic Patterns. Once the spectra are stored, the program must search for basic patterns. A computer routine first checks if the signs of the amplitudes S(w,,wz) in the experimental spectra agree with those of the expected patterns. For this purpose, a gridlike mask with a spacing that corresponds to the trial value for the coupling constant Ja"is moved through the 2D array, as explained below. Spectra with broad lines and overlapping peaks can be analyzed if one allows for partly distorted patterns, where for example a corner in both of the symmetrically disposed squares or rectangles is disguised. Once agreement is found, the actual amplitudes of the 2D spectrum at the eight corners of the two symmetrical patterns are combined (sum of four positive peaks minus sum of four negative peaks), and this information is deposited in a "record" (a format of the Pascal programming language) which contains the amplitude A, the frequencies wl,w2, and wJ, a code for the type of pattern (square, horizontal, or vertical rectangle), plus a pointer which allows one to order the record entries according to decreasing frequencies or amplitudes. A cluster of record entries which lie closely together in frequency space is referred to as a "centering". Such a cluster is centered in the middle of the patterns spanned by the active couplings. (A centering is a "temporary framing used to support an arch, dome, etc. while under construction" (33). As we shall see below, our centerings are indeed used only temporarily as an aid in pattern recognition.) Note that the centerings have an amplitude as a function of three frequency variables, and may thus be considered to be three-dimensional peak shapes in the parlance of NMR. The peaks in this 3D domain have a shape which can be approximately described by the product of three Lorentzian absorption lines s(W19W2,WJ)

=

al(Ul)a2(wZ)aJ(wJ)

(1)

where

i = 1, 2, J and Awl =

01

- W:

Aw= ~

~2

-

A w= ~

WJ

- 2rJBCt

In actual fact, the computer-simulated centering of Figure 4 shows that the dependence of the amplitude on W J is not accurately described by eq 1, although the dependence on w 1 and w2 is typically Lorentzian. Since the peaks are often poorly digitized, the assumed line shape is not critical in practice, and the form of eq 1 is used for simplicity. The line widths in the w1and w2 domains are equal to those in the experimental 2D spectrum, while the width at half height in the wJ frequency domain is approximately 1.4 times as large. In the presence of inhomogeneous broadening, the peaks in the experimental spectra, and hence the centerings, appear elongated along a plane which bisects the w1 and w2 axes (9),as a result

ANALYTICAL CHEMISTRY, VOL. 57, NO. 13,NOVEMBER 1985

a 60 0

I '

'

2513

20.0 54.0 I

40.0

'

I

'

w2 I

20.0

'

10.0

'

I

0.0

I

A

'

4

I

0.0

I\

b

*I

1

40.0

-

20.0

-

w,

A

v

-

40.0

-

50.0

-

60.0

WJ

40 0 I

'

20.0

I

I

I

-

-

-

-

-

-

-

20 0

C

*I

40 0

Figure 4. (a) Computer-generated square pattern with positive and negative 2D Lorentzian absorption peaks at the four corners (solid and dashed lines, respectively). The digital resolution is 1 point/Hz. (b) Contour plot of the amplitude of an "equatorial" 2D cross section parallel to the (w1,w2) plane at wJ = 27rJaCtthrough the 3D "centering" derived from (a). (c) Contour plot of the amplitude of a "meridional" 2D cross section of the same centering parallel to the (w,,wJ) plane at the w2 frequency of the maximum.

b 60 0

50 0

40.0

wz

20.0

10.0

60.0

0.0

0.0

-

10.0

-

20.0

-

-

40.0

50.0

] j

14 t

I

,

20,O

,

l0;O

,

0 o,o

IO 0

200

*I

-

40.0

-

500

Flgure 5. (a) Computer-generated spectrum consisting of three partially overlapping squares with positive and negative pure 2D absorption peaks at their corners (solid and dashed lines, respectively). The digital resolution was 1 pointlHz. Contours are drawn at 1, 10, 35,60,and 85% of the maximum peak height. (b) "Centerings" derived from (a), which appear in the centers of the three squares. The contours represent the amplitude of the 3D centerings projected onto the (w1,02) piane (integral over uJ).The lines indicate the positions of the reduced peaks (centers of gravity of the centerings), found by recursive subtraction of a standard 3D Lorentzian peak shape. The frequencies of the reduced peaks agree with those used to generate the patterns in (a).

of the formation of coherence transfer echoes (34). Such effects can be taken into account by a suitable modification ofeq 1. Figure 5 shows how a computer-generatedpattern consisting of three overlapping squares leads to three partially overlapping centering peaks (in actual fact, these are partly separated in the wJ domain, although this is not apparent in the projection of Figure 5). In the case shown here, the overlaps

in the spectrum lead to distorted centerings, but the centers of gravity of these centerings nevertheless correspond, within numerical accuracy, to the actual chemical shifts. 3. Reduced Patterns. Once the centering peaks are identified, it is necessary to determine the frequency coordinates of their centers of gravity. This was achieved with a method described for 2D NMR by Shaka, Keeler, and Freeman (35),which has been adapted from a technique used

2514

ANALYTICAL CHEMISTRY, VOL. 57, NO. 13,NOVEMBER 1985

overlapping “centrings”

I

t

Set number o f actlve J-couplings m

Find maximum intensity Amax

:=

n(n-lli2

I,

t

Define hypothetical

D e f i n e local threshold : - l..Ama,

spin system with m couplings

r

and n chemical shifts Subtract 3D Lorentzian w i t h amplitude r.Ama,.at

t h e position of Amax

Store coordinates o f

t

subtracted centring

t

Verify all properties that t h e spin system must fulfill

Find maximum intensity A m a x in spectra I and 11 o f remaining centring G

n

e

t

+

i

c equivalence

Ino

1 Delete clusters of located s p i n system from files

Sum a l l Lorentzians with “same”

wl,

w2

and uJ values

Flgure 6. Flow chart of program used to derive reduced patterns from centerings;adapted from the procedure of Shaka et al. (35).Typical parameters are X = 0.3 and y = 0.5 (for strongly overlapping centerings, a smaller y value is advisable).

ii n : = n - l

in radio interferometry (36). In the present application (see flow diagram in Figure 6), a standard centering peak shape with line widths in the three dimensions that are estimated from the experimental spectrum is iteratively subtracted from the centering signal derived from the experimental data. The set of centers of gravity are consolidated if they coincide within given tolerance limits. The resulting “reduced patterns” are stored in another record. The entries in this record include the signal amplitude (integral of the intensity in the 3D peak), the frequencieswl, w2, and WJ. The record entries also contain a “connectivity string” which encodes the relative positions of neighboring reduced peaks that are likely to belong to the same cross-peak multiplet, which allows one in the next step to search for connectivities with increased speed. 4. Search for Spin Systems. Once the reduced peaks have been obtained, one must identify cross peaks belonging to one and the same spin Bystem. The procedure is outlined in the flow chart of Figure 7 . If none of the cross-peak multiplets in the original spectrum is closer than the sum of all J couplings, one can easily find clusters of neighboring reduced patterns that belong to the same multiplet. In this case, the frequency coordinates (w1,u2) of the center of the cluster simply correspond to the chemical shifts of the two coupled nuclei. The distances between the reduced patterns within a cluster correspond to sums and differences of passive coupling constants, which must appear as active coupling constants in other cross peaks belonging to the same spin system. However, if the cross peaks partly overlap, some clusters may be found that contain reduced peaks belonging to different subsystems. The test spectrum shown in Figure 3 indeed contains two entangled patterns (framed for clarity), which must be unraveled. In this case, an erroneously identified cluster can be recognized by comparing the patterns in the two complementary experiments and by noting that the frequency coordinates of its center do not agree with any of the chemical shifts that are found in other cross peaks. 5. Output. The program searches for spin systems in a recursive manner, starting with systems with nmaxchemical shifts, allowing for magnetically equivalent nuclei. When a complete spin system is found, the parameters (shifts and

Figure 7. Flow chart for the search of spin systems with a decreasing number of chemical shifts. The maximum and minimum numbers of chemical shifts n,,, and nmlnexpected to be present in the mixture can be chosen by the operator. For each value of n , the program searches for spin systems with a decreasing number m of nonvanishing couplings. couplings) are plotted in a format that matches the experimental correlation spectrum, as shown in Figure 8. The corresponding reduced peaks are then deleted from the record. If no further systems with nmaxshifts can be found, the search is resumed with n = nmax- 1, etc., as shown in Figure 7 . The results are presented together with the experimental 2D correlation spectrum (lower triangles in Figure 8) to allow the user to judge whether the parameters found by the program are likely to be significant. PRACTICAL IMPLEMENTATION 1. Experimental Spectra. The analysis focused on two samples: Sample ( a ) consisted of 1,2-dibromopropane (AMK,X system) in benzene-& The three magnetically equivalent protons lead to characteristic patterns. Two of the couplings are nearly equal in magnitude but opposite in sign (JaxN 10.1 Hz and JMxE -10.2 Hz, the latter value being typical for gemihal protons), and two couplings are vanishingly small, leading to degenerate transitions and vanishing cross peaks. Sample ( b ) consisted of an approximately equimolar mixture of 2,3-dibromothiophene (AX system), 1,1,2-trichloroethane (A2X system), and 1-bromo-3-nitrobenzene (AMKX system), dissolved in benzene-& The analysis of sample (b) focused on the aromatic protons in the substituted benzene, which feature nearly degenerate Couplings (JH2H4 = J H ~ fz H 2.3 ~ . HZ and J H ~NHJ~H ~ H8.5 ~ HZ), One Small coupling (JH4H6 N 1.2 Hz), and one vanishing coupling (JH2H5 = 0). These properties lead to a variety of cross peak patterns

ANALYTICAL CHEMISTRY, VOL. 57, NO. 13, NOVEMBER 1985

2515

800

120.0

160.0

m.0

HZ

Figure 8. The result of the pattern recognition procedure applied to the spectrum in Figure 3. Two systems with four distinct chemical shifts have been found, the AMKX system of 1-bromo-3-nitrobenzene (left) and the AMK3X system of 1,P-dibromopropane (right). The chemical shifts that have been determined are shown on the left of each spectrum, the values of the active couplings with their signs are printed at the locations of the corresponding cross peaks. The number of magnetically equivalent nuclei is given on the left (e.g., K 3 in (b)). I n the lower triangles, the experimental spectra are shown to allow visual verification. The lower triangles in (a) and (b)are taken from spectra obtained with p = 45’ and 135’, respectively.

that present a challenge to automated analysis. In addition, the program must discriminate against the signals that stem from the two- and three-spin systems in the mixture. The 2D spectra were obtained at 300 MHz with a spectral width of 200 Hz in both domains, with 512 X 512 data points in frequency domain (2.5 points per hertz). In practical applications, one normally uses less points per hertz, and the search should therefore be faster (less trial values for J”?. Some signals at large offsets from the carrier were folded; these have to be identified as such by the user (marked with an asterisk in the outputs), because the signs of the signals are reversed if the cross peak is folded an odd number of times. Folding was permitted in order to be able to reduce the size of the data matrix. Future applications on more powerful computer systems should preferably use larger spectral widths to avoid the ambiguities associated with folding. From each sample, two double-quantum filtered COSY spectra were obtained with the sequence (90°)-tl-(45’)(P)-t2, one spectrum with = 45O, and the other with 6 = 135’. The analysis was based on the patterns in these two complementary spectra. To check the discriminating power of pattern recognition procedures, Gaussian random noise, generated as described in ref 37, was added to both spectra, in such a way that the peak-to-peak noise amplitude (Nptp 5 4 was about 80% of the peak-to-peak difference of the smallest significant antiphase multiplet components. The purpose of the procedure was not to identify peaks that are “buried” in the noise but rather to check that no patterns would be found accidentally in random noise. In order to explore the effects of overlapping cross peak multiplets, the experimental spectra of samples (a) and (b) were added together, both for p = 45’ and p = 135’. The sum of the two spectra obtained with p = 45’ is shown in Figure 3. By a relative shift of the two spectra, obtained by using different carrier frequencies, it is possible to bring about various “accidental”coincidences of cross peaks at will, thus providing a stringent test for pattern recognition procedures. 2. Search for Basic Patterns. Basic patterns were searched by shifting a gridlike mask of 4 x 4 points with a spacing corresponding to the trial value for Jact through the experimental spectrum. The sign bit pattern of the 16 points in the spectrum is packed in two 16-bit words, one for positive

amplitudes, the other for negative amplitudes. In the first word, a bit is set to 1if the corresponding signal amplitude is positive at symmetrical positions on both sides of the diagonal; in the second word, the bits are set if the two signals are negative. The occurrence of significant patterns (squares and horizontal and vertical rectangles up to 1:4 J)can be tested with a bitwise AND operation on both words. Typically, if the program searches for 29 trial values for Jactbetween 1 Hz and 12 Hz (equivalentto 3 to 31 points in the present study with 2.5 data points per hertz), and if computer-generated Gaussian random noise is added to the spectra, the pattern search yielded 650 squares and 400 horizontal and 400 vertical rectangles in the 512 X 512 data matrix. The original matrix required 0.25 Mword = 0.5 Mbyte of storage space, while the 1450 record entries of 13 byte each required about 20 kbyte. To speed up the search, signals with an amplitude of less than 0.5% of the maximum peak were disregarded. This must be compared with the amplitude of the least significant cross peak in the test spectrum (amplitude of ca. 2% of maximum peak) and the peak noise amplitude (about 1.6%). With the current program running on an IBM Instruments System 9000 computer with 2 Mbyte memory, the whole procedure requires 8 min for each of the two complementary spectra. 3. Fitting of Centerings in 3D Frequency Space. With a digital resolution of 2.5 points/Hz and a line width on the order of a few hertz, significant centerings must have a finite line width in all three frequency dimensions. All lone-standing points (6 functions) can therefore be discarded. The centers of gravity of the remaining centerings are then found by the procedure described by Shaka et al. (35),by subtracting from the experimentally obtained centering a standard centering with a width of 0.8 Hz in w1 and w2 and a width of 1.2 Hz in W J at the coordinates of the maximum, and with an amplitude y = 0.5 of the amplitude A,,, of the centering derived from the experimental data. The maximum of the remaining peak is then searched, and the subtraction procedure is repeated recursively, until the residual amplitude is less than a threshold hi,,. The procedure required ca. 1.5 min for each of the two complementary spectra, including the definition of a record of reduced peaks which contains information on the relative positions of neighboring reduced patterns. In our example, we obtained some 150 reduced patterns (70 squares

2516

ANALYTICAL CHEMISTRY, VOL. 57, NO. 13, NOVEMBER 1985

and 40 horizontal and 40 vertical rectangles). The record of reduced peaks required a storage space of about 10 kbyte (65 byte/ entry). 4. Search for Connected Cross Peaks. The program first searches for a spin system of the maximum expected size with n = nmaxchemical shifts where all possible m = n(n - 1 ) / 2 couplings are resolved. A set of m clusters associated with n chemical shifts is selected by trial and error, and the program checks in both complementary spectra whether the active and passive J couplings in these clusters are compatible with one and the same spin system. At the same time, magnetically equivalent spins are identified by checking for rectangular patterns. If no (further) systems with m clusters can be found, m is decreased and the search is resumed. If m = n - 1,one obtains a "linear" coupling network; if m < n - 1, the network will appear to break up into two different fragments, hence the need to reduce n in the next iteration (Figure 7). If the experimental spectra do not contain any overlapping cross peaks, the identification of clusters of reduced peaks is straightforward, and the chemical shifts and coupling constants can be found in minutes. Misleading clusters due to overlaps may however lead to time-consuming trial-and-error steps.

DISCUSSION Although the results presented in this paper are of preliminary nature, it appears that pattern recognition methods hold considerable promise for automated assignment of NMR spectra. The procedure outlined here represents just one of many possible strategies. At a later stage, it may be useful to include some knowledge of the spin systems that are likely to occur (amino acids, nucleotides, etc.) Special attention has to be paid to strongly coupled systems, where ambiguous situations' may be resolved by allowing for interaction with the user.

.

ACKNOWLEDGMENT The authors are indebted to II3M Instruments for providing two CS 9000 systems. Registry No. 1,2-Dibromopropane, 78-75-1; l-bromo-3nitrobenzene, 585-79-5; 2,3-dibromothiophene,3140-93-0;1,1,2trichloroethane, 79-00-5. LITERATURE CITED Bothner-By, A. A,; Castellano, S. M. I n "Computer Programs for Chemistry"; DeTar, D. F., Ed.; W. A. Benjamin: New York, 1968; Vol. 1. Quantum Chemistry Program Exchange Catalog, Indiana Universlty Chemistry Department. Neidlg, K. P.; Bodenmuelier, H.; Kaibizer, H. R. Biochem. Biophys. Res. Commun. 1984, 725#1143-1150.

(4) Aue, W. P.; Bartholdi, E.; Ernst, R. R. J. Chem. Phys. 1978, 6 4 , 2229-2246. (5) Bax, A.; Freeman, R. J. Magn. Reson. 1981, 4 4 , 542-561. (6) Piantini, U.; S0rensen, 0. W.; Ernst, R. R. J. Am. Chem. SOC. 1982, 704, 6800-6801. (7) Shaka, A. J.; Freeman, R. J. Magn. Reson. 1983, 57, 169-173. (8) Rance, M.; S0rensen, 0. W.; Bodenhausen, G.; Wagner, G.; Ernst, R. R.; Wuthrich, K. Biochem. Biophys. Res. Commun. 1983, 177, 479-485. (9) Ernst, R. R.; Bodenhausen, G.; Wokaun, A. "Principles of Nuclear Magnetic Resonance in One and Two Dimensions"; Oxford University Pres: London, in press. (10) Kumar, Anil; Ernst, R. R.; Wuthrich, K. Biochem. Biophys. Res. Commun. 1980, 95, 1. (11) Kumar, Anil, Wagner, G.; Ernst, R. R.; Wuthrich, K. Biochem. Biophys. Res. Commun. 1980, 96, 1156. (12) Wagner, G.;Kumar, Anii; Wuthrich, K. Eur. J. Biochem. 1981, 174, 375. (13) Scheek, R. M.; Russo, N.; Boeiens, R.; Kaptein, R.; van Boom, J. H. J. Am. Chem. SOC. 1983, 705, 2914. (14) Hare, D. R.; Wemmer, D. E.; Chou, S. H.; Drobny, G.; Reid, B. R. J. Mol. BIOI. 1983, 177, 319. (15) Wuthrich, K.; Wider, G.; Wagner, G.; Braun, W. J. Mol. Biol. 1982, 755, 311. (16) Havel, T.; Wuthrich, K. Bull. Math. Biol. 1984, 4 6 , 673. (17) Serra. J. "Imaae Analysis and Mathematical MorDholoay"; . -. Academic: . London, 1982.(18) Paviidis, W. K. "Algorithms for Graphics and Image Processing"; Springer: HeiidelbeG, 1982. (19) Nlemann, H. "Pattern Analysis"; Springer: New York, 1981. (20) Nilsson, Nils J. "Principles of Artificial Intelligence"; Springer: New York, 1982. (21) Rosenfeld, A. Pattern Recognit. 1984, 77, 3-12. (22) Kropatsch, W., Ed. "Mustererkennung"; Springer: Berlin, 1984. (23) Meier, B. U.; Bodenhausen, G.; Ernst, R. R. J. Magn. Reson. 1984, 6 0 , 161-163. (24) Marion, D.; Wuthrich, K. Biochem. Biophys. Res. Commun. 1983, 7 73,987. (25) Bodenhausen, G.; Kogler, H.; Ernst, R. R. J. Magn. Reson. 1984, 58, 370-388. (26) Griesinger, C.; Sclrensen, 0. W.; Ernst, R. R. J. Am. Chem. Soc., in press. (27) Griesinger, C.; S@rensen,0. W.; Ernst, R. R., manuscript in preparation. (28) Levitt, M. H.; Radloff, C.; Ernst, R. R. Chem. Phys. Lett. 1985, 774, 435-440. (29) Aue, W. P.; Karhan, J.; Ernst, R. R. J. Chem. Phys. 1976, 64, 4226. (30) Bodenhausen, 0.; Freeman, R.; Morris, G. A,; Turner, D. L. J. Magn. Reson. 1978, 37,75-95. (31) Eich, G.;Bodenhausen, G.; Ernst, R. R. J. Am. Chem. SOC. 1982, 704, 3731-3732. (32) Braunschweiler, L.; Bodenhausen, G.; Ernst, R. R. Mol. Phys. 1983, 48, 535-560. (33) Oxford English Dictionary, 1976. (34) Maudsley, A. A.; Wokaun, A.; Ernst, R. R. Chem. Phys. Lett. 1978, 55,9. (35) Shaka, A. J.; Keeler, J.; Freeman, R. J. Magn. Reson. 1984, 56, 294-313. (38) Hagborn, J. A. Astt'on. Astropbys. Suppl. Ser. 1974, 75,417. (37) Ralston, A.; Wiif, H. S. "Mathematical Methods for Digital Computers"; Wiiey: New York, 1967; Vol. 2.

RECEIVED for review May 20, 1985. Accepted July 5 , 1985. This research was supported by the Swiss National Science Foundation.