Higher-Order Mass Defect Analysis for Mass Spectra of Complex

Higher-order mass defect analysis is introduced as a unique formula assignment and visualization method for the analysis of complex mass spectra. This...
2 downloads 8 Views 3MB Size
ARTICLE pubs.acs.org/ac

Higher-Order Mass Defect Analysis for Mass Spectra of Complex Organic Mixtures Patrick J. Roach,†,§ Julia Laskin,*,† and Alexander Laskin‡ †

Chemical and Materials Sciences Division, and ‡William R. Wiley Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, P.O. Box 999, MSIN K8-88, Richland, Washington 99352, United States ABSTRACT: Higher-order mass defect analysis is introduced as a unique formula assignment and visualization method for the analysis of complex mass spectra. This approach is an extension of the concepts of Kendrick mass transformation widely used for identification of homologous compounds differing only by a number of base units (e.g., CH2, H2, O, CH2O, etc.) in complex mixtures. We present an iterative renormalization routine for defining higher-order homologous series and multidimensional clustering of mass spectral features. This approach greatly simplifies visualization of complex mass spectra and increases the number of chemical formulas that can be confidently assigned for given mass accuracy. The potential for using higher-order mass defects for data reduction and visualization is shown. Higher-order mass defect analysis is described and demonstrated through third-order analysis of a deisotoped high-resolution mass spectrum of crude oil containing nearly 13 000 peaks.

H

igh-resolution mass spectrometry (MS) capabilities are increasingly common in modern laboratories.1,2 The impressive mass resolving power of Fourier transform (FT) mass analyzers3,4 and the ongoing development of ambient pressure ionization (API) technologies59 together are being used to ionize and analyze mixtures of remarkable complexity. Chemical characterization of crude oil,1012 the molecular characterization of organic aerosol,1315 and the analysis of dissolved organic matter1618 commonly require chemical formula assignment and comparison of mass spectra containing hundreds or thousands of unique m/z peaks. The analysis of such complex spectra requires data processing methodologies19,20 that are capable of logically grouping the observed species such that the researcher can perceive significant attributes and unique characteristics of different samples. Mass accuracy and mass resolution determine the ability to unambiguously assign unique chemical formulas to the observed features in mass spectra. For example, mass accuracy of ∼0.1 mDa is required for unique elemental composition assignment of molecules up to 500 Da.21 Mixtures resulting from digestion or degradation processes such as atmospheric aerosol or crude oil often contain families of compounds that differ by a specific atomic building block (e.g., CH2, H2, O, CH2O, etc.). To identify homologous compounds differing only by a number of base units (e.g., CH2), an analysis has been developed by Kendrick22 that considers mass defect in a mass scale that has been cogently renormalized. Identification of one member of a homologous series with the same mass defect allows the remaining members to be confidently assigned chemical formulas.23 Experimental data often contain multiple series of homologous compounds with similar renormalized mass defects, which complicates the r 2011 American Chemical Society

identification of unique Kendrick series. In such cases a secondary criterion can be employed. For example, a method called z* sorting,24 is often used before CH2-based mass defect grouping is performed. In that method, species comprising the same CH2 homologous series have the same z* and mass defect. Higher-order mass defect analysis is introduced here as a new methodology for mass spectral data analysis that extends the concept of Kendrick transformation to multiple bases. Using Kendrick’s mass scale renormalization routine, homologous series are grouped by mass defect. In addition to being used to group homologous series, the mass defects contain further chemical information, as has been previously discussed.25 Specifically, addition of another base group (e.g., O or H2) to all members of one homologous series results in a systematic shift in the mass defect, which provides the basis for the higher-order renormalization described in this study. We show herein a routine for renormalization of the mass defects, which enables the identification of higher-order homologous series that are grouped based on secondary or tertiary compositional differences. Similar to the first-order Kendrick analysis, identification of one member of the group is sufficient for unambiguous formula assignment of all peaks in the group. After chemical formulas are assigned, existing tools, such as van Krevelen plots18,26 and other approaches,2730 are available to allow visualization of relative compositional differences and facilitate comprehension. We show here how higher-order MD plots can facilitate visualization and aid in the digestion of Received: March 15, 2011 Accepted: April 28, 2011 Published: April 28, 2011 4924

dx.doi.org/10.1021/ac200654j | Anal. Chem. 2011, 83, 4924–4929

Analytical Chemistry

ARTICLE

complex data sets. We now describe higher-order MD analysis beginning with a brief overview of the concepts proposed by Kendrick and include a demonstration of the methodology for a case of high-resolution mass spectrum of crude oil containing nearly 13 000 peaks.

’ METHOD DESCRIPTION AND DISCUSSION First-Order Mass Transformation. First-order mass transformation, commonly called Kendrick analysis, is used for the identification of compounds present in a mass spectrum that differ by chemical base and to group them into series of homologous compounds.22 This is accomplished by (1) generating a peak list from a mass spectrum, (2) choosing a base for renormalization, (3) renormalizing the mass scale such that the base has an integer value, and (4) grouping species with similar mass defects calculated on the new mass scale. A base for renormalization is a molecular fragment that can be added or removed from a chemical formula without creating radicals. For example, adding or removing the CH2 base changes the length of an alkane chain; addition of a double bond or ring to the structure of a hydrocarbon occurs with the removal of H2 from the formula, while oxidation results in a net addition of an oxygen atom to the molecular formula. Selection of different bases for Kendrick analysis is motivated by the specific chemical similarities in the systems being investigated. Kendrick showed that normalizing the mass scale so the base has an integer mass value could be used to identify homologous series of compounds differing only by the number of base units in the formula. Addition of the base does not change the decimal value of the renormalized mass, allowing identification of homologous series by the coincidental Kendrick mass defects of each member. As mass spectrometers are commonly calibrated for the IUPAC mass scale, the first-order Kendrick transformation normalizes the m/z scale for a specific base (B), using the ratio of the nominal value of the base mass and the IUPAC exact mass, M0(B).

MB1 ðpeakÞ ¼

roundðM 0 ðBÞÞ 0 M ðpeakÞ M 0 ðBÞ

ð1Þ

where M is mass and the superscript is used to identify the order of renormalization, 0 being used to indicate the IUPAC value. In the renormalized mass scale, two species with chemical formulas that differ only by B will have the same first-order mass defect (MD1) given by eq 2. MD1B

¼

ceiling ðMB1 , 1Þ  MB1

ð2Þ

where, the Ceiling function rounds MD1B up to the smallest following integer. Here we used the ceiling function in place of rounding to obtain the nominal mass in eq 2. The use of the ceiling function is necessary to avoid discontinuities in the firstorder homologous series that originate from rounding of M1B values while preserving the original convention used by Kendrick. Homologous series can then be identified and grouped within appropriately selected tolerance by coincidental MD1B values. The tolerance is chosen based on the mass accuracy. We note, that a generalized notation, MiB(peak) and MDiB(peak), where B is the base and i refers to the order of the transformation, has been introduced here that differs from that in previous publications for the purpose of clarification when using alternate bases (i.e., CH2, O, H2) and for the ensuing description of higher-order

Figure 1. A first-order mass defect plot (1MD plot) of MD1CH2 as a function of M0(set A). The CH2 first-order series shown in different colors are aligned through the first-order mass transformation. Periodic shift in MD1CH2 values by the value of MD1CH2(H2) apparent in this plot corresponds to changes in z, with z = 1, 2, 3, 4, and 5 shown in red, black, green, blue, and black, respectively.

mass transformation. Within the notation used here, MD1CH2(peak) represents the original CH2-based Kendrick mass defect (KMD). Second-Order Mass Transformation. The concepts of mass scale renormalization and mass defect analysis originally proposed by Kendrick can be extended to higher-order transformations to reveal additional homologous connections that exist between the first-order homologous series. Within a mass scale that has been renormalized for a first base, a second base transformation will modify the mass defect by an additional constant value, as has been noticed previously.25 The mass defects identified by the first-order mass transformation can be renormalized such that all of the homologous series that differ by only a second base will have the same second-order mass defect, allowing their subsequent grouping and identification. An example of this is provided using an empirical data set, Set A, composed of hydrocarbons with varying degree of saturation defined as follows: Set A:C12þn H26þ2n2z ðn ¼ 010; z ¼ 15Þ

ð3Þ

where n is an index and z is the double bond equivalent (number of double bonds and rings). The first-order CH2-based mass defect, MD1CH2(set A), is obtained using eqs 1 and 2, where B = CH2. A first-order mass defect plot (1MD) of MD1CH2 as a function of M0(set A) is shown in Figure 1. The points plotted in Figure 1 can be grouped into five homologous series based on their MD1CH2 values. The index z in eq 3 defines each homologous series and also corresponds to the DBE. Modification of the chemical formula by two hydrogen atoms results in a systematic change in the MD1CH2 value by the mass defect of the second base, H2, on this scale. The second-order mass transformation renormalizes the mass defects obtained from the first-order transformation to the mass defect of a second base, B2, on this scale. Compounds that differ only by the two bases, B1 and B2, have equivalent values of MD2B1,B2 on this scale, which enables their grouping as a second-order homologous series. 4925

dx.doi.org/10.1021/ac200654j |Anal. Chem. 2011, 83, 4924–4929

Analytical Chemistry

ARTICLE

Figure 3. Negative ESI spectrum of heavy crude oil containing 12 997 peaks.

Figure 2. Panel A: three-dimensional plot of IUPAC mass, M0(set A), first-order mass defects, MD1CH2(set A), and second-order mass defects MD2CH2,H2(set A) for a group of organic species defined as set A: C12þnH26þ2n2z(n = 010;z = 15). Panel B: two-dimensional plot of second-order mass defects (2MD plot) MD2CH2,H2(set A) versus firstorder mass defects, MD1CH2(set A). The series coloring is the same as in Figure 1.

A second-order mass defect is calculated from a first-order mass defect by first obtaining a quantity termed a second-order mass (M2B1,B2), 2 MB1 , B2 ðpeakÞ ¼

MD1B1 ðpeakÞ MD1B1 ðB2Þ

ð4Þ

Note that the second-order mass of B2, MD2B1,B2(B2), equals unity. The second-order mass defect (MD2B1,B2) is given by eq 5: 2 MD2B1, B2 ¼ MD2B1, B2 ðpeakÞ  ceilingðMB1 , B2 ðpeakÞ, 1Þ ð5Þ

The compounds that comprise Set A all share the value of MD2CH2,H2 = 0.000, because their chemical formulas differ only by a number of CH2 and H2 bases. Figure 2 shows a threedimensional plot of M0 vs MD1CH2 and MD2CH2,H2. The vertical xy projection of Figure 2a can be recognized as a standard MD1 mass diagram, known as a Kendrick mass diagram. Figure 2b shows MD2CH2,H2(Set A), introduced here as a second-order mass defect diagram (2MD), which is also the yz projection of

Figure 4. 1MD plot of MD1CH2(M0(heavy crude)). Note: The mass defect of each first-order homologous series is plotted as the average value of the series to reduce the scatter and aid visualization.

Figure 2a. The 2MD plot facilitates identification of homologous series and provides a practical visualization tool when viewing multiple second-order Kendrick groups within a single plot as generally occurs when examining experimental MS data. Higher-Order Mass Transformation Applied to a Crude Oil Sample. Crude oil is an excellent example of a complex mixture of homologous organic compounds that are suitable for identification using second-order mass defect analysis with B1 = CH2 and B2 = H2 bases. Electrospray ionization (ESI) mass spectra of crude oil commonly contain compounds of the formula CnHmX where X represents various combinations of N, O, and S.11,12,24,31 For such compounds, second-order mass defect, MD2CH2,H2, can be used to group various homologues series differing by the identity of X. As a result, species with different chemical functionalities determined by various combinations of O, N, and S can be readily distinguished and identified. We show now a higher-order mass defect analysis through the third-order transformation using the bases B1= CH2, B2 = H2, B3 = O for a negative mode ESI mass spectrum of heavy crude obtained from Dr. Ryan Rodgers of the National High Magnetic 4926

dx.doi.org/10.1021/ac200654j |Anal. Chem. 2011, 83, 4924–4929

Analytical Chemistry

ARTICLE

Table 1. Statistical Summary of the Homologous Groups Identified in the Crude Oil Sample through the Second-Order Analysisa group ID

Figure 5. 2MD plot showing the average value of MD2CH2,H2 as a function of MD1CH2 for the crude oil data. Adjacent series are plotted alternately in red and black to aid visualization. Numeric indices of homologous series that correspond to the unique functional families are identified in Table 1. The 12 997 points plotted are reduced to 480 distinct points, each corresponding to a first-order homologous CH2 series. The number of points is reduced from the original spectrum and the 1MD plot (Figure 4) by 96.3%.

Field Laboratory. The deisotoped crude oil spectrum shown in Figure 3 consists of 12 997 unique m/z species over a 2001100 m/z range. First-order mass defect analysis using B1 = CH2 results in the 1MD plot shown in Figure 4. Distinct CH2 homologous series are observable that are aligned horizontally. However, this plot contains as many points as the spectrum in Figure 3 and therefore improves data assessment only marginally. Although the firstorder mass defect analysis serves as a tool for facilitating formulas assignment, it does little to reduce the visual complexity of the plot. The 2MD plot in Figure 5 was created using B1 = CH2 and B2 = H2, for the first- and second-order mass defect analysis, respectively. The second-order transformation was applied to MD1CH2 values averaged over individual first-order homologous series. As a result, each of the CH2 first-order series is represented now as a single point in the 2MD plot. In the 2MD plot, the entire spectrum is represented by 480 points corresponding to only 3.7% of the number of points present in the 1MD plot (Figure 4). Horizontal series represent groups of compounds that differ in composition by only by the number of CH2 and H2 base units, thus allowing the simplified presentation of families containing unique combinations of N, O, and S elements. The extent of bond saturation can be easily identified by the abscissa of points plotted in Figure 5 because adjacent points of the same series differ by one double bond, increasing to the left. From Figure 5 and Table 1, it can be observed that the addition of an oxygen atom to a functional group family increases the value of MD2CH2,H2 by ∼0.286 in the experimental data. However, there is also a clear irregularity in the values of MD2CH2,H2. Specifically, when the value of MD2CH2,H2 obtained by the addition of oxygen to a functional group (addition of ∼0.286) exceeds unity, the resulting value of the second-order mass defect must be reduced by 1 (i.e., the MD2CH2,H2 values roll at unity).

X

MD2

size

14



0.542

681

6

O

0.828

881 584

22

O2

0.115

15

O3

0.399

240

23 16

N NO

0.106 0.392

1290 900

8

NO2

0.678

652

2

NO3

0.944

258

7

S

0.796

460

24

OS

0.080

830

17

O2S

0.367

443

10

O3S

0.652

300

18 11

NS NOS

0.353 0.642

969 722

3

NO2S

0.928

376

9

N2

0.667

507

1

N2O

0.952

311

25

S2

0.041

283

19

OS2

0.330

543

12

O2S2

0.618

223

13 5

NS2 NOS2

0.603 0.890

743 386

21

NO2S2

0.191

5

4

N2S

0.918

329

20

N2OS

0.202

73

a

Group ID corresponds to the group numbering shown in Figure 5. X denotes the functional group present in individual CnHmX species. MD2 values are empirical MD2CH2,H2 averages calculated for each of the series. Size denotes the number of members that have been grouped in the series.

For example, consider the S series in Table 1. MD2CH2, H2 ðSÞ ¼ 0:796

ð6aÞ

Addition of oxygen gives the SO series with the nominal mass defect given by eq 6b: MD2CH2, H2 ðSÞ þ 0:286 ¼ 1:082

ð6bÞ

However, because mass defect values roll at unity, the resulting mass defect is calculated as follows: MD2CH2, H2 ðSOÞ ¼ 1:082  1 ¼ 0:082

ð7Þ

These observations facilitate third-order mass transformation and the next level grouping of the 2MD homologous series identified in Table 1. The transformation of MD2CH2,H2 is subject to the constraint exemplified by eqs 6 and 7. Specifically, if the modified value of the mass defect increases above unity, the value of MD2CH2,H2 obtained from the transformation is reduced by 1. To create a 3MD plot, a modified MD2 value must be created that accounts for this issue of looping above unity, which we designate as 2 MD2 CH2, H2 ¼ ðMDCH2, H2 þ 0, 1, 2; etc:Þ

ð8Þ

MD2*CH2,H2

value is calculated for all members of A modified 3MD group that have looped above unity. In systems that have 4927

dx.doi.org/10.1021/ac200654j |Anal. Chem. 2011, 83, 4924–4929

Analytical Chemistry

ARTICLE

Figure 6. 3MD plot representing the 12 997 peaks of the heavy crude oil spectrum shown originally in Figure 3 using only 25 points, a data reduction of 99.8%. The horizontal series represent compounds that can be designated CnHmO(03)X, where X is labeled on the graph. The leftmost point of each series is the O0, and oxygen atoms are added to the formula successively to the right. Each point represents a second-order series listed in Table 1. Adjacent series are plotted alternately in red and black to aid visualization.

undergone excessive oxidation, such as for ozonolysis products in organic aerosol,32 the values may roll at unity multiple times. The third-order mass defect (MD3B1,B2,B3) is then calculated using eq 9: 2 MD3B1, B2, B3 ¼ modðMD2 B1, B2 , MDB1, B2 ðB3ÞÞ

ð9Þ

where the MOD function finds the remainder of division of one number by another. We routinely assign formulas after the initial de novo grouping up to the 2MD level. 3MD plots are used at this time primarily as a data reduction and visualization tool. The 3MD plot in Figure 6 employs 25 points, to represent 12 997 peaks in the heavy crude oil mass spectrum from Figure 3, providing a 520 simplification of the complex mass spectral data using only 0.8% of the 12 997 points present in the original 1MD plot. Identification of only 25 molecular formulas is necessary for complete assignment of all peaks in the spectrum. The 3MD plot in Figure 6 could be easily modified to facilitate visualization of a chosen property of the data, i.e., for comparing the extent of oxidation of the different crude oil samples. Clustering of all peaks in a complex spectrum into several large groups facilitates the assignment of the first-order Kendrick series that originate at high m/z values. For example, 1290 peaks with one nitrogen atom (CxHyN, group 23 in Table 1) were assigned based on the elemental composition of the leading peak at m/z 222.1288. More than 30% peaks in this group belong to the CH2 Kendrick series originating at m/z > 500, which cannot be confidently assigned using traditional approaches.21 It follows that for a given mass measurement accuracy, the method presented here allows us to assign more peaks in the complex spectrum. Preliminary studies in our group indicate that this approach can be successfully used for formula assignment in high-resolution complex spectra recorded at a mass resolution of 60 000 at m/z 400 and mass accuracy of