Multivariate Curve Resolution-Alternating Least Squares Analysis of

Oct 18, 2016 - This is a necessity when analyzing highly complex samples; however, the size of high-resolution LC-HRMS data sets can cause difficultie...
1 downloads 9 Views 758KB Size
Subscriber access provided by University of Newcastle, Australia

Article

Multivariate Curve Resolution-Alternating Least Squares Analysis of High Resolution Liquid Chromatography-Mass Spectrometry Data Melanie M. Sinanian, Daniel Wesley Cook, Sarah C. Rutan, and Dayanjan S. Wijesinghe Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.6b03116 • Publication Date (Web): 18 Oct 2016 Downloaded from http://pubs.acs.org on October 18, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Multivariate Curve Resolution-Alternating Least Squares Analysis of High Resolution Liquid Chromatography-Mass Spectrometry Data Melanie M. Sinanian1, Daniel W. Cook1, Sarah C. Rutan1*, Dayanjan S. Wijesinghe2 1. Department of Chemistry, Virginia Commonwealth University, Richmond, VA 23284 2. Department of Pharmacotherapy and Outcomes Science, Virginia Commonwealth University, Richmond, VA 23284 ABSTRACT: Methods such as liquid chromatography coupled with high resolution mass spectrometry (LC-HRMS) are crucial for differentiating compounds with highly similar masses. This is a necessity when analyzing highly complex samples; however, the size of high resolution LC-HRMS datasets can cause difficulties when applying advanced data analysis techniques. In this work, LC-HRMS analyses of known amphetamine samples and unknown bacterial lipid samples were carried out and multivariate curve resolution-alternating least squares (MCR-ALS) was applied to the data to obtain mathematical separation of overlapped analyte signals. In order to minimize computational strain, a novel strategy was developed which minimizes the number of irrelevant masses analyzed at full resolution. To do this, data were first binned to unit mass resolution and MCR-ALS was performed. This provided mathematical components for each analyte present plus background components. In the resolved spectral profiles of analyte components, masses above a preset intensity threshold were extracted, discarding all other masses, and expanded to successively higher levels of resolution, applying MCR-ALS at each level. These steps were repeated until 0.001 amu resolution was achieved, as dictated by the resolution of the instrument; in this case, a time-of-flight mass spectrometer. This strategy allowed for the accurate recovery of all known amphetamine compounds and select bacterial lipid extracts while minimizing the size of the data, therefore minimizing computational analysis time and data storage requirements. This relatively simple strategy enables the effective coupling of LC-HRMS with MCR-ALS.

An alternative to peak detection is the use of multi-way chemometric methods to resolve individual analyte signals from both background and other analyte signals. The data resulting from LC-HRMS experiments are considered to be ‘second-order’ data, meaning that the data can be represented by a matrix with time in one mode and mass-to-charge (m/z) in the second mode. Chemometric data analysis methods employ mathematical techniques that are able to handle second-order and higher dimensional data, allowing the entire dataset to be analyzed in a single analysis rather than analyzing a single m/z channel at a time (e.g., extracted ion chromatograms). One method that has been developed to resolve chromatographic data is multivariate curve resolution-alternating least squares (MCR-ALS).7–9 MCR-ALS has been widely used for secondorder data arising from LC with ultraviolet-visible detection (UV-Vis)10–14, LC with fluorescence spectroscopy12,15,16, and with low resolution mass spectrometry.17,18 MCR-ALS has also recently been utilized for LC-HRMS analyses, including metabolomics,19 however the advantages that the high resolution data provide have not been fully realized. In almost all cases, the LC-HRMS data analyzed by MCR-ALS is subjected to binning, a process of grouping the mass intensities into bins within a specific range.3 This is done to reduce the size of the LC-HRMS data, which is very large due to the number of masses in the dataset. For example, a mass spectrum with a range from 50-1000 amu at intervals of 0.001 amu contains a possible 9 x 106 mass-to-charge (m/z) values. Binning to in-

The combination of liquid chromatography with highresolution mass spectroscopy (LC-HRMS) offers a powerful tool for the resolution of peaks in two modes -- temporal and spectral. This offers the potential for qualitative and quantitative analysis of highly complex mixtures. Despite the temporal and spectral resolving power of LC-HRMS, overlapping and interfering signals are often present in both domains, especially in cases where the sample is highly complex. Another complication is the inclusion of irrelevant signals in the data. Because of the sensitivity of these instruments, much of the data collected in LC-HRMS experiments correspond to noise and background rather than true analyte signals. Because of this, it is important to efficiently differentiate analyte signals from irrelevant signals, a process known as peak detection. Many different algorithms have been implemented in many software packages that approach peak detection differently. Many of these rely on the Gaussian-like peak shapes in both the chromatographic and spectral modes, including direct fitting of Gaussian or exponentially modified Gaussian peak models to the extract ion chromatograms.1,2 Another approach is the use of derivatives to remove noise and linear background signals while defining peak start and end points.3–6 While these approaches are widely used, they can often struggle to detect low-level analytes, due to required intensity thresholds. Furthermore, while such approaches can often resolve slightly overlapped signals they are often unable to resolve highly overlapped signals.

1

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 9

solution of Eq. 1 for either C or ST using alternating least squares algorithms.4,25 Most often, spectral initial guesses are used, which can be obtained through methods such as SIMPLISMA (ACD/Labs, Toronto, Canada)26 or iterative orthogonal projection analysis (IOPA)27–29, both of which aim to extract the most dissimilar spectra from the raw data. In this work, IOPA was used. Briefly, IOPA extracts a certain number of spectra, defined by the user, from the raw data that are the most orthogonal from one another. Constraints The defining step in MCR-ALS is the application of constraints to drive the solution towards the correct, chemically relevant answer for the pure, resolved components. Commonly employed constraints include nonnegativity, selectivity, unimodality (one maximum per component), closure (mass balance), smoothness30, component correspondence31, area correlation32 and hard modeling constraints 8,9. Of the constraints listed above, nonnegativity and selectivity were applied in this procedure. Nonnegativity ensures that the chromatographic and mass spectral intensities do not go below zero, as the negative values are not physically reasonable. Selectivity notifies the algorithm of prior known values, such as regions of time known to contain no compounds or regions of the mass spectra known to contain no signals.25,33 These constraints were applied only to those components believed to correspond to true chemical species. Additional constraints beyond these provided no improvement to the resolution results. Extended MCR-ALS

tervals of 0.1 amu, for example, reduces the number of possible m/z values to 9 x 103. Even though unmanageable in its raw form, the information contained in HRMS data is often necessary to differentiate between compounds with very similar masses. In a lipidomic study of placental cells by Gorrochategui et al., MCR-ALS was performed on binned LC-HRMS data. After the data was resolved at unit mass resolution, the authors examined the raw, high-resolution data at masses found to correlate to potential biomarkers. From the raw data, masses at 0.0001 amu precision were assigned.20 This approach can be problematic if chromatographically overlapped species contain spectral peaks which share the same mass at lower precision because MCRALS would be unable to resolve these peaks in low precision data.13,21,22 An alternative approach to binning is the use of wavelet transforms; however, when MCR-ALS is to be used, the Haar wavelet must be selected to retain non-negativity in the data. The effect of the Haar wavelet transformation is a pair-wise averaging effect which is practically equivalent to the binning process, with a loss of resolution accompanying the compression. Recently, Tauler et al. have published a new protocol outlining a different approach to LC-HRMS data analysis using MCR-ALS.23 Their approach defines “regions of interest” (ROI) prior to MCR-ALS analysis, allowing for data compression to take place.24 These regions of interest are chosen based on several parameters including a signal-to-noise threshold, which if set incorrectly may lead to the exclusion of compounds at low intensities, particularly if a low-level analyte is present in the vicinity of a much higher concentration analyte. The work described in the current paper presents a new strategy which can analyze LC-HRMS data by finding relevant regions in the binned mass spectra using MCR-ALS and discarding all other masses. These regions are used for a second round of MCR-ALS at a 0.1 amu bin level. This process is repeated until data at 0.001 amu precision are analyzed. This allows for the resolution of compounds which overlap in the chromatographic mode and share masses at even 0.01 amu precision, while greatly reducing the size of the data. In contrast to the ROI approach, the current approach does not make use of any thresholds prior to MCR-ALS analysis, allowing for low level analytes to be captured without risk of mistakenly eliminating them due to incorrect thresholding.

As shown in Eq. 1, MCR-ALS operates on second-order data (i.e., a matrix); however, MCR-ALS is also capable of analyzing multiway (e.g., multiple samples, multi-dimensional chromatography34,35, etc.) and multiset datasets (e.g., data fusion36). Making use of this higher order data can greatly reduce the amount of rotational ambiguity often associated with MCRALS.37 For example, we often want to include multiple samples, creating a multi-way array (i.e., a cube) of data. In order to analyze all samples simultaneously, augmentation of the third-order array into a second-order array is necessary. This process is illustrated in Figure 1. Essentially, every sample is concatenated along the time mode to create a single augmented time mode containing all samples, while conserving the spectral mode. When this augmented data matrix is analyzed with MCR-ALS, it is called extended MCR-ALS.25

THEORY Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) MCR-ALS is an iterative optimization method that mathematically resolves signals arising from chemical species and background without needing prior information about the spectra or concentrations of the compounds present in the sample.19,8,22 MCR-ALS can be viewed as a multicomponent Beer’s law relationship given as the equation below.

Figure 1. Graphical representation of data rearrangement process for reshaping a third-order data array to a second-order data array.

STRATEGY

‫ ܆‬ൌ ۱‫ ܁‬୘ (1) In this relationship, X is the raw second-order data resulting from an LC-MS run, C is a matrix consisting of vectors representing the pure chromatographic profiles, and ST is the corresponding matrix of pure mass spectra.3,8 An initial guess for either the spectral or chromatographic profiles allows for the

The overall strategy proposed for the analysis of LC-HRMS data is outlined in Figure 2. The approach includes selection of retention time windows, binning the spectra to unit mass, MCR-ALS analysis, followed by selection of relevant masses from this analysis and examining these at 10-fold higher mass precision. MCR-ALS is then carried out on these data, and the

2

ACS Paragon Plus Environment

Page 3 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry ure 3. Time windows are chosen such that they are the same across all samples so that they ideally contain the same analyte peaks in each sample. The following steps were applied to each window individually. If peak windows cannot be chosen in which no peaks are split across the window edge, windows can be chosen in which the regions overlap to ensure that each peak is completely contained in at least one window. This procedure of segmentation is inevitably used in MCR-ALS applications in chromatography because of the localization of the component information within narrow time windows.

procedure is continued until analysis at the resolution precision of the instrument is achieved. 1. Selection of retention time windows 2. Binning mass spectra to unit mass 3. MCR-ALS analysis

2. Binning mass spectra to unit mass As described in the previous section, analyzing the entire mass range in LC-HRMS data requires binning the data to decrease the number of data points, and therefore reducing the size of the data, needed to be analyzed. In this strategy, the data is initially binned to unit mass resolution. The binning procedure collects masses around a certain value and sums the intensities and assigns them to that value. Because small molecules tend to have a mass defect ~0.1 amu, an asymmetric binning process, was utilized for the initial, unit mass binning procedure. For example, at unit mass, the intensity at all masses between 149.6000 and 150.5999 amu were summed and assigned to 150 amu. This process is often performed prior to MCR-ALS analysis of LC-HRMS data, eliminating the advantages of high-resolution data. The current strategy makes use of several binning steps at sequentially higher resolutions. 3. MCR-ALS analysis The binned data is then augmented to include all samples, as described in the Theory section above, and an initial guess is obtained using IOPA to initiate MCR-ALS analysis. For the initial unit mass resolution analysis, the number of components is chosen based on the expected number of compounds plus one or two background components. Scree plots38 are often used as a starting point for determination of the number of components in MCR-ALS. For chromatographic data, however, scree plots are often difficult to interpret due to a lack of distinct “shoulders” or breaks in the graph which are used to estimate the number of components.28 In the case of untargeted analyses, several numbers of component should be tried and the most reasonable number of components (based on peaks split between components and overall realistic looking background/analyte profiles) should be chosen. Because the initial round of MCR-ALS should eliminate the majority of background ions, the number of components at analyses at subsequent bin levels should be less than at unit mass resolution. In many cases, the number of components at these subsequent levels will equal the number of compounds present, without any background components, because of the elimination of the background masses in the initial round of MCR-ALS. The chromatographic profiles and the mass spectral profiles of each component are reviewed and those which appeared to correspond to true chemical components are chosen for submission to the next step. Any components which may be difficult to assign as a chemical component versus a background should be included in the next step to prevent falsely excluding true analyte peaks. 4. Expansion of relevant masses to next m/z level and extraction from raw data To prepare for higher resolution analysis in subsequent steps, masses with intensities greater than a set threshold percentage

4. Expansion of relvant masses to next m/z level and extraction of intensities from raw data

5. Repeat steps 3 and 4 until final desired m/z level is reached

Figure 2. Summary of strategy for analysis of LC-HRMS data by MCR-ALS.

Figure 3. The total ion chromatogram (TIC) (A) and contour plot (B) for an amphetamine chromatogram depict the partition of data in the time mode into the analyzed windows, differentiated by numbered boxes. The two peaks between 150190 s were diasteromers and therefore are unable to be resolved via MCR-ALS because they have identical spectra. 1. Selection of retention time windows The first step in most MCR-ALS analyses, including the present one, is to select windows along the time mode in order to minimize the complexity of the data submitted to the MCR-ALS analysis (i.e., number of compounds, data size). This is shown in Fig-

3

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 9

of the spectral peak intensity within each analyte’s resolved spectral profile are chosen as significant. In this work, a threshold of 5% of the maximum was used. It is important to note that the threshold is applied to each component individually, negating any possibility of excluding low-intensity compounds. Compounds at low intensities should resolve into their own component and their spectra will be normalized to the maximum intensity and thus masses significant to those compounds should dominate their corresponding spectral profile regardless of their intensity in the raw, unresolved data. The significant masses are then expanded to the next bin level (e.g., from 0.1 to 0.01 amu) and signals corresponding to those masses were then extracted from the raw data and subsequently binned to the new bin size similar to the binning procedure in step 1. For example, if a component contains a significant mass at 163 amu, for the next step of the analysis, the masses selected for binning cover a range from 162.6-163.5 amu at 0.1 amu intervals. From the raw data, signals at masses 162.5500-162.6499 amu are binned to 162.6 amu and 162.6500-162.7499 are binned to 162.7 and so on. This process is portrayed graphically in Figure 4. Only data at these extracted and expanded masses (above the 5 % threshold in this work) are analyzed in the next round, keeping the size of the overall data set to a level that is reasonable for analysis. This process eliminates masses that do not correspond to chemical compounds and therefore are irrelevant for analysis. 5. Repeat steps 3 and 4 until desired final m/z level is reached MCR-ALS analysis (with IOPA initial guesses each time) and extraction of masses as described in steps 3 and 4 are repeated, stepping down to a higher resolution level after each iteration. Once the data are expanded to the desired final resolution level as dictated by the instrumental data (e.g., 0.001 amu), the spectral peaks can then be used for compound identification, while the resolved chromatograms can be used for quantitative analysis and/or pattern recognition.

T3600 with an Intel Xeon E5-1620 CPU at 3.60 GHz and 32.0 GB of RAM and version R2015b on a Dell Optiplex 9020 with an Intel Core i7-4790 CPU at 3.60 Hz and 32.0 GB of RAM. Both systems were running Windows 7 Enterprise. Most of the calculations were run on the latter computer while the former was used for data translation and program development. The Bioinformatics Toolbox by Mathworks was used for importing data into MATLAB. Data Collection Two datasets were analyzed with the strategy described above to demonstrate its applicability to both targeted analyses and untargeted, discovery type analyses. Both sample sets were analyzed with a Shimadzu LC system (Nexera series, Kyoto, Japan) coupled to an AB Sciex TripleTOF 5600 mass spectrometer (Concord, Ontario, Canada). The chromatographic conditions for each sample set are listed in their respective sections below. The data were collected in profile mode, rather than being centroided. Data were converted from AB Sciex .wiff files to mzXML files using msConvert, which is contained in the ProteoWizard suite.9 These data were then imported into MATLAB using the Bioinformatics toolbox mzxmlread function and peak lists were extracted. These were then ready to be processed using the in-house binning program. Amphetamine Samples Amphetamine standards were purchased from Grace Discovery Services (Columbia, Maryland). The names, abbreviations, and structures of the amphetamines used are listed in Figure 5. The compounds were divided into three groups. For each group, four calibration mixtures and two test mixtures were created. The concentrations and further sample information are given in supplemental information Table S1. The data from all three groups were analyzed as a single dataset. This analysis represents an ideal analytical experiment and is used to demonstrate the feasibility of our strategy. For the amphetamine analysis, an Accucore C18 column (2.1 x 100 mm, 2.6 µm; Thermo Scientific, Waltham, MA) was used. Acetonitrile was used as mobile phase B and 10 mM formic acid was used as mobile phase A. Gradient elution was used starting at 2.5% B increasing to 35% B over 10 min. Bacterial Lipid Analysis

Figure 4. From the binned data, MCR-ALS extracts relevant masses. These masses are then expanded to a higher resolution level and MCR-ALS again extracts relevant masses at this resolution level. The blue and red boxes represent masses in the true chemical components and the gray boxes represent background or masses with insignificant intensities.

To demonstrate the utility of our strategy to complex analyses, five replicates from three different strains of bacteria were analyzed. To prepare for analysis, the samples were freeze-thawed three times in 200 µL phosphate buffered saline followed by probe sonication. Then, 1 mL of methanol was added followed by bath sonication. Another round of sonication was performed with 0.5 mL of chloroform. After a 2 hr incubation at 48 °C, 1 mL of chloroform and 3 mL of water were added followed by vortexing and centrifugation. The organic layer was extracted. A second extraction was performed with an additional 2 mL of chloroform. The organic extract was dried via vacuum centrifugation and resuspended in 100 µL of methanol for analysis.

The application of this general strategy is demonstrated in the following sections. Examples of both a targeted analysis of amphetamines and an untargeted analysis of bacterial lipid samples are shown.

These samples were analyzed using an Acuity HSS T3 C18 column (2.1 x 150 mm, 1.8 µm; Waters, Milford, MA) at 55 °C. Gradient elution was used with mobile phase A consisting of 60:40 water:methanol with 10 mM ammonium formate and 0.1% formic acid and mobile phase B consisting of 90:10 isopropanol/acetonitrile with 10 mM ammonium formate and

EXPERIMENTAL Software All programs were written in-house using MATLAB (Mathworks, Inc., Natick, MA) version R2013a on a Dell Precision

4

ACS Paragon Plus Environment

Page 5 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry investigated for use in the chromatographic mode, but showed no obvious improvement for these data. An important consideration when using MCR-ALS is the degree of rotational ambiguity is present in the results. For this work, the application of constraints along with the selectivity provided by mass spectrometry minimized rotational ambiguity. This is supported by the observation that using additional constraints provided no significant differences in the resolved analyte profiles.

0.1% formic acid. The analysis began with 100% A and increased to 100% B from 1-21 min and held at 100% B for 4 min.

RESULTS AND DISCUSSION Known Amphetamine Data To demonstrate the ability of the proposed algorithm to analyze LC-HRMS data with MCR-ALS, an analysis of amphetamines (Figure 3) was carried out. As shown in Figure 3, the peaks between 150-190 s were not selected for analysis. This was due to these analytes, ephedrine and pseudophedrine, being diastereomers with identical mass spectra. MCR-ALS requires different spectra to be able to resolve analyte signals. Otherwise, all of the windows shown in Figure 3 were analyzed using the methods described here; however, this discussion primarily focuses on the analysis of the second window. First, the data were binned to unit mass and MCR-ALS was performed. Two components were observed to contain realistic chromatographic peak shapes. This agreed with the two known compounds in this retention time window, PEA and PPA. The resolved mass spectral profiles are shown in Figure 6A. Three masses were found above the 5% intensity threshold in this window for both of these components. These masses are listed in Table 1. These masses were expanded as described in the Strategy section and at the 0.1 amu bin level, 3 masses were again determined to be significant using a 5% intensity threshold. The masses at this level are more precise than that of the unit mass bin level. This process was repeated until the 0.001 amu bin level was reached. It is important to note that at the unit mass and 0.1 amu bin levels each of the masses were represented by a single data point (i.e., a spike) in the spectrum, whereas at the 0.01 and 0.001 amu bin levels, the peaks are represented by several data points creating a spectral peak shape, thus more masses were found as significant. At the final level of resolution in Table 1, only the masses corresponding to the maximum intensity of each peak are listed. Because the m/z axis was irregular with intervals at approximately 0.0015 amu, at the final level of binning (0.001 amu), some bins contained no m/z values. In several instances, this caused spectral peaks to contain false zero intensities, as determined by a discontinuous peak. To account for this, a cubic spline interpolation was performed subsequent to binning but prior to MCRALS analysis to ensure a continuous m/z axis for all spectra in all samples for visualization purposes. The resolved chromatographic profiles for the two compounds at the final bin level are shown in Figure 7, with the statistics for the final MCRALS analysis and calibrations given in Table S2. The nonGaussian and jagged characteristics of these peaks are caused by injection solvent mismatch41,42 and natural fluctuations in electrospray ionization sampling (especially for the highly aqueous mobile phase used for the elution of the amphetamines), respectively. These characteristics are also seen in the

Figure 5. Structures and abbreviations of amphetamines contained in the amphetamine standard solutions analyzed. MCR-ALS analysis For this work, spectral initial estimates were obtained using IOPA to initiate MCR-ALS. This was performed using an inhouse MATLAB program. MCR-ALS was also performed using an in-house MATLAB program, which was based on previously described programs by Allen and Rutan28 and Bezemer and Rutan.39 The raw three-way LC-MS data, including the sample mode, are input along with the initial estimates of the component spectra. The sample augmentation is performed within the program. The maximum number of iterations and the convergence criterion are defined and the constraints are set for the selected components. For the current work, non-negativity was used in both chromatographic and spectral modes while selectivity was used in the chromatographic mode. This selectivity set certain regions in the chromatographic profiles to zero intensity. Specifically, this was used at the edges of the retention windows where no peaks were present to ensure resolution of background signals. A smoothing constraint based on Eilers’ perfect smoother40 was

5

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 9

here). This was clear from the presence of identical mass spectra (r2 > 0.99). When added together, the reconstructed component showed a chromatographic peak shape as seen in Figure 9, row 1. This is likely to have been caused by a slight shift in the mass spectra across the duration of the chromatographic peak, causing MCR-ALS to resolve the single analyte into two components.

Figure 6. Resolved mass spectral profiles for components in window 2 at bin levels of 1 amu (A) and 0.001 amu (B). Below each spectral plot the masses corresponding to components 1 (PEA; blue) and 2 (PPA; red) are listed.

raw data indicating they are not artifacts of MCR-ALS processing. To determine the quantitative performance of this methodology, peak areas were calculated by integrating over the entire resolved chromatographic profiles, shown in Figure 7, and constructing calibration curves. Concentrations were predicted for each compound in the two test mixtures. The percent errors in the predictions for the test unknowns are listed in Table 2. These errors are all less than 20 %, which is comparable to the expected precision of LC-HRMS with no internal standard. Table 1. All extracted masses for compounds resolved within window 2 collected above the 5 % intensity threshold at each bin level.

Figure 7. Resolved chromatographic profiles (overlaid samples) for PEA and PPA at the 0.001 amu bin level. Table 2 Percent errors in the known concentrations of test samples from the final bin level of 0.001 m/z obtained from this method. Window 1 2

Compound PE

Test sample 1

Test Sample 2

0.75 %

15 %

PPA

12 %

-3.0 %

PEA

9.6 %

-* 10 %

Bin Level

PPA

PEA

MDA

16 %

Unit

134, 135, 152

105, 106, 122

Phent

1.1 %

11 %

0.1

134.1, 135.1, 152.1

105.1, 106.1, 122.1

MDE

15 %

4.5 %

0.01

134.09, 134.10, 134.11, 135.10, 152.10, 152.11

105.07, 105.08, 106.07, 122.09, 122.10,

Mamp

17 %

11 %

MDMA

2.6 %

9.3 %

134.096, 135.099, 152.106

105.070, 106.074, 122.096

Amp

1.8 %

-*

Moxy

6.3 %

-*

PMMA

4.8 %

-*

0.001*

4

*masses at this level represent the maxima of the spectral peaks

5

Unknown Bacterial Lipid Data In order to demonstrate the utility of this strategy for untargeted analyses of complex samples, this strategy was applied to a bacterial lipid dataset shown in Figure 8. For the purposes of brevity, the results of a single time window (Figure 8 inset) are reported here. The complexity of the data is evident in both the TIC and the contour plots of the data, with crowded and overlapping features as well as background attributes, as shown in Figure 8. Analysis with our strategy allowed for the resolution of four analyte signals with their respective spectral and chromatographic profiles shown in Figure 9. As described in the above amphetamine results, spline interpolation was employed prior to the MCR-ALS analysis at the final bin level. The final MCR-ALS analysis resulted in the compound represented in component 1 being split across two components (not shown

MTA

-0.93 %

-3.4 %

mCPP

12 %

-*

6

Bromo

2.6 %

4.5 %

7

BP

2.0 %

11 %

Traz

-1.6 %

-*

*Blank cells are due to the absence of these compounds from test sample 2 The resolved spectral profiles from the final round of analysis allowed recovery of precise analyte masses which can aid in compound identification. As seen in Figure 9, the resolved spectral profiles in components 3 and 4 showed many significant masses. This is due to the relatively low intensities of these analyte signals as seen in their corresponding chromatographic profiles (maximum intensities of 300-400 ion counts, as opposed to intensities of 8000 and 2500 counts for components 1 and 2, respectively). This reduces the signal-to(residual) background of the masses corresponding to the compounds. Because the spectral profiles are normalized, the

6

ACS Paragon Plus Environment

Page 7 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

A)

B

Figure 8. (A) This panel shows the TICs for the total bacterial lipid dataset. The inset shows the window chosen for analysis. (B) The contour plot shows the LC-HRMS data for the window selected. It can be seen that background ions are present in addition to the analytes of interest.

residual background ions are intensified in the spectral profiles. Table 3 lists masses believed to be significant in the spectral profiles of all four components. Also listed in Table 3 are potential molecular formulas assigned using the LIPID MAPS structure search.43 It is interesting to note that in each case an isotope peak was recovered, and in the case of compound 4, two isotope peaks were recovered. The resolved chromatographic profiles allowed for the determination of retention times as well as relative quantitation between the bacterial strains. While not performed in this work, these resolved components can also be submitted to pattern recognition algorithms such as principal components analysis (PCA) to aid in distinguishing differences between the bacterial strains. Table 3. Found masses in the bacterial dataset and their associated molecular formula

Peak

Corresponding Molecular Average Reten- Associated Formula (within ± 0.005 tion Time (s) Masses (m/z) m/z tolerance )*

1

588

211.168 212.172

C13H23O2

2

589

199.169 200.173

C12H23O2 C18H29O3

592

293.210 294.213 447.130 448.135

3

4

595

313.214 314.219 315.193

Figure 9. Resolved mass spectral profiles (left column) and chromatographic profiles (right column) of the found components, labeled 1-4, of the analyzed window at the final bin level. Background components were found in the analysis, but are not shown here for clarity. Component 1 required combination of two components as described in the text.

CONCLUSIONS A novel method for resolving analyte signals in LC-HRMS data, while conserving the information from LC-HRMS data was developed. In the targeted amphetamine analysis all known amphetamine components in each specified window were recovered using MCR-ALS at every resolution level from unit mass to 0.001 amu, allowing for the facile quantitation of each compound. The utility of this procedure is clearly demonstrated by the finding that the final MCR-ALS step was completed with as little as 0.55% of the original high resolution data, with all relevant high resolution information being preserved for total analysis of amphetamine data. The application of this procedure to unknown, discovery type analyses was also demonstrated through the analysis of a bacterial lipid dataset. The samples analyzed here were chosen to demonstrate the feasibility of the proposed strategy; however, this general strategy can easily be applied to many types of analyses utilizing LC-HRMS. While a comprehensive comparison between the strategy outlined in the paper and the ROI approach described previously23, a brief comparison was performed in which prediction

C22H23O10 C21H29O2

*Calculated from LIPID MAPS structure search43

7

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(13)

errors were similar to those found in Table 2. In order to make a meaningful comparison, a more comprehensive study should be undertaken with varying sample conditions. It is our belief that the present strategy will be more useful in cases of signals with low signal-to-noise. This is due to the ROI approach requiring a threshold prior to MCR-ALS analysis, whereas the present strategy only requires a threshold that is relative to each resolved compound’s most intense spectral peak. The present strategy also includes fewer tunable parameters, which may allow for more robust operation despite requiring significant user interaction. It its current form, this strategy took up to several minutes per window for complete analysis, including loading data, with user-interaction throughout. While it was not a specific goal of this work, several steps of this work would lend themselves well to automation requiring minimal input by the user. Further optimization of the code may also provide significant reduction in analysis time. These further refinements will allow this strategy to be easily implemented by analysts with limited chemometrics training.

(14) (15)

(16) (17) (18) (19) (20)

(21) (22)

(23)

Acknowledgements

(24) (25)

The authors would like to thank the Lipidomics/Metabolomics Core Facility at Virginia Commonwealth University for the LCMS analysis of amphetamines used in this work. The authors acknowledge financial support from NSF CHE-1507332. DWC is supported by an Altria Graduate Student Fellowship.

(26) (27)

Author Information Corresponding Authors *Email: [email protected] The authors declare no competing financial interest.

(28) (29) (30) (31) (32)

Supporting Information Table S1. Amphetamine sample table; Table S2. Calibration and fit statistics for MCR-ALS.

(33) (34)

References

(35)

(1)

(36)

(2) (3) (4) (5) (6)

(7) (8)

(9) (10) (11) (12)

Wei, X.; Shi, X.; Kim, S.; Zhang, L.; Patrick, J. S.; Binkley, J.; McClain, C.; Zhang, X. Anal. Chem. 2012, 84 (18), 7963–7971. Tautenhahn, R.; Böttcher, C.; Neumann, S. BMC Bioinformatics 2008, 9, 504. Danielsson, R.; Bylund, D.; Markides, K. E. Anal. Chim. Acta 2002, 454 (2), 167–184. Cook, D. W.; Rutan, S. C. J. Chemom. 2014, 28 (9), 681–687. Smith, C. A.; Want, E. J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. Anal. Chem. 2006, 78 (3), 779–787. Tsugawa, H.; Cajka, T.; Kind, T.; Ma, Y.; Higgins, B.; Ikeda, K.; Kanazawa, M.; VanderGheynst, J.; Fiehn, O.; Arita, M. Nat. Methods 2015, 12 (6), 523–526. Tauler, R. Chemom. Intell. Lab. Syst. 1995, 30 (1), 133–146. Rutan, S. C.; de Juan, A.; Tauler, R. In Comprehensive Chemometrics; Brown, S. D., Tauler, R., Walczak, B., Eds.; Elsevier, 2009; Vol. 2, pp 249–259. de Juan, A.; Jaumot, J.; Tauler, R. Anal. Methods 2014, 6 (14), 4964. Peré-Trepat, E.; Hildebrandt, A.; Barceló, D.; Lacorte, S.; Tauler, R. Chemom. Intell. Lab. Syst. 2004, 74 (2), 293–303. Gargallo, R.; Tauler, R.; Cuesta-Sánchez, F.; Massart, D. L. TrAC - Trends Anal. Chem. 1996, 15 (7), 279–286. Bortolato, S. A.; Olivieri, A. C. Anal. Chim. Acta 2014, 842, 11–19.

(37)

(38) (39) (40) (41)

(42) (43)

Page 8 of 9

Peré-Trepat, E.; Tauler, R. J. Chromatogr. A 2006, 1131 (1–2), 85–96. Pérez, R. L.; Escandar, G. M. Anal. Chim. Acta 2014, 835, 19– 28. De Llanos, A. M.; De Zan, M. M.; Culzoni, M. J.; EspinosaMansilla, A.; Cañada-Cañada, F.; De La Peña, A. M.; Goicoechea, H. C. Anal. Bioanal. Chem. 2011, 399 (6), 2123– 2135. Bortolato, S. A.; Arancibia, J. A.; Escandar, G. M. Anal. Chem. 2009, 81 (19), 8074–8084. Dantas, C.; Tauler, R.; Ferreira, M. M. C. Anal. Bioanal. Chem. 2013, 405 (4), 1293–1302. Peré-Trepat, E.; Lacorte, S.; Tauler, R. J. Chromatogr. A 2005, 1096 (1–2), 111–122. Navarro-Reig, M.; Jaumot, J.; García-Reiriz, A.; Tauler, R. Anal. Bioanal. Chem. 2015. Gorrochategui, E.; Porte, C.; Lacorte, S.; Tauler, R.; Casas, J.; Porte, C.; Lacorte, S.; Tauler, R. Anal. Chim. Acta 2015, 854, 20–33. Peré-Trepat, E.; Lacorte, S.; Tauler, R. Anal. Chim. Acta 2007, 595 (1–2), 228–237. Sánchez Pérez, I.; Culzoni, M. J.; Siano, G. G.; Gil García, M. D.; Goicoechea, H. C.; Martínez Galera, M. Anal. Chem. 2009, 81 (20), 8335–8346. Tauler, R.; Gorrochategui, E.; Jaumot, J.; Tauler, R. Protoc. Exch. 2015, http://dx.doi.org/10.1038/protex.2015.102. Bedia, C.; Tauler, R.; Jaumot, J. J. Chemom. 2016, 1–14. Tauler, R.; Maeder, M.; de Juan, A. In Comprehensive Chemometrics; Brown, S. D., Tauler, R., Walczak, B., Eds.; Elsevier, 2009; Vol. 2, pp 473–505. Sánchez, F. C.; Massart, D. L. Anal. Chim. Acta 1994, 298 (3), 331–339. Cook, D. W.; Rutan, S. C.; Stoll, D. R.; Carr, P. W. Anal. Chim. Acta 2014, 859, 87–95. Allen, R.; Rutan, S. Anal. Chim. Acta 2012, 723, 7–17. Sánchez, F. C.; Toft, J.; van den Bogaert, B.; Massart, D. L.; Sanchez, F. Anal. Chem. 1996, 68 (1), 79–85. Hugelier, S.; Devos, O.; Ruckebusch, C. J. Chemom. 2015, 29, 448–456. Parastar, H.; Radović, J. R.; Bayona, J. M.; Tauler, R. Anal. Bioanal. Chem. 2013, 405 (19), 6235–6249. Neves, A. C. de O.; Tauler, R.; de Lima, K. M. G. Anal. Chim. Acta 2016, 937, 21–28. van Stokkum, I. H. M.; Mullen, K. M.; Mihaleva, V. V. Chemom. Intell. Lab. Syst. 2009, 95 (2), 150–163. Porter, S. E. G.; Stoll, D. R.; Rutan, S. C.; Carr, P. W.; Cohen, J. D. Anal. Chem. 2006, 78 (15), 5559–5569. Omar, J.; Olivares, M.; Amigo, J. M.; Etxebarria, N. Talanta 2014, 121, 273–280. Mas, S.; Tauler, R.; de Juan, A. J. Chromatogr. A 2011, 1218 (51), 9260–9268. Golshan, A.; Abdollahi, H.; Beyramysoltan, S.; Maeder, M.; Neymeyr, K.; Rajkó, R.; Sawall, M.; Tauler, R. Anal. Chim. Acta 2016. Otto, M. Chemometrics, 2nd ed.; Wiley-VCH, 2007. Bezemer, E.; Rutan, S. C. S. Chemom. Intell. Lab. Syst. 2006, 81 (1), 82–93. Eilers, P. H. C. Anal. Chem. 2003, 75 (14), 3631–3636. Hsu, S.-H.; Raglione, T.; Tomellini, S. A.; Floyd, T. R.; Sagliano, N.; Hartwick, R. A. J. Chromatogr. A 1986, 367, 293– 300. Jeong, L. N.; Sajulga, R.; Forte, S. G.; Stoll, D. R.; Rutan, S. C. J. Chromatogr. A 2016, 1457, 41–49. Sud, M.; Fahy, E.; Cotter, D.; Brown, A.; Dennis, E. A.; Glass, C. K.; Merrill, A. H.; Murphy, R. C.; Raetz, C. R. H.; Russell, D. W.; Subramaniam, S. Nucleic Acids Res. 2007, 35, D527– D532.

8

ACS Paragon Plus Environment

Page 9 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

For TOC graphic only

9

ACS Paragon Plus Environment