Global Spectral Deconvolution Based on Non-Negative Matrix

Jan 8, 2015 - E-mail: [email protected]. ... Post-modified non-negative matrix factorization for deconvoluting the gene expression profiles of ...
0 downloads 15 Views 746KB Size
Subscriber access provided by ERCIYES UNIV

Article

Global spectral deconvolution based on nonnegative matrix factorization in GC × GC–HRTOFMS Yasuyuki Zushi, Shunji Hashimoto, and Kiyoshi Tanabe Anal. Chem., Just Accepted Manuscript • Publication Date (Web): 08 Jan 2015 Downloaded from http://pubs.acs.org on January 9, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Global spectral deconvolution based on non-negative matrix factorization in GC × GC–HRTOFMS Yasuyuki Zushi*, Shunji Hashimoto and Kiyoshi Tanabe Center for Environmental Measurement and Analysis, National Institute for Environmental Studies, Tsukuba, Japan ABSTRACT: A global spectral deconvolution, based on non-negative matrix factorization (NMF) in comprehensive twodimensional gas chromatography high-resolution time-of-flight mass spectrometry, was developed. We evaluated the ability of various instrumental parameters and NMF settings to derive high-performance detection in non-target screening using a sediment sample. To evaluate the performance of the process, a NIST library search was used to identify the deconvoluted spectra. Differences of the instrumental scan rates (25 Hz and 50 Hz) in deconvolution were evaluated and results show that a high scan rate enhanced the number of compounds detected in the sediment sample. A higher mass resolution in the range of 1,000 to 10,000 and a higher m/z precision in the deconvolution were needed to obtain an accurate mass database. After removal of multiple duplicate hits, which occurred in batch processes of NIST library search on the deconvolution result, 62 unique assignable spectra with a match factor ≥ 900 were obtained in the deconvoluted chromatogram from the sediment sample, including 54 spectra that were refined by the deconvolution. This method will help to detect and build up well-resolved reference spectra from various complex mixtures, and will accelerate non-target screening.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

35 36 37 The identification and quantification of substances in com38 plex mixtures have been important issues in forensics, food 39 and flavor analysis, development and chemical management of 40 industrial and medical products, and environmental research 41 for many years. A recently developed, powerful analytical instrument, the comprehensive two-dimensional gas chroma- 42 tography high-resolution time-of-flight mass spectrometer 43 (GC × GC–HRTOFMS), has great potential to separate com- 44 plex mixtures into their two capillary columns with different 45 polarities and detect separated components with a high mass 46 resolution1. This advanced analytical technique is also able to 47 quantify each compound2, and to characterize chromatograph- 48 ic peaks3,4; therefore, this technology has the potential to be 49 used widely in various analytical fields. However, the instru- 50 ment detects all ions within a certain mass range in the matrix- 51 enriched samples during full scan mode analysis, resulting in 52 unresolved peaks from noise and co-eluted components, even 53 in the two-dimensional chromatogram5,6. Extracting specific 54 compounds from the total ion chromatogram based on target 55 accurate mass records allows us to achieve target screening, 56 even in unresolved chromatograms7-9. However, this process is 57 not similarly beneficial in non-target screening that tries to 58 identify unknown chromatographic peaks. This is because the 59 mass spectra that overlap because of noise or co-eluted com- 60 ponents inhibit the identification of the original compounds 61 from such disturbed data acquired in the non-target screening. 62 Furthermore, the situation will be more problematic for auto- 63 64 mating the data screening in samples of complex mixtures. 65 One way to solve this drawback in non-target screening is to 66 apply signal deconvolution methods. To date, several studies 67 have used deconvolution based on multivariate analysis for 68 GC–MS10, and LC-diode array detection11,12. Several peak 69 Introduction

model-based deconvolution methods have also been developed13,14. There are many types of deconvolution, such as intensity threshold cut-off signals, baseline correction for noise subtraction, and resolution of overlapping peaks. In this paper, deconvolution is defined as a method for resolving overlapping peaks that reconstructs several separate components from signals of mixed components, including noise. A tensor factorization method called parallel factor analysis (PARAFAC)15, which solves three-way arrays of acquired data (i.e., mass spectra for each data point of a chromatographic line with different retention in GC1) by alternating least squares, has been applied in GC × GC–MS to obtain resolved mass spectra16. One of the drawbacks of the tensor factorization is that it requires consistent variation in the focus component in different arrays. Retention time (RT) shifts of each peak in GC × GC, which are caused by correlation of the oven temperature control by a single oven, affect the spectral deconvolution results. This issue can be avoided by being very careful with the RT shift by PARAFAC217. Another drawback of the tensor factorization PARAFAC is that the outputs consist of a mix of positive and negative values as a result of a simple iterative calculation process. This requires an additional step, such as simple baseline correction subtraction, to avoid negative values16. Spectrum bias by subtraction is a concern for small peaks. There are several methods for general factorization (but not tensor factorization), such as non-negative matrix factorization (NMF)18 and independent component analyses (ICAs)19, that apply non-negative constraints in their iterative calculation processes (not all of ICA methods include the non-negative constraints). ICA was mostly developed in acoustics to separate sources of sound20,21. As ICA has developed, several functions including non-negative constraint have been added to ICA, resulting in the diffusion of several software applications or program codes to perform ICA, such as, among others, fast ICA algorithms, JADE algorithms, and

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62

extended Infomax algorithms22,23. NMF has been developed in 63 image analysis to extract common features in pictures, such as 64 eyes, nose, and ears in human faces18,24. NMF is built on the 65 intuitive concept that signals in the real world are basically the 66 addition of positive values. Therefore, it is becoming more 67 popular and feasible to realistically separate overlapping sig- 68 nals in various study fields25-27. The release of the NMF appli- 69 cation for the free statistical software R28 has also increased its 70 popularity. 71 To date, there have been various studies that apply NMF to 72 GC–MS data29,30. Several of them are to do with the develop- 73 ment of tensor factorization with non-negative constraints, so74 called three-dimensional non-negative matrix approxima30,31 tion . This has not been applied to GC × GC–MS data such 75 as the study applying PARAFAC15, but it has been applied to 76 several already-analyzed GC–MS datasets to construct three- 77 way arrays (MS spectra, RTs, analytical samples) 31. These 78 studies focused on just one or a few intense mixed signals to 79 introduce the deconvolution methods, therefore their perfor- 80 mance in various patterns of signal mixing is still unclear. 81 The automation of deconvolution so that it can be applied to 82 an entire chromatogram, namely global spectral deconvolution 83 method, also remains a challenge. The Automated Mass Spec- 84 tral Deconvolution and Identification System (AMDIS), which 85 was developed for deconvolution of GC–MS mass spectra, 86 identifies the shape of each peak in mixed signals by individu- 87 ally calculating the derivation of each recorded ion13. AMDIS 88 includes a function to select chromatogram peaks and then 89 automatically deconvolutes all the selected peaks (i.e., peak 90 model-based deconvolution). The peak selection method and 91 approach for calculating the derivation are not directly appli- 92 cable to two-dimensional peaks. Hoggard et al. challenged the 93 application of automated peak deconvolution in GC × GC–MS 94 using PARAFAC32. In their study, peak selection was not ap- 95 plied, but regular rectangular divisions of the entire chromato- 96 gram were used. The use of regular rectangles includes several 97 resolved peaks. Thus, the issue of needing a calculation pro- 98 cess to decide the appropriate number of factors (the number 99 of compounds in the region) remains in the application, and100 results in increased calculation cost. They also discussed that101 the RT shift needs to be modified. 102 The development of high-resolution mass spectrometry103 (HRMS) reinforces the detection power of analytes both in104 target and non-target analysis because of their high-resolution105 mass spectrum analysis. However, to date, the deconvolution106 method based on multivariate analysis for HRMS has not been107 studied. One of the reasons for this is that HRMS has been108 developed recently, therefore the HRMS peak deconvolution109 technique is not yet well known. Another reason is that the110 accurate mass obtained by HRMS will dramatically increase111 the size of the measured data and factorization of the data matrix. It leads to an unrealistic calculation burden for the decon-112 volution of HRMS data in older laptop or notebook computers.113 Therefore, there have been no studies on these methods for114 115 HRMS data. In this study, we developed a spectral deconvolution method116 based on NMF for two-dimensional chromatograms of accu-117 rate masses (i.e., GC × GC–HRTOFMS). We developed an118 automated process for the deconvolution of the entire chroma-119 togram (global spectral deconvolution) in free software and120 evaluated its performance in various instrumental mass resolu121 tions, ion scan rates, and also in various NMF calculation sce-

Page 2 of 11

narios. The research goal was to provide a practical solution for performing NMF-based spectral deconvolutions in real-life samples of complex mixtures by GC × GC–HRTOFMS, and demonstrate their ability to enhance the qualitative detection of non-target analytes. This procedure is expected to enhance the performance of non-target screening. In addition, it will be useful to store the accurate mass spectra from actual samples of complex mixtures, as it will enhance post-screening of the target compound in a later process. Methods Experimental data A 10 mL aliquot of extract in toluene from sea sediment (5 g dry weight), collected in a small Japanese harbor on the Pacific Ocean coast, was analyzed using a 7890GC (Agilent Technologies, Santa Clara, CA, USA) with a KT2006 GC × GC system (Zoex, Houston, TX, USA) coupled with a JMS– T100GCV 4G-N HRTOFMS (JEOL, Tokyo, Japan). The first GC column was an InertCap 5MS/Sil (45 m × 0.25 mm I.D., 0.1 µm film thickness, GL Sciences, Tokyo, Japan) and the second GC column was a SGE BPX–50 (1 m × 0.1 mm I.D., 0.1 µm film thickness, Sigma-Aldrich (St. Louis, MO, USA). The injection volume was 1 µL in splitless mode at 246 kPa (70 °C), with He as the carrier gas. The GC oven temperature was held at 70 °C for 1 min, ramped to 180 °C at a rate of 50 °C min-1, then ramped to 230 °C at 3 °C min-1, then ramped again to 300 °C at 5 °C min-1, and then was held at 300 °C for 16.133 min; the total time for this process was 50 min. The modulation period during the analysis was 4 s (releasing: 0.25 s) to split peaks in the first dimension of the GC. The ionization voltage for electron impact (EI) was set at 45 eV for the HRTOFMS detector. The ionization voltage generally does not change the fragment patterns from the typical setting of 70 eV33. The voltage of the microchannel plate detector was set at 2000 V. To evaluate the effect of changing the parameters on the spectral deconvolution, the sample was analyzed six times using a mass range between 35 and 600 m/z, different mass resolutions of 1,000, 5,000, and 10,000 by manual instrumental parameter tuning, and different scan rates of 25 Hz and 50 Hz. Measurement data of a matrix-rich sample of air-dried lake sediment CRM from Lake Ontario (WMS-01, Wellington Laboratories) were processed at a mass resolution of 5000 and a scan rate of 25 Hz to allow partial evaluation of the spectral deconvolution. The details of the analytical conditions are found elsewhere8. Theory of NMF The NMF estimates the basis matrix W (n × r non-negative matrices) and the coefficient matrix H (r × p non-negative matrices) for the original matrix Y (n × p non-negative matrix) so that the sum of the loss function D (Y, WH) and the regularization function R (W, H) are minimized28. For practical purposes, the factorization rank r is r