AQuA: An Automated Quantification Algorithm for High-Throughput

Dec 20, 2017 - As proof of concept, we analyzed 1342 human plasma samples using an established workflow for NMR-based metabolomics and then implemente...
0 downloads 11 Views 1MB Size
Subscriber access provided by READING UNIV

Article

AQuA – an automated quantification algorithm for high-throughput NMR-based metabolomics and its application in human plasma Hanna Eriksson Röhnisch, Jan Eriksson, Elisabeth Müllner, Peter Agback, Corine Sandström, and Ali A. Moazzami Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b04324 • Publication Date (Web): 20 Dec 2017 Downloaded from http://pubs.acs.org on December 22, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

AQuA – an automated quantification algorithm for highthroughput NMR-based metabolomics and its application in human plasma Hanna E. Röhnisch1*, Jan Eriksson1, Elisabeth Müllner1, Peter Agback1, Corine Sandström1, Ali A. Moazzami1* 1

Department of Molecular Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden

ABSTRACT: A key limiting step for high-throughput NMR-based metabolomics is the lack of rapid and accurate tools for absolute quantification of many metabolites. We developed, implemented, and evaluated an algorithm, AQuA (Automated Quantification Algorithm), for targeted metabolite quantification from complex 1H-NMR spectra. AQuA operates based on spectral data extracted from a library consisting of one standard calibration spectrum for each metabolite. It uses one pre-selected NMR signal per metabolite for determining absolute concentrations and does so by effectively accounting for interferences caused by other metabolites. AQuA was implemented and evaluated using experimental NMR spectra from human plasma. The accuracy of AQuA was tested and confirmed in comparison with a manual spectral fitting approach using the ChenomX software, in which 61 out of 67 metabolites quantified in 30 human plasma spectra showed a goodness-of-fit (r2) close to or exceeding 0.9 between the two approaches. In addition, three quality indicators generated by AQuA, namely occurrence, interference, and positional deviation, were studied. These quality indicators permit evaluation of the results each time the algorithm is operated. The efficiency was tested and confirmed by implementing AQuA for quantification of 67 metabolites in a large dataset comprising 1342 experimental spectra from human plasma, in which the whole computation took less than one second.

Recent advances in metabolomics have enabled investigation of human diseases at metabolite level, revealing mechanistic and diagnostic information.1 High-throughput metabolic phenotyping in epidemiological studies has identified several molecular species present in complex biofluids as being associated with disease risk.2-5 Metabolomics therefore provides a useful tool in systems biology research. Due to its quantitative nature, nuclear magnetic resonance (NMR) is considered one of the key analytical platforms for metabolomics.6 The 1H-NMR spectra from some biofluids, e.g., plasma, are characterized by a high degree of complexity because: 1) signals from many different metabolites are present and 2) signals from the different metabolites often have very similar chemical shift values, leading to interferences. These interferences make it difficult to accurately quantify metabolites based on their respective signals.7 Metabolic fingerprinting is an untargeted tool for signal pattern analysis which differs from absolute quantification of metabolites, often referred to as targeted metabolic profiling. In metabolic fingerprinting, spectral bucketing is often combined with multivariate statistics.8 This procedure is straight-forward, rapid, and thereby suitable for high-throughput application areas, but it does not permit absolute quantification, as signal interferences remain unaccounted for. Efforts in targeted metabolic profiling have focused on improving the accuracy of metabolite quantification. Here the quantitative analysis relies on deconvolution of experimentally observed signals.7 In order to

account for signal interference, the intensities (concentrations) and positions of signals in a metabolite library must be adjusted manually until the sum of these signals fits with the signals in the experimental spectrum. Although timeconsuming, manual spectral fitting can generate reliable metabolite concentration estimates suitable for subsequent statistical analysis.7 Algorithms for deconvolution and metabolite quantification have been introduced recently.9-14 These algorithms are specifically designed to avoid manual spectral fitting and instead attempt to automate the procedure. Although automated algorithms have been successfully used10,15, they still require considerable computational efforts, especially for spectral fitting of more complex 1H-NMR spectra. This is because all spectral data points (experimental signals) are incorporated in the computational procedure.10,11 As a result the quantification of metabolites may take long time and therefore the numbers of spectra and metabolites which can be processed are limited (e.g., 5 min needed for quantification of 50 metabolites in each spectrum or 13 min required for quantification of 24 metabolites in one spectrum).10,11 As metabolite quantification based on deconvolution (manual or automated) is a key bottleneck in high-throughput NMR-based metabolomics, there is still a need for more high-throughput procedures for metabolite quantification.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Workflow for quantitative 1H-NMR metabolomics with the time-consuming manual spectral fitting step replaced by AQuA.

This is especially important since improved methods for sample preparation and better instrumentation have resulted in higher quality of experimental spectra and increased numbers of identified metabolite signals.16 In this study, we developed an automated quantification algorithm (AQuA) that can account for signal interferences in complex experimental spectra typically observed in metabolomics studies. AQuA introduces a novel approach for reducing the amount of NMR spectral data used in the accurate computation of metabolite quantities by using one specific signal for the quantification of each metabolite. This approach reduces the extent of computation which is the key to the superior efficiency and speed of AQuA compared with e.g., the algorithms that use curve-fitting and incorporates all spectral data points (experimental signals).10,11 As a result AQuA is not limited by the number of spectra and metabolites to be processed. As proof of concept, we analysed 1342 human plasma samples using an established workflow for NMR-based metabolomics and then implemented and evaluated AQuA for the rapid quantification of 67 metabolites identified in human plasma by NMR. The robustness and accuracy of the algorithm were assessed by (1) comparing the results of quantification using AQuA with those obtained from a manual quantification procedure using the ChenomX software and by (2) examining a set of quality indicators that can be derived directly from AQuA.

EXPERIMENTAL SECTION We designed an automated quantification algorithm (AQuA) for high-throughput 1H-NMR-based metabolomics. AQuA was implemented by substituting the final step of an established workflow for manual metabolite quantification (Figure 1). Quantification with AQuA was implemented and evaluated for two sets of samples from human plasma. Sample preparation and collection of 1H-NMR data. Two sets of human plasma samples (sets A and B) were employed for the metabolomics experiment. Set A included 1342 plasma samples from fasting adult male volunteers who participated in a population-based health survey including blood sampling and storage of heparin plasma in a ‘biobank’.17 Set B were quality control samples: 81 aliquots of pooled human plasma samples collected in heparin tubes. All blood samples were taken in accordance with ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. Sample preparation and 1H-NMR analysis were performed as described previously18,19, with some minor modifications

that enabled use of smaller sample volumes. Ultrafiltration was used to remove plasma proteins in each sample. The filters (nanosep, 3 kDa cut-off, Pall Life Science, Port Washington, NY) were washed eight times with water (500 µl, 36oC, 2000 g, 8 min) prior to sample filtration (60 µl plasma, 4oC, 13000 g, 30-40 min) to remove the glycerol from the filter membrane. Each sample solution for 1H-NMR analysis was prepared by mixing 40 µl of sample filtrate with phosphate buffer (50 µl, 0.4 mol/l, pH 7.0), water (55 µL), D2O (15 µl), and the internal standard trimethyl-silyl-d4propionic acid (TSP, 10 µl, 5.8 mmol/l). Each sample solution (170 µl) was transferred to a 3 mm NMR tube and a 1H-NMR spectrum was acquired for each sample using a Bruker Avance III spectrometer. The spectrometer operated at 600 MHz proton frequency and was equipped with a cryogenically cooled probe and an autosampler. Each spectrum was recorded (25oC, 512 transients, 4 s relaxation delay) with a zgesgp pulse sequence (Bruker Biospin), which uses excitation sculpting with gradients for suppression of the water resonance. For each spectrum, 65,536 data points were collected over a spectral width of 17942 Hz. The spectral quality in each spectrum was assessed directly after acquisition, based on shape of the internal TSP signal after applying a line broadening factor of 0.3 Hz to ensure that full-width, halfmaximum (FWHM) was < 1.0 Hz. Spectral processing. Each experimental 1H-NMR spectrum collected was processed manually using NMR Suite Professional Software package (version 7.5; ChenomX Inc., Edmonton, Canada). Processing included phase correction, baseline correction, and line broadening using an exponential window function. The line broadening factor was tuned for each spectrum to generate an internal TSP signal FWHM value of 1.0 Hz. Linewidth adjustments were made to enable use of spectral signal heights in quantitative analyses.20 Identification of metabolites. Previously, Gowda et al.16 identified 67 metabolites in spectra from human serum using extensive 1D/2D NMR analyses, search in publicly available databases and spiking with authentic standards. Of these 67 metabolites, characteristic signals of 57 metabolites were positively identified in at least one of the spectra from set A. The identity for three of these metabolites with low concentration (2-hydroxybutyric acid, 2-oxoisocaproic acid, and 3-methyl-2-oxovaleric acid) was further confirmed by spiking with standards. Ten additional metabolites were identified in the present study based on previous publications and information from the human metabolome database (HMDB).16,21,22 Therefore, in total, 67 metabolites were chosen

ACS Paragon Plus Environment

Page 2 of 9

Page 3 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry for targeted quantification using AQuA. The identification of each metabolite was supported by signal pattern recognition analyses, wherein the spectra from set A were compared with a pure standard 1H-NMR spectrum of each metabolite from a commercial library (NMR Suite Professional Software package, version 7.5, ChenomX Inc., Edmonton, Canada) and the HMDB. AQuA - general model design and preparation for computations. AQuA was developed to provide a rapid yet accurate means of computing target metabolite quantities. A unique feature of the AQuA design is that the format of the NMR spectral data is reduced. AQuA operates using spectral data from two sources: 1) a library that consists of one calibration spectrum for each of the metabolites to be quantified and 2) the experimental spectra. Each calibration spectrum in the metabolite library and the experimental spectrum are first subjected to data reduction, as described below. For the automated quantification of m metabolites using AQuA, a library that contains m calibration spectra is employed, i.e., one calibration spectrum for each metabolite to be quantified. In each calibration spectrum, one signal is selected as the metabolite reporter signal, having a unique chemical shift value referred to as its target position. Each calibration spectrum is then normalized so that its signal height at its target position is 1. By considering only the signal heights at the target positions, each calibration spectrum is reduced to a calibration vector of length m, in which each element represents the signal height at one of the respective target positions. Hence as the result of data reduction, each normalized calibration spectrum is converted to a calibration vector in which one element is 1 (the reporter signal) and the other elements are the relative signal heights (> 0) observed at the remaining m-1 target positions. These normalized vectors  (Figure 2a). The m×m are organized as columns in a matrix   describes the interferences between the metabolites matrix  at the target positions when each reporter signal has been normalized to 1. The signals at the target positions are those that are examined when attempting to quantify each metabolite in an experimental spectrum. Therefore, the signal heights at all target positions are determined in all experimental spectra. The reduction of an experimental spectrum yields a vector , where each element describes the experimentally observed signal height at a target position (Figure 2b). AQuA − computations. Once an experimental spectrum has been reduced to a vector , it is possible to compute vector  derived from the calibration  by utilizing the matrix  spectra and solving Equation 1 (Figure 2c).       (1) The vector  is the model’s representation of the quantitative mixture of individual metabolite spectra that yields the experimental spectrum (Figure 2c). Each element in  is proportional to the concentration of one metabolite in the sample. Figure 2 shows an example of the components of the model when AQuA is prepared for quantifying m = 3 metabolites. The experimentally observed signal height at the target position for metabolite i (element yi) can be viewed as the sum of the signal height of the corresponding reporter signal (element xi) and the signal height contributions from all the interfering metabolites (yi − xi). For the purpose of evaluating the quality of the respective metabolite quantifications, we define the interference (∆i) as the relative

contribution to each element yi from interfering metabolite signals (Equation 2, Figure 2d).

∆i 





(2)

AQuA – requirements for implementation. The successful implementation of AQuA requires that the samples are prepared in a way that permits quantification. This includes removal of proteins, to avoid that their signals interfere with metabolite signals, and the addition of an internal standard of known concentration. Phase correction, baseline correction and adjustment of the linewidth of the internal standard signal to the same value in all spectra must be done to allow quantification using signal heights. Adjusting the linewidth effectively corrects for shimming irregularities.

Figure 2. Principles of data reduction of 1H-NMR spectra and the computational procedures used in AQuA, illustrated by an example where three hypothetical metabolites are quantified. a) Calibration spectra from a library for the three metabolites. One reporter signal is selected for each metabolite. The positions of the reporter signals are termed target positions (dashed lines). Each calibration spectrum is normalized so that its reporter signal height is 1 and is thereafter reduced to a vector containing only the signal intensities at the three target positions. Each reduced , calibration spectrum is organized as a column in a matrix,  where each column contains the intensity of 1 for the reporter  is signal and two intensities ≥ 0 for the other target positions.  employed as the model parameter for quantification of the metabolites in different experimental spectra. b) An experimental spectrum with the intensities at the respective target positions indicated by colored dots. The experimental spectrum is reduced to a vector containing only these intensities: = (y1, y2, y3). c) The     .  is the vector =(x1, x2, x3) is the result of solving:  quantitative mixture of individual metabolite spectra that yields the experimental spectrum . An element xi in  represents the contribution from metabolite i (colored squares) to the observed signal yi. Each element in  is proportional to the concentration of one metabolite in the sample. d) The interference, ∆i, is the relative contribution from interfering metabolites to the element yi. ∆i can be employed to indicate the quality of the metabolite quantification xi.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Implementation of AQuA for determining the concentrations of m different metabolites in a dataset including many experimental spectra (e.g., set A or B). First, a metabolite library including calibration spectra for all metabolites to be quantified is created. Each calibration spectrum is reduced to a vector containing normalized signal height elements at m target positions. The m vectors are compiled  , which describes the signal interferences between the metabolites. Using similar data reduction to generate the model parameter matrix,  strategies, each experimental spectrum is reduced to a vector,  , containing height elements at m target positions. By solving equation      separately for each experimental spectrum (n), the quantitative mixture of metabolites,  is determined for each sample. 

In addition to appropriate sample preparation and spectral processing the metabolite library must be optimized by: 1) including only the metabolites detected by NMR in a given bio specimen investigated, 2) matching the linewidths and positions of the library signals with the experimental spectra (for example in the present study superimposed spectra from set A were used as reference for positional adjustment of the library signals at target positions prior to generating matrix  ), and 3) selecting suitable reporter signals (Table S1). The  metabolite library used in AQuA can be created in different environments e.g., by utilizing spectral data in commercial, open access or in-house metabolite libraries. Note that AQuA is flexible and allows the metabolite library to be changed depending on the sensitivity of the NMR spectrometer used or the biospecimen studied. For example, different libraries would be created for plasma, cerebrospinal fluid etc. Therefore AQuA is not limited to the 67 metabolites used for human plasma in the present implementation. AQuA – implementation for human plasma. AQuA was implemented for targeted quantification of human plasma metabolites detected and identified in set A and B. AQuA implementation for large datasets that include many experimental spectra (such as sets A and B) is illustrated in  ), Figure 3. In order to generate the model parameter matrix ( a metabolite library was first created (for a detailed description, see Table S1, Table S2). The metabolite library contained calibration spectra of 67 human plasma metabolites. A total number of 67 reporter signals were selected and their corresponding target positions were determined (for how reporter signals were chosen, see Table S1, Figure S1). Each calibration spectrum in the metabolite library was reduced to a calibration vector with 67 signal height elements at 67 target positions (for a detailed description, see Table S1, Table S3). Each calibration vector was normalized to the height of its reporter signal and then these normalized vectors were  ). arranged as columns in a matrix ( In order to create a vector  for each spectrum, the signal height at all 67 target positions was also determined in each experimental spectrum. Spectral binning, using a narrow binsize, and automated peak-picking were employed for conversion of each experimental spectrum to a vector () (Table S1, Table S3). The in-house peak-picking routine searches for the peak within a small chemical shift window around a target position and assigns the height of the peak within the window. This height is referred to as the target signal height. Each respective target signal was used as an

entry for vector  and was assumed to have the same chemical shift value as the respective reporter signal. This is referred to as target position alignment. After creating the  vector for each experimental spectrum, the two datasets (spectra from sets A and B) were represented by 1342 and 81 positionally aligned column vectors, respectively. In order to quantify the 67 metabolites, repeated computations were performed using the same model  ) but a new set of input variables (n) to parameters ( generate a new set of model output variables (n) for each spectrum n (Figure 3). Each element xi in n was then converted to the sample concentration, in µM, of the  corresponding metabolite i in each spectrum n (, ) using a calibration factor for each metabolite (Table S1, Table S4). Final plasma concentrations were obtained after accounting for sample dilution. Implementation of the automated quantification algorithm was performed in MATLAB (Version R2012b, Math Works Inc.). The algorithm and the script used for these automated quantifications are presented in Table S5. Manual quantification procedure. Manual quantification was performed using ChenomX NMR Suite (version 7.5; ChenomX Inc., Edmonton, Canada). The manual quantification relied on the same reporter signal for each metabolite as AQuA (Figure S1). The pure standard 1H-NMR spectra of the 67 metabolites identified in the NMR spectra of human plasma were directly extracted from the 600 MHz commercial library and superimposed on experimental spectra (set A) using the Profiler module of the ChenomX NMR Suite. The position of each reporter signal was adjusted to be in close agreement with the experimentally observed target signals in the experimental spectra (Figure S1). The position of signals interfering with the reporter signals were also adjusted to be in close agreement with those from experimental spectra. After adjusting the position of reporter- and interfering signals, the superimposed calibration spectra were saved and referred to as the plasma profiler. For quantification of metabolites, the plasma profiler was superimposed on each experimental spectrum and the intensity of each reporter signal was adjusted so that the sum of intensities for each reporter and its interfering signals was equal to the corresponding experimental target signal. In order to account for the interference, which is important for accurate quantification, the intensity of each reporter signal was adjusted in a predetermined order (Table S4, Figure S1): For metabolite i, the intensity of the reporter signal was adjusted after adjusting the intensity of reporter signals from other

ACS Paragon Plus Environment

Page 4 of 9

Page 5 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry metabolites with interfering signals under the reporter signal from metabolite i. In this step-wise procedure, further positional adjustment for signals was performed when necessary. The manual procedure ultimately resulted in 67 deconvoluted spectra and the corresponding metabolite concentrations for each experimental spectrum. The manual quantification procedure has been used previously in human studies, in which it generated comparable experimental outcomes to targeted mass spectrometry-based metabolomics.22,23

Figure 4. Mean relative deviation (MRD, red) and goodness-of-fit (r2, blue) derived from linear regression analysis for each of the 67 metabolites that were quantified manually and automatically in 30 experimental spectra from set A. The metabolites are arranged (1-67) along the horizontal axis based on increasing r2-value. Sample concentrations, MRDs and r2 values are listed in Table S6. For most metabolites, MRD within ± 0.05 and r2 > 0.9, which shows that the rapid analysis with AQuA yields quantification results in agreement with those of the time-consuming manual integration procedure.

RESULTS AND DISCUSSION Accuracy. Evaluation of the performance accuracy of an automated quantification algorithm can be carried out in different ways. One approach is to closely examine the metabolites with respect to different quality indicators derived directly from the algorithm (see below). Another option is to compare the results from the algorithm with results obtained from a manual quantification procedure using an established software package (ChenomX) that is based on step-wise fitting of the contributions from individual metabolite signals to the signals in each experimental spectrum. Here, we analyzed a subset of 30 randomly selected experimental spectra from set A with both the automated and the manual procedures and   derived the sample concentration values , and , for each respective metabolite i in each spectrum n. The respective values of the mean relative deviation, MRDi (Equation 3), are displayed in Figure 4.  







∙

*

#$%& '#"$#( ! ," ! ," '#"$#( ! ,"

)

the steps in algorithm implementation, including establishment of the model input variables ( vectors, Figure S2), model  matrix, Figure S3), and calibration factors parameters ( (Table S4), are accurate. Efficiency. Manual quantification procedures are slow and are therefore not suitable for high-throughput analyses. For example, the spectral fitting of 67 metabolites in a single spectrum takes at least 30 minutes with the procedure described above. Automated quantification using BATMAN requires about 13 minutes for quantification of 24 metabolites in one spectrum.10 Automated quantification using BAYESIL typically requires about 5 minutes for quantifying approximately 50 metabolites in one experimental spectrum.11 The superior efficiency of AQuA, compared to existing manual and automated alternatives, was demonstrated by applying the algorithm for quantitative analysis of all experimental spectra from set A (1342 spectra). These quantifications, including data reduction and model computations, were performed on a standard personal computer in less than 1 second – i.e., > 105 times faster than automated alternatives.10,11 The high efficiency of AQuA is due to: 1) the minor computational efforts required for reducing the data, i.e., the spectral binning and peak picking that yield n vectors, and 2) the minor computational efforts required in the process of repeatedly solving Equation 1 to  matrix. obtain the n vectors using a fixed  Quality indicators. Although the accuracy of an automated quantification algorithm can be estimated by comparison with the manual quantification procedure, this approach has limitations. Since the manual procedure is time-consuming, for practical reasons the comparison would typically be made for only a subset of the experimental spectra. However, it is possible to utilize the entire experimental dataset and examine the quality of each quantification result via the quantities inherently generated by the algorithm (e.g., yi and xi). The following paragraphs describe how to employ an entire dataset to derive and interpret three different indicators of quality for each respective metabolite, namely the occurrence, the degree of interference and the positional deviation. To clarify their usefulness, each quality indicator was compared with the accuracy parameters shown in Figure 4.

(3)

The goodness-of-fit (r2) values from linear regression analysis   of , and , in all samples are also shown in Figure 4 and Table S6. Most r2 values exceeded 0.9 (61 out of 67 metabolites) (Figure 4, blue dashed line) and most of the MRDs were within ± 0.05 (Figure 4, red dashed line). The few metabolites that displayed higher deviations (Table S6) also had low target signal intensities in a large fraction of the spectra (Figure S2). As the results in Figure 4 show, the rapid automated algorithm yields quantification results in excellent agreement with those of the time-consuming manual integration procedure. Therefore it can be concluded that all

Figure 5. Top: Occurrence level (bar color) for each metabolite in set A together with the corresponding CV values (bar height). Color code key: occurrence ≥ 95% (blue), 50% ≤ occurrence 3 × noise level. We computed the occurrence for each metabolite i in set A and in the subset of 30 spectra employed for the comparison with manual quantification (see Table S7). The distribution of occurrence values for different metabolites in set A and the distribution of occurrence observed for the subset of 30 spectra showed high similarity (Table S7). The metabolites that had low occurrence values were those that yielded poor r2 values and high MRDs in the comparison with the manual quantification procedure (Figure 4). Hence, an investigation of the occurrence resulting from applying the AQuA algorithm to a whole experimental dataset (set A) provides similar information as performing a comparison between the algorithm and a manual quantification procedure for a subset of the data. Quality control samples (from pooled plasma) can be used in order to derive the respective analytical coefficient of variation (CV) for the concentration of metabolite i. If the analytical variation is below a pre-set threshold, a metabolite can be robustly quantified. We assume further that it can be useful to compare the analytical CV with the total CV (sum of biological and analytical variations) of each metabolite in an experimental dataset. If the variation in the experimental dataset is much larger than the analytical variation, a metabolite can be robustly quantified. Figure 5 presents CV results from such an analysis of the quality control data (set B) and the experimental dataset (set A), together with different levels of occurrence of all metabolites in set A. The metabolites that displayed a low analytical CV in set B also displayed the highest level of occurrence (100%) in set A (Figure 5). Most of those metabolites also displayed a high total CV in set A, and would therefore provide robust quantitative information about the metabolite concentration.

Therefore, similar information regarding which metabolites can be robustly quantified can be obtained from evaluation of the occurrence or the analytical variation in quality control samples. Hence, the occurrence of a given metabolite is a good quality indicator. However, the analysis of CV values from quality control samples can be of interest for detecting quantification problems due to e.g., sample preservation, such as those previously described for alcohols (Figure 5).21 For example, residual amounts of glycerol from filter membranes and the inherent volatility of other alcohols (e.g., methanol) result in large analytical variation. Interference. The target signals can be influenced by interferences from other metabolite signals. In AQuA, each target signal is initially selected in order to minimize interferences, but interferences are still encountered for many target signals. The interferences are accounted for by the  (Table S8, Figure S3), which algorithm via the matrix  describes the relative contribution from the different metabolite signals to each target signal, when the reporter signal heights are normalized to one.

Figure 7. a) F0.05 values for all spectra from set A (red) and a subset of spectra selected from set A for manual quantification (n = 30, grey) for 57 metabolites. Each F0.05 value reveals the fraction of spectra in a dataset where the interference of a given metabolite i exceeds 0.05. b) F0.5 values for all spectra from set A (blue) and a subset of spectra selected from set A for manual quantification (n = 30, grey) for 57 metabolites. Each F0.5 value gives the fraction of spectra where the interference for a given

ACS Paragon Plus Environment

Page 6 of 9

Page 7 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry metabolite i exceeds 0.5. Fq values for 57 metabolites are presented in Table S9.

Figure 8. a) Median-centered positional distribution of histidine signals in 1342 experimental spectra from set A. The computed positional deviation for histidine is indicated by dashed lines. b) Experimentally observed signals around the target position of histidine in different spectra from set A. The signal maximum in each respective spectrum (determined by peak-picking) is indicated by a dot. c) Alignment of the histidine signals to the target position.

The respective contributions from interfering metabolites to the height of a target signal vary among experimental spectra, depending on the concentration of metabolites involved. The interference for metabolite i can be derived by Equation 2 for each experimental spectrum. As an example, Figure 6a shows the target signal of creatine for a selection of spectra from set A, i.e., some with low intensity (blue) and some with high intensity (red). Figure 6b shows the corresponding interference values (black) for the target signal in Figure 6a. As can be seen, the interferences are higher for low-intensity target signals (blue) than for high-intensity target signals (red). Therefore for an entire dataset, there is a distribution of target signal heights and hence there is also a distribution of interferences. The distribution of interferences for creatine in spectra from set A is shown in Figure 6c. The fraction of spectra with interferences close to the extremes (< 0.05 and > 0.5) is small for creatine (Figure 6c). The quantification of a given metabolite i might be hampered if the interference ((yi –xi)/yi) is large compared with the relative contribution from the reporter signal (xi/yi) in a large fraction of the spectra. Hence, we tested whether the analysis of interferences in a dataset can provide information about the quality of the quantification that can be expected for each respective metabolite. We define Fq as the fraction of spectra in a dataset where the interference for a given metabolite exceeds a pre-set value q. We computed Fq for q = 0.05 and q = 0.5 for 57 different metabolites (Figure 7, Table S9). The Fq values were not computed for the 10 metabolites with occurrence < 5% in set A. The calculations were performed for all spectra from set A and the subset of spectra (n = 30) chosen from set A for the manual quantification. For several metabolites, F0.05 was close to zero, i.e., the interference did not exceed 0.05 in any spectrum (Figure 7). For these metabolites, the target signals displayed a low degree of interference. However, F0.5 was larger than 0.5 only for a few metabolites (acetylglycine, betaine, succinic acid, and trimethylamine-N-oxide), meaning that the contribution of interference was larger than the contribution of the reporter signal in the target signal intensity for more than 50% of spectra. These four metabolites displayed target signals with a high degree of interference and also yielded relatively poor r2 values in the comparison with the manual quantification procedure (Figure 4). Hence, these investigations show that the interference may cause problems for the quantification, but

only for metabolites with large F0.5. Importantly, most metabolites that displayed F0.05 > 0 but F0.5 < 0.5 (Figure 7), i.e., target signals with an intermediate degree of interference, also yielded good r2 values (> 0.99) in the comparison with the manual quantification (Figure 4). Hence, it can be concluded from the investigations above that the automated algorithm yields results of high quantitative accuracy, despite the broad range of interferences, and that the respective values of F0.05 and F0.5 are good quality indicators for each metabolite. Positional deviation. For each target signal there might be a difference between its actual position in the experimental spectra and the target position pre-determined by AQuA. For example, signal positions from some metabolites may vary between spectra due to differences in ionic strength or pH. For an entire dataset, the positional differences of the target signals for a metabolite i can be described by a median-centered positional distribution. We define the positional deviation for a metabolite i as the value that accounts for 95% of its positional distribution. We computed the positional deviation for 57 metabolites with occurrence ≥ 5% in set A (see Table S10) and found that a vast majority of these metabolites had very small positional deviation (± 1 bin, i.e., ± 0.0002 ppm). Hence, the investigation of positional deviation showed that it was not relevant for human plasma (set A, filtered, prepared and analyzed during more than two months), presumably since deproteinized and buffered plasma samples are subjected to negligible ionic strength or pH variations. When using AQuA, a large positional deviation could be an issue if it occurs in combination with a high degree of interference (a large F0.5 value). However, none of the metabolites displayed this combination in set A. For example, the target signal for histidine showed an unusually large positional deviation between spectra (Figure 8a, Table S10), but was still quantified accurately (r2 > 0.99) due to its low degree of interference (Table S9). The successful handling by AQuA of the positional variation for histidine (Figure 8a) is due to the combined use of the peak-picking routine, that searches for the target signal within a given chemical shift window (Figure 8b), and the alignment of the respective target signal with its target position (Figure 8c). Although the positional deviation does not appear to be an important quality indicator for quantification of metabolites in human plasma, the positional deviation should still be

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

investigated if AQuA is applied to experimental spectra from other biospecimens.

The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

CONCLUSIONS

Notes

We designed, implemented, and evaluated an automated quantification algorithm (AQuA) for calculating the concentration of metabolites using NMR spectra. AQuA is solely a tool for targeted quantification using NMR spectra and therefore other necessary steps for quantitative NMRbased metabolomics, e.g. adding an internal standard, spectral processing and metabolite identification, must be performed prior to implementation of AQuA. AQuA yielded quantification results that were in excellent agreement with a manual quantification procedure that utilizes the established ChenomX package, but with much higher efficiency. In AQuA the spectral data are reduced and include only the signal height of one carefully selected target signal for each metabolite and the interferences from other metabolites are accounted for in the quantification via a library based model. The data reduction leads to fast computations and the speed of AQuA quantifications is at least 105 times higher than for algorithms that quantifies via spectral curve-fitting. Different quality indicators, i.e., occurrence, interference, and positional deviation, were defined and used as tools to evaluate the reliability of AQuA. These indicators are easily derived from AQuA, permitting evaluation of the algorithm each time it is operated. Accurate quantification results can be expected for all metabolites except for those that display low occurrence, extreme interference values (i.e., F0.5 > 0.5) or a combination of interference and large positional deviation. Application of AQuA to a large dataset of NMR spectra from human plasma generated useful information regarding analytical and biological variations, limit of detection, and spectral characteristics (interference and position deviation of signals). This demonstrates how AQuA can facilitate the use of NMR for metabolomics analysis of human plasma. AQuA is flexible and can be easily modified for use with spectra from other NMR instruments or from other biospecimens after identification of NMR signals and using the same strategy for implementation as described here. The unique combination of speed, robustness, and flexibility that characterizes AQuA removes a bottleneck that currently hampers the throughput in large-scale NMR-based metabolomics.

The authors declare no competing financial interest.

ASSOCIATED CONTENT Supporting Information Explanation of terms used in AQuA; Metabolite library; Selected reporter signals; Chemical shift windows used in the peak-picking routine; Calibration factors and target positions for the reporter signals; The AQuA algorithm; Sample concentrations, r2 and MRD values for comparison between manual and automated quantifications; Region around the target signals and peak-picking  used in the AQuA computations; Results of results; The matrix  applying AQuA in different sets of spectra; Interferences accounted for by AQuA; Fq values for q = 0.05 and q = 0.50; Positional deviation for the target signals (PDF). This material is available free of charge via the Internet at http://pubs.acs.org.

AUTHOR INFORMATION Corresponding Authors * E-mail: [email protected]. * E-mail: [email protected]. Phone: +4618672048

ACKNOWLEDGMENT This work was supported by grants from FORMAS (222-20141341), Science Life Lab and SLU strategic funding for infrastructure.

REFERENCES (1) Medina, S.; Dominguez-Perles, R.; Gil, J. I.; Ferreres, F.; GilIzquierdo, A. Curr Med Chem 2014, 21, 823-848. (2) Floegel, A.; Stefan, N.; Yu, Z.; Muhlenbruch, K.; Drogan, D.; Joost, H. G.; Fritsche, A.; Haring, H. U.; Hrabe de Angelis, M.; Peters, A.; Roden, M.; Prehn, C.; Wang-Sattler, R.; Illig, T.; Schulze, M. B.; Adamski, J.; Boeing, H.; Pischon, T. Diabetes 2013, 62, 639648. (3) Ganna, A.; Salihovic, S.; Sundstrom, J.; Broeckling, C. D.; Hedman, A. K.; Magnusson, P. K.; Pedersen, N. L.; Larsson, A.; Siegbahn, A.; Zilmer, M.; Prenni, J.; Arnlov, J.; Lind, L.; Fall, T.; Ingelsson, E. PLoS Genet 2014, 10, e1004801. (4) Vouk, K.; Hevir, N.; Ribič-Pucelj, M.; Haarpaintner, G.; Scherb, H.; Osredkar, J.; Möller, G.; Prehn, C.; Rižner, T. L.; Adamski, J. Human Reproduction 2012, 27, 2955-2965. (5) Wang, T. J.; Larson, M. G.; Vasan, R. S.; Cheng, S.; Rhee, E. P.; McCabe, E.; Lewis, G. D.; Fox, C. S.; Jacques, P. F.; Fernandez, C.; O'Donnell, C. J.; Carr, S. A.; Mootha, V. K.; Florez, J. C.; Souza, A.; Melander, O.; Clish, C. B.; Gerszten, R. E. Nat Med 2011, 17, 448-453. (6) Lenz, E. M.; Wilson, I. D. J Proteome Res 2007, 6, 443-458. (7) Weljie, A. M.; Newton, J.; Mercier, P.; Carlson, E.; Slupsky, C. M. Anal Chem 2006, 78, 4430-4442. (8) Worley, B.; Powers, R. ACS Chem Biol 2014, 9, 1138-1144. (9) Zheng, C.; Zhang, S.; Ragg, S.; Raftery, D.; Vitek, O. Bioinformatics (Oxford, England) 2011, 27, 1637-1644. (10) Hao, J.; Liebeke, M.; Astle, W.; De Iorio, M.; Bundy, J. G.; Ebbels, T. M. Nat Protoc 2014, 9, 1416-1427. (11) Ravanbakhsh, S.; Liu, P.; Bjorndahl, T. C.; Mandal, R.; Grant, J. R.; Wilson, M.; Eisner, R.; Sinelnikov, I.; Hu, X.; Luchinat, C.; Greiner, R.; Wishart, D. S. PLoS One 2015, 10, e0124219. (12) Alm, E.; Slagbrand, T.; Aberg, K. M.; Wahlstrom, E.; Gustafsson, I.; Lindberg, J. Anal Bioanal Chem 2012, 403, 443-455. (13) Schleif, F. M.; Riemer, T.; Borner, U.; Schnapka-Hille, L.; Cross, M. Bioinformatics (Oxford, England) 2011, 27, 524-533. (14) Tulpan, D.; Leger, S.; Belliveau, L.; Culf, A.; CuperlovicCulf, M. BMC bioinformatics 2011, 12, 400. (15) Behrends, V.; Bell, T. J.; Liebeke, M.; Cordes-Blauert, A.; Ashraf, S. N.; Nair, C.; Zlosnik, J. E. A.; Williams, H. D.; Bundy, J. G. Journal of Biological Chemistry 2013, 288, 15098-15109. (16) Nagana Gowda, G. A.; Gowda, Y. N.; Raftery, D. Anal Chem 2015, 87, 706-715. (17) Norberg, M.; Wall, S.; Boman, K.; Weinehall, L. Glob Health Action 2010, 3. (18) Tiziani, S.; Emwas, A. H.; Lodi, A.; Ludwig, C.; Bunce, C. M.; Viant, M. R.; Gunther, U. L. Anal Biochem 2008, 377, 16-23. (19) Hwang, T. L.; Shaka, A. J. Journal of Magnetic Resonance, Series A 1995, 112, 275-279. (20) Hays, P. A.; Thompson, R. A. Magn Reson Chem 2009, 47, 819-824. (21) Psychogios, N.; Hau, D. D.; Peng, J.; Guo, A. C.; Mandal, R.; Bouatra, S.; Sinelnikov, I.; Krishnamurthy, R.; Eisner, R.; Gautam, B.; Young, N.; Xia, J.; Knox, C.; Dong, E.; Huang, P.; Hollander, Z.; Pedersen, T. L.; Smith, S. R.; Bamforth, F.; Greiner, R., et al. PLoS One 2011, 6, e16957. (22) Shrestha, A.; Mullner, E.; Poutanen, K.; Mykkanen, H.; Moazzami, A. A. European journal of nutrition 2017, 56, 671-681. (23) Moazzami, A. A.; Shrestha, A.; Morrison, D. A.; Poutanen, K.; Mykkanen, H. J Nutr 2014, 144, 807-814.

Author Contributions

ACS Paragon Plus Environment

Page 8 of 9

Page 9 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

“For Table of Content Only”

ACS Paragon Plus Environment