Simultaneous Single-Channel Multiplexing and Quantification of

Jan 9, 2019 - This is enabled through changing the fundamental approach to current data analytic techniques for the quantification of nucleic acids fr...
1 downloads 0 Views 954KB Size
Subscriber access provided by United Arab Emirates University | Libraries Deanship

Article

Simultaneous Single-Channel Multiplexing and Quantification of Carbapenem-Resistant Genes using Multidimensional Standard Curves Jesus Rodriguez-Manzano, Ahmad Moniri, Kenny Malpartida-Cardenas, Jyothsna Dronavalli, Frances Davies, Alison Holmes, and Pantelis Georgiou Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b04412 • Publication Date (Web): 09 Jan 2019 Downloaded from http://pubs.acs.org on January 11, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Simultaneous Single-Channel Multiplexing and Quantification of Carbapenem-Resistant Genes using Multidimensional Standard Curves Jesus Rodriguez-Manzano§‡*, Ahmad Moniri§‡, Kenny Malpartida-Cardenas§, Jyothsna Dronavalliͳ, Frances Daviesͳ, Alison Holmes║, and Pantelis Georgiou§ §Centre

for Bio-Inspired Technology, Department of Electrical and Electronic Engineering, Imperial College London, London, SW7 2AZ, United Kingdom ͳImperial College Healthcare NHS Trust, St. Mary's Hospital, Praed Street, London, W2 1NY, United Kingdom ║National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College London, Hammersmith Campus, W12 0NN, London, United Kingdom ABSTRACT: Multiplexing and quantification of nucleic acids, both have, in their own right, significant and extensive use in biomedical related fields. Currently, the ability to detect several nucleic acid targets in a single-reaction scales linearly with the number of targets; an expensive and time-consuming feat. Here, we propose a new methodology based on multidimensional standard curves that extends the use of real-time PCR data obtained by common qPCR instruments. By applying this novel methodology, we achieve simultaneous single-channel multiplexing and enhanced quantification of multiple targets using only real-time amplification data. This is obtained without the need of fluorescent probes, agarose gels, melting curves or sequencing analysis. Given the importance and demand for tackling challenges in antimicrobial resistance, the proposed method is applied to four of the most prominent carbapenem-resistant genes: blaOXA-48, blaNDM, blaVIM and blaKPC, which account for 97% of the UK's reported carbapenemase-producing Enterobacteriaceae.

This work demonstrates simultaneous multiplex qPCR and absolute quantification using standard curves, employing only a single fluorescence channel without post-PCR analysis; such that it can be used with conventional qPCR instruments. This is achieved by extending the quantification framework proposed by Moniri et al.,1 to establish that multidimensional standard curves (MSCs) can also be used for target identification. The proposed method is validated for the detection of the βlactamase genes blaOXA-48, blaNDM, blaVIM and blaKPC using bacterial isolates from clinical samples in a single-reaction; without fluorescent probes, agarose gels, melting curves or sequencing analysis. Table 1 summarises the breakdown of confirmed carbapenemase-producing Enterobacteriaceae (CPE) cases in the UK from 2003 to 2015. The chosen drug resistant genes in this study cover over 97% of the total reported cases. Diagnostic instruments that incorporate our methodology will greatly expand the applicability of emerging molecular technologies2,3. Table 1. Laboratory confirmed cases of carbapenemaseproducing Enterobacteriaceae from UK laboratories (20032015). Carbapenemases

blaOXA-48

blaNDM

blaVIM

blaKPC

Others

Total Cases

1325

1129

491

3260

189

Percentage (%)

20.73

17.67

7.68

51.01

2.96

Data obtained from the Public Health England’s Antimicrobial resistance and healthcare associated infections reference unit.4

Invasive infections with carbapenemase-producing strains are associated with high mortality rates (up to 40 - 50%) and represent a major public health concern worldwide 5,6. Rapid and accurate screening for carriage of CPE is essential for successful infection prevention and control strategies as well as bed management7,8. However, routine laboratory detection of CPE based on carbapenem susceptibility is challenging9: (i) culture-based methods have limited sensitivity and long turnaround time; (ii) nucleic acid amplification techniques (NAATs) are often too expensive and require sophisticated equipment to be used as a screening tool in healthcare systems; and (iii) multiplexed NAATs have not been able to meet the demand for high-level multiplexing using available technologies. There is an unmet clinical need for new molecular tools that can be successfully adopted within existing healthcare settings. The proposed method allows existing technologies to benefit from the advantages of multiplex PCR assays whilst reducing the complexity of CPE screening; resulting in a time and costeffective solution. This is enabled through changing the fundamental approach to current data analytic techniques for the quantification of nucleic acids from unidimensional to multidimensional. Figure 1 compares both (a) the conventional approach versus (b) the proposed method for single-channel multiplexing. In the first, conventional standard curves are constructed but multiple targets cannot be differentiated and quantified without post-PCR processing. In contrast, the proposed method constructs multidimensional standard curves, extracting more information from the amplification curves,

1 ACS Paragon Plus Environment

Analytical Chemistry allowing for simultaneous quantification and multiplexing in a single channel. This work uses CPE as a clinically relevant case study, however, the authors invite researchers to explore other

targets and amplification chemistries in order to expand the capabilities of current state-of-the art technologies.

-dF/dT

Sequencing

Target identification & Quantification

[target] CONVENTIONAL STANDARD CURVES

Feature extraction

POST-PCR PROCESSING

(b) MULTIDIMENSIONAL ANALYSIS

α

Std 1 Std 2 Std 3

Cluster 2

y

α β γ

Cluster 1 Cluster 3

x Unknown sample

β

M0

Cycles

Melting curves

0.2 0.1

Temperature (75°C – 95°C)

Unknown sample?

[D N

Multiplex qPCR: Target 1, Target 2 and Target 3

0.4 0.3

0

ta rg et ]

REAL-TIME NUCLEIC ACID AMPLIFICATION REACTION

Agarose gels

Std 1 Std 2 Std 3

A

UNKNOWN DNA SAMPLE ( )

α, β, or γ

(a) UNIDIMENSIONAL ANALYSIS

Fluorescence

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 8

Target identification & Enhanced quantification

γ MULTIDIMENSIONAL STANDARD CURVES

[target]

Figure 1. Illustration of experimental workflow for single-channel multiplex quantitative PCR using unidimensional and multidimensional approach. An unknown DNA sample is amplified by multiplex qPCR for targets 1, 2 and 3. Features denoted using dummy variables 𝛼, 𝛽 and 𝛾 are extracted from the amplification curve. It is important to stress that any number of targets and features could have been chosen. (a) Unidimensional analysis. Three conventional standard curves are generated through serial dilution of the known targets using a single feature. Given it is not possible to identify the target based on these standard curves, post-PCR analysis is required for target identification and quantification. (b) Multidimensional analysis. Three multidimensional standard curves are constructed through serial dilution of the known targets using multiple features. The unknown samples can be confidently classified and enhanced quantification can be achieved by combining all the features into a unified feature called 𝑀0.1

EXPERIMENTAL SECTION Primers and amplification reaction conditions. All oligonucleotides used in this study were synthesised by IDT (Integrated DNA Technologies, Germany) with no additional purification. Primers were previously reported by Monteiro et al.10 (see Table 2). Each amplification reaction was performed in 5 µL of final volume with: 2.5 µL FastStart Essential DNA Green Master 2 × concentrated (Roche Diagnostics, Germany), 1 µL PCR Grade water, 0.5 µL of 10 × multiplex PCR primer mixture containing the four primer sets (5 µM of each primer) and 1 µL of different concentrations of synthetic DNA or bacterial genomic DNA. PCR amplifications consisted of 10 min at 95 ºC followed by 45 cycles at 95 ºC for 20 sec, 68 ºC for 45 sec and 72 ºC for 30 sec. In order to validate the proposed method, the results were compared against melting curve analyses. One melting cycle was performed at 95 ºC for 10 sec, 65 ºC for 60 sec and 97 ºC for 1 sec (continuous reading from 65 ºC to 97 ºC). Each experimental condition was run 5 to 8 times loading the reactions into Light Cycler 480 Multiwell Plates 96 (Roche Diagnostics, Germany) using a Light Cycler 96 RealTime PCR System (Roche Diagnostics, Germany). Appropriate negative and positive controls were included in each experiment.

Synthetic DNA and bacterial isolates. Four synthetic doublestranded DNA (gBlock Gene fragments) were purchased from IDT and re-suspended in TE buffer to 10 ng/µL stock solutions (stored at -20 ºC until further use). The concentrations of all DNA stock solutions were determined using a Qubit 3.0 fluorimeter (Life Technologies). The synthetic templates contained the DNA sequence from: blaOXA-48, blaNDM, blaVIM

and blaKPC genes required for the multiplex qPCR assay (Table S1). Standard curves were constructed utilising different DNA concentrations as follows: blaOXA-48 (108 ― 104 7 1 copies/reaction), blaNDM (10 ― 10 copies/reaction), blaVIM ( 108 ― 103 copies/reaction) and blaKPC (108 ― 103 copies/reaction). Pure bacterial cultures from clinical isolates were used in this study, as described in Table 3. One loop of colonies from each pure culture was suspended in 50 µL digestion buffer at pH 8.0 (Tris-HCl 10 mmol/L, EDTA 1 mmol/L and 5 U/µL lysozyme) and incubated at 37 ºC for 30 min in a dry bath. Subsequently, 0.75 µL proteinase K at 20 µg/ µL (Sigma) was added, and the solution was incubated at 56 ºC for 30 min. Afterwards, the solution was boiled for 10 min to inactivate proteinase K, the samples were centrifuged at 10,000 × g for 5 min and the supernatant was transferred in a new tube and stored at -80 ºC before use. Sample 9 was generated by mixing sample 6 & 8 at equal proportions. NonCPE producers Klebsiella pneumoniae and Escherichia coli were included as control strains. Table 2. Primers used for multiplexing qPCR assay Target

Primer

Sequence

Size

blaOXA-48

F

TGTTTTTGGTGGCATCGAT

177

R

GTAAMRATGCTTGGTTCGC

F

TTGGCCTTGCTGTCCTTG

R

ACACCAGTGACAATATCACCG

blaNDM

82

2 ACS Paragon Plus Environment

Page 3 of 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry blaVIM blaKPC

F

GTTTGGTCGCATATCGCAAC

R

AATGCGCAGCACCAGGATAG

F

TCGCTAAACTCGAACAGG

R

TTACTGCCCGTTGACGCCCAATCC

382 785

Primer sequences were previously reported by Monteiro et al.10 Sequences are given in the 5’ to 3’ direction. Size is given in base pairs and denotes PCR amplification products.

Table 3. Bacterial isolates used in this study Sample ID

Bacterial Isolate

Carbapenemase genes

1

Klebsiella pneumoniae

blaOXA-48

2

Escherichia coli

blaOXA-48

3

Citrobacter Freundii

blaVIM

4

Escherichia coli

blaNDM

5

Klebsiella pneumoniae

blaOXA-48

6

Klebsiella pneumoniae

blaNDM

7

Pseudomonas aeruginosa

blaVIM

8

Klebsiella pneumoniae

blaKPC

9

Klebsiella pneumoniae

blaNDM + blaKPC

10

Klebsiella pneumoniae

non-producer

11

Escherichia coli

non-producer

Multidimensional standard curves. The data analysis for simultaneous quantification and multiplexing is achieved by extending the framework described in Moniri et al.1 This framework provides a generalisation of the approach for quantification of nucleic acids using standard curves. The stages of data analysis are as follows: pre-processing, curvefitting, multiple feature extraction, high-dimensional line fitting, similarity measure, feature weighting, and dimensionality reduction. The major difference with the conventional approach to quantification is that multiple features are extracted from amplification curves. The stages used in this work, also referred to as the instance of framework, are described below and summarised in Table 4. Table 4. Instance of framework proposed in Moniri et al.1 Data Analysis stages

Method

Ref.

Pre-processing

Baseline correction

-

Curve fitting

5-parameter sigmoid

11

Feature extraction

𝐶𝑡 , 𝐶𝑦 and ―log10 (𝐹0)

12–14

Line fitting

Method of least squares

15

Similarity measure

Mahalanobis distance: 𝑑

1,16

Feature weights

Minimise figure of merit: 𝑄

1

Dimensionality reduction

Principal component regression: 𝑀0

15

i) Pre-processing. The only pre-processing common to all features in this instance of framework is background subtraction. This is accomplished using baseline subtraction by removing the mean of the first 5 fluorescence readings from every amplification curve. ii) Curve fitting. The chosen model for curve fitting is the 5parameter sigmoid (Richards Curve) given by: 𝐹𝑚𝑎𝑥 𝐹(𝑥) = 𝐹𝑏 + (1) (1 + 𝑒 ―(𝑥 ― 𝑐)/𝑏)𝑑

where 𝑥 is the cycle number, 𝐹(𝑥) is the fluorescence at cycle 𝑥, 𝐹𝑏 is the background fluorescence, 𝐹𝑚𝑎𝑥 is the maximum fluorescence, 𝑐 is the fractional cycle of the inflection point, 𝑏 is related to the slope of the curve and 𝑑 allows for an asymmetric shape (Richard’s coefficient). The optimisation algorithm used to fit the curve to the data is the trust-region method and is based on the interior reflective Newton method17,18. The lower and upper bounds for the 5 parameters, [𝐹𝑏 , 𝐹𝑚𝑎𝑥, 𝑐, 𝑏, 𝑑], are given as: [-0.5, -0.5, 0, 0, 0.7] and [0.5, 0.5, 50, 100, 10] respectively. iii) Feature extraction. The features extracted from the amplification curves are: 𝐶𝑡, 𝐶𝑦 and ― log10 (𝐹0). Therefore each point in the feature space is a vector in 3-dimensional 𝑇 space, i.e. 𝒑 = [𝐶𝑡,𝐶𝑦, ― log10 (𝐹0)] where [ ⋅ ]𝑇denotes the transpose operator. Note that by convention, for the formulas in this paper, vectors are denoted using bold lowercase letters and matrices are indicated using bold uppercase letters. In order to compute the cycle-threshold, 𝐶𝑡, first the amplification curve is fit with the 5-parameter sigmoid in equation 1. The fit is then normalised with respect to the maximum fluorescence and 𝐶𝑡 is equal to the time where the fit exceeds 0.2 (i.e. 20% of its maximum fluorescence). The second feature, proposed by Guescini et al.,13 referred to as 𝐶𝑦, also uses the 5-parameter sigmoidal curve-fitting and takes 𝐶𝑦 as the intersection between the abscissa axis and the tangent of the inflection point from the obtained Richards curve. The final feature, proposed by Rutledge,14 referred to in this paper as 𝐹0, fits the sigmoid up to a “cut-off cycle” and takes 𝐹0 as the fluorescence at cycle 0. Each feature has an underlying assumption and, therefore, the combination of the features are expected to increase the amount of information obtained from the amplification curve. For example, the 𝐶𝑡 approach assumes the PCR efficiency to be constant between reactions and cycles. The 𝐶𝑦 approach allows for different efficiency between reactions but assumes a constant efficiency between cycles. The third feature, 𝐹0, allows for different efficiency between reactions but additionally assumes that it decreases from cycle to cycle. The reader may wish to review these papers to understand each feature in greater depth.12–14 iv) Line fitting. In this work, the line fitting, which is essentially constructing the MSC, is achieved by using the first principal direction in principal component analysis (PCA). v) Similarity measure. The similarity measure used is the Mahalanobis distance, 𝑑, as seen in equation 3.18 This is a measure of similarity between a test point, 𝒑, and the distribution of training points from a specific MSC. 𝑃 = ϕ(𝒑,𝒒1,𝒒2) =

(𝒑 ― 𝒒1)𝑇(𝒒2 ― 𝒒1) (𝒒2 ― 𝒒1)𝑇(𝒒2 ― 𝒒1)

𝑑 = (𝒑 ― 𝑃(𝒒2 ― 𝒒1)𝑇)𝚺 ―1(𝒑 ― 𝑃(𝒒2 ― 𝒒1))

(2) (3)

where 𝒒1 and 𝒒2 are two distinct points that lie on the MSC and 𝚺 is the co-variance matrix of the training data. Note that under the assumption that the data is normally distributed, the Mahalanobis distance squared follows a 𝜒2-distribution. vi) Feature weights. In order to maximize quantification performance, different weights, 𝜶, can be assigned to each feature. In order to accomplish this, a simple optimisation algorithm can be implemented in order to minimise an error

3 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

measure. In this study, the error measure used is the figure of merit described in the following subsection. The optimisation algorithm is the Nelder-Mead simplex algorithm19,20 with weights initialised to unity, i.e. beginning with no assumption on how good features are for quantification. This is a basic algorithm and only 20 iterations are used to find the weights so that there is little computational overhead. vii) Dimensionality reduction. In this study, principal component regression is used, i.e. 𝑀0 = 𝑃 from equation 2,17 and it is compared with projecting the standard curve onto all three dimensions, i.e. 𝐶𝑡, 𝐶𝑦 and ― log10 (𝐹0). Evaluating standard curves. In consistency with the current literature on evaluating standard curves, relative error (RE) and average coefficient of variation (CV) are used to measure accuracy and precision respectively. The CV for each concentration is calculated after normalising the standard curves such that a fair comparison across standard curves is achieved. The formula for the two measures are given by: 𝑅𝐸 = 100 ×

( ) 𝑥𝑖 𝑥𝑖

―1

(4)

where i is the index of a given training point, 𝑥𝑖 is the true concentration of the 𝑖𝑡ℎ training data and 𝑥𝑖 is the estimate of 𝑥𝑖 using the standard curve. 𝑠𝑡𝑑(𝒙𝑗) (5) 𝐶𝑉 = 100 × 𝑚𝑒𝑎𝑛(𝒙𝑗) Where j is the index of a given concentration and 𝒙𝑗 is a vector of estimated concentrations for a given concentration indexed by j. The function std( · ) and mean( · ) perform the sample standard deviation and sample mean of their vector arguments respectively. Borrowed from Statistics, this paper also uses the leave-one-out cross validation (LOOCV) error as a measure for stability and overall predictive performance15. Stability refers to the predictive performance when training points are removed. The equation for calculating the LOOCV is given as:

Page 4 of 8

difference was considered as *p-value < 0.05, **p-value < 0.01, ***p-value < 0.001 and ****p-value < 0.0001.

RESULTS AND DISCUSSION In this study, it is shown that simultaneous enhanced quantification and multiplexing of blaOXA-48, blaNDM, blaVIM and blaKPC β-lactamase genes in bacterial isolates can be achieved by using multidimensional standard curves constructed using fluorescent amplification curves in qPCR. This section is broken into two parts: (i) target discrimination using multidimensional analysis, and (ii) enhanced quantification. First, it is proven that single-channel multiplexing can be achieved. Once this has been established, the framework described in Moniri et al.1 can be applied for robust and enhanced quantification. Target Discrimination using Multidimensional Analysis. Given that it is non-trivial that several targets can be multiplexed and differentiated using only fluorescent amplification data in a single channel, it is helpful to visualise an example. Figure 2 shows four amplification curves and their respective derived melting curves specific for blaOXA-48, blaNDM, blaVIM and blaKPC genes. The four curves have been chosen to have similar 𝐶𝑡 (within 1.2 cycles). Using only this information, i.e. the conventional way of thinking, post-PCR processing such as melting curve analysis is needed to differentiate the targets. The same argument applies when solely observing 𝐶𝑦 or 𝐹0. This is an expected result given that these parameters are used for quantification and were not intended for target identification. However, can observing all three features together provide information for target discrimination?

𝑁―1

𝐿𝑂𝑂𝐶𝑉 =

∑ (𝒛 ― 𝒛 ) 𝑗 𝑖

𝑗 2 𝑖

(6)

𝑖=1

where N is the number of training points, i is the index of a given training point, 𝒛𝑖 is a vector of the true concentration for all training points except the 𝑖𝑡ℎ training point and 𝒛𝑖 is the estimate of 𝒛𝑖 generated by the standard curve without the 𝑖𝑡ℎ training point. In order for the optimisation algorithm for computing 𝜶 to simultaneously minimise the three aforementioned measures, it is convenient to introduce a figure of merit, 𝑄, to capture all of the desired properties. Therefore, 𝑄 is defined as the product between all three errors and can be used to heuristically used to compare the performance across quantification methods. The average 𝑄 across all training data points is the error measure that the optimisation algorithm will minimise. (7) 𝑄 = 𝑅𝐸 × 𝐶𝑉 × 𝐿𝑂𝑂𝐶𝑉 Statistical Analysis. For sample classification, outliers were determined using a χ2-distribution with 2 degrees of freedom and a statistical significance was assumed with a p-value < 0.01. For assessing the significance between methods in absolute quantification, p-values were calculated using a paired, twosided Wilcoxon signed rank test. Statistically significant

4 ACS Paragon Plus Environment

Page 5 of 8

(a) -dF/dT

0.4

0.1

0.3 0.2

blaOXA-48

0.1

blaNDM

0

Tm = 80.4°C

0 0

5

10

(b)

20 Cycles

25

30

35

40

blaVIM blaKPC

blaNDM 0.2

-dF/dT

0.4

0.15 Fluorescence

15

Ct = 19.6 Cy = 17.4 F0 = 3.2 x 10-6 [DNA target] = 107 copies/reaction

ta rg et ]

0.05

80 85 90 95 Temperature (°C)

[D NA

75

-log10(F0)

Fluorescence

0.15

0.1

0.3 0.2

Cy

0.1 0 75

0.05

80 85 90 95 Temperature (°C)

Tm = 84.0°C

0 0

5

10

(c)

15

Ct = 19.3 Cy = 17.0 F0 = 1.2 x 10-6 [DNA target] = 105 copies/reaction 20 Cycles

25

30

35

40

blaVIM 0.2 -dF/dT

0.4

0.15 Fluorescence

of the specific targets (or an outlier) solely using the extracted features from amplification curves in a single channel.

blaOXA-48 0.2

0.1

0.3 0.2 0.1 0 75

0.05

80 85 90 95 Temperature (°C)

Tm = 89.5°C

0 0

5

10

(d)

15

Ct = 18.7 Cy = 16.9 F0 = 2.1 x 10-7 [DNA target] = 105 copies/reaction 20 Cycles

25

30

35

40

blaKPC 0.2 -dF/dT

0.4

0.15 Fluorescence

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

0.1

0.3 0.2 0.1 0 75

0.05

80 85 90 95 Temperature (°C)

Tm = 91.6°C

Ct = 19.9 Cy = 17.4 F0 = 6.3 x 10-6 [DNA target] = 106 copies/reaction

0 0

5

10

15

20 Cycles

25

30

35

40

Figure 2. Experimentally obtained amplification and melting curves by single-channel multiplex quantitative PCR for: (a) blaOXA-48, (b) blaNDM, (c) blaVIM and (d) blaKPC genes. Extracted features (𝐶𝑡 , 𝐶𝑦 and 𝐹0) are shown on the respective plots. Each plot has been generated by amplifying standard synthetic DNA containing the target of interest. Background subtraction has been performed on all amplification curves and all samples contain SYBR Green I dye.

Moniri et al.1 shows that considering multiple features contains sufficient information gain in order to discriminate outliers from a specific target using a MSC. However, this raises the question: does the outlier lie on its own MSC? If so, can we take advantage of this property and build several multidimensional standard curves in order to discriminate multiple specific targets? To explore this new concept, MSCs are constructed using a single primer mix for the four target genes using 𝐶𝑡, 𝐶𝑦 and ― log10 (𝐹0), as shown in Figure 3. It is visually observed that the 4 standards are sufficiently distant in multidimensional space, also termed the feature space, in order to distinguish them. That is, an unknown DNA sample can be potentially classified as one

Ct Figure 3. Multidimensional standard curves for detection of four carbapenemase genes: blaOXA-48 (purple line), blaNDM (red line), blaVIM (yellow line) and blaKPC (blue line). They were constructed using 𝐶𝑡, 𝐶𝑦 and ―log10 (𝐹0) features extracted from real-time amplification curves derived from amplifying 10-fold dilutions of synthetic DNA. From bottom left to top right, target concentrations range between: 108 ― 104 copies/reaction for blaOXA-48; 107 ― 101 copies/reaction for blaNDM; 108 ― 103 copies/reaction for blaVIM; and 108 ― 103 copies/reaction for blaKPC. Each concentration was repeated 5 to 8 times and the resulting average values are projected onto the standard curves. The computed features and curve-fitting parameters for each MSC is presented in Table S2.

In order to demonstrate the proposed method for multiplexing, 11 samples (bacterial isolates) given in Table 3 were tested against the multidimensional standards. The specificity of all results was validated using a melting curve analysis (Figure S1). As expected, samples 1 to 9 provided a positive outcome whereas samples 10 & 11 (control strains) showed no amplification. The similarity measure used to classify the unknown samples is the Mahalanobis distance using a p-value of 0.01 as the threshold to determine if the sample is an outlier. The results from testing can be succinctly captured within a bar chart shown in Figure 4. There are two main observations: (i) the mean of the test samples (bacterial isolates) which have a single resistance (samples 1-8) are correctly classified with a pvalue < 0.01. (ii) the target with multiple resistances (sample 9) is considered as an outlier for all of the targets. Although multiple resistances are not currently common for CPE in the UK (accounting for less than 1.3% of confirmed cases), 4 there is room to explore extending MSCs for other applications that require detecting multiple targets in a single reaction. However, this is outside the scope of this study.

5 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

blaOXA-48

blaNDM

blaVIM

information in the 3 chosen features to distinguish the 4 standard curves. Enhanced Quantification. Given that multiplexing has been established for this case study, quantification can be trivially obtained using any conventional method such as the gold standard cycle threshold, 𝐶𝑡. However, as shown in Moniri et al.1, enhanced quantification can be achieved using a feature, 𝑀0, that combines all of the features. This is enabled through weighting each feature by optimising an objective function and then applying a dimensionality reduction technique in order to create a quantification curve for 𝑀0. The objective function in this study is a figure of merit, 𝑄, that combines accuracy, precision, stability and overall predictive power as described in the experimental section. Figure 6 shows the average figure of merit (with standard deviation and p-values) for each target using the 3 chosen features (𝐶𝑡, 𝐶𝑦 and ― log10 (𝐹0)) and 𝑀0. Please see Figure S2 for the quantification curves and details for the figure of merit. It can be observed from Figure 6 that quantification using 𝑀0 performs as good, or better than any single feature, for any of the targets. This is expected given the nature of the multidimensional framework as 𝑀0 is constructed using a linear combination of the other features in order to minimise the average figure of merit. It is important to stress that any figure of merit could be selected. For blaOXA-48, blaVIM and blaKPC, the average figure of merit of 𝑀0 was reduced by 26.5%, 41.1% and 12.9% compared with the best single feature - 𝐶𝑦, with all p-values < 0.05. Furthermore, for blaNDM, the optimisation algorithm showed that 𝑀0 converged to 𝐶𝑡 and that the p-value between 𝐶𝑡 and 𝐶𝑦 was not significant. Therefore, in this case, it is acceptable to use either 𝐶𝑡, 𝐶𝑦 or 𝑀0. However, 𝑀0 provides an automated solution for quantification that is robust in the sense that it will always be the best performing method.

blaKPC

Figure 4. Average Mahalanobis distance between multidimensional standard curves and sample test points (bacterial isolates) used for target identification. Dots below sample ID indicate that the test sample is classified to the standard of interest with a p-value < 0.01.

In addition to observing the average Mahalanobis distance, it is important to visualise the data in order to confirm that the Mahalanobis distance is a suitable similarity measure. Figure 5 shows the Mahalanobis space for the four standards. This visualisation is constructed by projecting all data points onto an arbitrary hyperplane orthogonal to each multidimensional standard curve, as described in Moniri et al.1 When the training data points in the feature space are approximately normally distributed, then the distribution of the training data points in the Mahalanobis space is approximately circular - as seen in Figure 5. It can also be observed that the training points (synthetic DNA) from each standard curve are clustered together (i.e. not considered outliers) in its respective Mahalanobis space, however, they are considered outliers for other MSCs. This corroborates the fact that there is sufficient

(a)

Mahalanobis Space for blaOXA

(b)

Mahalanobis Space for blaNDM

Page 6 of 8

(c)

Mahalanobis Space for blaVIM

(d)

Mahalanobis Space for blaKPC

bla bla VIMVIM

blaNDM blaOXA

blaKPC

(f)

(e) p = 0.01

(g) p = 0.01

(h) p = 0.01

p = 0.01

S2 S5

S6

S1

S7

S4 S3 blaOXA

Standard of interest

blaNDM

Other standards

S8 blaVIM

Test points

blaKPC

Test points mean per sample (S1 – S8)

Figure 5. Multidimensional analysis using the feature space for identification of unknown samples. (a-d) All data points, including the replicates for each concentration for the four MSCs (training standard points) and nine unknown samples (test points), have been projected onto arbitrary hyperplanes orthogonal to each MSC. (e-h) The previous plots are magnified to visualise the location of the samples relative to each standard of interest. The blue dots represent the data points for each standard of interest (5 to 8 replicates per each concentration) and the black circle around them corresponds to a p-value of 0.01. Sample 1-8 are correctly classified with a p-value < 0.01 whereas sample 9 is

6 ACS Paragon Plus Environment

Page 7 of 8

considered an outlier for all standards. Please see Table S3 for details on the extracted features and sigmoidal curve fittings and Table S4 for the estimated quantification of samples using all methods. blaOXA-48 *

**

***

blaNDM

blaKPC

blaVIM

**** *

*

**** **

**

***

Figure of Merit (Q)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Ct Cy F0 M0

Ct Cy F0 M0

Ct Cy F0 M0

Ct Cy F0 M0

Figure 6. Figure of merit comparing conventional features with 𝑀0 for absolute quantification. 𝐹0 denotes ― log10 (𝐹0). Bar plots represent the average figure of merit for each feature and the error bars indicate the standard deviation. Please see Figure S2 for details.

CONCLUSION There has been a long-standing goal to meet the demand of methods for high-level multiplexing and enhanced quantification. Any advancements in this area would have a substantial positive impact on healthcare and patient outcomes. Alongside these challenges, there is a growing concern around antimicrobial resistance; the past 10 years has seen an explosion in molecular methods and instruments for rapid screening of drug resistant genes21. Here, we propose a novel method that allows for simultaneous single channel multiplexing and enhanced quantification using existing technologies; without increase in complexity or cost over using conventional singleplex qPCR reactions for detecting multiple targets. This method has been validated for the detection of four of the most prominent carbapenem-resistant genes: blaOXA-48, blaNDM, blaVIM and blaKPC. Methods for multiplexing and quantifying typically involve using fluorescent probes, melting curve analysis, agarose gels or sequencing - all of which are time-consuming or expensive processes. In the last few years there have been attempts to achieve simultaneous multiplexing and quantification in a single channel. For instance, single channel multiplexing has been achieved without melting curve analysis by altering cycling conditions and reading fluorescence at different temperatures22. This resulted in sufficient information gain in order to discriminate two targets. However, this method still uses a unidimensional approach to data analytics and increasing the number of targets is not trivial. By extracting multiple features from existing data, our methodology represents an opportunity to evolve existing approaches in order to significantly increase the number of targets. In order to implement our methodology, we require: (i) to build multiple multidimensional standard curves. This is generally a one-time procedure; however, MSCs may be affected by variations between experiments (such as changing reagent batches or instruments). Therefore, as with conventional standard curves, MSCs may have to be eventually recalibrated; (ii) to design multiplex assays such that MSCs for each target are sufficiently distant in the feature space. This can be achieved by tuning reaction conditions or primer design. For example, by altering annealing temperature, amplicon length, introducing mismatches or primer mix concentration; and (iii) to perform

data processing (such as multi-feature extraction) which is negligible given the power of computers today. In addition, given the nature of the multidimensional framework, absolute quantification is enhanced through the use of 𝑀0 by optimising a figure of merit combining accuracy, precision, stability and overall predictive power. The authors invite researchers in this area to adopt 𝑀0, as an alternative to conventional standard curves for absolute quantification, as it guarantees improved performance by combining the benefits of all the features it is derived from. This property results in 𝑀0 offering a robust method of quantification in the sense that it provides the best quantification performance across targets. Furthermore, the capabilities of MSCs extend beyond quantification and allow for outlier detection and target identification. Given the novelty of this work, there are many future directions and questions that can be addressed. In this paper we have applied the proposed method to the rapid screening of the four most prominent carbapenemase genes in the UK. In future studies, it would be interesting to explore: other targets in order to develop new multiplexing panels associated with the most significant healthcare challenges; or more targets through incorporating additional MSCs into the feature space and/or using multiple fluorescent channels. It is also important to stress that the focus of this work was not on optimising the chemistry or data analytics for this specific set of targets. Thus, there is room to investigate whether the chemistry and the instance of framework can be optimised in order to maximise the separation of MSCs in the feature space for carbapenem-resistant genes. In conclusion, this work has shown that is possible to simultaneously quantify and multiplex several targets in a single channel. This is achieved by changing the way we analyse amplification data obtained from existing technologies. We hope that by sharing these ideas, researchers and practitioners can implement and advance this work in order to provide novel and affordable tools that can be easily adopted by healthcare systems.

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website. Synthetic DNA sequences, extended numerical values of features and sigmoidal fittings of standards and samples, melting analysis of standards and samples, standard curves generated utilising different features and quantification of samples using different methods (PDF).

AUTHOR INFORMATION Corresponding Author * Phone: (+44) 2075940843; Email: [email protected]

Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. ‡These authors contributed equally.

Notes The authors declare no competing financial interest.

7 ACS Paragon Plus Environment

Analytical Chemistry (7)

ACKNOWLEDGMENT We would like to acknowledge the Imperial Confidence in Concepts - Joint Translational Fund, the Wellcome Trust ISSF (PS3111EESA to PG and JRM), the EPSRC Pathways to Impact (PSE394EESA to PG and JRM) and the EPSRC Global Challenge Research Fund (EP/P510798/1 to PG and JRM) for supporting this work. The research was also partially funded by the National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Healthcare Associated Infection and Antimicrobial Resistance at Imperial College London in partnership with Public Health England (PHE) (HPRU-201210047 to AH). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, the Department of Health or Public Health England.

(8) (9) (10) (11) (12) (13)

REFERENCES

(14) (15)

(1) (2)

(16)

(3)

(4) (5) (6)

Moniri, A.; Rodriguez-Manzano, J.; Georgiou, P. bioRxiv. 2018. Moser, N.; Rodriguez-Manzano, J.; Lande, T. S.; Georgiou, P. IEEE Trans. Biomed. Circuits Syst. 2018, 1–12. Moser, N.; Rodriguez-Manzano, J.; Yu, L.-S.; Kalofonou, M.; de Mateo, S.; Li, X.; Lande, T. S.; Toumazou, C.; Georgiou, P. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS); IEEE, 2017; pp 1–1. Carbapenemase-Producing Enterobacteriaceae:Laboratory Confirmed Cases; United Kingdom, 2016. van Duin, D.; Doi, Y. Virulence 2017, 8 (4), 460–469. Otter, J. A.; Burgess, P.; Davies, F.; Mookerjee, S.; Singleton, J.; Gilchrist, M.; Parsons, D.; Brannigan, E. T.; Robotham, J.; Holmes, A. H. Clin. Microbiol. Infect. 2017, 23 (3), 188–196.

(17) (18) (19) (20) (21) (22)

Otter, J. A.; Doumith, M.; Davies, F. J.; Mookerjee, S.; Dyakova, E.; Gilchrist, M.; Brannigan, E. T.; Bamford, K. B.; Galletly, T.; Donaldson, H.; Aanensen, D. M.; Ellington, M. J.; Hill, R. R.; Turton, J. F.; Hopkins, K. L.; Woodford, N.; Holmes, A. H. Sci. Rep. 2017. Carmeli, Y.; Akova, M.; Cornaglia, G.; Daikos, G.; Garau, J.; Harbarth, S.; M Rossolini, G.; Souli, M.; Giamarellou, H. Clin. Microbiol. Infect. 2010, 16, 102–111. Viau, R.; Frank, K. M.; Jacobs, M. R.; Wilson, B.; Kaye, K.; Donskey, C. J.; Perez, F.; Endimiani, A.; Bonomo, R. A. Clin. Microbiol. Rev. 2016, 29 (1), 1–27. Monteiro, J.; Widen, R.; C C Pignatari, A.; Kubasek, C.; Silbert, S. J. Antimicrob. Chemother. 2012, 67, 906–909. Spiess, A.-N.; Feig, C.; Ritz, C. BMC Bioinformatics 2008, 9, 221–232. Wittwer, C. T.; Herrmann, M. G.; Moss, A. A.; Rasmussen, R. P. Biotechniques 1997, 54 6, 314–320. Guescini, M.; Sisti, D.; Rocchi, M.; Stocchi, L.; Stocchi, V. BMC Bioinformatics 2008, 9, 326–337. Rutledge, R. Nucleic Acids Res. 2004, 32, 178. Hastie T Tibshirani R, F. J. The Elements of Statistical Learning; Springer-Verlag New York, 2009. Maesschalck, R. De; Jouan-Rimbaud, D.; Massart, D. L. Chemom. Intell. Lab. Syst. 2000, 50 (1), 1–18. Coleman, T. F.; Li, Y. SIAM J. Optim. 1996, 6 (2), 418–445. Coleman, T. F.; Li, Y. Math. Program. 1994, 67 (1–3), 189–224. Nelder, J. A.; Mead, R. Comput. J. 1965, 7 (4), 308–313. Lagarias, J. C.; Reeds, J. A.; Wright, M. H.; Wright, P. E. SIAM J. Optim. 1998, 9 (1), 112–147. Osei Sekyere, J.; Govinden, U.; Essack, S. Y. J. Appl. Microbiol. 2015, 119 (5), 1219–1233. Lee, Y. J.; Kim, D.; Lee, K.; Chun, J. Y. Sci. Rep. 2014, 4, 7439– 7444.

TOC MULTIDIMENSIONAL STANDARD CURVES

Cluster 1

Target identification

ta rg

y

et ]

Cluster 2

Cluster 3

NA

α

Std 1 Std 2 Std 3

MULTIDIMENSIONAL ANALYSIS

x Unknown sample

β

M0

[D

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 8

Quantification

γ [DNA target]

8 ACS Paragon Plus Environment