MS software for screening unknown erectile dysfunction drugs

Jun 19, 2019 - To overcome this dilemma, we have constructed a standalone software named as AI-SIDA (Artificial Intelligence Screener of Illicit Drugs...
2 downloads 0 Views 2MB Size
Article Cite This: Anal. Chem. 2019, 91, 9119−9128

pubs.acs.org/ac

LC−MS/MS Software for Screening Unknown Erectile Dysfunction Drugs and Analogues: Artificial Neural Network Classification, PeakCount Scoring, Simple Similarity Search, and Hybrid Similarity Search Algorithms Inae Jang,† Jae-ung Lee,† Jung-min Lee,† Beom Hee Kim,‡ Bongjin Moon,† Jongki Hong,*,‡ and Han Bin Oh*,† Downloaded via BUFFALO STATE on July 19, 2019 at 00:52:38 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



Department of Chemistry, Sogang University, Seoul 04107, Republic of Korea College of Pharmacy, Kyunghee University, Seoul 02447, Republic of Korea



S Supporting Information *

ABSTRACT: Screening and identifying unknown erectile dysfunction (ED) drugs and analogues, which are often illicitly added to health supplements, is a challenging analytical task. The analytical technique most commonly used for this purpose, liquid chromatography−tandem mass spectrometry (LC−MS/MS), is based on the strategy of searching the LC− MS/MS spectra of target compounds against database spectra. However, such a strategy cannot be applied to unknown ED drugs and analogues. To overcome this dilemma, we have constructed a standalone software named AI-SIDA (artificial intelligence screener of illicit drugs and analogues). AI-SIDA consists of three layers: LC-MS/MS viewer, AI classif ier, and Identif ier. In the second AI classifier layer, an artificial neural network (ANN) classification model, which was constructed by training 149 LC−MS/MS spectra (including 27 sildenafil-type, 6 vardenafil-type, 11 tadalafil-type ED drugs/analogues and other 105 compounds), is included to classify the LC−MS/MS spectra of the query compound into four categories: i.e., sildenafil, vardenafil, and tadalafil families and non-ED compounds. This ANN model was found to show 100% classification accuracy for the 187 LC−MS/MS modeling and test data sets. In the third Identifier layer, three search algorithms (pick-count scoring, simple similarity search, and hybrid similarity search) are implemented. In particular, the hybrid similarity search was found to be very powerful in identifying unknown ED drugs/ analogues with a single modification from the library ED drugs/analogues. Altogether, the AI-SIDA software provides a very useful and powerful platform for screening unknown ED drugs and analogues.

E

Scheme 1. Structures of the (a) Sildenafil, (b) Vardenafil, and (c) Tadalafil Familiesa

rectile dysfunction (ED) drugs are phosphodiesterasetype 5 (PDE-5) inhibitors that are designed for the treatment of male erectile dysfunction.1,2 The most popular ED drugs on the market are Viagra (sildenafil), Levitra (vardenafil), Cialis (tadalafil), and Stendra (avanafil). In recent years, the advent of the patent expiration of the original ED drugs have aroused the development of other ED drugs worldwide, and consequently, a very large number of ED drugs are currently available in the market.3,4 The ED drugs are, however, known to have some side effects such as headache, muscle aches, redness of face, heartburn, diarrhea, visual damages, and hearing loss, particularly on coadministration with nitrates or α-blockers.5,6 During the past decade, it has been reported that ED drugs and their illegal analogues are illicitly adulterated into food or health supplements.3,4,7,8 In particular, many illicit ED analogues have been found to be slightly modified from the original ED drugs to avoid the screening process while keeping the ED drug efficacy to some degree.9,10 Scheme 1 shows the © 2019 American Chemical Society

Details of the substituent groups, denoted with #, X, R1, and R2, can be found in Table S1 in the Supporting Information. a

general structures of the sildenafil, vardenafil, and tadanafil families, in which substituent groups can be added, removed, or replaced with only a slight influence on their pharmacoReceived: April 3, 2019 Accepted: June 19, 2019 Published: June 19, 2019 9119

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128

Article

Analytical Chemistry

of ED drugs and analogues into three ED drug types: i.e., sildenafil, vardenafil, and tadalafil. Furthermore, the use of different search algorithms such as pick-count scoring (PCS), simple similarity search (SSS), and HSS in the software allows the identification of unknown ED analogues. Additionally, we evaluate, compare, and describe the capability of the individual classification and search algorithms for characterizing unknown ED analogues.

logical effect. Undoubtedly, the lack of clinical evaluation of these illegally modified ED analogues constitutes a potential health risk. Liquid chromatography−tandem mass spectrometry (LC− MS/MS) is a powerful analytical method in screening and identifying illegal drugs that are illicitly adulterated into food or health supplements.11−14 Acquisition of high-accuracy m/z values of the precursor ions and their fragments allows the ready identification of the query compounds by searching against existing databases. However, the identification becomes a challenging task when the query compound is not included in any database.15−17 In this regard, screening of ED drug analogues, which are constantly modified for the purpose of circumventing the screening process without drastically altering the pharmacological ED drug effect, constitutes a challenge to food and drug regulation agencies worldwide. For the identification of ED drug analogues, the library matching approach is useful and widely used. However, this approach suffers from the so-called “coverage problem” resulting from the absence of a query compound in the database, which prevents its correct identification.18 To address this issue, a number of new approaches have been suggested. Thus, linear discrimination analysis (LDA) models were developed to characterize compounds according to their structural class on the basis of their mass spectral features. For example, new psychoactive substances such as synthetic phenethylamines and tryptamines were classified using the LDA model constructed from the GC−MS spectra of a training set of standard analogues.19 The successful classification rate was evaluated to be 93%. Bonetti also showed that principal component analysis (PCA) followed by LDA for GC−MS spectra successfully differentiated positional isomers of fluoromethcathinone and fluorofentanyl.20 Another interesting approach is to produce an augmented reference library using machine learning. For instance, the competitive fragmentation modeling−electron ionization (CFM−EI) method predicted EI−MS spectra using a probabilistic model predicting the probability of breaking molecular bonds under EI−MS conditions.21 In this method, a stochastic simulation was used to determine the frequency of each molecular fragment. More recently, it was shown that a multilayer perception neural network bidirectional model could be used to produce an augmented reference library, thus enhancing the library matching performance.18 In 2017, an innovative GC−MS hybrid similarity search (HSS) algorithm was introduced to identify illicit drugs that do not exist in the library.22−25 The HSS algorithm was capable of identifying compounds that differed from library compounds only in a single inert structural component. In this approach, a hybrid mass spectrum was created in such a way that each original peak was conserved or shifted by a delta mass: that is, by a mass value difference between the query compound and the library compound. The similarity match score was calculated between the query compound mass spectrum and the generated hybrid mass spectrum. The scope of this approach was later expanded to cover the elucidation of unknown oligosaccharides, modifications in proteomics, and structural annotations in untargeted metabolomics.23−25 This study reports the development of an integrated, standalone software in which an artificial neural network (ANN) classification model for the LC−MS/MS spectra of ED drugs/analogues is constructed and used for the classification



EXPERIMENTAL SECTION Material. In total, 92 compounds, a list of which is included in Table S1 in the Supporting Information (1−92 in the table), were subjected to LC−MS/MS experiments. Among them, the following 29 ED drugs and analogue compounds were purchased from Toronto Research Chemicals (Toronto, ON, Canada): norneosildenafil, sildenafil, hydroxyhomosildenafil, udenafil, propoxyphenylthiosildenafil, thiosildenafil, thiohomosildenafil, chlorodenafil, gendenafil, hongdenafil, oxohongdenafil, nitrodenafil, imidazosagatriazinone, acetylvardenafil, norneovardenafil, pseudovardenafil, vardenafil, aminotadalafil, N-desmethyl tadalafil, octylnortadalafil, tadalafil, chloropretadalafil, avanafil, quinethazone, oxyphenbutazone, diethylpropion, fenfluramine, naltrexone, and phendimetrazine. Dimethylsildenafil and benzylsibutramine were obtained from TLC Pharmaceutical Standards (Vaughan, Ontario, Canada). The rest of the compounds, which were non-ED drugs, were provided from the Korea Ministry of Food Drug Safety (KMFDS). A standard solution for each compound was dissolved in methanol at a 1 mg/L concentration and stored at −18 °C before use. Water and methanol of HPLC grade were purchased from Burdick & Jackson (Ulsan, Korea). LC−MS/MS. The mixtures of ED drugs/analogues and nonED compounds were separated by using an ultrahighperformance liquid chromatograph (UHPLC, Agilent Technologies, Palo Alto, CA, USA). The chromatographic separation was carried out on a Waters ACQUITY UPLC BEH C18 column (150 × 2.1 mm, i.d., 1.7 μm). The mobile phases consisted of solvent A (0.1% formic acid in water) and solvent B (0.1% formic acid in acetonitrile). The gradient elution mode was programmed as follows: 35%−45% B for 0.0−0.5 min, 45%−100% B for 0.5−13.0 min, and 100% B for 13.0−15.0 min. The flow rate, injection volume, and column temperature were set at 0.30 mL/min, 2 μL, and 40 °C, respectively. LC−MS/MS experiments were performed in positive-ion mode on a hybrid quadrupole time-of-flight mass spectrometer (Q-TOF MS, 6530, Agilent Technologies, Palo Alto, CA, USA). LC−MS/MS spectra were acquired in triplicate, and the averaged spectra were used for the statistical analysis. The following experimental parameters were used: nitrogen sheath gas temperature, 350 °C; flow, 11 L/min; capillary voltage, +4000 V; nebulizer pressure, 45 psi; drying gas, 8 L/min; gas temperature, 300 °C; fragment voltage, +175 V; skimmer voltage, +65 V; Oct 1 RF Vpp voltage, +750 V; collision voltage, +65 V; mass scan range, m/z 50−800.



COMPUTATIONS Computational Methods. The software, named here “AISIDA (artificial intelligence screener of illicit drugs and analogues)”, was coded using MATLAB R2017a language (The MathWorks, Inc., Natick, MA, USA) equipped with “Deep learning toolbox” and “Statistics and machine learning 9120

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128

Article

Analytical Chemistry Table 1. Ten ED Drug Compounds Included in the in-House Database no.

compound name

chemical formula

[M]

[M + H]+

1 2 3 4 5 6 7 8 9 10

sildenafil hydroxyhomosildenafil imidazosagatriazinone carbodenafil hongdenafil thiosildenafil propoxyphenylthiosildenafil desulfovardenafil vardenafil nortadalafil

C22H30N6O4S C23H32N6O5S C17H20N4O2 C24H32N6O3 C25H34N6O3 C22H30N6O3S2 C23H32N6O3S2 C17H20N4O2 C23H32N6O4S C21H17N3O4

474.20 504.22 312.16 452.25 466.27 490.18 504.20 312.16 488.22 375.12

475.21 505.22 313.17 453.26 467.28 491.19 505.21 313.17 489.23 376.13

toolbox”. Since the software was designed to use mzXML data format only, all experimentally obtained LC−MS/MS raw files were converted into mzXML data format using the MSConvert program (Proteome wizard, http://proteowizard.sourceforge. net/download.html) before being loaded to the AI-SIDA software. The classification model for categorizing the LC− MS/MS spectra into four classes, i.e., tadalafil, sildenafil, vardenafil, and non-ED compounds, was made by the ANN pattern recognition method. In addition, the following three different search engines were included in the software: PCS, SSS, and HSS. The detailed descriptions for the ANN pattern recognition algorithm, PCS, SSS, and HSS will be given below. The ANN modeling computations and the software code were executed on an Intel quad-core personal computer (i5−4440 CPU, 3.10 GHz) with the Windows 7 (64-bit) operating system. Database. For this study, a temporary in-house database was constructed using a network LC−MS/MS database from MassBank of North America Web site (http://mona.fiehnlab. ucdavis.edu/) and the mzCloud Web site (https://www. mzcloud.org/). The in-house database consists of 73080 LC− MS/MS spectra entries, including 10 representative ED LC− MS/MS spectra acquired experimentally (note that only 10 ED drugs and analogues are included in the database for evaluation purposes, whereas all of the ED drugs and analogues were included in the practical applications). These 10 entries are shown in Table 1: seven sildenafil (sildenafil, hydroxyhomosildenafil, imidazosagatriazinone, carbodenafil, hongdenafil, thiosildenafil, and propoxyphenylthiosildenafil), two vardenafil (desulfovardenafil and vardenafil), and one tadalafil (nortadalafil) derivative. The in-house database with 73080 LC−MS/ MS spectra represents 11051 compounds, wherein many spectra were acquired at different collision energies for the same compound. ANN Classification Model for ED Drugs and Analogues. An ANN classification model was built by training the LC−MS/MS spectra of a preselected (training) subset of ED drugs/analogues and the non-ED compounds. The 187 LC−MS/MS spectra were divided randomly into two subsets: training and validation MS/MS spectra (80%, 149 spectra) and external test (20%, 38 spectra) sets. The total compound set consisted of 34 sildenafil-type, 8 vardenafil-type, 14 tadalafiltype ED drugs/analogues and 131 non-ED compounds; the list is given in Table S1 in the Supporting Information, and the ED drug types are denoted therein. The training and validation set was designed to include the MS/MS spectra of 11 tadalafil-, 27 sildenafil-, and 6 vardenafil-type ED drugs as well as 105 nonED drugs. Six atypical ED compounds (62−67 in Table S1), which do not share much structural similarity with tadalafil,

sildenafil, or vardenafil, are not included in this subset. The LC−MS/MS spectra of 92 compounds (1−92 in Table S1) were obtained directly from LC−MS/MS experiments carried out in-house. For the other 90 compounds (98−187 in Table S1), LC−MS/MS spectra from MassBank were utilized and 5 open-source LC−MS/MS spectra for ED drugs (hydroxythiovardenafil, N-desethyl vardenafil, 2-hydroxypropyl nortadalafil, tadalafil impurity A, and tadalafil impurity D) were obtained from the mzCloud Web site. An LC−MS/MS ANN model for ED drugs and analogues was constructed using the “Deep learning toolbox” in MATLAB R2017a. To generate the descriptors for the ANN modeling, bar-code MS/MS spectra with a 1 m/z bin size in the m/z range of 50−800 were constructed, where “1” was given to the peak with an abundance higher than a certain threshold (S/N ratio ≥3), and otherwise “0 (null)” was assigned. The binning process has often been used in mass spectrometry data preprocessing.26 The bar-code spectra were used as input vectors with 751 elements (representing the m/z range of 50−800). Since the fragment abundances are quite dependent on the collision energies provided during the tandem mass spectrometry, the bar-code spectra were used for the classification modeling and the prediction. The ANN modeling calculations were carried out using a feed-forward network with one hidden layer (10 neurons) and four output nodes (tadalafil, 1; sildenafil, 2; vardenafil, 3; nonED compounds, 4), whose structure is shown in Figure 1. The Levenberg−Marquardt back-propagation algorithm was used for the ANN training, and a hyperbolic tangent sigmoid was used as a transfer function. The maximum number of epochs was set to 100. Cross-validation was made using the 5-fold cross-validation method. Classification accuracy was obtained by calculating the Matthews’ correlation coefficient (cc) as follows:27 cc =

NP − OU (N + O)(N + U )(P + O)(P + U )

(1)

where P, N, O, and U represent the number of true positive, true negative, false positive, and false negative results, respectively, and the perfect prediction correlation coefficient was set to 1. Peak-Count Scoring (PCS) Algorithm. The PCS algorithm is included to show a simple peak matching method for a comparison with SSS and HSS. The PCS algorithm was designed to calculate the match factors (scores) of the two LC−MS/MS spectra under comparison using the equation score pMF = C the number of fragment ions + 10 (2) 9121

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128

Article

Analytical Chemistry

(unit) size in the m/z abscissa). This metric measures the orientations between the two vectors, thus providing a comparison between the vectors on a normalized space. The similarity match factor (sMF) was calculated using the following modified cosine similarity equation:27,28

(∑i sMF(Q , L) = C

Qi ×

Li

2

)

∑i Q i × ∑i Li

(3)

where Q and L are the vectors for the query compound and the library compound (i.e., bar-coded LC−MS/MS spectra) under comparison, respectively, and Qi and Li are the corresponding vector components, which represent the peak abundances at the ith integral (nominal) masses of the precursor ion or fragments in the 751-dimensional space (acquired m/z range 50−800). The arbitrary constant C was set to 999, which is the maximum value that can be obtained when the two vectors under comparison only differ in their magnitude. This sMF is a metric value that provides an indication of the similarity of the two compared LC−MS/MS spectra, one being the query spectrum and the other the library spectrum in the database. Hybrid Similarity Search (HSS). HSS is a variant of SSS in which the similarity match score between the bar-coded query spectrum and the bar-coded hybrid library spectrum instead of the original library spectrum is calculated according to the equation hMF(Q , L , Δm,int ) = sMF(Q , H )

(4)

where hMF and sMF represent the hybrid similarity match score and simple similarity match score, respectively, and Δm,int is an integral DeltaMass that will be described below. H is a vector that represents a hybrid mass spectrum. The hybrid (bar-code) LC−MS/MS spectrum was mainly constructed according to the description provided in ref 22. In brief, the integral m/z value difference between the query compound and the library compound was calculated to give an integer DeltaMass (Δm,int = the nominal mass value of the query precursor ionthe nominal mass value of the library precursor ion). When there were matched peaks between the query spectrum and the library spectrum without the application of the Δm,int to the library peak, this peak in the library spectrum was not shifted. Conversely, when the peaks only matched after the peak in the library spectrum was shifted by Δm,int, this library peak was shifted. Finally, when there were peak matches both with and without the application of Δm,int, the abundance of the library peak between its original peak and the shifted peak was apportioned.

Figure 1. Structure of the feed-forward artificial neural networks used in this study.

where C is an arbitrary constant that was set to 999. To calculate the score, the following rule was applied. When the m/z value difference between the precursor ion of a query compound and the library compound in the database, i.e., DeltaMass (Δm), was ≤10 or 30 ppm, a score of 10 or 8 was given, respectively. When Δm was ≤1.0 Da, 6 was allotted, and otherwise 0 was given. For the fragment ions, when the Δm was ≤10 or 30 ppm, a score of 1.0 or 0.8 was allotted, respectively. In the case of the Δm being ≤1.0 Da, 0.6 was added to the total score. Finally, the totally added score was divided by the number of total fragment ions (whose abundances were above the preset threshold value) + 10. PCS application results for a specific case (sulfamethoxazole) are illustrated in Figure S1 in the Supporting Information. Simple Similarity Search (SSS). In this study, the cosine similarity search was used as the SSS method. This method has long been used by the NIST for EI mass spectrum searches.28,29 In general, the cosine similarity search measures the cosine of the angle between two vectors (in this study, the bar-code LC−MS/MS spectra, i.e. [1 × 751] input vectors; note that the bar-code LC−MS/MS spectrum has a 1 m/z bin



RESULTS AND DISCUSSION LC−MS/MS Spectra of the Sildenafil, Vardenafil, and Tadalafil Families. The LC−MS/MS spectra of a large number of ED drugs and analogues were acquired. For the sake of conciseness, only three representative examples corresponding to sildenafil, vardenafil, and tadalafil are included, and their fragmentation pathways will be described in this section. Figure 2 shows the LC−MS/MS spectra acquired for singly protonated (a) sildenafil, (b) vardenafil, and (c) tadalafil. In Scheme 2 and Schemes S1 and S2 in the Supporting Information, their respective proposed fragmentation mechanisms are given. For sildenafil, protonation can occur on two different nitrogen positions, i.e., N1a and N7, and the corresponding protonation isomers are denoted as SH+−a

9122

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128

Article

Analytical Chemistry

shown in Scheme S1 in the Supporting Information is very similar to that of sildenafil, and most of the characteristic peaks in Figure 2b, including V1−V8, could be readily explained in a manner similar to that depicted in Scheme 2. Meanwhile, tadalafil has a skeletal structure quite different from those of sildenafil and vardenafil. Tadalafil has three potential protonation sites: TH+−a, TH+−b, and TH+−c (Scheme S2 in the Supporting Information). The first precursor ion (TH+−a) designates the species in which the carbon at the para position in the C6-benzo[1,3]dioxole group is protonated. From TH+−a, T1 (m/z 268.1081) could be generated by the release of benzo[1,3]dioxole (TN1). The subsequent release of carbon monoxide from T1 would form T2 (m/z 240.1071), from which T3 (m/z 197.0709) would be generated by the loss of N-methylformimine (TN2), and T4 (m/z 169.0760) would be formed by the additional loss of carbon monoxide. The second precursor ion (TH+−b) represents the species in which the carbonyl group at C1 is protonated. TH+−b would be then isomerized through the cleavage of the C12 and C12-a bond. From the newly formed isomer, T5 (m/z 262.0863) would be generated by the proton transfer from the N8 atom in the indole moiety to the oxygen in the carbonyl group and the subsequent loss of TN3. Protonation on N8 would form the third precursor ion (TH+−c). Cleavage of the C6−C7 bond would result in another structural isomer, which would eventually generate T6 (m/z 135.0430) by the sequential release of TN4 and TN5. Graphic User Interfaces (GUIs). AI-SIDA software consists of three layers. The first layer, labeled as “LC-MS/ MS viewer”, is a viewer program for a chromatogram and mass spectrum. The second “AI classif ier” layer is for the classification of compounds, in which an LC−MS/MS spectrum is classified into one of the ED types on the basis of the ANN ED drug classification model. The last one, labeled as “Identifier”, contains the search engines and the associated search result window. Figure 3 shows the three layers of GUIs, in which the function keys are denoted with red numbered boxes. The specific functions for the keys are described in Table S2 in the Supporting Information. The “LC−MS/MS viewer” layer shows the selected chromatogram and mass spectrum. The chromatogram viewer window denoted as 4 in Figure 3a can optionally display four different chromatograms: total ion chromatogram, base-peak chromatogram, extracted ion chromatogram (EIC), and extracted common ion chromatograms (ECIC). For the EIC, a target m/z value and m/z width can be selected and adjusted. In our previous study, the ECIC was demonstrated to be very useful in identifying ED drugs and analogues from suspicious health supplements.14 This ECIC chromatogram viewer was designed to select multiple numbers of common ions (up to five). The mass spectrum viewer is denoted as 7 in Figure 3a. When a cursor placed on a specific peak on the chromatogram is clicked, this mass spectrum viewer displays the corresponding mass spectrum. This viewer window also provides information regarding the normalized intensity, retention time, type of mass spectrometer, polarity (positive or negative), and scan type (full mass MS or MS/MS). A peak list of the displayed mass spectrum can individually be saved as .csv, .xls, or .xlsx files and be searched against the database that will be described below. In the second “AI classifier” layer, the compounds under LC−MS/MS investigations can be categorized into four

Figure 2. LC−MS/MS spectra for singly protonated (a) sildenafil, (b) vardenafil, and (c) tadalafil. Spectrum (c) is reproduced with permission from ref 31. Copyright 2018 John Wiley and Sons.

and SH+−b, respectively, in Scheme 2. Isomer SH+−a, which is protonated at the N1a atom of 1-methylpiperazine, could undergo homolytic cleavage of the bond between N4a of 1methylpiperazine and the sulfur atom, which would produce a radical cation S1 at m/z 100.0994. On the other hand, SH+−b, which is protonated at the N7 atom of 1-methylpyrazole, could lead to three different fragmentation pathways. In the first pathway, bonds b1 (between N4 of pyrimidin-4-one and C5) and b2 (between N2 and C3 of pyrimidin-4-one) would be simultaneously broken with the concomitant release of SN1, leading to the production of S2 at m/z 166.0975. In the second pathway, 1-methylpiperazine (SN2) would be released as a neutral fragment, thus yielding S3 at m/z 377.1333. Subsequent neutral losses of ethane and sulfur monoxide from S3 would result in the formation of S4 at m/z 299.1139. Finally, the release of 1-methylpiperazine and sulfur dioxide would yield S5 (m/z 311.1481). Then, an ethyl group located in ethoxybenzene or pyrazole could be readily eliminated from S5 to produce S6 at m/z 283.1173, and further elimination of ethylene would lead to S7 at m/z 255.1221. The compounds belonging to the vardenafil family have a structure similar to that of sildenafil, and thus their fragmentation mechanisms bear a close similarity to each other. Specifically, as in the case of sildenafil, two protonated precursors, VH+−a and VH+−b, can be suggested to explain the fragments observed in the MS/MS spectra. In VH+−a, the N1 position of piperazine is protonated, while the N7 of imidazole is protonated in VH+−b. The proposed mechanism 9123

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128

Article

Analytical Chemistry Scheme 2. Proposed Fragmentation Mechanism for a Protonated Sildenafil Precursor Ion

classes, i.e., sildenafil, tadalafil, and vardenafil derivatives and non-ED compounds, using the ANN model that was machinelearned from the 149 LC−MS/MS spectra of the standard ED drugs/analogues and non-ED compounds. In this layer, a single or multiple LC−MS/MS spectra can be loaded and converted into the digitized, binned spectrum (see 17 in Figure 3b) by clicking the button labeled as 12 in Figure 3b. As described in the Experimental Section, an MS/MS spectrum is converted into a [1 × 751] row vector whose entry is either 0 or 1, the so-called bar-code LC−MS/MS spectrum. Here, a specific ANN model can be called in by pressing the “ANN model” button labeled as 16. When the “Prediction” button (18) is pressed, the result viewer (20) shows the calculated ED drug types. In the third “Identifier” layer (Figure 3c), a peak list for the designated MS/MS spectrum can be searched against the inhouse database. For construction of a peak list, the threshold abundances of the fragment peaks can be adjusted to optimize the search results, using the abundance threshold control box labeled as 21. Alternatively, a peak list containing the m/z values of the precursor ion and fragments can be manually typed in using the window labeled as 22. The peak list can also

be visualized in the form of an MS/MS spectrum in the viewer window labeled as 31 (Figure 3c). Three search engines are available: PCS, SSS, and HSS. The search results are given in the table labeled as 28, which displays the name, chemical formula, and search scores for the candidates. For PCS and SSS, when the candidate is clicked in the table, the corresponding database MS/MS spectrum is shown below the query MS/MS spectrum in the window labeled as 31. In the case of HSS, a generated hybrid MS/MS spectrum is shown instead of the database spectrum. The chemical structure of the selected candidate compound can optionally be displayed by pressing the button 29. In addition, the search result can be saved in .xlsx Microsoft Excel file format using the export button labeled as 27. The software described here is now available free of charge on our Web site.30 On the Web site, a smaller version of the ED drug library is provided, which includes only our own ED drug/analogue library spectra. In addition, library editing software was coded and can be downloaded from this Web site. With this software, it is possible for users to expand or create their own library. Classification of ED Drugs and Their Analogues Using the ANN Model. Classification of ED drugs and analogues 9124

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128

Article

Analytical Chemistry

coefficient were 100% in both cases. The generated ANN model was further tested with the other 54676 compounds of m ≤ 800 present in the database (note that all these compounds are non-ED drugs/analogues). For those data, the classification accuracy was found to be 97% (the corresponding confusion matrix is not shown). Although the overall accuracy was satisfactory, some LC−MS/MS spectra were found to be miscategorized; for example, 1227 compounds were wrongly predicted to belong to the sildenafil family (corresponding to a 2.2% false-positive rate). The generated ANN classification model was also evaluated for real samples, and Figure 5 shows a representative example of real

Figure 5. Real sample analysis: (a) base-peak chromatogram, where a peak for the target compound is indicated with an arrow; (b) MS/MS spectrum.

Figure 3. GUIs of AI-SIDA software: (a) first layer, LC−MS (MS/ MS) viewer; (b) second layer, AI classifier; (c) third layer, Identifier. The descriptions of the red numbered boxes are given in Table S2 in the Supporting Information.

sample analyses: (a) base-peak chromatogram and (b) MS/MS spectrum. Although the amount of a target compound was very small as indicated in Figure 5a, the spectral quality of its MS/ MS spectrum was decent, showing several fragments of a high signal to noise ratio. The projection of the MS/MS spectrum onto our ANN model indicated that it belonged to the sildenafil family. Indeed, further analyses including NMR spectroscopy revealed that the target compound was desmethyl piperazinyl propoxysildenafil (C18H22N4O5S). This ANN classification model was used in the second “AI classifier” layer to produce a first screening result, which was expected to provide a rough guideline for screening ED drugs and analogues. Identification using PCS, SSS, and HSS Scores. As described above, in the third “Identifier” layer, identification of target molecular ions could be made using three different search methods: PCS, SSS, and HSS. In PCS and SSS, a query spectrum under examination is compared with the library spectra, whereas comparison in HSS is made between the query spectrum and the constructed hybrid mass spectra (vide supra). To evaluate the identification efficiency of the three search engines for unknown ED drugs and analogues, LC− MS/MS spectra of 31 ED drugs and analogues given in Table S3 in the Supporting Information (but not included in the database) were searched against the in-house database. Note that the in-house database included only 10 representative ED drugs and analogues for evaluation purpose.

using their LC−MS/MS spectra was initially attempted using PCA, LDA, and classification-and-regression trees methods. However, the results were not satisfactory, which led us to test the ANN classification. The results were as follows. Figure 4 shows the confusion matrices that reflect the MS/ MS pattern recognition results for the generated ANN model. For the limited validation and external test sets, the classification accuracies calculated by the Matthews correlation

Figure 4. Confusion matrices that show the ANN classification results for (a) training and validation set (80%) and (b) external test set (20%). Classes: 1, tadalafil; 2, sildenafil; 3, vardenafil; 4, others. 9125

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128

Article

Analytical Chemistry Table 2. PCS, SSS, and HSS Search Results no.

name

representative ED

DeltaMass (Δm,int)

pMF

rank in PCS

sMF

rank in SSS

hMF

rank in HSS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

norneosildenafil homosildenafil dimethylsildenafil methylhydroxyhomosildenafil thiohomosildenafil dimetylthiosildenafil Hydroxythiohomosildenafil Propoxyphenylthiohomosildenafil propoxyphenylthioaildenafil propoxyphenylthiohydroxyhomosildenafil dimethylhongdenafil piperidinohongdenafil dimethylacetildenafil hydroxyhongdenafil desmethylcarbodenafil chlorodenafil gendenafil acetil acid nitrodenafil dichlorodenafil pseudovardenafil hydroxyvardenafil acetylvardenafil norneovardenafil tadalafil homotadalafil N-butyltadalafil octylnortadalafil cis-Cyclopentyltadalafil aminotadalafil acetaminotadalafil

sildenafil sildenafil sildenafil hydroxyhomosildenafil thiosildenafil thiosildenafil Hydroxyhomosildenafil Propoxyphenylthiosildenafil propoxyphenylthiosildenafil propoxyphenylthiosildenafil hongdenafil hongdenafil hongdenafil hongdenafil carbodenafil imidazosagatriazinone imidazosagatriazinone imidazosagatriazinone imidazosagatriazinone imidazosagatriazinone vardenafil vardenafil vardenafil desulfovardenafil nortadalafil nortadalafil nortadalafil nortadalafil nortadalafil nortadalafil nortadalafil

−15 +14 +14 +14 +14 +14 +16 +14 +14 +30 −14 −29 0 +16 −14 +76 +42 +44 +45 +94 −29 +16 −22 +44 +14 +28 +56 +112 +68 +15 +57

405 368 420 239 325 427 246 288 348 264 607 547 675 592 411 248 193 243 220 39 392 407 172 209 304 330 268 329 233 347 264

>30 >30 11 >30 >30 6 >30 >30 14 >30 1 1 1 1 14 >30 >30 >30 >30 >30 14 10 >30 >30 >30 >30 >30 >30 >30 >30 >30

366 304 437 192 222 370 200 223 361 260 307 183 705 364 992 350 421 226 128 0 362 310 94 553 115 208 71 114 58 99 143

2 4 1 30 3 1 6 8 3 2 1 14 1 1 1 2 10 >30 >30 >30 2 2 >30 2 >30 >30 >30 >30 >30 >30 >30

604 815 835 821 870 703 899 910 620 671 887 567 705 860 991 644 760 736 714 533 829 936 388 866 940 896 920 832 852 867 640

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1

Peak-Count Scoring (PCS). Table 2 shows the search results acquired with the PCS method. The pMF values were calculated to be below 500 in all cases, except for hongdenafil and its close analogues, and the search for the true compounds returned candidates below the third position in the rank list. Considering that our PCS scoring algorithm is built on the basis of the scoring rules, in which the m/z value match of the precursor ion has a strong contribution and the matches with the fragment m/z values are additional, these poor search results for (mock) unknown ED compounds are not unexpected. Upon introduction of substituents, i.e., R1, R2, and/or R3, in the structures depicted in Scheme 1, a DeltaMass (Δm) is created with respect to the ED compounds present in the database. Therefore, the shifted m/z values of the precursor ion and the fragments of the query compound are very likely not to gain scores in the scoring calculations. Exceptionally, in the case of compounds of the hongdenafil family, the scoring results were excellent. This unexpected result arose from the fact that they share three common fragments at m/z 297, 325, and 341, and furthermore, only a few additional fragments were observed in the LC−MS/MS spectra. In particular, dimethylacetildenafil is a structural isomer of hongdenafil with Δm = 0, and thus it could readily be ranked as the no. 1 candidate for the hongdenafil family. On the other hand, when all of the 41 ED drugs and analogues were included in the in-house database, the search scores were 999 in the all cases, as expected. However, unlisted

(unknown) ED analogues could not readily be screened (identified) with the PCS method. Simple Similarity Search (SSS) and Hybrid Similarity Search (HSS). The search results for the LC−MS/MS spectra of 31 ED drug analogues using SSS and HSS are also shown in Table 2. Note that the LC−MS/MS spectra of only 10 representative ED drugs are included in the database. In the case of SSS, the search results were, as expected, generally poor, with the scores being very low with several exceptions. As was the case for PCS, the shifts of the m/z values of the precursor ion and the fragments corresponding to the DeltaMass lowered the sMF values, thus resulting in very low accuracy. On the other hand, HSS mostly hit the correct representative ED drug for each query compound, except for the search for acetylvardenafil, which returned vardenafil as the second compound in the rank list. The hMF scores were calculated to be above 600 in most cases. As an illustration, Figure 6 shows the PCS, SSS, and HSS results for the LC− MS/MS spectrum of tadalafil as query compound. As can be seen in Figure 6a, in the case of PCS and SSS, the pMF and sMF scores were very low (pMF = 304 and sMF = 115), with only a few matches with the fragments of the library compound nortadalafil. Since the precursor ion and a number of fragments were shifted by Δm,int = +14 due to the presence in tadalafil of a methyl group instead of hydrogen at the nitrogen position of 9126

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128

Article

Analytical Chemistry

Figure 6. (a) Top, MS/MS spectrum of tadalafil (query compound), and bottom, MS/MS spectrum of nortadalafil (library compound) with pMF and sMF scores denoted. (b) Top, MS/MS spectrum of tadalafil (query compound), and bottom, hybrid MS/MS spectrum of nortadalafil (library compound) with hMF score denoted. In (b), the shifted peaks are shown in red, and the peaks before the shift by a DeltaMass (Δm,int) are given in gray.

Figure 7. (a) Top, MS/MS spectrum of methylhydroxyhomosildenafil (query compound), and bottom, hybrid MS/MS spectrum of hydroxyhomosildenafil (library compound) with hMF score denoted. Note that sMF and pMF scores were 239 and 192, respectively. (b) Top, MS/MS spectrum of pseudovardenafil (query compound), and bottom, hybrid MS/MS spectrum of vardenafil (library compound) with hMF score denoted. Note that sMF and pMF scores were 392 and 362, respectively. In both (a) and (b), the shifted peaks are shown in red, and the peaks before the shift by a DeltaMass (Δm,int) are given in gray.

piperazine-2,5-dione in comparison to nortadalafil, the PCS and SSS search results were not good. In contrast, Figure 6b shows the constructed hybrid spectrum of nortadalafil in comparison with the experimental LC−MS/MS tadalafil query spectrum. As can be seen, the number of matched fragment peaks increases with the concomitant increase of the hMF score, which becomes as high as 940 (cf. sMF = 115). Moreover, the highly abundant peak at m/z 254 in the original nortadalafil spectrum aligns well with the peak at m/z 268 in the tadalafil spectrum when it is shifted by Δm,int = +14. Figure 7 shows two more examples evidencing the increase of the match score (hMFs) in HSS in comparison with those in PCS and SSS (pMFs and sMFs). In the case of the query compound methylhydroxyhomosildenafil (Figure 7a), the hMF score was much improved to 821 in comparison with pMF = 239 and sMF = 192 (see Table 2), and its m/z value was found to be modified from that of hydroxysildenafil in Δm,int = +14. Likewise, the m/z value of pseudovardenafil was identified to be shifted from that of vardenafil in Δm,int = −29, with hMF = 829 (pMF = 392, sMF = 362) (see Table 2). On the other hand, the average HSS search time was observed to be 2.35 s per spectrum. From these examples, it is clear that HSS can hit the ED drugs and analogues that PCS or SSS would miss when they are used as a single search engine. However, it is also noteworthy that the successful HSS search rate was shown to decrease when the query compound under examination has two or three modifications from the library compounds.22 We are certain that this weakness will largely be overcome when a larger number of ED drugs and analogues are included in the database, not just the 10 entries used in this study for the proof of principle.

unknown ED drugs/analogues. AI-SIDA consists of three different layers: a first layer called LC-MS/MS viewer, a second AI classifier layer, and a third layer named as Identifier. The second AI classifier layer, which has a built-in ED drugs/ analogues classification ANN model, was shown to effectively classify the query MS/MS spectra into four categories: i.e., three different classes of ED drugs/analogues and a fourth group consisting of non-ED drugs. However, caution should be paid when this AI classif ier is applied to the analysis of real samples. In a preliminary test with the data set acquired in different laboratories, it was found that the successful ED drugs/analogues classification rate did not reach 100%, although it was very high. This is due to the fact that the ANN model used in the present study was not optimized to the mass spectrometer conditions used in different laboratories. Therefore, it is highly recommended that the ANN classification model be built according to specific data set, even including those acquired at different collision energies. The third Identif ier layer includes three search engines: PCS, SSS, and HSS. In particular, HSS was shown to successfully identify the query compounds even for unlisted LC−MS/MS spectra. As demonstrated recently, HSS is a powerful search tool for screening compounds not listed in the database.22−25 In the near future, it is expected that further studies on the use of HSS for different types of compounds will be reported.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.9b01643. Proposed fragmentation mechanisms for protonated vardenafil and tadalafil, a list of 56 ED drug compounds and 131 other compounds, function keys in AI-SIDA



CONCLUSIONS In the present study, the constructed “AI-SIDA” standalone software was demonstrated to be very powerful in screening 9127

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128

Article

Analytical Chemistry



(22) Moorthy, A. S.; Wallace, W. E.; Keardsley, A. J.; Tchekhovskoi, D. V.; Stein, S. E. Anal. Chem. 2017, 89, 13261−13268. (23) Burke, M. C.; Mirokhin, Y. A.; Tchekhovskoi, D. V.; Markey, S. P.; Thompson, J. H.; Larkin, C.; Stein, S. E. J. Proteome Res. 2017, 16, 1924−1935. (24) Remoroza, C. A.; Mak, T. D.; De Leoz, M. L. A.; Mirokhin, Y. A.; Stein, S. E. Anal. Chem. 2018, 90, 8977−8988. (25) Blaženovic, I.; Kind, T.; Sa, M. R.; Ji, J.; Vaniya, A.; Wancewicz, B.; Roberts, B. S.; Torbašinovic, H.; Lee, T.; Mehta, S. S.; Showalter, M. R.; Song, H. S.; Kwok, J.; Jahn, D.; Kim, J. Y.; Fiehn, O. Anal. Chem. 2019, 91, 2155−2162. (26) Coombes, K. R.; Baggerly, K. A.; Morris, J. S. Pre-processing mass spectrometry data. In Fundamentals of Data Mining in Genomics and Proteomics; Dubitzky, W., Granzow, M., Berrar, D. P., Eds.; Springer: 2007; pp 79−102. (27) Matthews, B. W. Biochim. Biophys. Acta, Protein Struct. 1975, 405, 442−451. (28) Stein, S. E. J. Am. Soc. Mass Spectrom. 1994, 5, 316−323. (29) Stein, S. E.; Scott, D. R. J. Am. Soc. Mass Spectrom. 1994, 5, 859−866. (30) https://hanbinoh.sogang.ac.kr/hanbinoh/2179.html. (31) Lee, J. M.; Park, M. J.; Hong, J. K.; Oh, H. B.; Moon, B. J. Bull. Korean Chem. Soc. 2018, 39, 190−196.

software, ED drug compounds used as query compounds, and PCS search results for sulfamethoxazole (PDF)

AUTHOR INFORMATION

Corresponding Authors

*J.H.: tel, (82) 2 961 9255; e-mail: [email protected]. *H.B.O.: tel, (82) 2 705 8444; e-mail: [email protected]. ORCID

Han Bin Oh: 0000-0001-7919-0393 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This study was financially supported by the Ministry of Food and Drug Safety of Korea (2018, 18182MFDS425). Furthermore, all authors are thankful to the Ministry of Food and Drug Safety of Korea for the kind donation of many ED drugs and analogues. H.B.O. is also thankful for the financial support by a grant of the Korea Health Technology R&D project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number HI17C1238).



REFERENCES

(1) Vickers, M. A.; Satyanarayama, R. Int. J. Impot. Res. 2002, 14, 466−471. (2) Shindel, A. W. J. Sex. Med. 2009, 6, 2352−2364. (3) Patel, D. N.; Li, L.; Kee, C.; Ge, X.; Low, M.; Koh, H. J. Pharm. Biomed. Anal. 2014, 87, 176−190. (4) Venhuis, B. J.; de Kaste, D. J. Pharm. Biomed. Anal. 2012, 69, 196−208. (5) Harte, C. B.; Meston, C. M. J. Sex. Med. 2012, 9, 1852−1859. (6) Jackson, G.; Arver, S.; Banks, I.; Stecher, V. J. Int. J. Clin. Pract. 2010, 64, 497−504. (7) Campbell, N.; Clark, J. P.; Stecher, V. J.; Goldstein, I. J. Sex. Med. 2012, 9, 2943−2951. (8) Kern, S. E.; Nickum, E. A.; Flurer, R. A.; Toomey, V. M.; Litzau, J. J. J. Pharm. Biomed. Anal. 2015, 103, 99−103. (9) Low, M.; Zeng, Y.; Li, L.; Bloodworth, R. L. B. Drug Saf. 2009, 32, 1141−1146. (10) Park, H. K.; Lee, J. M.; Kim, J. Y.; Hong, J. K.; Oh, H. B. J. Liq. Chromatogr. Relat. Technol. 2017, 40, 790−797. (11) Zou, P.; Oh, S. S.; Hou, P.; Low, M. Y.; Koh, D. H. J. Chromatogr. A 2006, 1104, 113−122. (12) Ng, C. S.; Law, T. Y.; Cheung, Y. K.; Ng, P. C.; Choi, K. K. Anal. Methods 2010, 2, 890−896. (13) Song, F.; El-Demerdash, A.; Lee, S. J. J. Pharm. Biomed. Anal. 2012, 70, 40−56. (14) Kim, E. H.; Seo, H. S.; Ki, N. Y.; Park, N. H.; Lee, W. W.; Do, J. A.; Park, S. K.; Bae, S. Y.; Moon, B. J.; Oh, H. B.; Hong, J. K. J. Chromatogr. A 2017, 1491, 43−56. (15) Böcker, S.; Dührkop, K. J. Cheminf. 2016, 8, 5. (16) Aguilar-Mogas, A.; Sales-Pardo, M.; Navarro, M.; Guimerà, R.; Yanes, O. Anal. Chem. 2017, 89, 3474−3482. (17) Fu, Y.; Zhang, Y.; Zhou, Z.; Lu, X.; Lin, X.; Zhao, C.; Xu, G. Anal. Chem. 2018, 90, 8454−8461. (18) Wei, J. N.; Belanger, D.; Adams, R. P.; Sculley, D. Predicting electron-ionization mass spectrometry using neural networks, arXiv:1811.08545v1. (19) Setser, A. L.; Smith, R. W. Forens. Chem. 2018, 11, 77−86. (20) Bonetti, J. Forens. Chem. 2018, 9, 50−61. (21) Allen, F.; Pon, A.; Greiner, R.; Wishart, D. Anal. Chem. 2016, 88, 7689−7697. 9128

DOI: 10.1021/acs.analchem.9b01643 Anal. Chem. 2019, 91, 9119−9128