3D-MEDNEs: An Alternative “in Silico” Technique for Chemical

Graham , D. J., Malarkey , C., and Schulmerich , M. V. (2004) Information content in organic molecules: Quantification and statistical structure via B...
0 downloads 0 Views 2MB Size
Chem. Res. Toxicol. 2008, 21, 619–632

619

3D-MEDNEs: An Alternative “in Silico” Technique for Chemical Research in Toxicology. 2. Quantitative Proteome-Toxicity Relationships (QPTR) based on Mass Spectrum Spiral Entropy Maykel Cruz-Monteagudo,†,‡ Humberto González-Díaz,*,⊥ Fernanda Borges,† Elena Rosa Dominguez,‡ and M. Natália D.S. Cordeiro§ Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, UniVersity of Porto, 4150-047 Porto, Portugal, Applied Chemistry Research Center, Faculty of Chemistry and Pharmacy, Central UniVersity of Las Villas (UCLV), Santa Clara 54830, Cuba, Unit of Bioinformatics & ConnectiVity Analysis (UBICA), Institute of Industrial Pharmacy, and Department of Organic Chemistry, Faculty of Pharmacy, UniVersity of Santiago de Compostela, 15782 Santiago de Compostela, Spain, and REQUIMTE, Department of Chemistry, Faculty of Sciences, UniVersity of Porto, 4169-007, Porto, Portugal ReceiVed August 20, 2007

Low range mass spectra (MS) characterization of serum proteome offers the best chance of discovering proteome-(early drug-induced cardiac toxicity) relationships, called here Pro-EDICToRs. However, due to the thousands of proteins involved, finding the single disease-related protein could be a hard task. The search for a model based on general MS patterns becomes a more realistic choice. In our previous work (González-Díaz, H., et al. Chem. Res. Toxicol. 2003, 16, 1318–1327), we introduced the molecular structure information indices called 3D-Markovian electronic delocalization entropies (3D-MEDNEs). In this previous work, quantitative structure-toxicity relationship (QSTR) techniques allowed us to link 3DMEDNEs with blood toxicological properties of drugs. In this second part, we extend 3D-MEDNEs to numerically encode biologically relevant information present in MS of the serum proteome for the first time. Using the same idea behind QSTR techniques, we can seek now by analogy a quantitative proteome-toxicity relationship (QPTR). The new QPTR models link MS 3D-MEDNEs with drug-induced toxicological properties from blood proteome information. We first generalized Randic’s spiral graph and lattice networks of protein sequences to represent the MS of 62 serum proteome samples with more than 370 100 intensity (Ii) signals with m/z bandwidth above 700–12000 each. Next, we calculated the 3D-MEDNEs for each MS using the software MARCH-INSIDE. After that, we developed several QPTR models using different machine learning and MS representation algorithms to classify samples as control or positive Pro-EDICToRs samples. The best QPTR proposed showed accuracy values ranging from 83.8% to 87.1% and leave-one-out (LOO) predictive ability of 77.4–85.5%. This work demonstrated that the idea behind classic drug QSTR models may be extended to construct QPTRs with proteome MS data. Introduction The ability to predict the toxic effects of potential new drugs is crucial to prioritizing compound pipelines and eliminating costly failures in drug development. The inability to accurately predict toxicity early in drug development cost the pharmaceutical industry $8 billion in 2003, approximately one-third the cost of all drug failures. Indeed, predictive toxicology and “omics” technologies are of growing interest to government regulators, who have called for more predictive toxicology and toxicogenomics or toxicoproteomics approaches to be used in assessing drug safety. Predictive toxicology is still in its early stages, characterized by the use of gene or protein expression profiles to gain a basic understanding of whether a compound has a “clean” or “messy” profile. The tremendous advantages of these approaches, as well as pressure from the FDA to improve toxicology testing in drug development, indicate that advance* To whom correspondence should be addressed. Tel: +34-981-563100. Fax: +34-981 594912. E-mail: [email protected] or qohumbe@ usc.es. † Physico-Chemical Molecular Research Unit, University of Porto. ‡ UCLV. ⊥ University of Santiago de Compostela. § REQUIMTE, University of Porto.

ments in predictive toxicology will play an increasing and accelerating role in drug development (1). Specifically, cardiotoxicity is a serious adverse effect of chemotherapy ranging from relatively benign arrhythmias to potentially lethal conditions (2, 3), where the extent and severity of the necrosis can be monitored by the levels of bioactive markers (4). However, the number of new biomarkers reaching routine clinical use remains unacceptably low (5, 6). At the same time, body fluids are a protein-rich information reservoir that contains the traces of what the blood has encountered on its constant perfusion and percolation throughout the body (7). In this sense, the blood proteome is changing constantly as a consequence of the perfusion of the organ undergoing drug-induced damage, and this process then adds to, subtracts from, or modifies the circulating proteome (8, 9). So, a blood proteome represents a potential target for the detection of proteome-(early drug-induced cardiac toxicity) relationships called here Pro-EDICToRs (7). Thus, due to the optimal performance in the low mass range exhibited by mass spectra (MS), the use of this method applied to proteomics may offer the best chance for the study of Pro-EDICToRs type phenomena. However, due to the thousands of intact and cleaved proteins in the human serum proteome, finding the single

10.1021/tx700296t CCC: $40.75  2008 American Chemical Society Published on Web 02/08/2008

620

Chem. Res. Toxicol., Vol. 21, No. 3, 2008

Cruz-Monteagudo et al.

Figure 3. Process of generation of a serum proteome mass spectrum spiral graph. Figure 1. Schematic representation of the MS spiral graph-based study of Pro-EDICToRs.

Figure 4. MARCH-INSIDE interface view of a serum proteome MS Cartesian 2D-lattice-like graph.

Figure 2. MARCH-INSIDE interface view of a serum proteome MS spiral graph.

disease-related protein could be like searching for a needle in a haystack, requiring the separation and identification of each protein biomarker. In addition, most commonly used toxicity biomarkers appear only when significant organ damage has occurred. For these reasons, to identify patterns by using the overall serum proteome MS information instead of directly identify a single marker candidate represents a more attractive and realistic choice for this purpose (10, 11). In this sense, Petricoin et al. successfully identified patterns of low molecular weight biomarkers as ion peak features within the spectra and used these patterns as the diagnostic end point itself for the early detection of drug-induced cardiac toxicities (12) and ovarian (13) and prostate cancer (14). In any case, as referred above the detection of single biomarker signals is often a hard-to-manage problem due to the very large amount of information that serum proteome MS contain. In addition, diseases and toxicity phenomena are often multifactorial, and consequently predictions based only on a few proteomics

Table 1. Results for the LDA-Based QPTR Model on Training and LOO Cross Validation training

parameter (%)

LOO CV

class

NCT

CT

83.87

accuracy

77.42

class

NCT

CT

NCT CT

28 10

0 24

70.59 100

sensitivity specificity

67.65 89.29

NCT CT

25 11

3 23

biomarkers may be overfitted due to nonrepresentative data selection. In this sense, one could expect that a predictive model based on information about multiple-biomarker patterns should be a more realistic alternative. We can call a model making this kind of prediction a quantitative proteome-toxicity relationship (QPTR) model in close resemblance to the more classic quantitative structure–activity relationship (QSAR); quantitative structure–property relationship (QSPR), and quantitative structure-toxicity relationship (QSTR) models (15–39). Using the concept of entropy to encode information about MS patterns may be one way to tackle the QPTR problem. One of the pioneer works on the use of Shannon’s entropy to encode molecular structure in QSAR studies was published by Kier in 1980 (40, 41). In this work, the author states that a drug molecule is considered to be an information source. Quantifica-

Proteome MS Spiral Graph Detection of Toxicity

Figure 5. LDA model receiver operating characteristic (ROC) curve.

Figure 6. (A) Scatter plot of standardized residual vs sample order number and (B) histogram of residuals.

tion of the information content using Shannon’s equation gives negative entropy values, also known as molecular negentropy (40, 41). Many other authors have used Shannon’s entropy parameters to encode small molecule structure (17, 41–48). These concepts have been extended to describing protein (49, 50), DNA sequences (51) or protein–protein interaction networks (52). More recently, the information properties of

Chem. Res. Toxicol., Vol. 21, No. 3, 2008 621

organic molecules have been the subject of in depth research by Graham et al. (53–58). Our group has introduced an approach to study the information content of different molecular systems in terms of entropy measures derived with a Markov model (MM). In general, our approach begins with the classic description of a complex system by means of a graph or network. This graph is composed of nodes (the parts of the system) and edges expressing some kind of relationships between pairs of nodes (59–62). After that, we construct the Markov matrix of the system. The elements of this matrix are the probabilities of direct interaction between the parts of the system. On the basis of Markov chain theory and Chapman-Kolmogorov equations, we can obtain the probabilities of interactions between parts of the system not directly connected, computing the natural powers of the Markov matrix (63). The method is very flexible, depending mainly on the assignation of the roles for system, parts of the system (nodes), and relationships between parts (edges). This technique describes the system by using different numerical parameters derived from the Markov matrix. Between the numerical parameters, we can calculate to describe information of the system; we can find spectral moments, potentials, coupling numbers, average atomic physico-chemical properties, and others (64–67). In a recent review, we exposed the details of the method making a revision of other connectivity or topological graph theoretic methods used in chemistry, toxicology, medicine, and bioinformatics (68). From the parameters used by our group, the Shannon’s entropy of the different powers of the Markov matrix can be considered between the most used ones. The entropies calculated by us included systems such as small molecules of drugs and toxic substances, RNA secondary structures, protein sequences, and viral surfaces (69–75). The nature of this technique is topologically based mainly on the characterization of the connectivity of the system by numerical parameters. However, it is possibly to incorporate 3D information without difficulty (76–78). This approach was referred to from the first works (more entered on the use of spectral moments) as the Markovian chemicals in silico design (MARCHINSIDE) technique (79–81). In any case, applications exceeded the first theoretic definition and were confirmed many times by experimental outcomes in medicinal chemistry, molecular biology, and toxicology (67, 73–75, 82–90). The application of graph theory to MS was first proposed by Bartels for peptide sequencing (91). The basic idea consists in transforming a MS into a graph called a “spectrum graph”. Basically, each peak in the experimental MS is represented as a vertex (or several vertices) in the spectrum graph, and a directed edge is established between two vertices if the mass difference of the two vertices equals the mass of one or several amino acids (peptide loss) (91). An alternative approach based on graph theory is representing a collection of several MS as a network or graph considering that each MS as a whole is a node of the network (92–95). In the previous case, the MS of each protein is represented as a network (the nodes are fragmentation peptides), and in this last case, the MS of many proteins are represented as a network (the nodes are the total MS of a protein). In any case, no serious steps have been given to integrate proteome MS graphs or networks with QSAR/QSTR techniques to seek QPTR models based on MS information encoded by entropy indices. In the first paper of this series, we reported a new class of entropy parameters called the 3DMarkovian electronic delocalization negentropies (3D-MEDNEs). The 3D-MEDNEs are Shannon’s entropy type indices for a MM with experimentally demonstrated applications in QSTR studies

CT

-0.91 -0.31 -4.66 -17.4 -33.2 61.6 -36.9 0.3 41.3 -162.4 622.1 -682.1 262.2 -81.7

NCT vars

b0 O(sθ1*) O(sθ3*) O(sθ5*) O(sθ6*) O(sθ8*) O(sθ10*)

QPTR

-1.25 0.37 5.66 21.2 40.3 -74.8 44.8

0.34 -0.68 -10.32 -38.6 -73.4 136.4 -81.7

Materials and Methods

-0.9 18.6 -73.3 280.9 -308.1 118.4 -36.8

Kolmogorov–Smirnov test for normal distribution fit. b Shapiro-Wilk test for normal distribution fit. a

CT

(96). In the present work, we decided for the first time to extend these indices to identify Pro-EDICToRs type patterns by generating a QPTR model. This QPTR model is based on a MS graph theoretical approach instead of directly identifying patterns within the high-throughput MS. In the first instance, we propose an alternative graph theoretical representation to the classic proteome spectrum graph that is more compact and suitable to manage. The new proteome spectrum graph is constructed here based on the four-color spiral maps introduced by Randi∫ et al. for DNA sequence representation (97) Next, we calculate the new 3D-MEDNEs now referred as 3DMarkovian electronic dissociation negentropies (pointing to the nature of MS processes). These MS-spiral 3D-MEDNEs (θk) are then used as inputs to derive QPTR models as an alternative method for Pro-EDICToRs study in 62 drug-induced cardiotoxicity and control serum proteome MS samples. A graphic representation of the approach proposed in this work for the early detection of drug-induced cardiac toxicities is shown in Figure 1.

-1.2 -22.6 89.0 -341.1 374.1 -143.8 44.8

NCT vars

Cruz-Monteagudo et al.

b0 s θ1* s θ3* s θ5* s θ6* s θ8* s θ10* -69.2 752.6 -2772.6 10355.9 -11245.8 4271.9 -1318.2 -2299 38656 -166090 533686 -486931 81361 2001

QPTR CT

-2368 39409 -168862 544042 -498177 85633 683

NCT vars

b0 θ1 θ3 θ5 θ6 θ8 θ10 62 0.28 -0.60 0.81 0.17 0.5 for the MS region j represented by the node nj, otherwise oj ) 0.5. The parameter oj and the Ijm/zj are then the discrete and the continuous forms, respectively, for the third dimension (z) of the present probabilities. That is to say, the 1pij are 3D parameters depending on the Cartesian coordinates (x, y) and on the spectral intensity weighted mass-charge values Ijm/zj or their discrete form oj. Consequently, the new entropies derived from 1pij (see below) are also 3D parameters. (b) The zero-order absolute initial probabilities vector Aπ0 (see eq 3). This vector lists the absolute initial probabilities Apk(j) to reach a node ni from a randomly selected node nj, A

p0(j) )

1 N

(3)

where N represents the number of nodes (spectral regions) in the spiral graph. Due to the particularities of the graph representation used here Apk(j) only depends on the total number of data points or spectral regions on the graph. Consequently, all the nodes in the graph have the same and constant value of A pk. Because the elements of the matrices kΠ (which are the k natural powers of the matrix 1Π) depend on the adjacency relationships between the nodes on the graph, the use of Markov chains (MCH) theory thus allows calculation of the spectral proteomic stochastic entropies (sθk) for any node nj that one can reach in the spiral graph by moving from any node ni throughout the entire graph using walks of length k: s

θk ) Sh · (1Π)k · Aπk ) Sh · Aπk ) n

where Rij ) 1 if and only if the two nodes ni and nj are neighbors placed at topological distance k ) 1 in the spiral graphs, Rij )

- kB

∑ Apk(j)log Apk(j) (4)

j)1

Proteome MS Spiral Graph Detection of Toxicity

Chem. Res. Toxicol., Vol. 21, No. 3, 2008 625

Table 3. Machine Learning Classification Algorithms Based on MS Spiral Graph sθk Indices ML parameters ML scheme short name LDA Func.Log. Bayes Net C.Naive.Bayes Naïve.Bayes Naïve.Bayes.M. Naïve.Bayes.S. Naïve.Bayes.U. ML.Perceptron RBF.Network Simple.Log. SMO Perceptron Lazy.IB1 IBk KSpiral LWL Ada.Boost.M1 Att.Sel.Class. Bagging Class.Via.Reg. CV.Param.Sel. Decorate Filt.Classif. Grading Log.Boost M.Boost.AB Multi.C.Classif. MultiScheme Ord.Classif. Inc.Log.Boost R&.Comm Stacking StackingC Thresh.Sel. Vote ADTree Decis.Stump J48 LMT NB Tree R&.Forest REPTree Hyper.Pipes VFI Conj.Rule Decis.Tab. Jrip Nnge OneR PART Ridor

a

training accuracy

sensitivity

ct ) 41.39sθ1 - 162.4sθ3 + 622.05sθ5 - 682.15sθ6 + 262.24sθ8 - 81.7sθ10 - 0.34 ct × 10-4 ) 0.3sθ1 - 1.3sθ3 +4.9sθ5 - 5.9sθ6 + 3.9sθ8 - 1.9sθ10 - 0.3

83.87

70.59

87.1

76.5

54.8 72.6 62.9 54.8 62.9 62.9 71.0 71.0 75.8 53.2

100 61.8 38.2 100 38.2 38.2 85.3 85.3 70.6 94.1

54.8 64.5 87.1 79.0 75.8 74.2 74.2

100 100 76.5 70.6 70.6 91.2 67.6

0 21.4 100 89.3 82.1 53.6 82.1

79.0 75.8 54.8 79.0 54.8 54.8 82.3 71.0 87.1

70.6 70.6 100 70.6 100 100 76.5 79.4 76.5

89.3 82.1 0 89.3 0 0 89.3 60.7 100

54.8 74.2

100 67.6

0 82.1

54.8 87.1 54.8 54.8 87.1

100 76.5 100 100 76.5

54.8 82.3 71.0 74.2

100 76.5 85.3 67.6

0 89.3 53.6 82.1

70.6 79.4 76.5 67.6 100 0.59 91.2 100 85.3 76.5 70.6

82.1 60.7 100 85.7 0.36 96.4 50.0 0 53.6 100 85.7

72.6

91.2

50.0

77.4

73.5

82.1

ct )11.3 + 38.1sθ1 - 42.95sθ10 ct ) 0.16sθ1 + 0.52sθ3 + 0.67sθ5 + 0.71sθ6 + 0.72sθ8 + 0.74sθ10 - 3.04

if sθ1 e 1.82 ct or if sθ1 > 1.82 if sθ1 e 1.89 nct else ct

-4

ct × 10 ) 0.3 θ1 - 0.1 θ3 + 0.5 θ5 - 0.6sθ6 + 0.4sθ8 - 0.2sθ10 - 252.6 s

s

s

if sθ1 e 1.82 ct or if sθ1 > 1.82 sθ1 e 1.89 nct else ct

ct × 10-4 ) 0.3sθ1 – 1.3sθ3 + 4.9sθ5 - 5.9sθ6 + 3.9sθ8 – 1.9sθ10 - 252.6

if sθ1 e 1.82 ct or if sθ1 > 1.82 if sθ1 e 1.89 nct or if sθ1 > 1.89 ct ct )11.3 + 38.1•sθ1 - 42.95• sθ10

if sθ10 > 1.95 and sθ1 e 1.92 nct if sθ10 e 1.95 nct else ct if sθ8 < 1.90 ct or 1.82 & sθ1 e 1.89 and sθ10 > 1.95 nct else ct ct except if sθ10 > 1.95 and sθ1 e 1.92 nct except if sθ5 e 1.91 and sθ1 > 1.84 nct

ZeroR a

LOO cv

rule/tree/function

75.8 71.0 87.1 75.8 56.5 46.8 72.6 54.8 71.0 87.1 77.4

54.8

100

specificity

accuracy

sensitivity

specificity

100

77.42

67.65

89.29

100

85.5

73.5

72.6

61.8

85.7

61.3

67.6

53.6

72.6 71.0 61.3

67.6 61.8 61.8

78.6 82.1 60.7

61.3

64.7

57.1

59.7 58.1

64.7 79.4

53.6 32.1

72.6

64.7

82.1

67.7 64.5 85.5

67.6 73.5 73.5

67.9 53.6 100

61.3

67.6

53.6

72.6

67.6

78.6

72.6

70.6

75.0

71.0

61.8

82.1

61.3

67.6

53.6

59.7 64.5 69.4 56.5

64.7 70.6 70.6 61.8

53.6 57.1 67.9 50.0

62.9 67.7

73.5 61.8

50.0 75.0

64.5

64.7

64.3

0 85.7 92.9 0 92.9 92.9 53.6 53.6 82.1 0.36

0 100 0 0 100

100

0

For the details about all these schemes, see ref 136.

where Aπkis the vector listing the absolute probabilities Apk(j) to reach a node j moving throughout a walk of length k with respect to any node in the spectral graph. The operator Shapplies the -kBx log x function to each element Apk(j) of the vector

A

πk, resembling Shannon’s entropy-like magnitudes. Shannon entropies are of major importance to encode biologically relevant information not only for small-sized drugs but for large systems like proteins too. In this equation, kB is the Boltzmann’s

626

Chem. Res. Toxicol., Vol. 21, No. 3, 2008

Cruz-Monteagudo et al.

Table 4. Comparing Spiral vs Lattice-like Graphs with Different Classification Techniques parameters training technique/(scheme)a

rule/tree/function

s

θk-LDA

s

θk-MLC (logistic function)b

s

θk-PLS PLS LV-LDA

PLS LV-MLC (random tree)

b

PCA factor-LDA PCA factor-MLC (lazy Kspiral)b l

θk-LDA

l

θk-MLC (lazy IBk)b θk-PLS PLS LV-LDA l

PLS LV-MLC (OneR)b PCA factor-LDA PCA factor-MLC (OneR)b

accuracy

sensitivity

LOO cv specificity

Spiral-Like MS Graph Representation (sθk) s ct ) 41.39 θ1 - 162.4sθ3 + 83.87 70.59 622.05sθ5 - 682.15sθ6 + s s 262.24 θ8 - 81.7 θ10 - 0.34 87.1 76.5 ct ) 2883.4sθ1 - 12705.1sθ3 + 49845.7sθ5 - 58993.7sθ6 + 38608.6sθ8 - 19449.5sθ10 252.6 59.68 55.88 ct ) 7492PLS LV1 +233PLS LV2 74.19 67.65 + 4PLS LV3 - 952 PLS LV3 < 0.08 | PLSL V2 < 87.1 76.5 -0.04 | | PLS LV2 < -0.08 | | | PLS LV3 < -0.29: NCT (1/0) | | | PLS LV3 g -0.29: CT (4/0) | | PLS LV2 g -0.08: NCT (2/0) | PLS LV2 g -0.04 | | PLS LV3 < 0.06 | | | PLS LV1 < 0.13: CT (10/0) | | | PLS LV1 g 0.13 | | | | PLS LV2 < -0.04: CT (6/0) | | | | PLS LV2 g -0.04 | | | | | PLS LV3 < -0.18: CT (2/0) | | | | | PLS LV3 g -0.18: NCT (1/0) | | PLS LV3 g 0.06 | | | PLS LV1 < 0.11: NCT (1/0) | | | PLS LV1 g 0.11: CT (2/0) PLS LV3 g 0.08 | PLS LV2 < -0.04: NCT (17/3) | PLS LV2 g -0.04 | | PLS LV3 < 0.13 | | | PLS LV3 < 0.09: NCT (13/5) | | | PLS LV3 g 0.09: CT (2/0) | | PLS LV3 g 0.13: NCT (1/0) ct ) 0.59PCA factor1 - 0.93PCA 74.19 67.65 factor2 + 0.24 87.1 76.5 Cartesian 2D-Lattice-like MS Graph Representation (lθk) ct ) -80173lθ4 + 111184lθ6 + 58.06 32.35 169228lθ8 - 228212lθ9 + l 28087 θ10 - 43 71.0 100 56.45 35.29 ct ) 361PLS LV1 +11PLS LV2 + 54.83 32.35 4PLS LV3 - 46 if -0.0541 e PLS LV2 < 0.1676; 66.13 94.1 then class ) nct otherwise; then class ) ct ct ) 0.30PCA factor1 - 0.37PCA 53.23 32.35 factor2 + 0.20 if -0.4402 e PCA factor1 < 66.13 94.1 1.2853 then class ) nct otherwise; then class ) ct

accuracy

sensitivity

specificity

100

77.42

67.65

89.29

100

85.5

73.5

59.68 64.71

55.88 72.58

64.29 82.14

72.58

67.6

78.6

72.58

64.71

82.14

77.41

70.6

85.7

89.29

53.22

29.41

85.71

35.7 82.14 82.14

62.9 53.23 54.84

94.1 29.41 32.25

25 82.14 82.14

32.1

64.52

94.1

28.6

78.57

53.23

32.35

78.57

32.1

64.52

94.1

28.6

64.29 82.14 100

82.14 100

100

a Linear discriminant analysis (LDA), partial least squares (PLS), partial least squares latent variable (PLS LV), principal components analysis factor (PCA Factor), machine learning classifier (MLC). b For details about these classification schemes, see ref 136.

constant, which is used as a physical unit scaling value. The MS θk entropies encode in a stochastic manner the disorder degree induced by the pool of proteins registered at specific spectral regions (nodes placed at different distances in the spiral reticule). As can be noted in eq 4 the sθk can be written using a MM as the product of Aπ0 and the natural powers of the matrix 1 ΠbasedontheChapman-Kolgomorovequations(42,43,49,116). Definition of Serum Proteome Cartesian 2D-Lattice-like Mass Spectrum Graphs. A second graph theoretical representation is developed in order to test another source of graph representation and at the same time the influence of the degree of compression over the performance in detecting Pro-EDICToRs of classifiers using numerical indices derived from this type of representation. Hence, the number of data points in the

binned data files was condensed now to 71 by including in each new data point the averaged m/z and I values of 100 consecutive data points. Each new data point condenses now the information encoded on 100 binned data points producing a less condensed source of spectral information. As with spiral graphs, the last data point was generated by using the last 105. All the averaged m/z and I values were replaced by their respective standardized values generating a new averaged and standardized data file consisting of 71 data points, which is now suitable for graph generation. Like for the spiral graphs, a cut off value of 0.5 is chosen for both m/z and I values related to each averaged data point. This cut off value is used to codify each data point according to their respective average m/z and I values allowing their representation as a node on a Cartesian

Proteome MS Spiral Graph Detection of Toxicity

Chem. Res. Toxicol., Vol. 21, No. 3, 2008 627

2D space. Each data point in the averaged data file is placed in a Cartesian 2D space with the first data point at the (0, 0) coordinate. The coordinates of the successive data points are calculated as follows in a similar manner to that for DNA spaces (117): (a) increases in +1 (abscissas) if the absolute m/z value is 0.5 for a data point (rightward step); (b) decreases in -1 (abscissas) if the absolute m/z value is >0.5 and the absolute I value is 0.5 for a data point (rightward step, upward step); (d) Decreases in -1 (ordinates) if the absolute m/z and I values are