On Predicting Medulloblastoma Metastasis by Gene Expression Profiling

Accurately predicting clinical outcome or metastatic status from gene expression profiles ... of the biggest hurdles facing the adoption of predictive...
0 downloads 0 Views 79KB Size
On Predicting Medulloblastoma Metastasis by Gene Expression Profiling Michael J. Korenberg* Department of Electrical and Computer Engineering, Queen’s University, Kingston, Ontario, Canada K7L 3N6 Received August 29, 2003

Accurately predicting clinical outcome or metastatic status from gene expression profiles remains one of the biggest hurdles facing the adoption of predictive medicine. Recently, MacDonald et al. (Nat. Genet. 2001, 29, 143-152) used gene expression profiles, from samples taken at diagnosis, to distinguish between clinically designated metastatic and nonmetastatic primary medulloblastomas, helping to elucidate the genetic mechanisms underlying metastasis and suggesting novel therapeutic targets. The obtained accuracy of predicting metastatic status does not, however, reach statistical significance on Fisher’s exact test, although 22 training samples were used to make each prediction via leave-oneout testing. This paper introduces readily implemented nonlinear filters to transform sequences of gene expression levels into output signals that are significantly easier to classify and predict metastasis. It is shown that when only 3 exemplars each from the metastatic and nonmetastatic classes were assumed known, a predictor was constructed whose accuracy is statistically significant over the remaining profiles set aside as a test set. The predictor was as effective in recognizing metastatic as nonmetastatic medulloblastomas, and may be helpful in deciding which patients require more aggressive therapy. The same predictor was similarly effective on an independent set of 5 nonmetastatic tumors and 3 metastatic cell lines also used by MacDonald et al. Keywords: metastasis • prediction • clinical outcome • gene expression • DNA chips • microarray

1. Introduction An unsolved problem in predictive medicine is to build effective classifiers of gene expression profiles based on only a few exemplars of the classes to be distinguished. In these circumstances, statistically based prediction methods may be at a disadvantage, because they rely on estimating means, standard deviations, etc. of gene expression levels for each class from very few representatives, to choose the “best” genes for distinguishing the classes. MacDonald et al.1 constructed predictors of metastatic status by adapting a “weighted voting” (WV) technique introduced by Golub et al.2 for distinguishing between various acute leukemia classes. Subsequently, this approach, along with several others, has been employed to predict clinical outcome of medulloblastoma patients.3 The MacDonald et al. study is intriguing and provides valuable clues to the genetic regulation of metastasis, identifying some genes critical to this process and suggesting possible targets for novel therapies, including use of specific inhibitors of platelet-derived growth factor receptor-R as possible new treatments of medulloblastoma.1 Moreover, prediction of metastasis has rarely been attempted elsewhere. Using Affymetrix G110 Cancer Chips, MacDonald et al.1 obtained gene expression profiles for 14 nonmetastatic (M0) and 9 metastatic (M+) medulloblastoma tumors. Over this dataset, a leave-one-out protocol was employed, where each * To whom correspondence should be addressed. Tel: (613) 533-2931. Fax: (613) 353-1729. E-mail: [email protected]. 10.1021/pr034069s CCC: $27.50

 2004 American Chemical Society

time a predictor was trained on 22 profiles and then used to predict metastatic status of the remaining profile, provided that a “prediction strength” measure exceeded a pre-set threshold. The procedure was repeated until an attempt had been made to classify each profile. Their predictors could make decisions about metastatic status for 18 of the 23 samples. Thirteen of the eighteen predictions were correct (4 of 8 metastatic, 9 of 10 nonmetastatic, Fisher’s exact test P < 0.0883,1 - tail). Although the predictions do not reach statistical significance on this test, the use of the prediction strength threshold had benefited the resulting accuracy. Indeed, it is clear from their Figure 2 that if vote decisions where the prediction strength was low were not excluded, then only 4 of 9 M+, and 11 of 14 M0, profiles would have been correctly classified. The present paper introduces nonlinear filters that convert input signals corresponding to gene expression profiles into output signals that are much easier to classify into metastatic and nonmetastatic classes than the original profiles. In particular, the effectiveness of this approach is investigated when very few training data are used. It is shown that when only the first three exemplars were assumed to be known from each class, a predictor was constructed that exhibits statistically significant performance over the test set. The nonlinear filters are found via a modeling technique known as parallel cascade identification (PCI),4 which has been previously applied5 to predict treatment response of a group of acute myeloid leukemia (AML) patients from their gene expression profiles. See also the review by Kirkpatrick.6 Subsequently, the approach Journal of Proteome Research 2004, 3, 91-96

91

Published on Web 01/13/2004

research articles

Korenberg

Figure 1. Parallel cascade model used to predict medulloblastoma metastasis. Each L is a dynamic linear element; each N is a polynomial static nonlinearity.

was employed to predict failure/survivor outcome of medulloblastoma patients,7 rather than metastatic status of primary tumors as in this paper. Indeed, there are several significant ways that the present paper differs from the earlier applications5,7 of PCI to interpret gene expression profiles. First, as noted above, a critical issue in predictive medicine is to develop accurate classifiers when there are only a few known examples of the classes to be distinguished. For example, after a Phase II trial, there may only be a few responders to an experimental cancer drug, but it would be helpful to use their profiles to select likely responders for Phase III. Certainly, there have been few demonstrations of gene-expression-based predictors developed on only three exemplars from each class to be distinguished that have shown statistically significant accuracy over an independent set. This is true for the PCI predictor created here, using only the first exemplars of each class to build the model, and the next two of each class to form references for classifying new profiles. Moreover, this predictor is at least as effective in recognizing metastatic as nonmetastatic tumors. This is an important development because the earlier predictor had much more difficulty in recognizing M+ than M0 tumors, perhaps because of the heterogeneity of the metastatic tumors.1 In the medulloblastoma outcome study,7 only one exemplar per class was used to identify the PCI model per se. However, to classify the remaining 58 profiles, a leave-one-out protocol was used in which 57 of the profiles provided reference output signals for classifying the held-out profile. Similarly, in the AML study,5 again in a leave-one-out protocol, 12 profiles provided certain PCI model architecture parameter values needed to classify the held-out profile. Of course, this is not possible when only three examples from each class are assumed to be known as in the main focus of the present paper. Second, the present application does not search for the “best” parameter values, and number of genes to use, as is commonly done in other papers concerned with developing gene-expression-based predictors. Rather, all of these values were taken from a different study,7 in order to rigorously maintain independence between the 6-profile training set and the test set of remaining profiles. Third, even with only three exemplars assumed known from each class, it is possible to construct a voting scheme of PCI predictors to further improve accuracy, which was not shown in the two previous papers.5,7 Fourth, while the PCI model constructed with only the first training exemplar of each class is the main focus here, it is also shown how multiple expression profiles from each class can be employed to construct the training input and identify 92

Journal of Proteome Research • Vol. 3, No. 1, 2004

the model. Finally, a list of 22 genes is provided that are useful in predicting metastatic status of medulloblastomas. The approach taken here has special advantage when few training exemplars are available, and represents a radical departure from statistically based methods such as weighted voting. Rather than attempting to select the “best” genes for distinguishing between classes, PCI seeks to accentuate, for the selected genes, the model’s differential response to exemplars from different classes. The model is identified from a training input derived from expression levels of a few class exemplars at selected genes, and a training output defined to have different values over portions of the input corresponding to different classes. The model’s memory, essentially the number of consecutive input values needed to calculate an output value, is typically chosen to be much shorter than each segment of the training input corresponding to one class exemplar. Consequently, as the model “slides” over the training input and attempts to approximate the corresponding training output, it encounters many more training examples than the number of exemplars used to create that input. Thus, the approach takes advantage of much more information inherent in gene expression profiles than other approaches are currently using. The challenge here of constructing effective predictors from very few exemplars affords no opportunity to “tune” the model over additional known examples of M+ and M0 classes. Hence, all required architectural parameter values for the model, as well as the number of genes used, and the method of selecting genes, were implemented exactly as previously published for the outcome prediction study7 (that had involved 60 patients). Note that the Affymetrix HuGeneFL arrays used there, containing 7129 expression levels per profile, differ from microarrays analyzed in the present study with 2059 expression levels in each profile. The latter are the same profiles used by MacDonald et al.1 (i.e., the raw values given after each data set had been normalized by them), obtained from samples taken in the initial diagnostic biopsy. Each profile contains 2059 integer values, the gene expression data from an M0 or M+ medulloblastoma or a medulloblastoma cell line, indicating the degree of expression of selected genes in the medulloblastoma that are represented on the G110 Cancer Array. RNA from a medulloblastoma biopsy or cell line is converted into cDNA and thence into fluorophorelabeled cRNA that is fragmented to approximately 200 bp pieces and hybridized to the array.1 The raw expression values are derived for the genes by measuring the absolute fluorescence intensities for their respective probe-sets on the chip, and were scaled to normalize the data for comparison between arrays.1

Predicting Medulloblastoma Metastasis

Figure 2. (a) Training input x(i) formed by splicing together the raw expression levels of genes from the first four metastatic (M+) profiles and first four nonmetastatic (M0) profiles. The genes used (Table 1) were the 22 having greatest difference in expression levels between the M+ and M0 training profiles. (b) Training output y(i) (solid line) defined as -1 over the M+ portions of the training input and 1 over the M0 portions. The training input and output were used to identify a parallel cascade model of the form in Figure 1. The dotted line represents calculated output z(i) when the identified model is stimulated by training input x(i). Note that z(i) is predominately negative over the M+ portions, and positive over the M0 portions, of the training input. The identified model’s ability to separate metastatic and nonmetastatic profiles is exploited by replacing the profiles with corresponding model output signals that are easier to classify and predict metastasis.

A larger value indicates greater expression by the gene. Further details are provided by MacDonald et al.1 concerning how the gene expression profiles were obtained.

2. Building a Nonlinear Filter Use of PCI for class prediction from gene expression profiles has been described recently,5 and is now summarized. In brief, given one or more exemplars from each class to be distinguished, the first step is to select genes that assist in distin-

research articles guishing between the two classes. In the outcome prediction study,7 the most effective predictor was constructed by selecting the 22 genes with greatest difference in raw expression levels between the first failure and survivor profiles and using these two profiles to create the training input. The same strategy was followed here. Hence, the first M+ and M0 profiles, respectively designated as M1 and N2 by MacDonald et al.,1 were compared to find the 22 genes with greatest difference in raw expression levels between these profiles. The corresponding 22 raw values from profile M1 were appended, in the same order as in the profile, to form an M+ segment, and an M0 segment was similarly prepared from profile N2. The two segments were spliced together to form a 44-point training input, with the training output defined as -1 over the M+ segment and 1 over the M0 segment of the training input.7 Clearly, a nonlinear system having this input/output relation would function as a predictor of metastatic status, at least for the two training profiles, and its output values would be expected to be negative for M+ and positive for M0 profiles. The training input and output were then used to identify a parallel cascade model of the form in Figure 1 as explained earlier.4,5 In Figure 1, each linear element L is dynamic, i.e., its output at instant i depends not only on its input at instant i but also at instants i - 1,...,i - R, so the memory length is said to be R+1; each static nonlinearity N is a polynomial. Earlier, to model discrete-time nonlinear systems, Palm8 had introduced a parallel array of LNL cascades, where the static nonlinearities were logarithmic and exponential functions, rather than the polynomials used here. In another case below, the first four exemplar profiles of each class (respectively designated1 as M1, M2, M3, M4 from M+, and N2, N3, N4, N5 from M0) were employed to create the training input. As described above, each profile consists of 2059 integer values reflecting the expression levels of specific genes represented on the microarray, and the 22 genes (Table 1) used in forming the training input were selected as follows. For each gene, the mean of its raw expression values was computed over the four M+ training profiles, and the mean was also computed over the four M0 training profiles. Then the absolute value of the difference between the two means was computed for the gene. The 22 genes having the largest of such absolute values were selected. Four segments each of M+ and M0 type were prepared, similarly to above, maintaining the same order of appending the raw values as they had in the original profiles. The eight segments were spliced together to form the 176-point training input x(i) in Figure 2a. The corresponding training output y(i) (Figure 2b, solid line) was defined as -1 over M+ and 1 over M0 segments of the input. The training input and output were used to identify a PCI model (Figure 1), and Figure 2b (dotted line) shows the resulting model output z(i) when stimulated by the training input. Thus, Figure 2, parts a and b, is for the case of training inputs constructed from multiple exemplars per family, whereas only single-exemplar figures were presented in the two earlier papers.5,7 To identify the PCI model, certain parameter settings related primarily to its architecture had to be pre-specified.7 These were as follows: the memory length of each linear element L, the degree of each polynomial N, the maximum number of cascades allowed in the model, and a threshold regulating the required reduction in mean-square error for allowing a candidate cascade into the model. As noted above, these parameter settings were not established by selecting values that are optimal for the present data set. In the main focus of this paper, Journal of Proteome Research • Vol. 3, No. 1, 2004 93

research articles

Korenberg

Table 1. Twenty-Two Genes Used to Predict Medulloblastoma Metastasis position in profile (1-2059)

description

90 115 219 467 744 763 1078 1083 1138 1168 1194 1291 1423 1570 1664 1669 1684 1762 1822 1863 1871 1949

M33764cds human ornithine decarboxylase gene, complete cds M11717mRNA human heat shock protein (hsp 70) gene, complete cds D13748 HUM4AI human mRNA for eukaryotic initiation factor 4AI D78577expanded D78576S2 human DNA for 14-3-3 protein eta chain; exon2 and complete cds M55409 human pancreatic tumor-related protein mRNA, 3′ end D11139exons#1-4 HUMTIMP human gene for tissue inhibitor of metalloproteinases; partial sequence X58965 H. sapiens RNA for nm23-H2 gene X73066cds Homo sapiens NM23-H1 mRNA M55914 HUMCMYCQ human c-myc binding protein (MBP-1) mRNA; complete cds L19182 HUMMAC25X human MAC25 mRNA; complete cds D17517 HUMSKY human sky mRNA for Sky; complete cds HG4322-HT4592 Tubulin, β V00567cds HSMGLO human messenger RNA fragment for the β-2 microglobulin M94250expanded human retinoic acid inducible factor (MK) gene exons 1-5, complete cds J03040 human SPARC/osteonectin mRNA, complete cds J04164 HUM927A human interferon-inducible protein 9-27 mRNA; complete cds J02783mRNA HUMTHBP human thyroid hormone binding protein (p55) mRNA; complete cds D00017 HUMLIC Homo sapiens mRNA for lipocortin II; complete cds U21689cds human glutathione S-transferase-P1c gene; complete cds M93311cds human metallothionein-III gene, complete cds M29386mRNA HUMPRLA human prolactin mRNA; 3′ end HG1980-HT2023 Tubulin, β 2

only three known exemplars are assumed from each class. This necessitated employing published parameter settings tailored for a different study, concerned with predicting clinical outcome.7 There it was found that a memory length of 4, a polynomial degree of 5, two cascades allowed in the model, and a threshold of 6, resulted in good classification accuracy. Specifically, these values were found by systematically trying different parameter settings, obtaining the corresponding PCI model, and observing the resulting classification accuracy over a test set of 58 profiles.7 Although the microarrays differed between that study and the present one, and the class distinctions (failure/survivor outcome, metastatic/nonmetastatic status of primary tumors) are different, these values were directly adopted here so as to minimize the number of exemplars that had to be known from each class.

3. Classifying a Test Profile The classification procedure set out previously7 was followed exactly. The raw expression values of the 22 previously selected genes were appended in the same order used above to form an input signal corresponding to the test profile. This input signal was fed to the PCI model to obtain the output signal that was compared, using correlation, with reference output signals for both the M+ and M0 classes. In the principle case considered below, only the first three exemplars from each class were employed to construct the predictor, with all other profiles reserved for testing. In particular, M+ profile M1 and M0 profile N2 were used to create the training input for identifying the PCI model, and reference output signals from the model were prepared for known M+ exemplars M2, M3, and M0 exemplars N3, N4. Then, if z(0)(i) represents the output signal for the test profile, and z(j)(i) one of the reference output signals, with i ) 1,...,22, the correlation coefficient 22

∑(z

(i) - z(j))(z(0)(i) - z(0))

i)4

r)

22

∑(z

(

i)4

94

(j)

22

(j)

∑(z

(i) - z(j))2)(1/2)(

(0)

(i) - z(0))2)(1/2)

i)4

Journal of Proteome Research • Vol. 3, No. 1, 2004

was calculated, where z(0) and z(j) denote the average of z(0)(i) and z(j)(i) respectively over i ) 4,...,22. Note that the first three points of each 22-point output signal were not used, to allow the model to “settle”, because the memory length was 4. The test output signal was assigned the class of the reference output signal it was most positively correlated with; i.e., the correlation coefficient was largest.

4. Predicting Metastatic Status As noted above, the initial PCI predictor was constructed from only the first three M+ and M0 profiles (respectively designated1 as M1, M2, M3 and N2, N3, N4). None of the remaining profiles were used here, rather they were reserved as an independent set for testing. First, the nonlinear filter, in the form of the parallel cascade model in Figure 1, was identified from a training input derived only from M1 and N2, as earlier set out. Recall that the same architectural parameter values and number (22) of genes were used as in the outcome prediction study.7 Second, the model was employed to obtain reference output signals corresponding to the remaining 2 known profiles (M2, M3 and N3, N4) from each class. Using correlation with the reference outputs from the model to predict class, as explained above, yielded these test results: 5 of the 6 novel M+, and 8 of the 11 novel M0, profiles were correctly classified (Matthews9 correlation coefficient φ ) 0.54, Fisher’s exact test P < 0.043 1-tail, P < 0.05 2-tail). Corresponding results for WV,1 from leave-one-out training with 22 profiles and predicting class of the held-out profile, were φ ) 0.44, Fisher’s exact test P < 0.0883 1-tail, P < 0.118 2-tail. Even with only the three exemplars assumed known from each class, it was possible to study how the choice of profiles for training the PCI model influenced the resulting accuracy. Here, it is important that the test set, consisting of the remaining profiles, stays the same for each of the models constructed. In addition to the above model, whose training input was derived from profiles M1 and N2, two other models were identified. Thus, a training input derived from M2 and N3 was used to identify a second model, with M1, M3 and N2, N4 employed to obtain reference output signals for classifying

research articles

Predicting Medulloblastoma Metastasis

novel profiles via correlation. Another model was identified from M3 and N4, whereas exemplars M1, M2 and N2, N3 were used to obtain reference output signals. Although there were similarities in performance between the three models, there were some differences that benefited overall accuracy. For the first predictor, only M+ profile M20, and M0 profiles N15, N24, N25, were misclassified. It is of interest that the three predictors never unanimously misclassified a profile; rather, every unanimous decision (for profiles M4, N18, N22, and N2+box2) was correct. In addition, the second and third predictors reversed the misclassification of profile M20 by the first predictor, but both of the former erred on profile M28. Overall, no other classification of M+ profiles was affected, so that by majority vote there was still one M+ misclassification. Next, the second and third predictors corrected the misclassification of M0 profiles N24, N25 by the first predictor, but contributed one new error, profile N16. Overall, no other M0 classifications were affected, so by majority vote there were only two M0 misclassifications, namely N15, N16. In summary, constructing three PCI predictors to vote on the final classification decisions slightly raised accuracy: 5 of 6 remaining M+, 9 of 11 remaining M0 (φ ) 0.63, Fisher’s exact test P < 0.018 1- or 2-tail). Thus, although the choice of exemplars used for training the PCI model does have some effect on the classification results, the differences may be used constructively to increase the accuracy over that of an individual predictor. The first PCI predictor was next tested over a second dataset of five M0 “validation” medulloblastomas and three M+ cell lines (DAOY, D283, D341) also used previously.1 Recall that only profiles M1 and N2 had been employed to obtain the PCI model, with the next two profiles from each class used to obtain reference output signals. The predictor correctly classified 4 of 5 validation profiles, and 2 of 3 cell lines (misclassifying D283). This performance, starting with 6 known profiles, is roughly comparable to that of the WV predictor1 trained on all 23 profiles from the first set, which could make classification decisions (all correct) on 4 of 5 validation medulloblastomas, and 1 of 3 cell lines. Overall, the PCI predictor’s accuracy was 7 of 9 M+ and 12 of 16 M0 novel profiles (φ ) 0.51, Fisher’s exact test P < 0.017, 1- or 2-tail). Finally, when 4 known profiles from each class produced the training input (Figure 2a), and the remaining profiles from the first set produced reference outputs from the model, the PCI predictor correctly classified all 3 cell lines, and 4 of 5 validation medulloblastomas. Moreover, the sign was examined of the values of the 3 metastatic and 5 nonmetastatic output signals, using every 4th point of each output signal to avoid any overlap in gene expression levels used to obtain such output values (see Statistical Considerations). The sign of these output values correlated with actual metastatic status (Fisher’s exact test P < 0.0193, 1-tail).

5. Statistical Considerations In addition to calculating Fisher’s exact probability, the utility of the PCI model was checked by classifying the test 22-point input signals through correlation with reference input signals, rather than using the corresponding output signals from the model. Classification without first obtaining model output signals dropped the accuracy to 2 of 6 novel M+ and 6 of 11 novel M0 profiles in the original data set, a negative Matthews’ correlation (coefficient φ ) - 0.12) with actual metastatic status. This should be compared with the first predictor’s accuracy of

5 of 6 novel M+ and 8 of 11 novel M0, which was obtained using the output signals, showing the value of the PCI model. All PCI models employed here had memory length of 4. Hence, when the sign was examined of the values of the output signals for the 3 cell lines and 5 validation medulloblastomas, every 4th point of each output signal was used to avoid any overlap in gene expression levels used to obtain such output values. This gave rise to 25 such output values corresponding to the 5 M0 validation medulloblastomas, of which 7 values were negative and 18 were positive. For the three M+ cell lines, there were 15 such output values with 10 negative and 5 positive. Indeed, the sign of these output values was correlated with actual metastatic status (Fisher’s exact test P < 0.0193, 1-tail). A 1-tail test was used here since, because of the way the model had been trained, output signals corresponding to M0 and M+ profiles are expected to have positive and negative values, respectively.

6. Conclusion The above results show the capability of PCI to build accurate and reliable predictors from small amounts of training data, and suggest a new approach to microarray analysis. Certainly, achieving the above accuracy required that the three M+ and M0 exemplars actually be representative of their respective classes, otherwise no method could succeed. Moreover, if the training exemplars do not well cover the variety of profile types in each class, then additional exemplars will clearly be needed beyond the three per class used above. A key point of the present paper is that a predictor was developed that exhibited good accuracy in predicting medulloblastoma metastasis and in particular was effective in recognizing M+ tumors. The predictor can now be investigated as one component in a scheme for predicting which patients can safely forego radiation therapy, an important issue raised by MacDonald et al.1 It is notable that, given good class representatives for training, PCI is highly efficient at building models that separate M+ and M0 profiles, compared with weighted voting trained on almost four times the data. As further exemplars of the classes to be distinguished become available, the above parallel cascade filters can be used to obtain the corresponding output signals, and the known classes of these additional reference output signals lead to increased accuracy in classifying novel profiles. Also, although nearest neighbor was used here to classify output signals successfully, it remains open to investigate whether other predictors, e.g., based on support vector machines,10 artificial neural networks,11 K-means-clustering,12 or PCI, are better suited to distinguishing these output signals.

References (1) MacDonald, T. J.; Brown, K. M.; LaFleur, B.; Peterson, K.; Lawlor, C.; Chen, Y.; Packer, R. J.; Cogen, P.; Stephan, D. A. Expression profiling of medulloblastoma: PDGFRA and the RAS/MAPK pathway as therapeutic targets for metastatic disease. Nat. Genet. 2001, 29, 143-152. Datasets: http://microarray.cnmcresearch.org/ datafiles/MacDonaldetal.xls. (2) Golub, T. R.; Slonim, D. K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J. P.; Coller, H.; Loh, M. L.; Downing, J. R.; Caligiuri, M. A.; Bloomfield, C. D.; Lander, E. S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286, 531-537. (3) Pomeroy, S. L.; Tamayo, P.; Gaasenbeek, M.; Sturla, L. M.; Angelo; M.; McLaughlin, M. E.; Kim, J. Y. H.; Goumneroval, L. C.; Black, P. M.; Lau, C.; Allen, J. C.; Zagzag, D.; Olson, J. M.; Curran, T.;

Journal of Proteome Research • Vol. 3, No. 1, 2004 95

research articles

(4) (5) (6) (7) (8)

96

Wetmore, C.; Biegel, J. A.; Poggio, T.; Mukherjee, S.; Rifkin, R.; Califano, A.; Stolovitzky, G.; Louis, D. N.; Mesirov, J. P.; Lander, E. S.; Golub, T. R. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 2002, 415, 436-442. Korenberg, M. J. Parallel cascade identification and kernel estimation for nonlinear systems. Ann. Biomed. Eng. 1991, 19, 429-455. Korenberg, M. J. Prediction of treatment response using gene expression profiles. J. Proteome Res. 2002, 1, 55-61. Kirkpatrick, P. Look into the future. Nat. Rev. Drug Discovery 2002, 1 (5), 334. Korenberg, M. J. Gene expression monitoring accurately predicts medulloblastoma positive and negative clinical outcomes. FEBS Lett. 2003, 533, 110-114. Palm, G. On representation and approximation of nonlinear systems. Part II: Discrete time. Biol. Cybern. 1979, 34, 49-52.

Journal of Proteome Research • Vol. 3, No. 1, 2004

Korenberg (9) Matthews, B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442-451. (10) Yeang, C.-H.; Ramaswamy, S.; Tamayo, P.; Mukherjee, S.; Rifkin, R. M.; Angelo, M.; Reich, M.; Lander, E.; Mesirov, J.; Golub, T. Molecular classification of multiple tumor types. Bioinformatics 2001, 17, Suppl. 1, S316-S322. (11) Khan, J.; Wei, J. S.; Ringne´r, M.; Saal, L. H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C. R.; Peterson, C.; Meltzer, P. S. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 2001, 7, 673-679. (12) Tavazoie, S.; Hughes, J. D.; Campbell, M. J.; Cho, R. J.; Church, G. M. Systematic determination of genetic network architecture. Nat. Genet. 1999, 22, 281-285.

PR034069S