SecretePipe: A Screening Pipeline for Secreted ... - ACS Publications

Jan 21, 2013 - SecretePipe: A Screening Pipeline for Secreted Proteins with. Competence to Identify Potential Membrane-Bound Shed Markers. Wei-Sheng ...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jpr

SecretePipe: A Screening Pipeline for Secreted Proteins with Competence to Identify Potential Membrane-Bound Shed Markers Wei-Sheng Tien,†,‡ Yen-Tsuen Chen,† and Kun-Pin Wu*,† †

Institute of Biomedical Informatics, National Yang Ming University, Taipei 112, Taiwan Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan



S Supporting Information *

ABSTRACT: The identification of secreted protein markers has been receiving great attention as part of the trend toward noninvasive biomarker discovery. In addition, certain cell membrane proteins are known to be released into the extracellular milieu via ectodomain shedding. As membrane proteins play an essential role in signaling pathways and because most of the cancer biomarkers approved by the FDA today are membrane shed proteins, a tool that can correctly predict these class shed proteins is valuable. In this study, an in-house predictor, ShedP, was developed to predict the ectodomain shedding events of membrane proteins. ShedP is the first computational method to our knowledge to allow shed membrane protein prediction. By integrating ShedP with other state-of-the-art predictors, a screening pipeline, SecretePipe, has been created that is able to identify secreted nonmembrane proteins on the basis of signal peptides and to identify released membrane proteins on the basis of ectodomain shedding. The predictive results using secretome data sets revealed that SecretePipe outperformed other state-of-the-art secreted protein predictors. When evaluated against released membrane proteins, SecretePipe performed better than other predictors in identifying membrane-bound released proteins due to the presence of ShedP. SecretePipe showed a great potential in assisting the identification of membrane-bound shed markers in biomarker discovery. KEYWORDS: biomarker discovery, secreted markers, shed membrane proteins, secretome



INTRODUCTION Biomarker discovery has become an emerging field in biological research and medical diagnostic practice. Various approaches have been implemented to identify predictive and prognostic biomarkers for the early detection and treatment of cancers and other diseases.1 As a growing number of cancer or disease marker candidates are identified, it has become very important to know how to efficiently and effectively screen a given candidate list for subsequent clinical validation. This has led to the emergence of a new trend toward the clinical use of secreted proteins as noninvasive and easily accessible biomarkers. Studies of the secretome, which specifically consists of target proteins in the blood, saliva, and urine, has thereafter become a promising research direction within biomarker discovery.2,3 In this context, an important marker screening criterion is to determine whether or not a candidate is secreted into the extracellular milieu. Our goal in this study was the development of a computational model that can effectively screen for secreted proteins, to help in the identification of noninvasive secreted markers. Although most proteins that undergo secretory pathways have signal peptides, membrane-anchored molecules have also been identified in secretome profiling studies.4 As summarized by Ludwig and Weinstein, most of the protein cancer markers © 2013 American Chemical Society

approved by the U.S. Food and Drug Administration (FDA) are membrane proteins.5 Moreover, membrane proteins play a potential role as receptors in various signaling pathways. Therefore, membrane proteins comprise a valuable biomarker discovery resource. During the process of ectodomain shedding, certain cell membrane proteins may be proteolytically released to extracellular compartment. Numerous cellular processes and various pathologies, such as cell differentiation, development, degeneration, and cancer, have been found to be regulated by the shedding events.6 As a consequence, when designing a screening system for secreted proteins, it is important to take into account the shedding events associated with membrane proteins. Previous studies have revealed that only about 4% of cell surface proteins undergo the shedding process; thus, it is apparent that not every membrane protein will be released through proteolytic cleavage.7 In this context, tools that are able to identify whether or not a membraneanchored protein is released via shedding become indispensable. Although numerous computational tools have been proposed to predict protein secretion and subcellular localization, to the best of our knowledge, none of these take into Received: September 25, 2012 Published: January 21, 2013 1235

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244

Journal of Proteome Research

Article

1401 proteins without any cleavage record were used as negative training samples. We used proteins without any cleavage record as negative samples because, to the best of our knowledge, no experimental evidence data sets are available for nonshed membrane proteins. Third, to avoid learning bias caused by information redundancy, any proteins with a mutual sequence identity higher than 30% were removed using the alignment tool CD-HIT.24 Finally, we had 220 positive samples (shed membrane proteins) and 511 negative samples (nonshed membrane proteins). These 731 membrane proteins formed our learning data set (Table S1, Supporting Information). We also prepared an independent test set to evaluate the performance of ShedP. Twenty-eight human membrane proteins that were experimentally verified as being shed were collected from the literature.25−30 Among these 28 proteins, those with a mutual sequence identity higher than 30% were removed by CD-HIT. Proteins to be used for the learning procedure of ShedP were also removed. In the end we obtained an independent test set of 19 positive samples (Table S1). To the best of our knowledge, no membrane proteins have been explicitly verified not to be shed and, therefore, our independent test set did not contain negative samples. Encoding of Protein Features. To unify the feature representation of proteins of different lengths, a protein sequence was encoded into a numeric vector on the basis of the pseudoamino acid composition (PseAAC);31 this vector served as model features for ShedP. The PseACC encoding tries to capture not only the amino acid composition but also the primary structure information of a protein. In a PseACC representation, each protein is transformed into a (20 + λ)dimensional vector. The first 20 components of the vector denote the normalized frequency of the 20 amino acids in the protein. The following λ components summarize the correlation between two remote amino acids by a distance of i: 1 ≤ i ≤ λ. The correlation is a numeric value defined by the physical and chemical properties of amino acids, including hydrophobicity, hydrophilicity, and side-chain mass. In this study, we made use of a stand-alone implementation of the PseAAC (http://pseb.sourceforge.net/) in order to generate the numeric vectors of proteins. Performance Evaluation. On the basis of the number of possible prediction outcomes, namely true positive (TP), false positive (FP), true negative (TN), and false negative (FN), the proposed ShedP was evaluated using the following measures. (1) The accuracy is defined as ((TP + TN)/(TP + TN + FP + FN)), which is the probability that a sample is correctly predicted. (2) The sensitivity is defined as (TP/(TP + FN)), which is the probability that a positive sample is correctly predicted. (3) The specificity is defined as (TN/(TN + FP)), which is the probability that a negative sample is correctly predicted. (4) The Matthews correlation coefficient (MCC) is defined as ((TP × TN − FP × FN)/((TP + FP)(TP + FN)(TN + FP)(TN + FN))1/2), which makes a balanced evaluation of a binary classifier even if the sizes of two classes differ greatly. The value of MCC ranges from −1 to +1, with +1 meaning perfect prediction, 0 meaning a random guess, and −1 meaning that the prediction is totally opposite to the observation. (5) The area under a receiver operating characteristic curve is denoted as AUC. A receiver operating characteristic (ROC) curve is the plot of sensitivity versus (1 − specificity) at different cutoff thresholds of a binary discriminative model. ROC curves are often used for model selection and

account proteolytic shedding as a feature of membrane protein release.8−11 In this study, a screening pipeline, SecretePipe, for secreted proteins is proposed. SecretePipe integrated several computational modules into a decision hierarchy, one of which is an inhouse module, ShedP, that was developed to predict shedding events involving membrane proteins. The tool ShedP is a support vector machine-based classifier that was built to discriminate between shed membrane proteins and nonshed membrane proteins. The proposed SecretePipe integrates ShedP with other state-of-the-art predictors to identify both secreted nonmembrane proteins and released membrane proteins. SecretePipe was evaluated using 4 public secretome data sets from human A549 NSCLC cells,12 23 human cancer cell lines,2 6 human cancer cell lines,38 and thyroid regulated human hepatoma cell39 to evaluate the predictive performance. The predictive results revealed that SecretePipe outperforms other state-of-the-art secreted protein predictors in all aspects except for sensitivity, where SignalP is slightly better than SecretePipe. The augmented ability to identify shedding events of membrane proteins gives the pipeline SecretePipe great potential in terms of biomarker discovery. SecretePipe and all its test data sets are publicly available at http://bal.ym.edu.tw/ SecretePipe/.



EXPERIMENTAL SECTION The proposed secreted protein screening pipeline, SecretePipe, integrates an in-house software tool and three publicly available tools into a decision-making flow. The in-house software ShedP was developed to predict ectodomain shedding events of membrane proteins. The three public tools TMHMM,13,14 Hum-mPLoc,15−19 and SignalP11 were used to carry out membrane protein prediction, membrane protein localization prediction, and signal peptide identification, respectively. ShedP: An in-House Predictor of Ectodomain Shedding

In an attempt to identify membrane-bound markers that are released to the extracellular milieu via proteolytic cleavage, the tool ShedP was developed to predict shedding events of membrane proteins. ShedP is a model built by supervised learning that discriminates between shed membrane proteins and nonshed membrane proteins. Data Sets. First, membrane proteins that contained extracellular domains were selected from the UniprotKB/ Swiss-Prot database20 to form an initial data set. The selection was based on the following two criteria: (1) in the “General annotation” field, the annotation in the “Subcellular location” section contains the term ‘membrane’ and (2) in the “Sequence annotation” field, the annotation in the “Topological domain” section contains the term ‘extracellular.’ There were 2919 proteins selected to form the initial data set as of July 2011. Second, each protein in the initial data set was queried against the three databases MEROPS,21 PMAP-SubstrateDB,22 and HPRD23 to determine whether or not it might undergo proteolytic cleavage. Among the initial data set, there were 1518 proteins that possessed cleavage records in at least one of the three databases. However, 1232 out of the 1518 proteins only had cleavage records deposited in MEROPS, and these were annotated as “non-physiological relevant protein substrates” and “unknown peptidase”. These 1232 proteins were removed because their annotations show little confidence in the fact that they were shed. Consequently, this left 286 membrane proteins to be used as positive training samples, while the remaining 1236

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244

Journal of Proteome Research

Article

Figure 1. Design flow of ShedP. The training procedure of ShedP was divided into 10 iterations. In each of the 10 iterations, a balanced data set of 220 positive samples and 220 negative samples was prepared by random sampling, and SVM models based on PseAAC feature representation were built on the data set using 5-fold cross-validation. At the end of the training procedure, a final model, which served as our ShedP, was constructed on a distinct training set (220 positive and 220 negative samples) and evaluated by an independent test set (19 positive samples from the literature).

Publicly Available Tools Used in SecretePipe

discriminative threshold determination. The AUC ranges from 0 to 1, and it can be seen as the probability that a binary classifier ranks a random positive sample higher than a random negative sample: therefore, a larger AUC implies a better prediction performance. Models that perform at the random guess level would have an AUC of 0.5. Predictive Model Construction. The design flow of ShedP is shown in Figure 1. It uses a supervised machine learning process to build a computational predictive model. The process began by preparing balanced data sets for learning. For machine learning, having the same number of positive and negative samples is recommended.32 In each balanced data set, all 220 positive samples as well as 220 out of 511 negative samples were included; the choice of the 220 negative samples was done by random sampling. To make sure that the following training process had no propensity for certain negative samples, we prepared 10 balanced data sets, each of which was in turn used for model building. We therefore had 10 iterations of the training process. In each of the 10 iterations, a predictive model for discriminating between shed membrane proteins and nonshed membrane proteins was built by support vector machine (SVM) and evaluated by 5-fold cross-validation. SVM is a supervised learning model for binary classification that optimizes prediction performance by maximally partitioning data into two categories using a high-dimensional hyperplane.33 SVMs have been applied to a wide variety of applications and have found a certain degree of success. Our SVMs used the PseAAC representation of proteins as model features. In this study, SVM classifiers were built using the integrated software LIBSVM library34 with the nonlinear radial basis function (RBF) kernel. The built SVMs then report the probability that a query protein is shed. The performance of our predictive models was assessed by 5fold cross-validation. The balanced data set was partitioned into 5 balanced subsets, each of which was in turn used for testing, and the remaining 4 subsets were pooled for model training. The procedure was repeated 5 times with each of the 5 subsets. After the 10 iterations of model building with 5-fold crossvalidation (we built a total of 50 models), a final model, which served as our ShedP, was constructed and then evaluated using the independent test set of 19 positive samples.

TMHMM. One step of SecretePipe is to determine whether or not a query protein is a membrane protein. We used the very widely used program TMHMM13,14 as our predictor to discriminate between membrane and nonmembrane proteins. TMHMM was built using the hidden Markov model (HMM) in which different states were assigned to different topologies of a membrane protein, including transmembrane helices, intracellular helix tails, extracellular helix tails, inside loops, outside loops, and globular domains within loops. Each residue is reported to be inside or outside a transmembrane helix on the basis of its state in the HMM. A query protein is regarded as a membrane protein if it is predicted to have a transmembrane helix; otherwise, it is regarded as a nonmembrane protein. TMHMM version 2.0 was used in this study. Hum-mPLoc. Although we used TMHMM to identify membrane proteins, we are only interested membrane proteins that are located at the cell surface. These membrane proteins have more chance of being able to be released to extracellular milieu as noninvasive marker candidates by ectodomain shedding. Hum-mPLoc, a benchmark predictor that was specifically developed for human proteins, was selected in this study to determine the localization of a membrane protein.15−19 There are 14 predefined subcellular localizations in HummPLoc, and a query protein may receive more than one predicted result and therefore can have more than one possible localization. Hum-mPLoc executes in a top-down fashion, in which the subcellular localization of query proteins is first determined using gene ontology (GO).35 There is a second round of prediction based on functional domains and evolutionary information for those proteins without available GO annotation. In our experience, Hum-mPLoc does not work for proteins with a length greater than 3000, which might be attributable to the computationally intensive matrix operations used in the second round of prediction. Even though HummPLoc has this limitation, we still integrated it into SecretePipe due to its superior performance. The web service Hum-mPLoc 2.0 (http://www.csbio.sjtu.edu.cn/bioinf/hum-multi-2/) was used in this study. SignalP. To assess the presence of signal peptides, the sequence feature that directs protein secretion via classic secretory pathways, the tool SignalP10,11 was incorporated into SecretePipe. SignalP is one of the most widely used tools in 1237

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244

Journal of Proteome Research

Article

annotated as “extracell” or “plasma membrane” by HummPLoc, namely when p is predicted to be located at an internal membrane or an organelle membrane; otherwise, it is finally queried against ShedP to determine whether or not it could be released by a shedding event.

signal peptide prediction and is thought to have a considerably better predictive performance in comparison to other prediction tools.36 The classification model is generated using artificial neural networks (ANNs) and HMMs, and a final Dscore is used to identify the presence of signal peptides. In addition, the location of the predicted signal peptides is also reported. SignalP version 4.0 was used in this study.

Performance Evaluation of SecretePipe

To evaluate the predictive performance of SecretePipe, a public experimental secretome data set from human A549 NSCLC cells (non small cell lung cancer cells)36 was used in this study; this contained 382 secreted proteins and 1486 nonsecreted cellular proteins. Proteins used in our training and testing of ShedP, as well as those with mutual sequence identity higher than 30% as determined by CD-HIT, were removed. Since experiment-derived secretome data sets are prone to intracellular contamination whereby intracellular proteins leak into the cell medium during incubation, proteins that were not annotated as secreted, extracellular, or cell membrane by UniProt were also removed. Due to the limitation of HummPLoc, which forms part of SecretePipe, proteins with a length greater than 3000 amino acids were also removed. A set of 1346 proteins (149 secreted proteins and 1197 nonsecreted proteins) was finally constructed as the positive and negative test sets, respectively (Table S2, Supporting Information). The performance measures reported by this evaluation were predictive accuracy, sensitivity, specificity, MCC, and false discovery rate (FDR). FDR is defined as FP/(FP + TP), which is the probability that a positive prediction is a false positive. Membrane proteins tend to be predicted as secreted by most secreted protein predictors, due to the false cross-predictions between transmembrane topology and signal peptides.37 We therefore used FDR to measure the ability of a predictor to distinguish released membrane proteins from nonreleased membrane proteins. To further examine the performance of SecretePipe on membrane proteins, we additionally prepared three membrane protein data sets for testing. The first data set consists of 616 membrane proteins from the secretome profiles of 23 human cancer cell lines.2 The second data set consists of 80 membrane proteins from the secretome profile of 6 human cancer cell lines.38 The third data set consists of 180 membrane proteins from the cancer secretome profile of thyroid hormoneregulated human hepatoma cells.39 Due to the limitation of Hum-mPLoc, the length of each selected protein is shorter than 3000 amino acids. Since all these proteins were released membrane proteins, we only verified the sensitivity of tested predictors on the three data sets.

Decision-Making Flow of SecretePipe

The decision-making flow of SecretePipe for the screening of secreted proteins is depicted in Figure 2. This process tries to

Figure 2. Decision-making flow of SecretePipe. SecretePipe determines whether or not a query protein p is secreted using the following logic. If p is a membrane protein (tested by TMHMM) in an extracellular location (tested by Hum-mPLoc), p is regarded as secreted if it may undergo ectodomain shedding process (tested by ShedP). If p is not a membrane protein (tested by TMHMM), p is regarded as secreted if it contains a signal peptide (tested by SignalP).

test two different secretion mechanisms. For nontransmembrane proteins, it was determined whether or not they have signal peptides; proteins predicted to have signal peptides are regarded as secreted. In contrast, for those transmembrane proteins located at the cell surface, it was determined if they could be released into extracellular milieu through proteolytic shedding. Please note that transmembrane proteins may be located within an internal membrane or within an organelle membrane as well as the outer membrane, but we regard the former two proteins as nonsecreted because they are much less likely to be released from the cell surface. The journey of a query protein p through SecretePipe is as follows. The protein p is first queried against TMHMM to determine whether or not it is a transmembrane protein. If p is a nontransmembrane protein, it is queried against SignalP. Protein p is secreted if it is predicted to have a signal peptide by SignalP; otherwise, it is nonsecreted. If p is a transmembrane protein, it is queried against the Hum-mPLoc to determine its subcellular localization. Protein p is nonsecreted if it is not



RESULTS We evaluated in turn the performance of ShedP, our in-house predictor of ectodomain shedding events of membrane proteins, and SecretePipe, a screening pipeline of secreted proteins. The Parameter λ in the Model Feature of ShedP

The PseAAC representation was used as model features of ShedP; PseAAC specifies a parameter λ to control the distance between two amino acids whose physical and chemical relationships are taken into consideration. A rank-based voting scheme was applied to determine the best λ value. We constructed predictive models using different λ values ranging from 0 to 30. For each training set, we received 31 predictive results, and each λ value was ranked and assigned a score on the basis of the MCC values. The top-ranked λ received a score of 1238

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244

Journal of Proteome Research

Article

Figure 3. Accumulated scores of different λ values. In each of the 10 training iterations of ShedP, different λ values (ranging from 0 to 30) were ranked on the basis of their MCC values and received a score. Each λ value consequently had 10 scores. The figure shows λ values versus their accumulated scores.

30, while the bottom-ranked λ received a score of 0. Since we had 10 balanced training sets, the rankings of the different λ values were finally determined by their accumulated scores across the 10 training sets. The accumulated scores of the different λ values are depicted in Figure 3. According to these results, the best λ value was 2 (score 252), and the score showed a trend downward as the λ value increased. It would seem that the physical and chemical information related to close amino acids contributed more to shedding event prediction than that of remote amino acids. For the rest of our evaluation, the default value of λ was 2. Performance Evaluation of ShedP

Discriminative Threshold of ShedP. A cutoff threshold of ShedP is required to tell us whether or not a query protein is predicted to be shed. The discriminative threshold was determined by ROC analysis (Figure 4a). We performed 10 iterations using 5-fold cross-validation, and in each validation we tested 44 positive and 44 negative samples to give a total of 4400 predicted results. The best cutoff threshold from the ROC curves was determined by maximizing the sum of sensitivity and specificity, which is usually located at the upper left corner of the ROC curve. The best cutoff threshold reported by ROC analysis on these 4400 results was 0.5 with a specificity plus specificity of 1.42. Consequently, a query protein whose predicted probability was greater than or equal to 0.5 was regarded as positive and predicted to be shed, while otherwise it was predicted as negative and nonshed. Performance Measures of ShedP. We prepared 10 balanced data sets for 10 iterations of model training. Each data set was used to build predictive models with 5-fold crossvalidation. The constructed models were first subject to ROC analyses, and their predictive accuracy, sensitivity, specificity, and MCC at the cutoff threshold of 0.5 are reported. The 10 iterations of 5-fold cross-validation therefore gave 10 sets of performance measures (Table 1). We summarized the performance of our models by averaging the 10 sets of results. The average ROC curve of all models is depicted in Figure 4b and has an AUC of 0.78 ± 0.03. The mean and standard error of the accuracy, sensitivity, specificity, and MCC at a cutoff threshold of 0.5 were 0.71 ± 0.03, 0.75 ± 0.03, 0.67 ± 0.04, and 0.43 ± 0.05, respectively. The performance results listed in Table 1 show the effectiveness of the constructed models, and we accordingly applied the same procedure to construct our ShedP. We

Figure 4. ROC analysis of ShedP. (a) The ROC curve of 4400 prediction results from 10 iterations using 5-fold cross-validation; each validation contained 44 positive and 44 negative tests. The best cutoff probability was 0.5 with sensitivity plus specificity of 1.42. (b) The average AUC of the 10 iterations using 5-fold cross-validation was 0.78 ± 0.03.

1239

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244

Journal of Proteome Research

Article

Table 1. Performance of ShedP AUCa

iteration 1 2 3 4 5 6 7 8 9 10 average a

0.79 0.74 0.74 0.75 0.79 0.80 0.78 0.81 0.76 0.82 0.78

± ± ± ± ± ± ± ± ± ± ±

accuracy

0.04 0.08 0.08 0.06 0.06 0.06 0.08 0.07 0.07 0.07 0.03

0.72 0.67 0.67 0.70 0.71 0.72 0.70 0.75 0.70 0.75 0.71

± ± ± ± ± ± ± ± ± ± ±

sensitivity

0.04 0.07 0.08 0.05 0.06 0.06 0.05 0.05 0.06 0.07 0.03

0.73 0.73 0.73 0.74 0.73 0.75 0.74 0.80 0.71 0.80 0.75

± ± ± ± ± ± ± ± ± ± ±

0.10 0.09 0.09 0.11 0.10 0.08 0.10 0.10 0.10 0.09 0.03

MCCa

specificity 0.70 0.60 0.62 0.65 0.70 0.69 0.67 0.70 0.68 0.70 0.67

± ± ± ± ± ± ± ± ± ± ±

0.09 0.09 0.10 0.09 0.05 0.09 0.12 0.08 0.06 0.12 0.04

0.44 0.37 0.35 0.40 0.43 0.45 0.42 0.50 0.39 0.51 0.43

± ± ± ± ± ± ± ± ± ± ±

0.08 0.14 0.16 0.10 0.13 0.12 0.10 0.11 0.12 0.13 0.05

Abbreviations: AUC, the area under the receiver operating characteristic (ROC) curve; MCC, Matthews correlation coefficient.

SignalP, and WoLF PSORT.9 SecretomeP applies an artificial neural network to identify secreted proteins onthe basis of sequence-derived features. SecretomeP was specifically designed for proteins that are involved in nonclassical secretory pathways. In addition to secreted protein prediction, prediction of signal peptide and protein subcellular localization also provides critical information when identifying secreted proteins. Signal peptides are present in most secreted proteins and usually direct these proteins into the classical secretory pathway. Protein subcellular localization tells us whether or not a protein is located within the extracellular region; an extracellular protein is usually regarded as secreted. Accordingly, we also compared SecretePipe with the two leading predictors SignalP and WoLF PSORT. SignalP combines an artificial neural network and hidden Markov model to identify proteins with signal peptides. WoLF PSORT applies k-nearest neighbor to predict subcellular localization of a protein on the basis of the amino acid composition, N-terminal sorting signals, and several function motifs of the protein. In this evaluation, proteins that were predicted to have signal peptides or to be located at extracellular compartments were reported as secreted. The prediction results of the SecretePipe, SecretomeP, SignalP, and WoLF PSORT are given in Table 2. When tested on all 1346 test samples (Table 2a), SecretePipe and SignalP had comparable sensitivities (0.82 vs. 0.83); SecretePipe had only one more wrong prediction out of 149 positive tests than SignalP. Both SecretePipe and SignalP outperformed SecretomeP (0.67) and WoLF PSORT (0.75) using the positive samples. On the other hand, SecretePipe and SignalP also had comparable specificities; SecretePipe performed slightly better than SignalP (0.96 vs. 0.95), and both of them outperformed SecretomeP (0.58) and WoLF PSORT (0.93) using the negative samples. The overall indices accuracy and MCC value indicated that SecretePipe (accuracy, 0.95; MCC, 0.75) and SignalP (accuracy, 0.94; MCC, 0.72) were better secreted protein predictors than SecretomeP (accuracy, 0.59; MCC, 0.16) and WoLF PSORT (accuracy, 0.91; MCC, 0.61). The smaller FDR (0.27) of SecretePipe indicates that SecretePipe carried out a more effective selection of secreted protein. The performance evaluation showed that SecretePipe and SignalP were competitors in terms of overall secreted protein prediction. The protein accession number, protein description, number of peptides, and prediction results of each of the 1346 test proteins from A549 NSCLC secretome data are given in Table S3 (Supporting Information).

prepared a distinct balanced data set of 220 positive and 220 negative samples by random sampling and used all of these 440 samples as training data to build an SVM as our final ShedP. The ShedP was evaluated using the independent test set of 19 positive samples, among which 14 samples were correctly predicted. Thus, there was a sensitivity of 0.74 from this independent test. Performance Evaluation of SecretePipe

Performance Measures of SecretePipe. A test set derived from human A549 NSCLC cells was used to evaluate the predictive performance of SecretePipe. The data set contained 1346 proteins, 149 of which were secreted and 1197 were nonsecreted. SecretePipe produced an accuracy of 0.95, a sensitivity of 0.82, a specificity of 0.96, an MCC value of 0.75, and an FDR value of 0.27 using this unbalanced test set (Table 2). Performance Comparison between SecretePipe and Other Secreted Protein Predictors. We compared SecretePipe with three state-of-the-art predictors: SecretomeP,8 Table 2. Performance Comparisons among SecretePipe, SecretomeP, SignalP, and WoLF PSORT SecretePipe

SecretomeP

SignalP

WoLF PSORT

(a) 1346 Proteins (149 Secreted, 1197 Nonsecreted) true positive 122 100 123 112 false negative 27 49 26 37 true negative 1153 689 1143 1116 false positive 44 508 54 81 accuracy 0.95 0.59 0.94 0.91 sensitivity 0.82 0.67 0.83 0.75 specificity 0.96 0.58 0.95 0.93 MCCa 0.75 0.16 0.72 0.61 FDRa 0.27 0.84 0.31 0.42 (b) 176 Membrane Proteins (40 Released, 136 Nonreleased) true positive 27 18 28 20 false negative 13 22 12 20 true negative 120 70 110 111 false positive 16 66 26 25 accuracy 0.84 0.50 0.78 0.74 sensitivity 0.68 0.45 0.70 0.50 specificity 0.88 0.51 0.81 0.82 MCCa 0.54 −0.03 0.46 0.30 FDRa 0.37 0.76 0.48 0.56 a

Abbreviations: MCC, Matthews correlation coefficient; FDR, false discovery rate. 1240

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244

Journal of Proteome Research

Article

been identified that are relevant to the study of cancer and other diseases.2,3 Furthermore, most of the current protein cancer markers approved by the FDA are membrane proteins.5 The release of membrane protein is therefore a useful type of information for biomarker discovery. Membrane protein release is mainly achieved by ectodomain shedding events. For the past decade, computation-based prediction methods of protein secretion and subcellular localization have been proposed as alternative approaches, in addition to time-consuming biological experiments. Nevertheless, there seems to be no existing prediction tool that is as yet dedicated to assess the shedding events of membrane proteins and reflect the biological phenomenon on membrane protein release. Thus, a reliable strategy is needed that will evaluate the ectodomain shedding of membrane proteins in order to determine whether or not a membrane protein will be released into extracellular milieu. In this study, we are the first to have developed an in-house tool, ShedP, to identify membrane proteins that undergo ectodomain shedding. ShedP is the first computational method designed to identify shed membrane proteins. Having developed ShedP, we then constructed an integrated screening pipeline, SecretePipe, for the identification of secreted proteins by incorporating TMHMM, Hum-mPLoc, SignalP, and ShedP. SecretePipe was designed to identify both nonmembrane proteins with signal peptides and membrane proteins that are released into extracellular compartment by shedding events. We conducted a performance comparison between SecretePipe and three state-of-the-art predictors, SecretomeP, SignalP, and WoLF PSORT. As revealed in the prediction results, SecretePipe outperformed all the other predictors. The prediction results suggest that the presence of protein signal peptides and shedding events involving cell membrane proteins are the most critical factors with respect to protein secretion. SecretomeP was specifically designed to identify proteins following nonsignal peptide triggered secretion pathways and, unfortunately, produced the worst experimental results. WoLF PSORT predicts protein subcellular localizations based on protein sequences, and its prediction results probably rely on signal peptides that are believed to direct the transport of a protein. Even though WoLF PSORT may take into account the content of signal peptides, it is not designed specifically for signal peptide identification like SignalP. Therefore, the performance of WoLF PSORT is similar to but worse than that of SignalP. SecretePipe uses SignalP to identify signal peptides; the performance difference between SecretePipe and SignalP primarily comes from the ectodomian shedding prediction provided by ShedP. SecretePipe outperforms SignalP because it takes into account the shedding events of cell membrane proteins. It seems that there is no clear difference between SecretePipe and SignalP except for FDR, where SecretePipe outperformed SignalP. The ability to identify true negatives, however, is very important in biomarker discovery. The main task of SecretePipe is to predict shed membrane proteins, which could subsequently be applied in the screening of body fluid accessible markers from numerous protein marker candidates. While the following validation experiments for each candidate protein are usually time and resource consuming, the improvement of predictive performance in true negatives could help researchers to exclude false positives and therefore narrow down the candidate lists and identify the useful makers both accurately and efficiently. We further examined 27 proteins that were predicted to be membrane-bound and correctly classified as released (Table S6,

Nevertheless, when our attention was turned to the valuable source of potential biomarkers, the 176 membrane proteins in the test set (Table 2b), SecretePipe performed much better than the other predictors using the negative samples while keeping a comparable discriminative power as SignalP using the positive samples; SecretePipe again had one more wrong prediction out of 40 released membrane proteins in comparison to SignalP. Out of the 40 released membrane proteins, 11 received different predictions from SecretePipe and SignalP; SignalP had five false negative predictions, and SecretePipe had six false negative predictions. The five false negatives of SignalP were caused by unidentified signal peptides. In the six false negatives of SecretePipe, four were caused by the wrong prediction of Hum-mPLoc and two were caused by the wrong prediction of ShedP (Table 4, Supporting Information). On evaluation using the 136 nonreleased membrane proteins, however, SecretePipe had the best performance. SecretePipe was ranked at the top in most aspects, such as accuracy (SecretePipe, 0.84; SignalP, 0.78), specificity (SecretePipe, 0.88; WoLF PSORT, 0.82), MCC (SecretePipe, 0.54; SignalP, 0.46), and FDR (SecretePipe, 0.37; SignalP, 0.48). The evaluation results showed that SecretePipe is able to distinguish released membrane proteins from nonreleased membrane proteins better than the other predictors. We further evaluated the four predictors on the 3 membrane protein data sets from the cancer secretome profiles of 23 human cancer cell lines,2 6 human cancer cell lines,38 and thyroid hormone-regulated human hepatoma cells.39 SignalP again had the best sensitivity of the three data sets (0.53, 0.66, 0.60), while SecretePipe had a sensitivity comparable to that of SignalP (0.52, 0.68, 0.56). Both SignalP and SecretePipe outperformed SecretomeP (sensitivity 0.47, 0.40, 0.52) and WoLF PSORT (sensitivity 0.32, 0.31, 0.32). These results revealed that SecretePipe is a robust and competitive predictor for released membrane proteins. The protein accession number, protein description, and prediction result of each test protein were listed in the Supplementary Table S3; the performance of the four predictors on the three data sets was summarized in Table 3.



DISCUSSION For the purpose of noninvasive diagnosis, secretome profiling analysis has become an emerging field in the area of biomarker discovery; this is because numerous secreted biomarkers have Table 3. Performance Comparisons among SecretePipe, SecretomeP, SignalP, and WoLF PSORT on the Membrane Proteins Identified in Condition Media-Derived Data Sets of Three Secretome Studies sensitivity membrane protein data set from cancer secretome profiles 616 membrane proteins from 23 human cancer cell lines (Wu et al.2) 80 membrane proteins from 6 human cancer cell lines (Lawlor et al.38) 180 membrane proteins from thyroid regulatedhuman hepatoma cell (Chen et al.39)

SecretePipe

SecretomeP

SignalP

WoLF PSORT

0.52

0.47

0.53

0.32

0.65

0.40

0.68

0.31

0.56

0.52

0.60

0.32

1241

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244

Journal of Proteome Research

Article

similarity to EGF-like growth factor. As mentioned in a number of previous studies, the ectodomain shedding of EGF-like factors is essential to the release of mature form EGF-like factors; this type of shedding event is vital for the regulation of cell fate during proliferation and differentiation.50 Other experimental evidence has revealed that shedding events seem to be involved in the activity of the following membrane proteins: carboxypeptidase D (CPD), type I transmembrane receptor precursor (SEZ6L2), and major prion protein (PRNP). 51,52 Finally, Beta-1,4-galactosyltransferase 1 (B4GALT1) has been speculated to be released via proteolytic shedding from the cell membrane.53,54 Thus, as more membrane-bound proteins are identified as released via shedding and some of them may have important physiological functions in relation to molecular signaling, SecretePipe provides a tool for predicting potentially secreting proteins that may serve as clinically useful biomarkers.

Supporting Information), some of them being supported experimentally as released by previous studies (Table 4). The Table 4. List of Membrane Proteins Predicted to Be Shed with Evidential Support by Previous Studies

a

IPI IDa

Uniprot ID

protein description

IPI00395488 IPI00031821 IPI00002320

Q6EMK4 Q9Y287 Q9NZU0

IPI00015881 IPI00027310

P09603 Q7Z7M0

IPI00027078 IPI00018276 IPI00022284 IPI00215767

O75976 Q6UXD5 P04156 P15291

vasorin integral membrane protein 2B leucine-rich repeat transmembrane protein FLRT3 macrophage colony-stimulating factor 1 multiple epidermal growth factor-like domains protein 8 carboxypeptidase D type I transmembrane receptor precursor major prion protein (PrP) UDP-Gal:betaGlcNAc beta-1,4galactosyltransferase 1, membrane-bound form



ASSOCIATED CONTENT

S Supporting Information *

Abbreviation: IPI, international protein index.

Table S1, giving training and test samples of ShedP, Table S2, giving test samples of SecretePipe, Table S3, giving protein information and predictive results of the 1346 test proteins from A549 NSCLC secretome data, Table S4, giving a performance comparison of SecretePipe and SignalP on the 40 released membrane proteins from the A549 NSCLC data set, Table S5, giving predictive results of SecretePipe, SignalP, SecretomeP, and WoLF PSORT on the membrane proteins from three cancer secretome studies, and Table S6, giving a list of released membrane proteins correctly identified by SecretePipe on the A549 NSCLC data set. This material is available free of charge via the Internet at http://pubs.acs.org. SecretePipe and all its test data sets are publicly available at http://bal.ym.edu.tw/SecretePipe/.

membrane protein vasorin (VASN) has been found to be cleaved by the metalloprotease ADAM 17 (a disintegrin and a metalloprotease domain 17 protein).40 Vasorin has been reported to be a TGF-β binding protein that is involved in the TGF-β-mediated signaling pathway.41 The release of its extracellular domain following the proteolytic cleavage of vasorin has been reported to be involved in the modulation of the TGF-β-triggered epithelial-to-mesenchymal transition (EMT). As EMT plays a critical role in the progression of tumorgenesis,42 the ectodomain shedding of vasorin may be an event indicative of cancer progression.40 A proteolytic event affecting integral membrane protein 2B (BRI2) was also reported, whereby the extracellular BRICHOS domain is shed from the cell surface by metalloprotease ADAM 10 (a disintegrin and a metalloprotease domain 10 protein).43 BRI2 is a transmembrane protein that has been reported to interact with amyloid precursor protein (APP) and to regulate APP processing, one of the major pathogenesis events in Alzheimer’s disease (AD).44 Although the exact molecular function of the BRICHOS domain remains unclear, it has been suggested that the release of this extracellular domain may have a physiological function.43 The shedding events of membrane proteins have also been reported to mediate biological signaling during cell differentiation and development. Fibronectin and leucine-rich transmembrane protein-3 (FLRT3) was reported to be shed from neurons, with the released ectodomain of FLRT3 serving as a chemorepellent that modulates neuron migration. The shedding of the FLRT3 extracellular domain may play an essential role in the development of the nervous system.45 Macrophage colony-stimulating factor 1 (mCSF-1) has been reported to be shed from cell surfaces in both in vitro and animal model studies.46,47 mCSF-1 is a cytokine that regulates the differentiation and proliferation of hematopoietic precursor cells; mCSF is essential to the cell survival and proliferation of mononuclear phagocytes.46,48 The shed form of mCSF-1 has been found to be biologically active in bone and to be involved in the formation of osteoclasts.47,49 Even though there is no direct evidence of a shedding event in relation to multiple epidermal growth factor-like domains protein 8 (MEGF8), MEGF8 has been conjectured to undergo ectodomain shedding and to have a perceived physiological function due to its



AUTHOR INFORMATION

Corresponding Author

*Tel: +886-2-28267273. Fax: +886-2-28202508. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We thank Dr. Hsuan-Cheng Huang at the Institute of Biomedical Informatics, National Yang Ming University, for support and inspiring discussions. This work was financially supported by the National Science Council, Taiwan (NSC992320-B-010-022-MY2).



REFERENCES

(1) Robeson, R. H.; Siegel, A. M.; Dunckley, T. Genomic and Proteomic Biomarker Discovery in Neurological Disease. Biomarker Insights 2008, 3, 73−86. (2) Wu, C. C.; Hsu, C. W.; Chen, C. D.; Yu, C. J.; Chang, K. P.; Tai, D. I.; Liu, H. P.; Su, W. H.; Chang, Y. S.; Yu, J. S. Candidate serological biomarkers for cancer identified from the secretomes of 23 cancer cell lines and the human protein atlas. Mol. Cell. Proteomics 2010, 9 (6), 1100−1117. (3) Hudler, P.; Gorsic, M.; Komel, R. Proteomic strategies and challenges in tumor metastasis research. Clin. Exp. Metastasis 2010, 27 (6), 441−51. (4) Makridakis, M.; Vlahou, A. Secretome proteomics for discovery of cancer biomarkers. J. Proteomics 2010, 73 (12), 2291−2305. 1242

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244

Journal of Proteome Research

Article

(5) Ludwig, J. A.; Weinstein, J. N. Biomarkers in cancer staging, prognosis and treatment selection. Nat. Rev. Cancer 2005, 5 (11), 845−856. (6) Montes de Oca, B. P. Ectdomain shedding and regulated intracellular proteolysis in the central nervous system. Cent. Nerv. Syst. Agents Med. Chem. 2010, 10 (4), 337−359. (7) Arribas, J.; Massague, J. Transforming growth factor-alpha and beta-amyloid precursor protein share a secretory mechanism. J. Cell Biol. 1995, 128 (3), 433−441. (8) Bendtsen, J. D.; Jensen, L. J.; Blom, N.; Von Heijne, G.; Brunak, S. Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Sel. 2004, 17 (4), 349−356. (9) Horton, P.; Park, K. J.; Obayashi, T.; Fujita, N.; Harada, H.; Adams-Collier, C. J.; Nakai, K. WoLF PSORT: protein localization predictor. Nucleic Acids Res. 2007, 35 (Web Server issue), W585− W587. (10) Petersen, T. N.; Brunak, S.; von Heijne, G.; Nielsen, H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 2011, 8 (10), 785−786. (11) Nielsen, H.; Engelbrecht, J.; Brunak, S.; von Heijne, G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 1997, 10 (1), 1−6. (12) Luo, X.; Liu, Y.; Wang, R.; Hu, H.; Zeng, R.; Chen, H. A highquality secretome of A549 cells aided the discovery of C4b-binding protein as a novel serum biomarker for non-small cell lung cancer. J. Proteomics 2011, 74 (4), 528−538. (13) Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 2001, 305 (3), 567−580. (14) Sonnhammer, E. L.; von Heijne, G.; Krogh, A. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1998, 6, 175−182. (15) Shen, H. B.; Chou, K. C. A top-down approach to enhance the power of predicting human protein subcellular localization: HummPLoc 2.0. Anal. Biochem. 2009, 394 (2), 269−274. (16) Chou, K. C.; Shen, H. B. Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc. 2008, 3 (2), 153−162. (17) Shen, H. B.; Chou, K. C. Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem. Biophys. Res. Commun. 2007, 355 (4), 1006−1011. (18) Chou, K. C. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005, 21 (1), 10− 19. (19) Shen, H. B.; Chou, K. C. Ensemble classifier for protein fold pattern recognition. Bioinformatics 2006, 22 (14), 1717−1722. (20) The UniProt Consortium, Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2012, 40 (Database issue), D71−D75. (21) Rawlings, N. D.; Barrett, A. J.; Bateman, A. MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res. 2012, 40 (Databaseissue), D343−D350. (22) Igarashi, Y.; Heureux, E.; Doctor, K. S.; Talwar, P.; Gramatikova, S.; Gramatikoff, K.; Zhang, Y.; Blinov, M.; Ibragimova, S. S.; Boyd, S.; Ratnikov, B.; Cieplak, P.; Godzik, A.; Smith, J. W.; Osterman, A. L.; Eroshkin, A. M. PMAP: databases for analyzing proteolytic events and pathways. Nucleic Acids Res. 2009, 37 (Database issue), D611−D618. (23) Keshava Prasad, T. S.; Goel, R.; Kandasamy, K.; Keerthikumar, S.; Kumar, S.; Mathivanan, S.; Telikicherla, D.; Raju, R.; Shafreen, B.; Venugopal, A.; Balakrishnan, L.; Marimuthu, A.; Banerjee, S.; Somanathan, D. S.; Sebastian, A.; Rani, S.; Ray, S.; Harrys Kishore, C. J.; Kanth, S.; Ahmed, M.; Kashyap, M. K.; Mohmood, R.; Ramachandra, Y. L.; Krishna, V.; Rahiman, B. A.; Mohan, S.; Ranganathan, P.; Ramabadran, S.; Chaerkady, R.; Pandey, A. Human Protein Reference Database–2009 update. Nucleic Acids Res. 2009, 37 (Database issue), D767−D772.

(24) Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 2010, 26 (5), 680−682. (25) Morrison, C. J.; Butler, G. S.; Rodriguez, D.; Overall, C. M. Matrix metalloproteinase proteomics: substrates, targets, and therapy. Curr. Opin. Cell Biol. 2009, 21 (5), 645−653. (26) Reiss, K.; Saftig, P. The “a disintegrin and metalloprotease” (ADAM) family of sheddases: physiological and cellular functions. Semin. Cell Dev. Biol. 2009, 20 (2), 126−137. (27) Pruessmeyer, J.; Ludwig, A. The good, the bad and the ugly substrates for ADAM10 and ADAM17 in brain pathology, inflammation and cancer. Semin. Cell Dev. Biol. 2009, 20 (2), 164−174. (28) Edwards, D. R.; Handsley, M. M.; Pennington, C. J. The ADAM metalloproteinases. Mol. Aspects Med. 2008, 29 (5), 258−289. (29) Murphy, G. The ADAMs: signalling scissors in the tumour microenvironment. Nat. Rev. Cancer 2008, 8 (12), 929−491. (30) Shiomi, T.; Lemaitre, V.; D’Armiento, J.; Okada, Y. Matrix metalloproteinases, a disintegrin and metalloproteinases, and a disintegrin and metalloproteinases with thrombospondin motifs in non-neoplastic diseases. Pathol. Int. 2010, 60 (7), 477−496. (31) Chou, K. C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 2001, 43 (3), 246−255. (32) Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C. A.; Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000, 16 (5), 412−424. (33) Cortes, C.; Vapnik, V. Support Vector Network. Mach. Learn. 1995, 20, 273−297. (34) Chang, C. C.; Lin, C. J. LIBSVM: a library for support vector machines. ACM Transact. Int. Sys. Technol. 2011, 2 (3), 1−27. (35) Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M.; Sherlock, G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000, 25 (1), 25−29. (36) Emanuelsson, O.; Brunak, S.; von Heijne, G.; Nielsen, H. Locating proteins in the cell using TargetP, SignalP and related tools. Nat. Protoc. 2007, 2 (4), 953−971. (37) Kall, L.; Krogh, A.; Sonnhammer, E. L. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 2004, 338 (5), 1027−1036. (38) Lawlor, K.; Nazarian, A.; Lacomis, L.; Tempst, P.; Villanueva, J. Pathway-based biomarker search by high-throughput proteomics profiling of secretomes. J. Proteome Res. 2009, 8 (3), 1489−5103. (39) Chen, C. Y.; Chi, L. M.; Chi, H. C.; Tsai, M. M.; Tsai, C. Y.; Tseng, Y. H.; Lin, Y. H.; Chen, W. J.; Huang, Y. H.; Lin, K. H. Stable isotope labeling with amino acids in cell culture (SILAC)-based quantitative proteomics study of a thyroid hormone-regulated secretome in human hepatoma cells. Mol. Cell. Proteomics 2012, 11 (4), M111 011270. (40) Malapeira, J.; Esselens, C.; Bech-Serra, J. J.; Canals, F.; Arribas, J. ADAM17 (TACE) regulates TGFbeta signaling through the cleavage of vasorin. Oncogene 2011, 30 (16), 1912−1922. (41) Ikeda, Y.; Imai, Y.; Kumagai, H.; Nosaka, T.; Morikawa, Y.; Hisaoka, T.; Manabe, I.; Maemura, K.; Nakaoka, T.; Imamura, T.; Miyazono, K.; Komuro, I.; Nagai, R.; Kitamura, T. Vasorin, a transforming growth factor beta-binding protein expressed in vascular smooth muscle cells, modulates the arterial response to injury in vivo. Proc. Natl. Acad. Sci. U.S.A. 2004, 101 (29), 10732−10737. (42) Thiery, J. P.; Sleeman, J. P. Complex networks orchestrate epithelial-mesenchymal transitions. Nat. Rev. Mol. Cell Biol. 2006, 7 (2), 131−142. (43) Martin, L.; Fluhrer, R.; Reiss, K.; Kremmer, E.; Saftig, P.; Haass, C. Regulated intramembrane proteolysis of Bri2 (Itm2b) by ADAM10 and SPPL2a/SPPL2b. J. Biol. Chem. 2008, 283 (3), 1644−1652. (44) Fotinopoulou, A.; Tsachaki, M.; Vlavaki, M.; Poulopoulos, A.; Rostagno, A.; Frangione, B.; Ghiso, J.; Efthimiopoulos, S. BRI2 interacts with amyloid precursor protein (APP) and regulates amyloid beta (Abeta) production. J. Biol. Chem. 2005, 280 (35), 30768−30772. 1243

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244

Journal of Proteome Research

Article

(45) Yamagishi, S.; Hampel, F.; Hata, K.; Del Toro, D.; Schwark, M.; Kvachnina, E.; Bastmeyer, M.; Yamashita, T.; Tarabykin, V.; Klein, R.; Egea, J. FLRT2 and FLRT3 act as repulsive guidance cues for Unc5positive neurons. EMBO J. 2011, 30 (14), 2920−2933. (46) Tuck, D. P.; Cerretti, D. P.; Hand, A.; Guha, A.; Sorba, S.; Dainiak, N. Human macrophage colony-stimulating factor is expressed at and shed from the cell surface. Blood 1994, 84 (7), 2182−2188. (47) Yao, G. Q.; Wu, J. J.; Sun, B. H.; Troiano, N.; Mitnick, M. A.; Insogna, K. The cell surface form of colony-stimulating factor-1 is biologically active in bone in vivo. Endocrinology 2003, 144 (8), 3677− 3682. (48) Wang, Y.; Mo, X.; Piper, M. G.; Wang, H.; Parinandi, N. L.; Guttridge, D.; Marsh, C. B. M-CSF induces monocyte survival by activating NF-kappaB p65 phosphorylation at Ser276 via protein kinase C. PLoS One 2011, 6 (12), e28081. (49) Yao, G. Q.; Sun, B.; Hammond, E. E.; Spencer, E. N.; Horowitz, M. C.; Insogna, K. L.; Weir, E. C. The cell-surface form of colonystimulating factor-1 is regulated by osteotropic agents and supports formation of multinucleated osteoclast-like cells. J. Biol. Chem. 1998, 273 (7), 4119−4128. (50) Sanderson, M. P.; Dempsey, P. J.; Dunbar, A. J. Control of ErbB signaling through metalloprotease mediated ectodomain shedding of EGF-like factors. Growth Factors 2006, 24 (2), 121−136. (51) Hemming, M. L.; Elias, J. E.; Gygi, S. P.; Selkoe, D. J. Identification of beta-secretase (BACE1) substrates using quantitative proteomics. PLoS One 2009, 4 (12), e8477. (52) Parkin, E. T.; Watt, N. T.; Turner, A. J.; Hooper, N. M. Dual mechanisms for shedding of the cellular prion protein. J. Biol. Chem. 2004, 279 (12), 11170−11178. (53) Strous, G. J. Golgi and secreted galactosyltransferase. CRC Crit. Rev. Biochem. 1986, 21 (2), 119−151. (54) Schaub, B. E.; Berger, B.; Berger, E. G.; Rohrer, J. Transition of galactosyltransferase 1 from trans-Golgi cisterna to the trans-Golgi network is signal mediated. Mol. Biol. Cell 2006, 17 (12), 5153−5162.

1244

dx.doi.org/10.1021/pr3009012 | J. Proteome Res. 2013, 12, 1235−1244