EpiC: An Open Resource for Exploring Epitopes To Aid Antibody

May 24, 2010 - resource (http://epic.embl.de) for the proteomics com- munity; the Epitope Choice Resource (EpiC) for the selec- tion of epitopes and ...
0 downloads 0 Views 2MB Size
EpiC: An Open Resource for Exploring Epitopes To Aid Antibody-Based Experiments Niall J. Haslam* and Toby J. Gibson European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany Received January 18, 2010

Abstract: Antibodies are a primary research tool for a diverse range of experiments in biology, from development to pathology. Their utility is derived from their ability to specifically identify proteins at a high level of sensitivity. This diversity of experimental requirements stretches the capabilities of these key research reagents. However, antibodies seem well placed to answer the challenges of the forthcoming proteome-scale biology. Their use in such a wide variety of experimental requirements impacts on the choice of epitope used to raise the antibody. Understanding the constraints imposed by the experimental configuration is crucial to developing well-characterized affinity reagents. Their application to a wide range of biological fields and relatively low-cost of manufacture has ensured that the demand for a resource of wellcharacterized antibodies will remain high and that they will be an important biological resource for the foreseeable future. This demand will only increase as the number of therapeutic targets continues to grow. Current tools to aid in the production of affinity reagents are disparate and not freely available. We present a freely available Web resource (http://epic.embl.de) for the proteomics community; the Epitope Choice Resource (EpiC) for the selection of epitopes and characterization of the target protein. It provides the community with a single Web-based portal for the exploration of epitopes on a target protein and connects over the Internet to a wide range of bioinformatic tools ensuring that data being presented are up to date. Keywords: Antigenicity prediction • epitope selection • affinity proteomics • antigenicity software

Introduction Antibodies are invaluable research tools, deployed in a wide range of experimental technologies. However, the utility of an antibody in one experimental context does not guarantee its utility in another. The state of the protein, the functional module under investigation, and the requirements of the experimental setup can all alter the optimal epitope in an antibody-based experiment.1 Many tools are available for the prediction of antigenicity and protein modules in proteins. * To whom correspondence should be addressed. E-mail: niall@ sgenomics.org. 10.1021/pr100029f

 2010 American Chemical Society

These are distributed over a number of different resources requiring different pieces of information about the protein of interest. The Epitope Choice Resource (EpiC, http://epic. embl.de) is a resource that collates bioinformatic analyses and annotated information to inform epitope selection. EpiC utilizes information on both the protein target and the planned experiment in order to aid in the selection of epitopes. A major current objective in biology is the attempt to develop technologies and reagents to characterize fully the human proteome.2 The size of the human genome is currently estimated to comprise around 24 000 protein encoding genes,3 but that of the proteome is manyfold larger, due to splice-variation and post-translational modification (PTM). Tools are needed to identify the constituent proteins, quantify their amounts, and define their intracellular localization. Additionally, research scientists need tools to differentiate between their functional states which can be modified, for example, through PTMs, as well as structural conformational changes. This is particularly important when considering the fact that for just under half of the genome-encoded proteins no function has been experimentally determined.3 Antibody-based methodologies, such as Western blotting, immunoprecipitation, immunohistochemistry, and others are being increasingly applied to a host of problems in biology. Consequently, antibodies are used in a wide range of biological disciplines from cell biology to development and from single protein assays to high-throughput proteomics arrays. Such characterization of protein function is central to an understanding of human disease. Therefore, the lack of suitable antibody reagents for the majority of proteins is a major hurdle in disease exploration. More tools and online resources, such as Antibodypedia (a user editable database cataloguing the antibodies developed within the human proteome resource and elsewhere), to help generate and characterize antibodies and do so in a high-throughput manner will therefore accelerate the ease of discovery in biology.4 The Epitope Choice Resource discussed in this paper will help in the identification of epitopes and subsequent production of antibodies for the research community. The Human Proteome Organisation (HUPO) has recognized the importance of affinity reagents such as antibodies in a wide range of biological problems.5 For example, they are a key resource in the quest to implement a protein atlas, a compendium of protein locations on the cellular and subcellular level.6 There is currently an initiative to annotate experimentally the remainder of the human proteome, which will lead to a new era of proteome-scale biology. The development of resources Journal of Proteome Research 2010, 9, 3759–3763 3759 Published on Web 05/24/2010

technical notes of antibodies for all proteins is likely to continue for some time into the future, and therefore, the development of improved, cheaper, and more accessible computational and Web-based tools will aid in the development of such physical resources. The advances in next-generation sequencing technologies are likely to result in the identification of new disease mutations in previously uncharacterised proteins.7 Therefore, there is a need to develop biochemical reagents to determine their properties, interactions, localizations, and so forth. Where the function of the protein is known, EpiC can help in identifying regions of the protein that a binder reagent may be useful in disrupting, for example, short linear motifs.8 For those proteins which are well-characterized experimentally, it is important to display this information so that it can be built upon and used to drive further experimental investigation. An example of this is in the display of high-throughput phosphorylation site identification,9 where the use of affinity reagents in corroborating this data is particularly important. Furthermore, once validated, the reagents could be used to interfere by competitive binding with the phosphosite and therefore be of use in assays (especially Western blots) designed to interrogate signaling pathways. Using multiple antibodies, one to detect the protein and one to detect phosphosite occupancy, it is possible to investigate phosphorylation events within the cell. This method is being employed successfully in the study of AKT and MAP kinase activity.10,11 EpiC brings together for the first time all this information freely, from a single entry point (the protein sequence or UniProt identifier), and displays it in one screen. In the case of protein fragments or proteins that are less well characterized, EpiC provides predictions based on the physicochemical properties of the protein sequence.

Methods Ontological Description of the Experiment. The wide range of uses to which antibodies can be applied impacts on the type of analysis that is required from a bioinformatic point of view. Therefore, it is crucial that EpiC captures this information and incorporates it into the resource in a structured manner to determine what analysis to run. EpiC makes use of the Protein Standards Initiative Protein Affinity Reagent (PSI-PAR) Ontology to determine the experimental requirements of the user. This provides the link between the experimental context and the bioinformatic analysis required. The PSI-PAR Ontology and Exchange Format are described elsewhere.12 The experimental context captured in the input to EpiC drives the module assignment engine in EpiC shown in Figure 1. This determines which information about the protein should be used in the display of the results. Central to this is an understanding of the physical state of the protein required by the different types of experimental methods, from Western blots and ELISAs to pull-downs. Depending on the experimental context, different analysis modules are assigned. There is no universal approach that can be used. The state of the protein in the assay is one of the most carefully considered aspects of EpiC. Certain assays permit the use of native proteins and, therefore, require an awareness of its functional modules, including domains, motifs, and unstructured regions. For this, EpiC makes use of a number of prediction tools (Table 1). Since there is no one definitive answer that can be provided to a user, it is important to display all pertinent information, without overloading the user, to enable him/her to make an informed decision on the likely location of an epitope in their protein of interest. This requires the display of known func3760

Journal of Proteome Research • Vol. 9, No. 7, 2010

Haslam and Gibson

Figure 1. Pipeline of the Epitope Choice Resource. The experimental information is used to drive the analyses presented to the user who enters the protein sequence or identifier. This mapping is central to driving the context dependent analyses which are used to create the results displayed to the user.

tional and structural information about the protein, when available and where appropriate. Given the rapidly evolving state of protein annotation, it is incumbent upon EpiC to deliver current information. Distributed Analysis. EpiC uses a network of distributed biocomputing resources to identify antigenic regions of a protein for investigation in antibody-based experiments. This removes the necessity for the EpiC resource to maintain a copy of the commonly used databases upon which it relies, and the curation and maintenance costs that would entail. The use of distributed resources also ensures that the data presented in EpiC are up-to-date. This has informed the technical implementation of the resource discussed in more detail in our article on data integration issues for EpiC.13 The bioinformatic analysis programs are run on remote servers using webservices technologies. This reduces the requirement for EpiC to be run on expensive computational hardware. Instead dedicated resources at institutes such as the European Bioinformatics Institute (EBI) are used. This is facilitated by resources such as the EMBRACE registry and the DAS registry at the Sanger Institute, Hinxton.14,15 The use of precomputed analyses and data from DAS sources removes the need to recompute common analyses and allows faster return of results. From a user point of view, this results in an increased sophistication of analysis possible through a single portal, while for the developer, it decreases the complexity of developing such applications. Web-Interface to Multiple Remote Servers. Rather than produce a script to be run from the command line or program to be downloaded, we have developed a Web-interface to the resource. This reuse of the pre-existing resources adds value to the databases and servers already in production from large providers like UniProt through to smaller laboratories that only produce single, simple, modular services designed to be consumed in this automated way. The PTM information is fetched from Netglyc, NetPhos, NetAcet, phospho.ELM and phosida.9,16-19 Splice site variants are provided from UniProt.3 Potential nuclear export signals are provided by NetNes.20 Cleavage site and proteosomal digest sites are predicted by

technical notes

EpiC: An Open Resource for Exploring Epitopes Table 1. Summary of the Bioinformatic Resources Used by EpiC abbreviation

PICR OLS GlobPlot

name

Protein Atlas

Protein Identifier Cross-Reference Service Ontology Lookup Service Exploring protein sequences for globularity and disorder Human Protein Atlas

phospho.ELM IEDB AR

Phosphorylation related ELMs Immune Epitope Database Analysis Resource

NetNes EMBOSS

Neural Network based prediction of NES European Molecular Biology Open Software Suite

UniProt UniProtJapi SMART pFam disoDB netPhos phosida netOGlyc Phobius

Universal Protein Resource Remote UniProt Service Simple Modular Architecture Research Tool Protein Family Resource Disorder Database Neural Network based High-throughtput phosphorylation database Neural Network based combined transmembrane topology and signal peptide predictor Neural Network based Markov Model Based TM prediction Neural Network based Acetylation prediction Predictions of all-alpha transmembrane proteins Prediction of the Presence and location of signal peptide Protein Sequence Analysis Basic Local Alignment and Search Tool

netNGlyc TmHmm netAcet Pongo signalP DASher BLAST

EMBOSS, which also provides hydrophobicity prediction.21 Further hydrophobicity predictions are retrieved from the Sonnhammer lab DAS resource.22 Transmembrane prediction is provided by Phobius and TMHMM.23,24 Domain structure prediction is provided by SMART, pFam, and Globplot.25-27 The IEDB analysis resource provided the tools for the prediction of physicochemical properties including antigenicity.28 The resources used are summarized in Table 1. EpiC builds upon the shared bioinformatic resource infrastructure created in projects such as EMBRACE and BioSapiens. These projects have created a networking infrastructure that now enables the decentralized access to a wide range of bioinformatic tools and data. EMBRACE has generated a set of tools that are distributed throughout Europe and which enable researchers to interact more easily with software and computational resources. This enables small research teams to have access to large computational resources in a relatively straightforward manner. The BioSapiens project focused on the distribution of biological data and generated a set of standards for the dissemination of the information.29 This included an ontology to describe the information and facilitate reuse by tools downstream of the project. EpiC is one of the first beneficiaries of this new system of bioinformatic analyses. It enables researchers to run jobs from a wide range of tools and collect information from many diverse sources without having to know where these tools or sources are. The collection of data and sequence analysis is hidden from the user. Nevertheless, the user is presented with timely and up-to-date information without the need to visit a plethora of disparate Web sites with conflicting data entry formats. Functional Modules Constrain Epitope Choice. The provision of functional information in addition to the synthesis of antigenicity prediction tools provides researchers with necessary information to make a decision about the design of their experiment. For example, the display of transmembrane pre-

function

reference

Maps sequence to identifier in UniProt Driving the module assignment engine Globularity and Disorder Prediction

30 32 27

Link out to explore the Human Protein Atlas entry for that sequence Phosphorylation Data Antigenicity and physicochemical properties determination Nuclear Export Signal prediction Hydrophobic moment, antigenicity and protease cleavage site prediction Functional Annotation Protein Name Lookup Domain prediction Protein Domain Prediction Disorder region prediction Phosphorylation prediction Phosphorylation annotation glycosylation prediction Transmembrane Prediction

6

glycosylation prediction Transmembrance prediction Acetylation prediction transmembrane prediction Signal Peptide prediction Hydrophobicity Properties Potential Cross-Reactivity Check

19 28 20 21 34 35 25 34 35 17 9 16 23 16 24 18 36 37 22 31

dictions is important when the experiment is to be carried out in vivo but irrelevant in Western blots. Similarly, the display of predicted and known extracellular regions is also important for experimental design. Phosphorylation site identification or prediction is important for designing phosphosite occupancy experiments. The exploration of the AKT pathways is a good example of the need for multiple antibodies, both within the pathway and for each individual protein.10 This again highlights the need for careful consideration, not just of the antigenicity of the protein, or the cross-reactivity of the antibody, but also the functional characterization that the antibody is being used to determine. This information is also useful in designing experiments where the user may wish to avoid epitopes on phosphorylated residues. Overall, the inclusion of functional predictions and annotations is an important addition to the tools for antigenicity prediction. Consensus Pipeline. The user enters a protein sequence or identifier into the user interface. If a sequence is entered, EpiC tries to find the identifier using the PICR service from the EBI.30 If the PICR service is unable to provide the UniProt identifier of the sequence, EpiC runs a suite of antigenicity tools and returns the results to the user. This is normally the result when only a fragment of the protein sequence is used. However, PICR is able to identify correctly a sequence if the full-length protein is used. The input of an accession number or identifier from UniProt, the Swiss-Prot name, or the full-length sequence allows EpiC to fetch the sequence from a DAS source. The use of an identifier enables a more sophisticated analysis, due to the increased ability to identify the sequence through crossreferences in other databases. This means EpiC is then able to retrieve a much wider range of information about the protein. Sequence identifiers and cross-references are used by a large number of databases to share information about proteins. Therefore, it is possible to extract information on PTMs, splice variants, signal peptides, and other annotated details which Journal of Proteome Research • Vol. 9, No. 7, 2010 3761

technical notes

Haslam and Gibson

would be impossible given only the raw sequence. Together with the information gathered from running the protein sequence through a range of analysis tools, it is possible to filter possible antigenic sites on the basis of the annotated information. Upon identifying antigenic sites using the antigenic algorithms, EpiC then filters the results, removing antigenic peptides located in signal peptides or transmembrane regions. The peptides are also checked to ensure that they are not strongly hydrophobic, using the EMBOSS hmoment program.21 Moreover, cysteine-containing peptides are also removed, since they are known to be difficult to work with due to oxidative cross-linking. Finally, each visual feature in the user interface provides more information, for example, by clicking on a feature, such as a predicted epitope, the user is able to initiate a BLAST analysis of the peptide.31 This aids identification of potential cross-reactivity both within the originating species and other species. This ability to identify potential crossreactivity will increase the applicability of the antibody. Since only a few epitopes are selected per protein by EpiC, there is no need to precompute similarities for all subsequences of the target protein.

Results and Discussion We present here a tool that is capable of being used by experimentalists without the need to trawl through a range of bioinformatic Web sites. The EpiC resource presents relevant annotation on the target protein, which is crucial for designing antibody-based experiments. The resource is entirely Webbased, therefore, requiring no download of software and is freely available for researchers. The Web-interface allows the collection of data about the planned experiment. This is sent to the resource and then a results screen is shown to the user, as shown in Figures 2 and 3. The results are clearly summarized and can be explored to verify the providence and validity of the information being presented.

Figure 2. Subsection of the result output from EpiC showing the consensus report (see Consensus Pipeline section). This section of the result displays a graphical representation of the physicochemical properties analyses and antigenicity prediction. Favorable results are highlighted in blue in each of the graphs. These are interactive and can be adjusted dynamically by the user. Ranges can also be dynamically highlighted.

The first section of the results shows the protein sequence, its name, a legend, a link to the protein atlas entry for that sequence (if human), and a configuration panel for controlling the display of the graphical components of the results.6 This allows the user to highlight sections of the graphs interactively. Hovering over a data point on a graph will display the values associated with that data point. The IEDB analysis resource tools are used to generate the data for the graphs for the prediction of antigenicity and physicochemical properties of the protein sequence.28 Features such as predicted surface exposure, flexibility, hydrophilicity, propensity for disorder, and hydrophobicity are represented in the first section of the results in Figure 2 which also illustrates the presentation of results from the consensus pipeline described earlier. This consensus prediction is an attempt to synthesize the information that EpiC has received from the disparate servers. Bringing the information together in one place is useful, but still requires some analysis by the user. The consensus view is an attempt to distill the appropriate information and present clear guidance to the user. Of course, all the information used to arrive at this decision is presented to the user, so that he/she can decide between the choices presented. The user is free to view the provenance of the information to determine if it is experimental evidence or a prediction. This can all be achieved interactively inside the Web page. 3762

Journal of Proteome Research • Vol. 9, No. 7, 2010

Figure 3. Subsection of the result output showing the protein features. These are fetched from DAS sources; the module assignment engine decides which features to display. Each feature of the protein is color-coded according to the legend in the results section. Upon hovering over each feature, the ontological description of the feature is displayed.

Concluding Remarks The adoption of a modular approach means that improvements in antigenicity prediction can easily be incorporated into the pipeline without the need for extensive redesign. The use of standardized protocols for data exchange (for example, DAS) and operation of networked analysis will facilitate further development of the resource. From the user’s point of view, the capabilities can improve over time, but there is no need to stay abreast of the developments in the bioinformatic community. Nevertheless, the display of the provenance of the data allows the user to make informed choices about the reliability

technical notes

EpiC: An Open Resource for Exploring Epitopes of the data. This focus on delivering bioinformatic analyses in a coherent framework, through a freely available and easy to use Web interface, will improve user access to the latest data and progress in epitope selection. The cumulative nature of the work builds upon the resources and infrastructure of other projects, and therefore delivers on the idea of code and information reuse. This represents a departure from the monolithic development processes of the past and brings together information and cutting edge algorithms seamlessly. We invite users to mail us with other tools that may be incorporated into the resource in order to improve the overall experience.

Acknowledgment. This work was funded by the EU Framework 6 programme research infrastructure coordination action ProteomeBinders. The authors would like to thank the members of the project for their invaluable feedback and advice, in particular, Mike Taussig, Oda Stoevesandt, Alan Sawyer, David Sherman, Erik Bjo¨rling, Sandrine Palcy, Henning Hermjakob, Julie Bourbeillon, and David Gloriam. We would also like to acknowledge the work of other projects, in particular BioSapiens and EMBRACE for the provision of a framework upon which to build. EpiC makes use of a number of open source toolkits and packages, in particular biojava and dasobert, and we would like to thank the developers of those, in particular Andreas Prlic. Furthermore, we would like to thank all the sysadmins and programmers of the webservices that we use for their support and time.

References (1) Taussig, M. J.; et al. ProteomeBinders: planning a European resource of affinity reagents for analysis of the human proteome. Nat. Methods 2007, 4, 13–17. (2) Stoevesandt, O.; Taussig, M. J. Affinity reagent resources for human proteome detection: Initiatives and perspectives. Proteomics 2007, 7, 2738–2750. (3) Uniprot-Consortium. The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res. 2009, 37 (Database issue), D169–174. (4) Bjo¨rling, E.; Uhle´n, M. Antibodypedia, a portal for sharing antibody and antigen validation data. Mol. Cell. Proteomics 2008, 7, 2028– 2037. (5) Gavin, A.-C.; Aebersold, R.; Heck, A. J. Meeting Report on the 7th World Congress of the Human Proteome Organization (HUPO) in Amsterdam: Proteome Biology. Mol. Cell. Proteomics 2008, 7, 2288–2291. (6) Persson, A.; Hober, S.; Uhle´n, M. A human protein atlas based on antibody proteomics. Curr. Opin. Mol. Ther. 2006, 8, 185–190. (7) Uhlen, M. A new era for proteomics research. Genome Biol. 2008, 9, 325+. (8) Diella, F.; Haslam, N.; Chica, C.; Budd, A.; Michael, S.; Brown, N. P.; Trave, G.; Gibson, T. J. Understanding eukaryotic linear motifs and their role in cell signaling and regulation. Front. Biosci. 2008, 13, 6580–6603. (9) Gnad, F.; Ren, S.; Cox, J.; Olsen, J.; Macek, B.; Oroshi, M.; Mann, M. PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol. 2007, 8, R250. (10) Johnson, S. A.; Hunter, T. Kinomics: methods for deciphering the kinome. Nat. Methods 2005, 2, 17–25. (11) Cetin, A.; Ozturk, O. H.; Tokay, A.; Akc¸it, F.; Cag˘lar, S.; Yes¸ilkaya, A. Angiotensin II-induced MAPK phosphorylation mediated by Ras and/or phospholipase C-dependent phosphorylations but not by protein kinase C phosphorylation in cultured rat vascular smooth muscle cells. Pharmacology 2007, 79, 27–33. (12) Gloriam, D. E.; et al. Report: A community standard format for the representation of protein affinity reagents. Mol. Cell. Proteomics 2010, 9 (1), 1–10. (13) Haslam, N.; Gibson, T. EpiC: a resource for integrating information and analyses to enable selection of epitopes for antibody based

(14) (15)

(16)

(17) (18) (19) (20) (21) (22) (23) (24)

(25) (26) (27) (28) (29) (30)

(31) (32)

(33) (34)

(35) (36)

(37)

experiments. In Lecture Notes in Bioinformatics; Springer-Verlag. Berlin, 2009; pp 173-181. Prlic´, A.; Down, T. A.; Kulesha, E.; Finn, R. D.; Ka¨ha¨ri, A.; Hubbard, T. J. Integrating sequence and structural biology with DAS. BMC Bioinf. 2007, 8, 333. Pettifer, S.; Thorne, D.; McDermott, P.; Attwood, T.; Baran, J.; Bryne, J. C.; Hupponen, T.; Mowbray, D.; Vriend, G. An active registry for bioinformatics web services. Bioinformatics 2009, 25 (16), 2090–2091. Julenius, K.; Mølgaard, A.; Gupta, R.; Brunak, S. Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 2005, 15, 153–164. Miller, M. L.; Blom, N. Kinase-specific prediction of protein phosphorylation sites. Methods Mol. Biol. 2009, 527, 299–310. Lars, K.; Dyrlov, B. J.; Nikolaj, B. NetAcet: prediction of N-terminal acetylation sites. Bioinformatics 2005, 21, 1269–1270. Diella, F.; Gould, C. M.; Chica, C.; Via, A.; Gibson, T. J. Phospho.ELM: a database of phosphorylation sites update 2008. Nucleic Acids Res. 2007, 36, D240–244. Cour, T. L.; Kiemer, L.; Mølgaard, A.; Gupta, R.; Skriver, K.; Brunak, S. Analysis and prediction of leucine-rich nuclear export signals. Protein Eng. 2004, 17, 527–536. Rice, P.; Longden, I.; Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16, 276–277. Messina, D. N.; Sonnhammer, E. L. L. DASher: a stand-alone protein sequence client for DAS, the Distributed Annotation System. Bioinformatics 2009, 25, 1333–1334. Ka¨ll, L.; Krogh, A.; Sonnhammer, E. L. Advantages of combined transmembrane topology and signal peptide prediction-the Phobius web server. Nucleic Acids Res. 2007, 35, W429–432. Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 2001, 305, 567– 580. Letunic, I.; Doerks, T.; Bork, P. SMART 6: recent updates and new developments. Nucleic Acids Res. 2009, 37, D229–232. Yeang, C.-H.; Haussler, D. Detecting coevolution in and among protein domains. PLoS Comp. Biol. 2007, 3, e211+. Linding, R.; Russell, R. B.; Neduva, V.; Gibson, T. J. GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic Acids Res. 2003, 31, 3701–3708. Zhang, Q. Immune epitope database analysis resource (IEDB-AR). Nucleic Acids Res. 2008, 36, W513–518. Thornton, J.; Network, T. B. Annotations for all by all - the BioSapiens network. Genome Biol. 2009, 10 (2), 401. Cote, R.; Jones, P.; Martens, L.; Kerrien, S.; Reisinger, F.; Lin, Q.; Leinonen, R.; Apweiler, R.; Hermjakob, H. The Protein Identifier Cross-Reference (PICR) service: reconciling protein identifiers across multiple source databases. BMC Bioinf. 2007, 8, 401. Labarga, A.; Valentin, F.; Anderson, M.; Lopez, R. Web services at the European bioinformatics institute. Nucleic Acids Res. 2007, 35, W6–11. Coˆte´, R. G. G.; Jones, P.; Martens, L.; Apweiler, R.; Hermjakob, H. The Ontology Lookup Service: more data and better tools for controlled vocabulary queries. Nucleic Acids Res. 2008, 36, W372– 376. Patient, S.; Wieser, D.; Kleen, M.; Kretschmann, E.; Jesus Martin, M.; Apweiler, R. UniProtJAPI: a remote API for accessing UniProt data. Bioinformatics 2008, 24, 1321–1322. Finn, R. D.; Tate, J.; Mistry, J.; Coggill, P. C.; Sammut, S. J.; Hotz, H. R.; Ceric, G.; Forslund, K.; Eddy, S. R.; Sonnhammer, E. L.; Bateman, A. The Pfam protein families database. Nucleic Acids Res. 2008, 36, D281–288. Pentony, M. M.; Jones, D. T. Modularity of intrinsic disorder in the human proteome. Proteins: Struct., Funct., Bioinf. 2010, 78 (1), 212–221. Amico, M.; Finelli, M.; Rossi, I.; Zauli, A.; Elofsson, A.; Viklund, H.; von Heijne, G.; Jones, D.; Krogh, A.; Fariselli, P.; Luigi Martelli, P.; Casadio, R. PONGO: a web server for multiple predictions of allalpha transmembrane proteins. Nucleic Acids Res. 2006, 34, W169– 172. Bendtsen, J. D.; Nielsen, H.; von Heijne, G.; Brunak, S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 2004, 340, 783–795.

PR100029F

Journal of Proteome Research • Vol. 9, No. 7, 2010 3763