Inference of Protein Function from Protein Structure
2005, Structure
https://doi.org/10.1016/J.STR.2004.10.015Abstract
Debnath Pal and David Eisenberg* teractions between these functions form the basis for sustainable homeostasis. These multiple levels of func-UCLA-DOE Institute for Genomics and Proteomics tion are reflected in our procedure, described below, of Howard Hughes Medical Institute linking protein features to annotations at various levels. Box 951570 The repertoire of methods for in silico annotation of Los Angeles, California 90095 function has grown enormously over the past two decades. A protein with a high degree of sequence similarity to a family of well-characterized proteins can be Summary detected by BLAST (Altschul et al., 1990). With lower sequence similarity, more subtle methods such as "pro-Structural genomics has brought us three-dimensional files" (where patterns obvious from multiple sequence structures of proteins with unknown functions. To shed alignment are evident) (Altschul et al., 1997; Bork and light on such structures, we have developed ProKnow Gibson, 1996; Gribskov et al., 1987) or hidden Markov (http://www.doe-mbi.ucla.edu/Services/ProKnow/), which models (HMM) (Eddy et al., 1995) are required. These annotates proteins with Gene Ontology functional methods are based on the assumption that similar seterms. The method extracts features from the protein quences have descended from a common ancestor such as 3D fold, sequence, motif, and functional linkand share similar function. The assumption is, howages and relates them to function via the ProKnow ever, limited in validity, as demonstrated by numerous knowledgebase of features, which links features to studies (Devos and Valencia, 2000; Gerlt and Babbitt, annotated functions via annotation profiles. Bayes' 2000; Karp, 1998; Rost, 2002; Rost et al., 2003; Rost theorem is used to compute weights of the functions and Valencia, 1996; Tian and Skolnick, 2003; Whisstock assigned, using likelihoods based on the extracted and Lesk, 2003). To enhance accuracy of functional asfeatures. The description level of the assigned funcsignment, functional annotations can be inferred from tion is quantified by the ontology depth (from 1 = information on fold (Bowie et al., 1991; Holm and general to 9 = specific). Jackknife tests show 89% Sander, 1998; Jones et al., 1992), motif (Attwood et al., correct assignments at ontology depth 1 and 40% at 2003; Henikoff et al., 2000; Hulo et al., 2004), domain depth 9, with 93% coverage of 1507 distinct folded (Bateman et al., 2004), and orthology (Tatusov et al., proteins. Overall, about 70% of the assignments were 1997). Another class of annotation algorithms infers inferred correctly. This level of performance suggests protein function based on identification of functionally that ProKnow is a useful resource in functional assignificant residues. This class includes biodictionary sessments of novel proteins. "seqlets" mapping sequence patterns to their properties (Rigoutsos et al., 2002), evolutionary tracing (Land
References (38)
- Pitman, J. (1997). Probability (New York: Springer).
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. Prilc, A., Dominigues, F.S., Lackner, P., and Sippl, M.J. (2004).
- Basic local alignment search tool. J. Mol. Biol. 215, 403-410. Wilma-automated annotation of protein sequences. Bioinformatics Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., 20, 127-128.
- Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: Rigoutsos, I., Huynh, T., Floratos, A., Parida, L., and Platt, D. (2002).
- a new generation of protein database search programs. Nucleic Dictionary-driven protein annotation. Nucleic Acids Res. 30, 3901- Acids Res. 25, 3389-3402.
- Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Rost, B. (2002). Enzyme function less conserved than anticipated.
- Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P., et al. J. Mol. Biol. 318, 595-608.
- PRINTS and its automatic supplement, prePRINTS. Nucleic Rost, B., and Valencia, A. (1996). Pitfalls of protein sequence analy- Acids Res. 31, 400-402. sis. Curr. Opin. Biotechnol. 7, 457-461.
- Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths- Rost, B., Liu, J., Nair, R., Wrzeszczynski, K.O., and Ofran, Y. (2003).
- Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Automatic prediction of protein function. Cell. Mol. Life Sci. 60, et al. (2004). The Pfam protein families database. Nucleic Acids 2637-2650.
- Schmitt, S., Kuhn, D., and Klebe, G. (2002). A new method to detect
- Bork, P., and Gibson, T.J. (1996). Applying motif and profile related function among proteins independent of sequence and fold searches. Methods Enzymol. 266, 162-184. homology. J. Mol. Biol. 323, 387-406.
- Bowie, J.U., Luethy, R., and Eisenberg, D. (1991). A method to iden- Strong, M., Graeber, T.G., Beeby, M., Pellegrini, M., Thompson, tify protein sequences that fold into a known three-dimensional M.J., Yeates, T.O., and Eisenberg, D. (2003). Visualization and inter- structure. Science 253, 164-170. pretation of protein networks in Mycobacterium tuberculosis based on hierarchical clustering of genome-wide functional linkage maps.
- Cai, C.Z., Wang, W.L., Sun, L.A., and Chen, Y.Z. (2003). Protein Nucleic Acids Res. 31, 7099-7109.
- function classification via support vector machine approach. Math. Biosci. 185, 111-122.
- Tatusov, R.L., Koonin, E.V., and Lipman, D.J. (1997). The genomics perspective on protein families. Science 278, 631-637.
- Devos, D., and Valencia, A. (2000). Practical limits of function pre- diction. Proteins 41, 98-107.
- Tian, W., and Skolnick, J. (2003). How well is enzyme function con- served as a function of pairwise sequence identity? J. Mol. Biol.
- Eddy, S.R., Mitchison, G., and Durbin, R. (1995). Maximum discrimi- 333, 863-882.
- nation hidden Markov models of sequence consensus. J. Comput. Biol. 2, 9-23.
- Todd, A.E., Orengo, C.A., and Thornton, J.M. (2002). Sequence and structural differences between enzyme and nonenzyme homologs.
- Eisenberg, D., Marcotte, E.M., Xenarios, I., and Yeates, T.O. (2000). Structure 10, 1435-1451.
- Protein function in the post-genomic era. Nature 405, 823-826.
- Wallace, A.C., Laskowski, R.A., and Thornton, J.M. (1996). Deriva- Gene Ontology Consortium(2001). Creating the gene ontology re- tion of 3D coordinate templates for searching structural databases: source: design and implementation. Genome Res. 11, 1425-1433. application to Ser-His-Asp catalytic triads in the serine proteinases
- Gerlt, J.A., and Babbitt, P.C. (2000). Can sequence determine func- and lipases. Protein Sci. 5, 1001-1013. tion? Genome Biol. 1, 1-10.
- Wangikar, P.P., Tendulkar, A.V., Ramya, S., Mali, D.N., and Sarawagi, Gribskov, M., McLachlan, M., and Eisenberg, D. (1987). Profile S. (2003). Functional sites in protein families uncovered via an ob- analysis: detection of distantly related proteins. Proc. Natl. Acad. jective and automated graph theoretic approach. J. Mol. Biol. 326, Sci. USA 84, 4355-4358. 955-978.
- Guo, J.T., Xu, D., Kim, D., and Xu, Y. (2003). Improving the perfor- Whisstock, J.C., and Lesk, A.M. (2003). Prediction of protein func- mance of DomainParser for structural domain partition using neural tion from protein sequence and structure. Q. Rev. Biophys. 36, network. Nucleic Acids Res. 31, 944-952. 307-340.
- Henikoff, J.G., Greene, E.A., Pietrokovski, S., and Henikoff, S. Wise, E., Yew, W.S., Babbitt, P.C., Gerlt, J.A., and Rayment, I. (2000). Increased coverage of protein families with the blocks data- (2002). Homologous (β/α)8-barrel enzymes that catalyze unrelated base servers. Nucleic Acids Res. 28, 228-230. reactions: orotidine 5#-monophosphate decarboxylase and 3-keto- L-gulonate 6-phosphate decarboxylase. Biochemistry 41, 3861- Holm, L., and Sander, C. (1998). Touring the fold space with DALI/ 3869. FSSP. Nucleic Acids Res. 26, 316-319.
- Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., and Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bor- Eisenberg, D. (2002). DIP, database of interacting proteins: a re- doli, L., Gattiker, A., De Castro, E., Bucher, P., and Bairoch, A. search tool for studying cellular networks of protein interactions.
- Recent improvements to the PROSITE database. Nucleic Nucleic Acids Res. 30, 303-305.
- Acids Res. 32, D134-D137.
- Yao, H., Kristensen, D.M., Mihalek, I., Sowa, M.E., Shaw, C., Kim- Jeffery, C.J. (1999). Moonlighting proteins. Trends Biochem. Sci. 24, mel, M., Kavraki, L., and Lichtarge, O. (2003). An accurate, sensi- 8-11. tive, and scalable method to identify functional sites in protein
- Jensen, L.J., Gupta, R., Staerfeldt, H.-H., and Brunak, S. (2003). structures. J. Mol. Biol. 326, 255-261.
- Prediction of human protein function according to Gene Ontology Categories. Bioinformatics 19, 635-642.
- Jones, D.T., Taylor, W.R., and Thornton, J.M. (1992). A new ap- proach to protein fold recognition. Nature 358, 86-89.
- Karp, P.D. (1998). What do we know about sequence analysis and sequence databases. Bioinformatics 14, 753-754.
- Kleywegt, G.J. (1999). Recognition of spatial motifs in protein struc- tures. J. Mol. Biol. 285, 1887-1897.
- Landgraf, R., Xenarios, I., and Eisenberg, D. (2001). Three-dimen-