Inference of Protein Function from Protein Structure

Debnath Pal; David Eisenberg

doi:10.1016/J.STR.2004.10.015

Outline

Inference of Protein Function from Protein Structure

Debnath Pal

2005, Structure

https://doi.org/10.1016/J.STR.2004.10.015

visibility

…

description

10 pages

link

1 file

Abstract

Debnath Pal and David Eisenberg* teractions between these functions form the basis for sustainable homeostasis. These multiple levels of func-UCLA-DOE Institute for Genomics and Proteomics tion are reflected in our procedure, described below, of Howard Hughes Medical Institute linking protein features to annotations at various levels. Box 951570 The repertoire of methods for in silico annotation of Los Angeles, California 90095 function has grown enormously over the past two decades. A protein with a high degree of sequence similarity to a family of well-characterized proteins can be Summary detected by BLAST (Altschul et al., 1990). With lower sequence similarity, more subtle methods such as "pro-Structural genomics has brought us three-dimensional files" (where patterns obvious from multiple sequence structures of proteins with unknown functions. To shed alignment are evident) (Altschul et al., 1997; Bork and light on such structures, we have developed ProKnow Gibson, 1996; Gribskov et al., 1987) or hidden Markov (http://www.doe-mbi.ucla.edu/Services/ProKnow/), which models (HMM) (Eddy et al., 1995) are required. These annotates proteins with Gene Ontology functional methods are based on the assumption that similar seterms. The method extracts features from the protein quences have descended from a common ancestor such as 3D fold, sequence, motif, and functional linkand share similar function. The assumption is, howages and relates them to function via the ProKnow ever, limited in validity, as demonstrated by numerous knowledgebase of features, which links features to studies (Devos and Valencia, 2000; Gerlt and Babbitt, annotated functions via annotation profiles. Bayes' 2000; Karp, 1998; Rost, 2002; Rost et al., 2003; Rost theorem is used to compute weights of the functions and Valencia, 1996; Tian and Skolnick, 2003; Whisstock assigned, using likelihoods based on the extracted and Lesk, 2003). To enhance accuracy of functional asfeatures. The description level of the assigned funcsignment, functional annotations can be inferred from tion is quantified by the ontology depth (from 1 = information on fold (Bowie et al., 1991; Holm and general to 9 = specific). Jackknife tests show 89% Sander, 1998; Jones et al., 1992), motif (Attwood et al., correct assignments at ontology depth 1 and 40% at 2003; Henikoff et al., 2000; Hulo et al., 2004), domain depth 9, with 93% coverage of 1507 distinct folded (Bateman et al., 2004), and orthology (Tatusov et al., proteins. Overall, about 70% of the assignments were 1997). Another class of annotation algorithms infers inferred correctly. This level of performance suggests protein function based on identification of functionally that ProKnow is a useful resource in functional assignificant residues. This class includes biodictionary sessments of novel proteins. "seqlets" mapping sequence patterns to their properties (Rigoutsos et al., 2002), evolutionary tracing (Land

References (38)

Pitman, J. (1997). Probability (New York: Springer).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. Prilc, A., Dominigues, F.S., Lackner, P., and Sippl, M.J. (2004).
Basic local alignment search tool. J. Mol. Biol. 215, 403-410. Wilma-automated annotation of protein sequences. Bioinformatics Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., 20, 127-128.
Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: Rigoutsos, I., Huynh, T., Floratos, A., Parida, L., and Platt, D. (2002).
a new generation of protein database search programs. Nucleic Dictionary-driven protein annotation. Nucleic Acids Res. 30, 3901- Acids Res. 25, 3389-3402.
Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Rost, B. (2002). Enzyme function less conserved than anticipated.
Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P., et al. J. Mol. Biol. 318, 595-608.
PRINTS and its automatic supplement, prePRINTS. Nucleic Rost, B., and Valencia, A. (1996). Pitfalls of protein sequence analy- Acids Res. 31, 400-402. sis. Curr. Opin. Biotechnol. 7, 457-461.
Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths- Rost, B., Liu, J., Nair, R., Wrzeszczynski, K.O., and Ofran, Y. (2003).
Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Automatic prediction of protein function. Cell. Mol. Life Sci. 60, et al. (2004). The Pfam protein families database. Nucleic Acids 2637-2650.
Schmitt, S., Kuhn, D., and Klebe, G. (2002). A new method to detect
Bork, P., and Gibson, T.J. (1996). Applying motif and profile related function among proteins independent of sequence and fold searches. Methods Enzymol. 266, 162-184. homology. J. Mol. Biol. 323, 387-406.
Bowie, J.U., Luethy, R., and Eisenberg, D. (1991). A method to iden- Strong, M., Graeber, T.G., Beeby, M., Pellegrini, M., Thompson, tify protein sequences that fold into a known three-dimensional M.J., Yeates, T.O., and Eisenberg, D. (2003). Visualization and inter- structure. Science 253, 164-170. pretation of protein networks in Mycobacterium tuberculosis based on hierarchical clustering of genome-wide functional linkage maps.
Cai, C.Z., Wang, W.L., Sun, L.A., and Chen, Y.Z. (2003). Protein Nucleic Acids Res. 31, 7099-7109.
function classification via support vector machine approach. Math. Biosci. 185, 111-122.
Tatusov, R.L., Koonin, E.V., and Lipman, D.J. (1997). The genomics perspective on protein families. Science 278, 631-637.
Devos, D., and Valencia, A. (2000). Practical limits of function pre- diction. Proteins 41, 98-107.
Tian, W., and Skolnick, J. (2003). How well is enzyme function con- served as a function of pairwise sequence identity? J. Mol. Biol.
Eddy, S.R., Mitchison, G., and Durbin, R. (1995). Maximum discrimi- 333, 863-882.
nation hidden Markov models of sequence consensus. J. Comput. Biol. 2, 9-23.
Todd, A.E., Orengo, C.A., and Thornton, J.M. (2002). Sequence and structural differences between enzyme and nonenzyme homologs.
Eisenberg, D., Marcotte, E.M., Xenarios, I., and Yeates, T.O. (2000). Structure 10, 1435-1451.
Protein function in the post-genomic era. Nature 405, 823-826.
Wallace, A.C., Laskowski, R.A., and Thornton, J.M. (1996). Deriva- Gene Ontology Consortium(2001). Creating the gene ontology re- tion of 3D coordinate templates for searching structural databases: source: design and implementation. Genome Res. 11, 1425-1433. application to Ser-His-Asp catalytic triads in the serine proteinases
Gerlt, J.A., and Babbitt, P.C. (2000). Can sequence determine func- and lipases. Protein Sci. 5, 1001-1013. tion? Genome Biol. 1, 1-10.
Wangikar, P.P., Tendulkar, A.V., Ramya, S., Mali, D.N., and Sarawagi, Gribskov, M., McLachlan, M., and Eisenberg, D. (1987). Profile S. (2003). Functional sites in protein families uncovered via an ob- analysis: detection of distantly related proteins. Proc. Natl. Acad. jective and automated graph theoretic approach. J. Mol. Biol. 326, Sci. USA 84, 4355-4358. 955-978.
Guo, J.T., Xu, D., Kim, D., and Xu, Y. (2003). Improving the perfor- Whisstock, J.C., and Lesk, A.M. (2003). Prediction of protein func- mance of DomainParser for structural domain partition using neural tion from protein sequence and structure. Q. Rev. Biophys. 36, network. Nucleic Acids Res. 31, 944-952. 307-340.
Henikoff, J.G., Greene, E.A., Pietrokovski, S., and Henikoff, S. Wise, E., Yew, W.S., Babbitt, P.C., Gerlt, J.A., and Rayment, I. (2000). Increased coverage of protein families with the blocks data- (2002). Homologous (β/α)8-barrel enzymes that catalyze unrelated base servers. Nucleic Acids Res. 28, 228-230. reactions: orotidine 5#-monophosphate decarboxylase and 3-keto- L-gulonate 6-phosphate decarboxylase. Biochemistry 41, 3861- Holm, L., and Sander, C. (1998). Touring the fold space with DALI/ 3869. FSSP. Nucleic Acids Res. 26, 316-319.
Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., and Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bor- Eisenberg, D. (2002). DIP, database of interacting proteins: a re- doli, L., Gattiker, A., De Castro, E., Bucher, P., and Bairoch, A. search tool for studying cellular networks of protein interactions.
Recent improvements to the PROSITE database. Nucleic Nucleic Acids Res. 30, 303-305.
Acids Res. 32, D134-D137.
Yao, H., Kristensen, D.M., Mihalek, I., Sowa, M.E., Shaw, C., Kim- Jeffery, C.J. (1999). Moonlighting proteins. Trends Biochem. Sci. 24, mel, M., Kavraki, L., and Lichtarge, O. (2003). An accurate, sensi- 8-11. tive, and scalable method to identify functional sites in protein
Jensen, L.J., Gupta, R., Staerfeldt, H.-H., and Brunak, S. (2003). structures. J. Mol. Biol. 326, 255-261.
Prediction of human protein function according to Gene Ontology Categories. Bioinformatics 19, 635-642.
Jones, D.T., Taylor, W.R., and Thornton, J.M. (1992). A new ap- proach to protein fold recognition. Nature 358, 86-89.
Karp, P.D. (1998). What do we know about sequence analysis and sequence databases. Bioinformatics 14, 753-754.
Kleywegt, G.J. (1999). Recognition of spatial motifs in protein struc- tures. J. Mol. Biol. 285, 1887-1897.
Landgraf, R., Xenarios, I., and Eisenberg, D. (2001). Three-dimen-

Background: Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived. Results: A procedure for detecting clusters of destabilizing residues in protein structures is presented. Individual residue contributions to protein stability are evaluated using detailed atomic models and a force-field successfully applied in computational protein design. The most destabilizing residues, and some of their closest neighbours, are clustered into destabilizing regions following a rigorous protocol. Our procedure is applied to high quality apo-structures of 63 unrelated proteins. The biologically relevant binding sites of these proteins were annotated using all available information, including structural data and literature curation, resulting in the largest hand-curated dataset of binding sites in proteins available to date. Comparing the destabilizing regions with the annotated binding sites in these proteins, we find that the overlap is on average limited, but significantly better than random. Results depend on the type of bound ligand. Significant overlap is obtained for most polysaccharide-and small ligand-binding sites, whereas no overlap is observed for most nucleic acid binding sites. These differences are rationalised in terms of the geometry and energetics of the binding site. Conclusion: We find that although destabilizing regions as detected here can in general not be used to predict binding sites in protein structures, they can provide useful information, particularly on the location of functional sites that bind polysaccharides and small ligands. This information can be exploited in methods for predicting function in protein structures with no known relatives. Our publicly available benchmark of hand-curated functional sites in proteins should help other workers derive and validate new prediction methods.

Inference of Protein Function from Protein Structure

Sign up for access to the world's latest research

Abstract

Related papers

References (38)

Related papers

Related topics

Cited by