Academia.eduAcademia.edu

Outline

Use of designed sequences in protein structure recognition

2018, Biology Direct

https://doi.org/10.1186/S13062-018-0209-6

Abstract

Background: Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. Results: We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. Reviewers: This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

References (43)

  1. Joachimiak A. High-throughput crystallography for structural genomics. Curr Opin Struct Biol. 2009;19:573-84.
  2. Punjani A, Rubinstein JL, Fleet DJ, Brubaker MA. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat Methods. 2017;14: 290-6.
  3. Carpenter EP, Beis K, Cameron AD, Iwata S. Overcoming the challenges of membrane protein crystallography. Curr Opin Struct Biol. 2008;18:581-6.
  4. Acharya KR, Lloyd MD. The advantages and limitations of protein crystal structures. Trends Pharmacol Sci. 2005;26:10-4.
  5. Murzin AG. How far divergent evolution goes in proteins. Curr Opin Struct Biol. 1998;8:380-7.
  6. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333-51.
  7. Eisenhaber F. A decade after the first full human genome sequencing: when will we understand our own genome? J Bioinforma Comput Biol. 2012;10:1271001.
  8. Koehl P, Levitt M. No title. Nat Struct Biol. 1999;6:108-11.
  9. Taylor WR. Protein structure prediction from sequence. Comput Chem. 1993;17:117-22.
  10. Schmidt am Busch M, Mignon D, Simonson T. Computational protein design as a tool for fold recognition. Proteins. 2009;77:139-58.
  11. Larson SM, England JL, Desjarlais JR, Pande VS. Thoroughly sampling sequence space: large-scale protein design of structural ensembles. Protein Sci. 2009;11:2804-13.
  12. Ovchinnikov S, Kim DE, Wang RY-R, Liu Y, DiMaio F, Baker D. Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins. 2016;84:67-75.
  13. Sandhya S, Mudgal R, Kumar G, Sowdhamini R, Srinivasan N. Protein sequence design and its applications. Curr Opin Struct Biol. 2016;37:71-80.
  14. Koehl P, Levitt M. Improved recognition of native-like protein structures using a family of designed sequences. Proc Natl Acad Sci. 2002;99:691-6.
  15. Dai L, Yang Y, Kim HR, Zhou Y. Improving computational protein design by using structure-derived sequence profile. Proteins. 2010;78:2338-48.
  16. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389-402.
  17. Sandhya S, Chakrabarti S, Abhinandan KR, Sowdhamini R, Srinivasan N. Assessment of a rigorous transitive profile based search method to detect remotely similar proteins. J Biomol Struct Dyn. 2005;23:283-98.
  18. Edgar RC, Sjolander K. COACH: profile-profile alignment of protein families using hidden Markov models. Bioinformatics. 2004;20:1309-18.
  19. Sadreyev RI, Baker D, Grishin NV. Profile-profile comparisons by COMPASS predict intricate homologies between protein families. Protein Sci. 2009;12: 2262-72.
  20. Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005;21:951-60.
  21. Sandhya S, Mudgal R, Jayadev C, Abhinandan KR, Sowdhamini R, Srinivasan N. Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. Mol BioSyst. 2012;8: 2076-84.
  22. Mudgal R, Sowdhamini R, Chandra N, Srinivasan N, Sandhya S. Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability. J Mol Biol. 2014;426:962-79.
  23. Schaffer AA. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 2001;29:2994-3005.
  24. Johnson LS, Eddy SR, Portugaly E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinformatics. 2010;11:431.
  25. Wang Y, Virtanen J, Xue Z, Zhang Y. I-TASSER-MR: automated molecular replacement for distant-homology proteins using iterative fragment assembly and progressive sequence truncation. Nucleic Acids Res. 2017;45: W429-34.
  26. Kelley LA, Mezulis S, Yates CM, Wass MN, MJE S. The Phyre2 web portal for protein modeling , prediction and analysis. Nat Protoc. 2015;10:845-58.
  27. Xu J, Li M, Kim D, Xu Y. Raptor: optimal protein threading by linear programming. J Bioinforma Comput Biol. 2003;1:95-117.
  28. Xu Y, Xu D. Protein threading using PROSPECT: design and evaluation. Proteins. 2000;40:343-54.
  29. Mudgal R, Sandhya S, Kumar G, Sowdhamini R, Chandra NR, Srinivasan N. NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection. Nucleic Acids Res. 2015;43:D300-5.
  30. Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44:D279-85.
  31. Mudgal R, Sandhya S, Chandra N, Srinivasan N. De-DUFing the DUFs: deciphering distant evolutionary relationships of domains of unknown function using sensitive homology detection methods. Biol Direct. 2015;10:38.
  32. Hubbard TJP, Ailey B, Brenner SE, Murzin AG, Chothia C. SCOP: a structural classification of proteins database. Nucleic Acids Res. 1999;27:254-6.
  33. Chandonia J-M, Fox NK, Brenner SE. SCOPe: manual curation and artifact removal in the structural classification of proteins -extended database. J Mol Biol. 2017;429:348-55.
  34. Velankar S, Dana JM, Jacobsen J, van Ginkel G, Gane PJ, Luo J, et al. SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res. 2012;41:D483-9.
  35. Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:29-37.
  36. Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol. 2011;7: e1002195.
  37. Xu Q, Dunbrack RL. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB. Bioinformatics. 2012;28: 2763-72.
  38. Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33:2302-9.
  39. Finn RD, Mistry J, Schuster-Böckler B, Griffiths-Jones S, Hollich V, Lassmann T, et al. Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34:D247-51.
  40. Halaby DM, Poupon A, Mornon J-P. The immunoglobulin fold family: sequence analysis and 3D structure comparisons. Protein Eng. 1999;12:563-71.
  41. Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986;5:823-6.
  42. Illergård K, Ardell DH, Elofsson A. Structure is three to ten times more conserved than sequence-a study of structural response in protein cores. Proteins. 2009;77:499-508.
  43. Sadowski MI, Jones DT. The sequence-structure relationship and protein function prediction. Curr Opin Struct Biol. 2009;19:357-62.