Academia.eduAcademia.edu

Outline

PSS-SQL: Protein Secondary Structure - Structured Query Language

2010

https://doi.org/10.1109/IEMBS.2010.5627303

Abstract

Secondary structure representation of proteins provides important information regarding protein general construction and shape. This representation is often used in protein similarity searching. Since existing commercial database management systems do not offer integrated exploration methods for biological data e.g. at the level of the SQL language, the structural similarity searching is usually performed by external tools. In the paper, we present our newly developed PSS-SQL language, which allows searching a database in order to identify proteins having secondary structure similar to the structure specified by the user in a PSS-SQL query. Therefore, we provide a simple and declarative language for protein structure similarity searching.

References (19)

  1. I. Eidhammer, J. Inge, W.R. Taylor, Protein Bioinformatics: An Algorithmic Approach to Sequence and Structure Analysis. John Wiley & Sons, 2004.
  2. J.P. Allen, Biophysical Chemistry. Wiley-Blackwell, 2008.
  3. C. Branden, J. Tooze, Introduction to Protein Structure. Garland, 1991.
  4. R.E. Dickerson, I. Geis, The Structure and Action of Proteins. 2nd ed. Benjamin/Cummings, Redwood City, Calif. Concise, 1981.
  5. T.E. Creighton, Proteins: Structures and molecular properties. 2 nd ed. Freeman, San Francisco, 1993.
  6. J.F. Gibrat, T. Madej, S.H. Bryant, "Surprising similarities in struc- ture comparison," Curr Opin Struct Biol, v. 6(3), pp. 377 385, 1996.
  7. J. Shapiro, D. Brutlag, "FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web," Nucleic Acids Res., v. 32, pp. 536-41, 2004.
  8. T. Can, Y.F. Wang, "CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features," in Proc. of the 2003 IEEE Bioinformatics Conference, pp. 169 179, 2003.
  9. J. Yang, "Comprehensive description of protein structures using protein folding shape code," Proteins, v. 71(3), pp. 1497-518, 2008.
  10. C.J. Date, Introduction to Database Systems. (8th Edition). Addison Wesley, 2003.
  11. S. Stephens, J.Y. Chen, Sh. Thomas, "ODM BLAST: Sequence Homology Search in the RDBMS," Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2004.
  12. L. Hammel, J.M. Patel, "Searching on the secondary structure of protein sequences," in Proc. of the 28th international conference on Very Large Data Bases, Hong Kong, China, pp. 634-645, 2002.
  13. S. Tata, et al., "Declarative Querying for Biological Sequences," in Proc. of the 22nd International Conference on Data Engineering, IEEE Computer Society, pp. 87-98, 2006.
  14. Y. Wang, R. Sunderraman, H. Tian, "A Domain Specific Data Mana- gement Architecture for Protein Structure Data," in Proc. of the 28th IEEE EMBS Annual International Conference, New York City, USA, IEEE, pp. 5751-5754, 2006.
  15. H.M. Berman, et al., "The Protein Data Bank," Nucleic Acids Res., v. 28, pp. 235-242, 2000.
  16. A.G. Murzin, et al., "SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures," J. Mol. Biol., v. 247, pp. 536-540, 1995.
  17. C.A. Orengo, et al., "CATH -A hierarchic classification of protein domain structures," Structure, v. 5. No 8., pp.1093-1108, 1997.
  18. T.F. Smith, M.S. Waterman, "Identification of common molecular subsequences," J Mol Biol, v. 147, pp. 195-197, 1981.
  19. D. Wieczorek, B. Małysiak-Mrozek, D. Mrozek, "Query language for protein molecular structures", Studia Informatica, v. 31, No 2A(89), pp. 267-287, 2010.