Curated databases

Wang-chiew Tan

Outline

Curated databases

Wang-chiew Tan

2008, Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '08

Abstract

Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries -dictionaries, encyclopedias, gazetteers etc.are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area.

References (75)

REFERENCES
C. Aravindan and P. Baumgartner. Theorem proving techniques for view deletion in databases. J. Symb. Comput., 29(2):119-147, 2000.
A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement trEMBL. Nucleic Acids Research, 25(1):31-36, 1997.
V. Benzaken, G. Castagna, and A. Frisch. CDuce: an XML-centric general-purpose language. In ICFP 2003, pages 51-63. ACM, 2003.
G. J. Bex, W. Gelada, F. Neven, and S. Vansummeren. Learning deterministic regular expressions for the inference of schemas from XML data. In WWW 2008, 2008.
G. J. Bex, F. Neven, and J. V. den Bussche. DTDs versus XML Schema: a practical study. In WebDB 2004, pages 79-84, New York, NY, USA, 2004. ACM.
G. J. Bex, F. Neven, T. Schwentick, and K. Tuyls. Inference of concise DTDs from XML data. In VLDB 2006, pages 115-126, 2006.
G. J. Bex, F. Neven, and S. Vansummeren. Inferring XML schema definitions from XML data. In VLDB 2007, pages 998-1009, 2007.
D. Bhagwat, L. Chiticariu, G. Vijayvargiya, and W. Tan. An annotation management system for relational databases. VLDB Journal, 14(4):373-396, 2005.
S. Bowers, L. Delcambre, and D. Maier. Enriching documents in an information portal using superimposed schematics. In dg.o '02: Proceedings of the 2002 annual national conference on Digital government research, pages 1-6. Digital Government Research Center, 2002.
S. Bowers, T. McPhillips, B. Ludaescher, S. Cohen, and S. B. Davidson. A model for user-oriented data provenance in pipelined scientific workflows. In Moreau and Foster [59], pages 133-147.
R. J. Brachman and J. G. Schmolze. An overview of the KL-ONE knowledge representation system. Cognitive Science, 9(2):171-216, 1985.
P. Buneman. How to cite curated databases and how to make them citable. In SSDBM 2006, pages 195-203. IEEE Computer Society, 2006.
P. Buneman, A. Chapman, and J. Cheney. Provenance management in curated databases. In SIGMOD 2006, pages 539-550, 2006.
P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. In Database Theory -ICDT 2007, volume 4353 of LNCS, pages 209-223, 2007.
P. Buneman, S. B. Davidson, W. Fan, C. S. Hara, and W. Tan. Keys for XML. Computer Networks, 39(5):473-487, 2002.
P. Buneman, S. Khanna, K. Tajima, and W. Tan. Archiving scientific data. ACM Trans. Database Syst., 27(1):2-42, 2004.
P. Buneman, S. Khanna, and W. Tan. On the propagation of deletions and annotations through views. In PODS 2002, pages 150-158. ACM, 2002.
P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In Database Theory - ICDT 2001, volume 1973 of LNCS, pages 316-330, 2001.
P. Buneman, S. A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theor. Comp. Sci., 149(1):3-48, 1995.
Central Intelligence Agency. The world factbook. http://www.cia.gov/cia/publications/factbook/.
A. Chapman and H. V. Jagadish. Issues in building practical provenance systems. IEEE Data Eng. Bull., 30(4):38-43, 2007.
J. Cheney. Program slicing and data provenance. IEEE Data Eng. Bull., 30(4):22-28, 2007.
J. Cheney. Lux: A lightweight, statically typed XML update language. In ACM SIGPLAN Workshop on Programming Language Technology and XML (PLAN-X 2007), pages 25-36, 2007.
J. Cheney, A. Ahmed, and U. A. Acar. Provenance as dependency analysis. In Database Programming Languages - DBPL 2007, volume 4797 of LNCS, pages 139-153. Springer, 2007.
L. Chiticariu and W. Tan. Debugging schema mappings with routes. In VLDB 2006, pages 79-90, 2006.
L. Chiticariu, W. Tan, and G. Vijayvargiya. DBNotes: A post-it system for relational databases based on provenance. In SIGMOD 2005, pages 942-944, 2005. (Demonstration paper).
G. Cong, W. Fan, and F. Geerts. Annotation propagation revisited for key preserving views. In CIKM 2006, pages 632-641. ACM, 2006.
Y. Cui and J. Widom. Run-time translation of view tuple deletions using data lineage. Technical report, Stanford University, 2001.
Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst., 25(2):179-227, 2000.
N. Dalvi and D. Suciu. Management of probabilistic data: foundations and challenges. In PODS 2007, pages 1-12. ACM, 2007.
R. D. Dowell, R. M. Jokerst, A. Day, S. R. Eddy, and L. Stein. The distributed annotation system. BMC Bioinformatics, 2:7, 2001.
J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making Data Structures Persistent. J. Comput. Syst. Sci., 38(1):86-124, 1989.
W. Fan. Dependencies revisited for improving data quality. In PODS 2008. ACM, June 2008. These proceedings.
K. Fisher, D. Walker, K. Q. Zhu, and P. White. From dirt to shovels: fully automatic tool generation from ad hoc data. In POPL 2008, pages 421-434. ACM, 2008.
J. N. Foster, T. Green, and V. Tannen. Annotated XML: Queries and provenance. In PODS 2008. ACM, June 2008. These proceedings.
M. Y. Galperin. The molecular biology database collection: 2008 update. Nucleic Acids Research, 36, 2008.
D. Gao and R. T. Snodgrass. Temporal slicing in the evaluation of XML queries. In VLDB 2003, pages 632-643, 2003.
H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. D. Ullman, V. Vassalos, and J. Widom. The TSIMMIS approach to mediation: Data models and languages. J. Intell. Inf. Syst., 8:117-132, 1997.
P. Gardner, G. Smith, M. Wheelhouse, and U. Zarfaty. Local hoare reasoning about DOM. In PODS 2008, June 2008. These proceedings.
F. Geerts, A. Kementsietsidis, and D. Milano. MONDRIAN: Annotating and querying databases through colors and blocks. In ICDE 2006, page 82. IEEE Computer Society, 2006.
F. Geerts and J. Van den Bussche. Relational completeness of query languages for annotated databases. In Database Programming Languages -DBPL 2007, volume 4797 of LNCS, pages 127-137, 2007.
W. Gelade, W. Martens, and F. Neven. Optimizing schema languages for XML: Numerical constraints and interleaving. In Database Theory -ICDT 2007, volume 4353 of LNCS, pages 269-283. Springer, 2007.
G. Ghelli, D. Colazzo, and C. Sartiani. Efficient inclusion for a class of XML types with interleaving and counting. In Database Programming Languages: DBPL 2007, volume 4797 of LNCS, pages 231-245, 2007.
T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS 2007, pages 31-40. ACM Press, 2007.
J. Hidders, N. Kwasnikowska, J. Sroka, J. Tyszkiewicz, and J. V. den Bussche. DFL: A dataflow language based on petri nets and nested relational calculus. Inf. Syst., 33(3):261-284, 2008.
H. Hosoya and B. C. Pierce. XDuce: A statically typed xml processing language. ACM Trans. Interet Technol., 3(2):117-148, 2003.
T. Imielinski and J. Witold Lipski. Incomplete information in relational databases. J. ACM, 31(4):761-791, 1984.
IUPHAR receptor database. http://www.iuphar-db.org.
S. Jones, D. Abbott, , and S. Ross. Risk Assessment for AHDS Performing Arts Collections: A Response to the Withdrawal of Core Funding. Technical report, Glasgow, December 2007.
S. Kumar and T. Bednar. Oracle9i flashback query. Technical report, Oracle Corporation, 2001.
T. Lee, S. Bressan, and S. E. Madnick. Source attribution for querying against semi-structured documents. In First Workshop on Web Information and Data Management, pages 33-39. ACM, 1998.
H. Liefke and S. B. Davidson. Specifying updates in biomedical databases. In SSDBM 1999, pages 44-53. IEEE, 1999.
D. Lomet, R. Barga, M. F. Mokbel, G. Shegalov, R. Wang, and Y. Zhu. Immortal DB: transaction time support for SQL server. In SIGMOD 2005, pages 939-941. ACM, 2005.
B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger-Frank, M. Jones, E. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice & Experience, 18(10):1039-1065, 2006.
P. Maniatis, M. Roussopoulos, T. J. Giuli, D. S. H. Rosenthal, and M. Baker. The LOCKSS peer-to-peer digital preservation system. ACM Trans. Comput. Syst., 23(1):2-50, 2005.
A. J. Mayer and L. J. Stockmeyer. Word problems-this time with interleaving. Inf. Comput., 115(2):293-311, 1994.
D. L. McGuinness, R. Fikes, J. Rice, and S. Wilder. The Chimaera ontology environment. In Proceedings of Twelfth Conference on Innovative Applications of Artificial Intelligence, pages 1123-1124. AAAI Press, 2000.
V. A. McKusick. OMIM -online mendelian inheritance in man. www.ncbi.nlm.nih.gov/omim/.
L. Moreau and I. T. Foster, editors. Provenance and Annotation of Data -IPAW 2006, volume 4145 of LNCS. Springer, 2006.
H. Müller, P. Buneman, and I. Koltsidas. XArch: Archiving scientific and reference data. In SIGMOD 2008, June 2008. Demonstration Paper. To appear.
N. F. Noy, M. Sintek, S. Decker, M. Crubezy, R. W. Fergerson, and M. A. Musen. Creating semantic web contents with Protege-2000. IEEE Intelligent Systems, 16(2):60-71, 2001.
T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(17):3045-3054, 2004.
Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina. Object fusion in mediator systems. In VLDB 1996, pages 413-424. Morgan Kaufmann, 1996.
Plutarch. Vita Thesei 22-23.
D. Rémy. Type inference for records in a natural extension of ML. In Theoretical aspects of object-oriented programming. MIT Press, 1994.
A. D. Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working models for uncertain data. In ICDE 2006, page 7. IEEE Computer Society, 2006.
R. T. Snodgrass. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann, July 1999.
L. D. Stein and J. Thierry-Mieg. AceDB: A genome database management system. Computing in Science and Engg., 1(3):44-52, 1999.
W. Tan. Containment of relational queries with annotation propagation. In Database Programming Languages -DBPL 2003, volume 2921 of LNCS, pages 37-53. Springer, 2003.
The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics, 25(1):25-29, May 2000.
F. Wang and C. Zaniolo. Temporal queries in XML document archives and web warehouses. In TIME, pages 47-55. IEEE Computer Society, 2003.
Y. R. Wang and S. E. Madnick. A polygen model for heterogeneous database systems: The source tagging perspective. In VLDB 1990, pages 519-538. Morgan Kaufmann, 1990.
M. Weiser. Program slicing. In ICSE, pages 439-449, Piscataway, NJ, USA, 1981. IEEE Press.
G. Yang, I. V. Ramakrishnan, and M. Kifer. On the complexity of schema inference from web pages in the presence of nullable data attributes. In CIKM 2003, pages 224-231. ACM, 2003.

Curated databases

Sign up for access to the world's latest research

Abstract

Related papers

References (75)

Related papers

Related topics