Abstract
The Open Provenance Model (OPM) is a community data model for provenance that is designed to facilitate the meaningful interchange of provenance information between systems. Underpinning OPM, is a notion of directed graph, used to represent data products and processes involved in past computations, and dependencies between them; it is complemented by inference rules allowing new dependencies to be derived. The Open Provenance Model was designed from requirements captured in two "Provenance Challenges", and tested during the third: these challenges were international, multi-disciplinary activities aiming to exchange provenance information between multiple systems and query it. The design of OPM was mostly driven by practical and pragmatic considerations. The purpose of this paper is to formalize the theory underpinning this data model. Specifically, this paper proposes a temporal semantics for OPM graphs, defined in terms of a set of ordering constraints between time-points associated with OPM constructs. OPM inferences are characterized with respect to this temporal semantics, and a novel set of patterns is introduced to establish soundness and completeness properties. Building on this novel foundation, the paper proposes new definitions for graph algebraic operations, graph refinement and the notion of account, by which multiple descriptions of a same execution are allowed to co-exist in a same graph. Overall, this paper provides a strong theoretical underpinning to a data model being adopted by a community of users that help its disambiguation and promote inter-operability.
References (33)
- Peter Buneman, James Cheney, Wang-Chiew Tan, and Stijn Vansummeren. Curated databases. In PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 1-12, New York, NY, USA, 2008. ACM.
- J. Cheney. Causality and the semantics of provenance. In S.B. Cooper, E. Kashefi, and P. Panangaden, editors, Proceedings 6th Workshop on De- velopments in Computational Models, volume 26 of EPTCS, pages 63-74, 2010.
- James Cheney, Laura Chiticarius, and Wang-Chiew Tan. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, 1(4):379-474, 2009.
- James Cheney, Stephen Chong, Nate Foster, Margo Seltzer, and Stijn Van- summeren. Provenance: A Future History. In Companion to the 24th An- nual ACM SIGPLAN Conference on Object-Oriented Programming Lan- guages, Systems, Languages, and Applications: Onward! Session, pages 957-964, 2009.
- Li Ding, James Michaelis, Jim McCusker, and Deborah L. McGuinness. Linked provenance data: A semantic Web-based approach to interoperable workflow traces. Future Generation Computer Systems, 2010. In Press, http://dx.doi.org/10.1016/j.future.2010.10.011.
- Andre Freitas, Sean O'Riain, Edward Curry, and Tomas Knap. W3P: Building an OPM based provenance model for the Web. Future Generation Computer Systems, 2010. In Press, http://dx.doi.org/10.1016/j.future.2010.10.010.
- Yolanda Gil, Ewa Deelman, Mark Ellisman, Thomas Fahringer, Geoffrey Fox, Dennis Gannon, Carole Goble, Miron Livny, Luc Moreau, and Jim Myers. Examining the challenges of scientific workflows. IEEE Computer, 40(12):26-34, 2007.
- Paul Groth and Luc Moreau. Representing Distributed Systems Using OPM. Future Generation Computer Systems, December 2010. In Press, http://dx.doi.org/10.1016/j.future.2010.10.001.
- Joseph Y. Halpern and Judea Pearl. Causes and Explanations: A Structural-Model Approach. Part I: Causes. Br J Philos Sci, 56(4):843-887, 2005.
- Natalia Kwasnikowska and Jan Van den Bussche. Mapping the NRC Dataflow Model to the Open Provenance Model. In Juliana Freire, David Koop, and Luc Moreau, editors, Second International Provenance and An- notation Workshop, IPAW'2008, volume 5272 of Lecture Notes in Computer Science, pages 3-16. Springer, 2008.
- Leslie Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7):558-565, 1978.
- Chunhyeok Lim, Shiyong Lu, Artem Chebotko, and Farshad Fotouhi. Stor- ing, Reasoning, and Querying OPM-Compliant Scientific Workflow Prove- nance Using Relational Databases. Future Generation Computer Systems, 2010. In Press, http://dx.doi.org/10.1016/j.future.2010.10.013.
- Friedemann Mattern. Virtual time and global states of distributed systems. In M. Cosnard et al., editors, Proceedings of the International Workshop on Parallel and Distributed Algorithms, pages 215-226, Amsterdam, 1989. Elsevier Science Publishers.
- Robert E. McGrath and Joe Futrelle. Reasoning about provenance with OWL and SWRL rules. In AAAI Spring Symposium 2008 "AI Meets Busi- ness Rules and Process Management", 2008.
- Simon Miles. Mapping Attribution Metadata to the Open Prove- nance Model. Future Generation Computer Systems, 2010. In Press, http://dx.doi.org/10.1016/j.future.2010.10.007.
- Simon Miles, Paul Groth, Miguel Branco, and Luc Moreau. The require- ments of using provenance in e-Science experiments. Journal of Grid Com- puting, 5(1):1-25, 2007.
- Paolo Missier and Carole Goble. Workflows to Open Provenance Graphs, round-trip. Future Generation Computer Systems, 2010. In Press, http://dx.doi.org/10.1016/j.future.2010.10.012.
- Luc Moreau. Provenance-Based Reproducibility in the Seman- tic Web. Technical report, University of Southampton, 2010. http://eprints.ecs.soton.ac.uk/21554/.
- Luc Moreau. The Foundations for Provenance on the Web. Foundations and Trends in Web Science, 2(2-3):99-241, November 2010.
- Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle, Yolanda Gil, Paul Groth, Natalia Kwasnikowska, Simon Miles, Paolo Missier, Jim Myers, Beth Plale, Yogesh Simmhan, Eric Stephan, and Jan Van den Bussche. The Open Provenance Model core specification (v1.1). Future Generation Computer Systems, 2010. In Press, http://dx.doi.org/10.1016/j.future.2010.07.005.
- Luc Moreau, Bertram Ludaescher, Ilkay Altintas, Roger S. Barga, Shawn Bowers, Steven Callahan, George Chin Jr., Ben Clifford, Shirley Cohen, Sarah Cohen-Boulakia, Susan Davidson, Ewa Deelman, Luciano Digiampi- etri, Ian Foster, Juliana Freire, James Frew, Joe Futrelle, Tara Gib- son, Yolanda Gil, Carole Goble, Jennifer Golbeck, Paul Groth, David A. Holland, Sheng Jiang, Jihie Kim, David Koop, Ales Krenek, Timothy McPhillips, Gaurang Mehta, Simon Miles, Dominic Metzger, Steve Munroe, Jim Myers, Beth Plale, Norbert Podhorszki, Varun Ratnakar, Emanuele Santos, Carlos Scheidegger, Karen Schuchardt, Margo Seltzer, Yogesh L. Simmhan, Claudio Silva, Peter Slaughter, Eric Stephan, Robert Stevens, Daniele Turi, Huy Vo, Mike Wilde, Jun Zhao, and Yong Zhao. The first provenance challenge. Concurrency and Computation: Practice and Expe- rience, 20(5):409-418, 2008.
- Luc Moreau (editor), Li Ding, Joe Futrelle, Daniel Garijo Verdejo, Paul Groth, Mike Jewell, Simon Miles, Paolo Missier, Jeff Pan, and Jun Zhao. Open Provenance Model (OPM) OWL Specification. Technical report, openprovenance.org, 2010. http://openprovenance.org/model/opmo.
- James Myers. I Think Therefore I Am Someone Else: Understanding the confusion of granularity with Continuant/Occurrent and Related Perspec- tive Shifts. In Provenance and Annotation of Data and Processes, volume 6378 of Lecture Notes in Computer Science, pages 292-294. Springer Berlin / Heidelberg, 2010.
- PREMIS Working Group. Data Dictionary for Preservation Meta- data -Final Report of the PREMIS Working Group. Technical re- port, Preservation Metadata: Implementation Strategies (PREMIS), 2005. http://www.oclc.org/research/projects/pmwg/premis-final.pdf.
- Christoph Ringelstein and Steffen Staab. Papel: A language and model for provenance-aware policy definition and execution. In Richard Hull, Jan Mendling, and Stefan Tai, editors, Business Process Management, volume 6336 of Lecture Notes in Computer Science, pages 195-210. Springer Berlin / Heidelberg, 2010.
- Satya Sahoo, Paul Groth, Olaf Hartig, Simon Miles, Sam Coppens, James Myers, Yolanda Gil, Luc Moreau, Jun Zhao, Michael Panzer, and Daniel Garijo. Provenance Vocabulary Mappings. Technical report, W3C, 2010. http://www.w3.org/2005/Incubator/prov/wiki/Provenance Vocabulary Mappings.
- Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. A survey of data provenance in e-Science. SIGMOD Record, 34(3):31-36, 2005.
- Gerard Tel. Introduction to Distributed Algorithms. Cambridge University Press, 1994.
- The Provenance Challenge Wiki. http://twiki.ipaw.info/bin/view/Challenge/.
- J.D. Ullman. Principles of Database and Knowledge-Base Systems, vol- ume II. Computer Science Press, 1989.
- W3C Incubator Activity, Provenance Incubator Group Charter. http://www.w3.org/2005/Incubator/prov/charter.
- Jim Woodcock and Jim Davies. Using Z. Specification, Refinement, and Proof. Prentice Hall, 1996.
- Jun Zhao. The Open Provenance Model Vocabulary Specification. Techni- cal report, University of Oxford, 2010. http://purl.org/net/opmv/ns.