Abstract
We propose a logical framework, based on Datalog, to study the foundations of querying JSON data. The main feature of our approach, which we call J-Logic, is the emphasis on paths. Paths are sequences of keys and are used to access the tree structure of nested JSON objects. J-Logic also features "packing" as a means to generate a new key from a path or subpath. J-Logic with recursion is computationally complete, but many queries can be expressed without recursion, such as deep equality. We give a necessary condition for queries to be expressible without recursion. Most of our results focus on the deterministic nature of JSON objects as partial functions from keys to values. Predicates defined by J-Logic programs may not properly describe objects, however. Nevertheless we show that every object-toobject transformation in J-Logic can be defined using only objects in intermediate results. Moreover we show that it is decidable whether a positive, nonrecursive J-Logic program always returns an object when given objects as inputs. Regarding packing, we show that packing is unnecessary if the output does not require new keys. Finally, we show the decidability of query containment for positive, nonrecursive J-Logic programs.
References (38)
- REFERENCES
- S. Abiteboul and R. Hull. Data functions, datalog and negation. In H. Boral and P.A. Larson, editors, 1988
- Proceedings SIGMOD International Conference on Management of Data, pages 143-153. ACM Press, 1988.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
- S. Abiteboul and P.C. Kanellakis. Object identity as a query language primitive. Journal of the ACM, 45(5):798-842, 1998.
- S. Abiteboul and V. Vianu. Procedural languages for database queries and updates. Journal of Computer and System Sciences, 41(2):181-229, 1990.
- S. Abiteboul and V. Vianu. Datalog extensions for database queries and updates. Journal of Computer and System Sciences, 43(1):62-124, 1991.
- P.C. Arocena, B. Glavic, and R.J. Miller. Value invention in data exchange. In Proceedings 2013 SIGMOD Conference, pages 157-168. ACM, 2013.
- P. Barceló and R. Pichler, editors. Datalog in Academia and Industry: Second International Workshop, Datalog 2.0, volume 7494 of Lecture Notes in Computer Science. Springer, 2012.
- C. Beeri and M.Y. Vardi. A proof procedure for data dependencies. Journal of the ACM, 31(4):718-741, 1984.
- A. Bonner and G. Mecca. Sequences, Datalog, and transducers. Journal of Computer and System Sciences, 57:234-259, 1998.
- A.J. Bonner and G. Mecca. Querying sequence databases with transducers. Acta Informatica, 36:511-544, 2000.
- P. Buneman, A. Deutsch, and W.-C. Tan. A deterministic model for semi-structured data. http://users.soe.ucsc.edu/˜tan/papers/1998/icdt.pdf. Presented at the Workshop on Query Processing for Semistructured Data and Non-standard Data Formats, Jerusalem, Israel, January 13, 1999.
- P. Buneman, S.A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theoretical Computer Science, 149(1):3-48, 1995.
- L. Cabibbo. The expressive power of stratified logic programs with value invention. Information and Computation, 147(1):22-56, 1998.
- A.K. Chandra and D. Harel. Computable queries for relational data bases. Journal of Computer and System Sciences, 21(2):156-178, 1980.
- O. de Moor, G. Gottlob, T. Furche, and A. Sellers, editors. Datalog Reloaded: First International Workshop, Datalog 2010, volume 6702 of Lecture Notes in Computer Science. Springer, 2011.
- The JSON data interchange format. Standard ECMA-404, October 2013.
- D. Florescu and G. Fourny. JSONiq: The history of a query language. IEEE Internet Computing, 17(5):86-90, 2013.
- G. Fourny. JSONiq, the SQL of NoSQL. http://www.28.io/jsoniq-the-sql-of-nosql. Retrieved 25 November 2016.
- D.D. Freydenberger and D. Reidenbach. Bad news on decision problems for patterns. Information and Computation, 208(1):83-96, 2010.
- T. Furche, G. Gottlob, B. Neumayr, and E. Sallinger. Data wrangling for big data: Towards a lingua franca for data wrangling. In R. Pichler and A. Soares da Silva, editors, Proceedings 10th Alberto Mendelzon International Workshop on Foundations of Data Management, volume 1644 of CEUR Workshop Proceedings, 2016.
- H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, V. Vassalos, and J. Widom. The TSIMMIS approach to mediation: data models and languages. Journal of Intelligent Information Systems, 8(2):117-132, 1997.
- M. Gyssens, J. Paredaens, J. Van den Bussche, and D. Van Gucht. A graph-oriented object database model. IEEE Transactions on Knowledge and Data Engineering, 6(4):572-586, 1994.
- S.S. Huang, T.J. Green, and B.T. Loo. Datalog and emerging applications: an interactive tutorial. In Proceedings 2011 ACM SIGMOD International Conference on Management of Data, pages 1213-1216. ACM Press, 2011.
- R. Hull and M. Yoshikawa. ILOG: Declarative creation and manipulation of object identifiers. In D. McLeod, R. Sacks-Davis, and H. Schek, editors, Proceedings of the 16th International Conference on Very Large Data Bases, pages 455-468. Morgan Kaufmann, 1990.
- M. Kifer and J. Wu. A logic for programming with complex objects. Journal of Computer and System Sciences, 47(1):77-120, 1993.
- A. Klug and R. Price. Determining view dependencies using tableaux. ACM Transactions on Database Systems, 7:361-380, 1982.
- G. Kuper and M. Vardi. The logical data model. ACM Transactions on Database Systems, 18(3):379-413, 1993.
- G. Mecca and A.J. Bonner. Query languages for sequence databases: Termination and complexity. IEEE Transactions on Knowledge and Data Engineering, 13(3):519-525, 2001.
- K.W. Ong, Y. Papakonstantinou, and R. Vernoux. The SQL++ query language: Configurable, unifying and semi-structured. arXiv:1405.3631, 2015.
- J. Paredaens and D. Van Gucht. Converting nested algebra expressions into flat algebra expressions. ACM Transactions on Database Systems, 17(1):65-93, 1992.
- F. Pezoa, J.L. Reutter, F. Suarez, M. Ugarte, and D. Vrgoč. Foundations of JSON Schema. In Proceedings 25th International Conference on World Wide Web, pages 263-273, 2016.
- A. Poggi et al. Linking data to ontologies. Journal on Data Semantics, 10:133-173, 2008.
- K. Tajima. Schemaless semistructured data revisited: Reinventing Peter Buneman's deterministic semistructured data model. In V. Tannen, L. Wong, et al., editors, In Search of Elegance in the Theory and Practice of Computation, volume 8000 of Lecture Notes in Computer Science, pages 466-482. Springer, 2013.
- J. Van den Bussche and J. Paredaens. The expressive power of complex values in object-based data models. Information and Computation, 120:220-236, 1995.
- J. Van den Bussche, D. Van Gucht, M. Andries, and M. Gyssens. On the completeness of object-creating database transformation languages. Journal of the ACM, 44(2):272-319, 1997.
- T.L. Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In Proceedings 17th International Conference on Database Theory, pages 96-106, 2014.