Building an XML document warehouse
2013, Journal of Decision Systems
https://doi.org/10.1080/12460125.2013.780322Abstract
Data Warehouses and OLAP (On Line Analytical Processing) technologies are dedicated to analyzing structured data issued from organizations' OLTP (On Line Transaction Processing) systems. Furthermore, in order to enhance their decision support systems, these organizations need to explore XML (eXtensible Markup Language) documents as an additional and important source of unstructured data. In this context, this paper addresses the warehousing of document-centric XML documents. More specifically, we propose a two-method approach to build Document Warehouse conceptual schemas. The first method is for the unification of XML document structures; it aims to elaborate a global and generic view for a set of XML documents belonging to the same domain. The second method is for designing multidimensional galaxy schemas for Document Warehouses. Les entrepôts de données et les technologies d'analyses en ligne OLAP («On Line Analytical Processing») sont dédiés à l'analyse des données structurées issues des systèmes OLTP («On Line Transaction Processing») des organisations. De plus, ces organisations ont besoin d'explorer des documents XML, comme une importante source additionnelle de données non structurées, à des fins de prise de décisions. Dans ce contexte, cet article s'intéresse à l'entreposage de documents XML orienté-document. Plus particulièrement, nous proposons une approche composées de deux méthodes pour la construction d'un schéma conceptuel d'un Entrepôt de Documents. La première méthode est pour l'unification des structures de documents XML ; elle vise à élaborer une vue globale et générique pour un ensemble de documents XML appartenant à un même domaine. La seconde méthode est pour la modélisation multidimensionnelle en galaxie de schémas d'Entrepôts de Documents.
References (26)
- 4.3. Hierarchy constraints Ch1: Hierarchical root: All hierarchies of a dimension D start from the identifier of D (Ben-Abdallah et al., 2009).
- Ch2: Exclusive hierarchies: Any dimension having the minimal hierarchy 6 must not have other hierarchies (Ben-Abdallah et al., 2009). Ch3: Non-isolated attribute: In a dimension D, each attribute must belong to at least one hierarchy of D (Ben-Abdallah et al., 2009). Ch4: Non-empty hierarchy: Within a dimension D, a hierarchy must contain at least two parameters: the identifier of D and the All parameter (Ben-Abdallah et al., 2009). Ch5: Roll-up: All the parameters of a hierarchy, except the All parameter, have at least a parent (Hurtado et al., 2002). Ch6: Acyclicity: Any parameter, except All, cannot be parent and child for the same parameter by transitivity (Ghozzi et al., 2003; Hurtado et al., 2002). References
- Aouabed H., Ben Messaoud, I., Feki, J., and Zurfluh, G. (2012). USD: Un outil d'unification des struc- tures des documents XML. In N. Benblidia & S. Oukidkhouas (Eds.), Sixième Atelier sur les Systè- mes Décisionnels (ASD'12) (pp. 83-94) Blida, Algeria.
- Ben-Abdallah, H., Feki, J., and Ben Abdallah, M., (2009). A multidimensional pattern based approach for the design of data marts. In D. Taniar (Ed.), Progressive methods in data warehousing and busi- ness intelligence: Concepts and competitive analytics, Volume 3 of the Advances in Data Warehous- ing and Mining Series (pp. 172-192). Australia: IGI Global.
- Ben Messaoud, I., Feki, J., Khrouf, K., and Zurfluh, G. (2011a). "Unification of XML document struc- tures for Document Warehouse (DocW). In 13th International Conference on Enterprise Information Systems (ICEIS'11) (pp. 85-94). Beijing, China.
- Ben Messaoud, I., Feki, J., and Zurfluh, G. (2011b). Modélisation multidimensionnelle des documents XML. Revue des Nouvelles Technologies de l'Information (RNTI) B-7, 55-70.
- Ben Messaoud, I., Feki, J., and Zurfluh, G. (2012). A first step for building a document warehouse: Unification of XML documents. In Sixth International Conference on Research Challenges in Information Science (RCIS'12) (pp. 59-64). Valencia, Spain.
- Carpani, F., and Ruggia, R. (2001). An integrity constraints language for a conceptual multidimensional data model. In 13th International Conference on Software Engineering & Knowledge Engineering (SEKE'01) (pp. 220-227). Argentina.
- De-Meo, P., Quattrone, G., Terracina, G., and Ursino, D. (2003). 'Almost automatic' and semantic inte- gration of XML Schemas at various 'severity" levels'.In Procedings of the International Conference on Cooperative Information Systems (CoopIS) (pp. 4-21).
- Feki, J. (2004). Vers une conception automatisé des entrepôts de données: Modélisation des besoins OLAP et génération de schémas multidimensionnels. In 8th Maghrebian Conference on Software Engineering and Artificial Intelligence (MCSEAI'04) (pp. 473-485). Sousse, Tunisia: CPU (Centre de Publication Universitaire).
- Ghozzi, F., Ravat, F., Teste, O., and Zurfluh, G. (2003). Constraints and multidimensional databases. In 5th International Conference on Enterprise Information Systems (ICEIS'03) (pp. 104-111). Angers, France.
- Hachaichi, Y., Feki, J., & Ben-Abdallah, H. (2010). Modélisation multidimensionnelle de documents XML centrés-données. Journal of Decision Systems, 19, 313-345.
- Hurtado, C. A., and Mendelzon, A. O. (2002). OLAP Dimension Constraints. In 21st ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'02) (pp. 169-179). Madi- son, USA.
- Inmon, W. H. (2002). Building the data warehouse. New York, John Wiley & Sons.
- Jaro, M. A. (1989). Advances in record linking methodology as applied to the 1985 census of Tampa Florida. Journal of the American Statistical Society, 84, 414-420.
- Júnior, C. A. S., and Mello, R. S. (2008). An ontology-driven process for unification of XML instances. In Brazilian Symposium on Multimedia and the Web (pp. 242-249). Vila Velha, Brazil.
- Kimball, R. (1997). The data warehouse toolkit. New York, NY: John Wiley and Sons.
- Lee, M. L., Yang, L. H., Hsu, W., and Yang, X. (2002). XClust: Clustering XML schemas for effective integration. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'02) (pp. 292-299). Virginia, USA.
- McCabe, M. C., Lee, J., Chowdhury, A., Grossman, D., and Frieder, O. (2000). On the design and evaluation of a multi-dimensional approach to information retrieval. In Nicholas J. Belkin, Peter Ingwersen, Mun-Kew Leong (Eds.), Proceedings of the 23th Annual International ACM SIGIR Con- ference (pp. 363-365). Athens, Greece.
- Mello, R. D. S., Castano, S., & Heuser, C. A. (2002). A method for the unification of XML schemata. Information and Software Technology, 44, 241-249.
- Pujolle, G., Ravat, F., Teste, O., and Tournier, R. (2011). Multidimensional database design from document-centric XML documents. In Alfredo Cuzzocrea & Umeshwar Dayal (Eds.), International Conference on Data Warehousing and Knowledge Discovery (DaWaK'11) (pp. 51-65). Toulouse, France.
- Ravat, F., and Teste, O. (2000). A temporal object-oriented data warehouse model. In DEXA 2000 (pp. 583-592), London: Springer-Verlag.
- Sullivan, D. (2001). Document warehousing and text mining: Techniques for improving business operations, marketing and sales. New York, NY: John Wiley & Sons.
- Tseng, F. S. C., & Chou, A. Y. H. (2006). The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence. Decision Support Systems (DSS), 42, 727-744.
- Yoo, C. S., Woo, S. M., and Kim, Y. S. (2005). Unification of XML DTD for XML documents with similar structure. Computational Science and its Applications -ICCSA, Part III (pp. 954-963).
- Zhang, Y. F., and Liu, W. Y. (2002). Semantic integration of XML schema. First International Conference on Machine Learning and Cybernetics (pp. 1085-1061) Beijing, China.