A Survey on the Evolution of Models of Data Integration
2020, International Journal of Knowledge Based Computer Systems
Abstract
From time to time there have been different models of data integration to manage and analyze data. Also with the emergence of big data, the database community has proposed newer and better solutions to manage such disparate and large data. Also, the changes in the data storage models and massive data repositories on the web have encouraged the need for novel data integration models. In this article, we try to present a case of various trends in integrating data through different models. We present a brief overview of Federated Database Systems, Data Warehouse, Mediators and new proposed Polystore Systems with the evolution of architecture, query processing, distribution, automation and data models supported within those data integration models. The similarities and differences of these models are also presented. Also, the novelty of Polystore Systems with various examples is discussed. This article also highlights the importance of such system for integrating large scale heterogeneous data.
FAQs
AI
What explains the limitations of traditional data integration models with big data?
The study finds that traditional models like FDBS fail to manage heterogeneous data due to a 'one size fits all' approach, which is inadequate for the complexities introduced by big data. This inadequacy has been increasingly highlighted with the emergence of new data types and the need for advanced query optimization.
How do Polystore Systems enhance data integration across multiple models?
Polystore Systems provide a cohesive single query language for disparate data models and stores, enabling seamless integration of various data types. For example, architectures like BigDAWG and PolyBase showcase successful implementations that allow interaction with both traditional RDBMS and modern NoSQL systems.
What are the primary differences between mediators and Federated Database Systems?
The paper highlights that while both are virtual integration models, Federated Database Systems primarily focus on managing relational data while mediators support mapping and merging of queries from multiple heterogeneous sources. This distinction greatly influences their respective query processing methodologies.
When did new data virtualization techniques become crucial in database management?
Recent advancements leading to the necessity of data virtualization techniques emerged with the explosive growth of big data, particularly over the last decade. As traditional systems could not handle the volume and variety of data, new approaches like Polystore architecture gained prominence.
How does autonomy impact the design of Multi-Database Systems?
The research delineates four facets of autonomy—design, communication, implementation, and associations—that underpin the functionality of Multi-Database Systems. Balancing this autonomy with the integration of heterogeneous data sources is essential for achieving optimal performance in MDBS.
References (16)
- M. Ceriani, and P. Bottoni, "A dataflow platform for ap- plications based on linked data," International Journal of Computational Science and Engineering, vol. 16, no. 4, pp. 419-429, 2018.
- C. R. Musick, T. Critchlow, M. Ganesh, T. Slezak, and K. Fidelis, "System and method for integrating and ac- cessing multiple data sources within a data warehouse architecture," U.S. Patent No. 7,152,070, Dec. 19, 2006.
- A. P. Sheth, and J. A. Larson, "Federated database sys- tems for managing distributed, heterogeneous, and au- tonomous databases," ACM Computing Surveys, vol. 22, no. 3, pp. 183-236, 1990.
- S. Suwanmanee, et al., "Wrapping and integrating het- erogeneous databases with OWL," 7th International Conference on Enterprise Information Systems (ICIES 2005), 2005.
- V. Gadepally, P. Chen, J. Duggan, A. Elmore, B. Haynes, ......, and M. Stonebraker, "The BigDAWG polystore system and architecture," 2016 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, Waltham, MA, USA, Sep. 13-15, 2016.
- M. Stonebraker, and U. Çetintemel, ""One size fits all": An idea whose time has come and gone," Making Databases Work: The Pragmatic Wisdom of Michael Stonebraker, 2018, pp. 441-462.
- Z. She, S. Ravishankar, and J. Duggan, "BigDAWG polystore query optimization through semantic equiv- alences," 2016 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, Waltham, MA, USA, Sep. 13-15, 2016.
- D. L. Moody, and M. A. R. Kortink, "From enterprise models to dimensional models: A methodology for data warehouse and data mart design," Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW'2000), Stockholm, Sweden, Jun. 5-6, 2000.
- S. Chaudhuri, and U. Dayal, "An overview of data ware- housing and OLAP technology," ACM Sigmod Record, vol. 26, no. 1, pp. 65-74, 1997.
- G. J. L. Kemp, N. Angelopoulos, and P. M. D. Gray, "Architecture of a mediator for a bioinformatics data- base federation," IEEE Transactions on Information Technology in Biomedicine, vol. 6, no. 2, pp. 116-122, 2002.
- J. Duggan, A. J. Elmore, M. Stonebraker, M. Balazinska, B. Howe, ..., and S. Z. Brown, "The BigDAWG polys- tore system," ACM Sigmod Record, vol. 44, no. 2, pp. 11-16, 2015.
- Mohd. Saeed, M. Villarroel, A. T. Reisner, G. Clifford, L.-W. Lehman, ....., and R. G. Mark, "Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): A public-access intensive care unit database," Critical Care Medicine, vol. 39, no. 5, pp. 952-960, 2011.
- M. Armbrust, et al., "Spark SQL: Relational data processing in spark," Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015.
- M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, ….., and I. Stoica, "Apache spark: A unified engine for big data processing," Communications of the ACM, vol. 59, no. 11, pp. 56-65, 2016.
- D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, ....., and J. Gramling, "Split query processing in polybase," Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 2013.
- B. Kolev, P. Valduriez, C. Bondiombouy, R. Jimenez- Peris, R. Pau, and J. Pereira, "CloudMdsQL: Querying heterogeneous cloud data stores with a common lan- guage," Distributed and Parallel Databases, vol. 34, no. 4, pp. 463-503, 2016.