Academia.eduAcademia.edu

Outline

Verifying big data topologies by-design: a semi-automated approach

Journal of Big Data

https://doi.org/10.1186/S40537-019-0199-Y

Abstract

Big data or data-intensive applications (DIAs) process large amounts of data for the purpose of gaining key business intelligence through complex analytics using machinelearning techniques [20, 35]. These applications are receiving increased attention in the last years given their ability to yield competitive advantage by direct investigation of user needs and trends hidden in the enormous quantities of data produced daily by the average Internet user. According to Gartner [1] business intelligence and analytics applications will remain a top focus for Chief-Information Officers (CIOs) of most Fortune 500 companies until at least 2019-2021. However, the cost of ownership of the systems that process big data analytics are high due to infrastructure costs, steep learning curves for the different frameworks (such as Apache Storm [21], Apache Spark [2] or Apache Hadoop [3]) typically involved in design and development of big data applications and complexities in large-scale architectures. A key complexity of the above design and development activity lies in quickly and continuously refining the configuration parameters of the middleware and service platforms on top of which the DIA is running [12]. The process in question is especially complex as the number of middleware involved in DIAs design increases; the more middleware are involved the more parameters need co-evaluation (e.g., latency or beaconing times, caching policies, queue retention and more)-fine-tuning these "knobs" on so many

References (35)

  1. https ://githu b.com/maels tromd at/OSTIA . Accessed 1 Dec 2018.
  2. https ://githu b.com/socia lsens or. Accessed 1 Dec 2018.
  3. https ://githu b.com/senso rstor m/Storm CV. Accessed 1 Dec 2018.
  4. Balalaie A, Heydarnoori A, Jamshidi P. Microservices architecture enables devops: an experience report on migration to a cloud-native architecture. 2016.
  5. Bersani MM, Distefano S, Ferrucci L, Mazzara M. A timed semantics of workflows. In: ICSOFT (Selected Papers), com- munications in computer and information Science, vol. 555. Berlin: Springer; 2014. p. 365-83.
  6. Bersani MM, Marconi F, Tamburri DA, Jamshidi P, Nodari A. Continuous architecting of stream-based systems. In: Muccini H, Harper EK, editors. Proceedings of the 25th IFIP/IEEE working conference on software architectures. Washington, DC: IEEE Computer Society; 2016. p. 131-42.
  7. Bersani MM, Rossi M, San Pietro P. A tool for deciding the satisfiability of continuous-time metric temporal logic. Acta Informatica. 2015:1-36. https ://doi.org/10.1007/s0023 6-015-0229-y.
  8. Brunnert A, van Hoorn A, Willnecker F, Danciu A, Hasselbring W, Heger C, Herbst N, Jamshidi P, Jung R, von Kistowski J, et al. Performance-oriented devops: a research agenda. 2015. arXiv preprint arXiv :1508.04752 .
  9. Camilli M. Formal verification problems in a big data world: towards a mighty synergy. In: Companion proceedings of the 36th international conference on software engineering, ICSE companion. New York: ACM; 2014. p. 638-41. https ://doi.org/10.1145/25910 62.25910 88
  10. Chandrasekaran K, Santurkar S, Arora A. Stormgen -a domain specific language to create ad-hoc storm topologies. In: FedCSIS. 2014. p. 1621-8.
  11. Clements P, Kazman R, Klein M. Evaluating software architectures: methods and case studies. Boston: Addison- Wesley; 2001.
  12. Demri S, D'Souza D. An automata-theoretic approach to constraint LTL. Inf Comput. 2007;205(3):380-415.
  13. Di Nitto E, Jamshidi P, Guerriero M, Spais I, Tamburri DA. A software architecture framework for quality-aware devops. In: Proceedings of the 2nd international workshop on quality-aware DevOps, QUDOS@ISSTA 2016, Saarbrücken, Germany, July 21, 2016. 2016. p. 12-7. https ://doi.org/10.1145/29454 08.29454 11.
  14. Emani CK, Cullot N, Nicolle C. Understandable big data: a survey. Comput Sci Rev. 2015;17:70-81.
  15. Evans R. Apache storm, a hands on tutorial. In: IC2E. New York: IEEE; 2015. p. 2.
  16. Frankel D. Model driven architecture: applying MDA to enterprise computing. New York: Wiley; 2002.
  17. Furia CA, Mandrioli D, Morzenti A, Rossi M. Modeling time in computing: a taxonomy and a comparative survey. ACM Comput Surv. 2010;42(2):6:1-59.
  18. Hirzel M, Andrade H, Gedik B, Jacques-Silva G, Khandekar R, Kumar V, Mendell MP, Nasgaard H, Schneider S, Soulé R, Wu KL. Ibm streams processing language: analyzing big data in motion. IBM J Res Dev. 2013;57(3/4):7.
  19. Kalantari A, Kamsin A, Kamaruddin H, Ale Ebrahim N, Gani A, Ebrahimi A, Shamshirband S. A bibliometric approach to tracking big data research trends. J Big Data. 2017;4(1):30. https ://doi.org/10.1186/s4053 7-017-0088-1.
  20. Krippendorff K. Content analysis: an introduction to its methodology. 2nd ed. Thousand Oaks: Sage Publications; 2004.
  21. Marconi F, Bersani MM, Erascu M, Rossi M. Towards the formal verification of DIA through MTL models. In: Lecture notes in computer science.
  22. Morgan DL. Focus groups as qualitative research. Thousand Oaks: Sage Publications; 1997.
  23. Olshannikova E, Ometov A, Koucheryavy Y, Olsson T. Visualizing big data with augmented and virtual reality: chal- lenges and research agenda. J Big Data. 2015;2(1):22. https ://doi.org/10.1186/s4053 7-015-0031-2.
  24. Peng S, Gu J, Wang XS, Rao W, Yang M, Cao Y. Cost-based optimization of logical partitions for a query workload in a hadoop data warehouse. In: Chen L, Jia Y, Sellis TK, Liu G, editors. APWeb, Lecture notes in computer science, vol. 8709. Berlin: Springer; 2014. p. 559-67.
  25. Pnueli A. The temporal logic of programs. In: Proceedings of the 18th annual symposium on foundations of computer science, SFCS '77. Washington, DC: IEEE Computer Society; 1977. p. 46-57. https ://doi.org/10.1109/ SFCS.1977.32
  26. Pradella M, Morzenti A, Pietro PS. Bounded satisfiability checking of metric temporal logic specifications. ACM Trans Softw Eng Methodol. 2013;22(3):201-2054. https ://doi.org/10.1145/24915 09.24915 14.
  27. Quartulli M, Lozano J, Olaizola IG. Beyond the lambda architecture: effective scheduling for large scale eo informa- tion mining and interactive thematic mapping. In: IGARSS. 2015. p. 1492-5.
  28. Rajeev A, Dill DL. A theory of timed automata. Theor Comput Sci. 1994;126:183-235.
  29. Ratner B. Statistical and machine-learning data mining: techniques for better predictive modeling and analysis of big data. Boca Raton: CRC Press Inc; 2012.
  30. Snášel V, Nowaková J, Xhafa F, Barolli L. Geometrical and topological approaches to big data. Futur Gener Comput Syst. 2017;67:286-96. https ://doi.org/10.1016/j.futur e.2016.06.005.
  31. Tamura Y, Yamada S. Reliability analysis based on a jump diffusion model with two wiener processes for cloud computing with big data. Entropy. 2015;17(7):4533-46.
  32. Tommaso Di Noia MM, Sciascio ED. A computational model for mapreduce job flow. 2014.
  33. Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J et al. Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. New York: ACM; 2014. p. 147-56.
  34. Wang D, Liu J. Optimizing big data processing performance in the public cloud: opportunities and approaches. IEEE Netw. 2015;29(5):31-5.
  35. Yang F, Su W, Zhu H, Li Q. Formalizing mapreduce with csp. In: Proceedings of ECBS. Washington, DC: IEEE Computer Society; 2010. p. 358-67. https ://doi.org/10.1109/ECBS.2010.50.