Academia.eduAcademia.edu

Outline

Service-generated Big Data and Big Data-as-a-Service: An Overview

https://doi.org/10.1109/BIGDATA.CONGRESS.2013.60

Abstract

With the prevalence of service computing and cloud computing, more and more services are emerging on the Internet, generating huge volume of data, such as trace logs, QoS information, service relationship, etc. The overwhelming service-generated data become too large and complex to be effectively processed by traditional approaches. How to store, manage, and create values from the service-oriented big data become an important research problem. On the other hand, with the increasingly large amount of data, a single infrastructure which provides common functionality for managing and analyzing different types of service-generated big data is urgently required. To address this challenge, this paper provides an overview of service-generated big data and Big Data-as-a-Service. First, three types of service-generated big data are exploited to enhance system performance. Then, Big Data-as-a-Service, including Big Data Infrastructure-as-a-Service, Big Data Platform-as-a-Service, and Big Data Analytics Software-as-a-Service, is employed to provide common big data related services (e.g., accessing servicegenerated big data and data analytics results) to users to enhance efficiency and reduce cost.

Key takeaways
sparkles

AI

  1. Service-generated big data, including trace logs and QoS data, poses significant processing challenges.
  2. Big Data-as-a-Service (BDaaS) offers essential infrastructure, platform, and analytics services to manage big data effectively.
  3. Volume, velocity, variety, and veracity define the complexities of service-generated big data management.
  4. The global Big Data-as-a-Service market is projected to grow from $2.25 billion in 2015 to $30 billion by 2021.
  5. The paper reviews existing frameworks and proposes future research directions for enhancing service-generated big data analytics.

References (51)

  1. M. A. Beyer and D. Laney, "The importance of 'big data': A definition," Gartner, Tech. Rep., 2012.
  2. D. Austin, "eDiscovery Trends: CGOCs Information Lifecycle Gov- ernance Leader Reference Guide," "http://www.ediscoverydaily.com", May 2012.
  3. The Economist, "A special report on managing information: Data, data everywhere," The Economist, February 2010.
  4. IBM, "What is big data? ł bringing big data to the enterprise," "http://www-01.ibm.com/software/data/bigdata", 2013.
  5. H. Mi, H. Wang, Y. Zhou, M. R. Lyu, and H. Cai, "Towards fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems," IEEE Transactions on Parallel and Distributed Systems, no. PrePrints, 2013.
  6. B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag, "Dapper, a large-scale dis- tributed systems tracing infrastructure," Google, Inc., Tech. Rep., 2010.
  7. S. Han, Y. Dang, S. Ge, D. Zhang, and T. Xie, "Performance debugging in the large via mining millions of stack traces," in Proc. 34th Int'l Conf. on Software Engineering (ICSE'12), 2012, pp. 145-155.
  8. M. Y. Chen, A. Accardi, E. Kiciman, J. Lloyd, D. Patterson, A. Fox, and E. Brewer, "Path-based faliure and evolution management," in Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation. USENIX Association, 2004, pp. 23-23.
  9. C. Lim, N. Singh, and S. Yajnik, "A log mining approach to fail- ure analysis of enterprise telephony systems," in Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN'08), 2008, pp. 398-403.
  10. E. Thereska, B. Salmon, J. Strunk, M. Wachs, M. Abd-El-Malek, J. Lopez, and G. R. Ganger, "Stardust: tracking activity in a distributed storage system," in ACM SIGMETRICS Performance Evaluation Re- view, vol. 34, no. 1. ACM, 2006, pp. 3-14.
  11. H. Mi, H. Wang, H. Cai, Y. Zhou, M. R. Lyu, and Z. Chen, "P- tracer: Path-based performance profiling in cloud computing systems," in Proceedings of the 36th IEEE Annual Computer Software and Applications Conference (COMPSAC'12). IEEE, 2012, pp. 509-514.
  12. B. M. Cantrill, M. W. Shapiro, A. H. Leventhal et al., "Dynamic instrumentation of production systems," in USENIX Annual Technical Conference, 2004, pp. 15-28.
  13. S. L. Graham, P. B. Kessler, and M. K. Mckusick, "Gprof: A call graph execution profiler," ACM Sigplan Notices, vol. 17, no. 6, pp. 120-126, 1982.
  14. P. Barham, A. Donnelly, R. Isaacs, and R. Mortier, "Using magpie for request extraction and workload modelling," in Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation (OSDI'04), 2004, pp. 18-18.
  15. P. Reynolds, C. Killian, J. L. Wiener, J. C. Mogul, M. A. Shah, and A. Vahdat, "Pip: detecting the unexpected in distributed systems," in Proceedings of the 3rd conference on Networked Systems Design & Implementation (NSDI'06), 2006, pp. 9-9.
  16. E. Thereska and G. R. Ganger, "Ironmodel: robust performance mod- els in the wild," in Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '08), 2008, pp. 253-264.
  17. M. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer, "Pinpoint: problem determination in large, dynamic internet services," in Pro- ceedings of the International Conference on Dependable Systems and Networks (DSN'02), pp. 595-604.
  18. W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, "Detecting large-scale system problems by mining console logs," in Proceedings of the ACM 22nd Simposium on Operating Systems Principles (SOSP'09), 2009, pp. 117-132.
  19. M. R. Lyu, Software Fault Tolerance. Trends in Software, Wiley, 1995.
  20. Z. Zheng and M. R. Lyu, "A QoS-aware fault tolerant middleware for dependable service composition," in Proc. 39th Int'l Conf. Dependable Systems and Networks (DSN'09), 2009, pp. 239-248.
  21. H. Yang, Z. Xu, I. King, and M. Lyu, "Online learning for group lasso," in International Conference on Machine Learning (ICML'10), 2010.
  22. L. Shao, J. Zhang, Y. Wei, J. Zhao, B. Xie, and H. Mei, "Personalized QoS prediction for Web services via collaborative filtering," in Proc. 5th Int'l Conf. Web Services (ICWS'07), 2007, pp. 439-446.
  23. Z. Zheng, H. Ma, M. R. Lyu, and I. King, "QoS-aware Web service rec- ommendation by collaborative filtering," IEEE Transactions on Service Computing, vol. 4, no. 2, pp. 140-152, 2011.
  24. Z. Zheng, Y. Zhang, and M. R. Lyu, "CloudRank: A QoS-driven component ranking framework for cloud computing," in Proc. Int'l Symp. Reliable Distributed Systems (SRDS'10), 2010, pp. 184-193.
  25. X. Chen, Z. Zheng, X. Liu, Z. Huang, and H. Sun, "Personalized QoS-aware Web service recommendation and visualization," IEEE Transactions on Services Computing, no. PrePrints, 2011.
  26. M. Tang, Y. Jiang, J. Liu, and X. F. Liu, "Location-aware collaborative filtering for qos-based service recommendation," in Pro. IEEE 19th Int'l Conf' on Web Services (ICWS'12), 2012, pp. 202-209.
  27. W. Lo, J. Yin, S. Deng, Y. Li, and Z. Wu, "Collaborative web service qos prediction with location-based regularization," in Pro. IEEE 19th Int'l Conf' on Web Services (ICWS'12), 2012, pp. 464-471.
  28. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, "Dynamo: amazon's highly available key-value store," in Proc. 21st ACM Sympo- sium on Operating Systems Principles (SOSP'07), 2007, pp. 205-220.
  29. A. Liu, Q. Li, L. Huang, and S. Wen, "Shapley value based impression propagation for reputation management in web service composition," in Pro. IEEE 19th Int'l Conf' on Web Services (ICWS'12), 2012, pp. 58-65.
  30. Y. Kang, Z. Zheng, and M. Lyu, "A latency-aware co-deployment mechanism for cloud-based services," in Proceedings of the IEEE 5th International Conference on Cloud Computing (CLOUD'12), 2012, pp. 630-637.
  31. J. Zhu, Z. Zheng, Y. Zhou, and M. R. Lyu, "Scaling service-oriented applications into geo-distributed clouds," in Pro. IEEE Int'l Workshop on Internet-based Virtual Computing Environment (iVCE'13), 2013.
  32. Q. Zhang, Q. Zhu, M. F. Zhani, and R. Boutaba, "Dynamic service placement in geographically distributed clouds," in Proc. IEEE 32nd Int'l Conf. on Distributed Computing Systems (ICDCS'12), 2012, pp. 526-535.
  33. M. Steiner, B. G. Gaglianello, V. K. Gurbani, V. Hilt, W. D. Roome, M. Scharf, and T. Voith, "Network-aware service placement in a distributed cloud environment," in Proc. ACM SIGCOMM'12, 2012, pp. 73-74.
  34. M. Alicherry and T. V. Lakshman, "Network aware resource allocation in distributed clouds," in Proc. IEEE INFOCOM'12, 2012, pp. 963-971.
  35. S. Lohr, "The age of big data," New York Times, vol. 11, 2012.
  36. "Challenges and opportunities with big data," leading researchers across the United States, Tech. Rep., 2011.
  37. E. Slack, "Storage infrastructures for big data workflows," Storage Switchland, LLC, Tech. Rep., 2012.
  38. C. Lynch, "Big data: How do your data grow?" Nature, vol. 455, no. 7209, pp. 28-29, 2008.
  39. Y. Demchenko, "Bof: Infrastructure issues in big data," "https://tnc2013.terena.org/core/event/15", 2013.
  40. "Big data-as-a-service: A market and technology perspective," EMC Solution Group, Tech. Rep., 2012.
  41. J. Horey, E. Begoli, R. Gunasekaran, S.-H. Lim, and J. Nutaro, "Big data platforms as a service: challenges and approach," in Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing, ser. HotCloud'12, 2012, pp. 16-16.
  42. R. Sharir, "Cloud database service: The difference between dbaas, daas and cloud storage -what's the difference?" "http://xeround.com/blog/2011/02/dbaas-vs-daas-vs-cloud-storage- difference", 2011.
  43. B. Devlin, S. Rogers, and J. Myers, "Big data comes of age," Tech. Rep.
  44. M. Lenzerini, "Data integration: A theoretical perspective," in Pro- ceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2002, pp. 233-246.
  45. J. LaFayette, "The future of 'big data' is apps, not infras- tructure," "http://venturebeat.com/2013/01/04/the-future-of-big-data-is- apps-not-infrastructure/", 2013.
  46. M. Rouse, "Definition of big data analytics," "http://searchbusinessanalytics.techtarget.com/definition/big-data- analytics", January 2012.
  47. "Why big data analytics as a service?" "http://www.analyticsasaservice.org/why-big-data-analytics-as-a- service/", August 2012.
  48. P. O'Brien, "The future: Big data apps or web services?" "http://blog.fliptop.com/blog/2012/05/12/the-future-big-data-apps- or-web-services/", 2013.
  49. "What is big data? analytics as a service in the cloud," "http://www.analyticsasaservice.org/what-is-big-data-analytics-as-a- service-in-the-cloud/", March 2012.
  50. V. W. Consulting, "Big data, big impact: New possibilities for interna- tional development," The World Economic Forum, Tech. Rep., 2012.
  51. J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers, "Big data: The next frontier for innovation, competition, and productivity," McKinsey Global Institute, pp. 1-137, 2011.