Academia.eduAcademia.edu

Outline

Data Management for the RedisDG Scientific Workflow Engine

2016, 2016 IEEE International Conference on Computer and Information Technology (CIT)

https://doi.org/10.1109/CIT.2016.55

Abstract

In this paper we investigate the general problem of controlling a scientific workflow service in terms of data management. We focus on the data management problem for the RedisDG scientific workflow engine. RedisDG is based on the Publish/Subscribe paradigm for the interaction between the different components of the system, hence new issues appear for scheduling. Indeed, the Publish/Subscribe paradigm utilization introduces different challenging problems, among them the design of effective solutions for managing data, on the fly, when tasks are published. Our contributions are twofold. First we add new functionalities to the RedisDG workflow engine with scheduling decisions related to the allocation of data intensive jobs to compute units and according to an efficient management of data and second we introduce a large set of experiments to validate our approaches. We analyze our results and we also sketch perspectives and insights. Experiments are conducted on the Grid'5000 testbed and the paper is a step forward to implement a 'Workflow engine as a Service' (WaaS).

References (13)

  1. P. T. Eugster, P. Felber, R. Guerraoui, and A.-M. Kermarrec, "The many faces of publish/subscribe," ACM Comput. Surv., vol. 35, no. 2, pp. 114-131, 2003.
  2. L. Abidi, J. Dubacq, C. Cérin, and M. Jemni, "A publication- subscription interaction schema for desktop grid computing," in Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, Coimbra, Portugal, March 18-22, 2013, S. Y. Shin and J. C. Maldonado, Eds. ACM, 2013, pp. 771- 778. [Online]. Available: http://doi.acm.org/10.1145/2480362. 2480510
  3. W. Saad, L. Abidi, H. Abbes, C. Cérin, and M. Jemni, "Wide area bonjourgrid as a data desktop grid: Modeling and implementation on top of redis," in 26th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2014, Paris, France, October 22-24, 2014. IEEE Computer Society, 2014, pp. 286-293. [Online]. Available: http://dx.doi.org/10.1109/SBAC-PAD.2014.50
  4. H. Abbes and J.-C. Dubacq, "Analysis of Peer-to-Peer Protocols Performance for Establishing a Decentralized Desktop Grid Mid- dleware," in Euro-Par Workshops, ser. Lecture Notes in Com- puter Science, E. César, M. Alexander, A. Streit, J. L. Träff, C. Cérin, A. Knüpfer, D. Kranzlmüller, and S. Jha, Eds., vol. 5415. Springer, 2008, p. 235â ȂŞ246.
  5. S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.-H. Su, and K. Vahi, "Characterization of scientific workflows," in Work- flows in Support of Large-Scale Science, 2008. WORKS 2008. Third Workshop on, Nov 2008, pp. 1-10.
  6. S. F. Altschul, T. L. Madden, A. A. Sch Ãd'ffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, "Gapped blast and psiblast: a new generation of protein database search programs," NUCLEIC ACIDS RESEARCH, vol. 25, no. 17, pp. 3389-3402, 1997.
  7. R. Graves, T. H. Jordan, S. Callaghan, E. Deelman, E. Field, G. Juve, C. Kesselman, P. Maechling, G. Mehta, K. Milner, D. Okaya, P. Small, and K. Vahi, "CyberShake: A Physics- Based Seismic Hazard Model for Southern California," Pure and Applied Geophysics, vol. 168, pp. 367-381, 2011.
  8. J. Yu, R. Buyya, and K. Ramamohanarao, Workflow Scheduling Algorithms for Grid Computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 173-214. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-69277-5 7
  9. H. Topcuoglu, S. Hariri, and M.-Y. Wu, "Performance-effective and low-complexity task scheduling for heterogeneous comput- ing," IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 3, pp. 260-274, Mar 2002.
  10. M. Rahman, S. Venugopal, and R. Buyya, "A dynamic critical path algorithm for scheduling scientific workflow applications on global grids," in e-Science and Grid Computing, IEEE Interna- tional Conference on, Dec 2007, pp. 35-42.
  11. R. Sakellariou and H. Zhao, "A hybrid heuristic for dag schedul- ing on heterogeneous systems," in Parallel and Distributed Pro- cessing Symposium, 2004. Proceedings. 18th International, April 2004, pp. 111-.
  12. D. Y. K. M. Sim, "A locality enhanced scheduling method for multiple mapreduce jobs in a workflow application."
  13. M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling," in European Conference on Computer Systems, Proceedings of the 5th European conference on Computer systems, EuroSys 2010, Paris, France, April 13-16, 2010, C. Morin and G. Muller, Eds. ACM, 2010, pp. 265-278. [Online]. Available: http://doi.acm.org/10.1145/1755913.1755940