Academia.eduAcademia.edu

Outline

Decentralized Learning Made Easy with DecentralizePy

2023

https://doi.org/10.1145/3578356.3592587

Abstract

Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance. It consists of many nodes that coordinate without a central server and exchange millions of parameters in the inherently iterative process of machine learning (ML) training. In addition, these nodes are connected in complex and potentially dynamic topologies. Assessing the intricate dynamics of such networks is clearly not an easy task. Often in literature, researchers resort to simulated environments that do not scale and fail to capture practical and crucial behaviors, including the ones associated to parallelism, data transfer, network delays, and wall-clock time. In this paper, we propose decentralizepy, a distributed framework for decentralized ML, which allows for the emulation of large-scale learning networks in arbitrary topologies. We demonstrate the capabilities of decentralizepy by deploying techniques such as sparsification and secure aggregation on top of several topologies, including dynamic networks with more than one thousand nodes. CCS Concepts: • Networks → Programming interfaces; • Computing methodologies → Distributed algorithms; Machine learning algorithms; • Computer systems organization → Peer-to-peer architectures.

References (39)

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irv- ing, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for Large-Scale machine learn- ing (OSDI'16). https://www.usenix.org/conference/osdi16/technical- sessions/presentation/abadi
  2. Dan Alistarh, Demjan Grubic, Jerry Z. Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding (NIPS'17). https://proceedings.neurips.cc/paper_files/paper/2017/file/ 6c340f25839e6acdc73414517203f5f0-Paper.pdf
  3. Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov, and Cédric Renggli. 2018. The Convergence of Sparsified Gradient Methods (NIPS'18). https://proceedings.neurips.cc/paper_ files/paper/2018/file/314450613369e0ee72d0da7f6fee773c-Paper.pdf
  4. Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Erick Lavoie, and Anne-Marie Kermarrec. 2023. Refined Convergence and Topology Learning for Decentralized Optimization with Heterogeneous Data (AISTATS'23). arXiv:2204.04452
  5. Aurélien Bellet, Anne-Marie Kermarrec, and Erick Lavoie. 2022. D- Cliques: Compensating for Data Heterogeneity with Topology in De- centralized Federated Learning (SRDS'22). https://doi.org/10.1109/ SRDS55811.2022.00011
  6. Juan Benet. 2014. IPFS -Content Addressed, Versioned, P2P File System. CoRR (2014). arXiv:1407.3561
  7. Daniel J Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Titouan Parcol- let, and Nicholas D Lane. 2020. Flower: A Friendly Federated Learning Research Framework. (2020). arXiv:2007.14390
  8. Luca Boccassi et al. 2023. ZeroMQ: An open-source universal messag- ing library. https://zeromq.org
  9. Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, et al. 2019. Towards Federated Learning at Scale: System Design (ML- Sys'19). https://proceedings.mlsys.org/paper_files/paper/2019/file/ bd686fd640be98efaae0091fa301e613-Paper.pdf
  10. Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning (CCS '17). https://doi.org/10.1145/3133956.3133982
  11. Sebastian Caldas, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMa- han, Virginia Smith, and Ameet Talwalkar. 2019. Leaf: A benchmark for federated settings. In 2nd Intl. Workshop on Federated Learning for Data Privacy and Confidentiality (FL-NeurIPS'19). arXiv:1812.01097
  12. Akash Dhasade, Nevena Dresevic, Anne-Marie Kermarrec, and Rafael Pires. 2022. TEE-based decentralized recommender systems: The raw data sharing redemption (IPDPS'22). https://doi.org/10.1109/ IPDPS53621.2022.00050
  13. Paulo Gouveia, João Neves, Carlos Segarra, Luca Liechti, Shady Issa, Valerio Schiavoni, and Miguel Matos. 2020. Kollaps: Decentralized and Dynamic Topology Emulation (EuroSys '20). Article 23. https: //doi.org/10.1145/3342195.3387540
  14. Chaoyang He, Songze Li, Jinhyun So, Mi Zhang, Hongyi Wang, Xi- aoyang Wang, Praneeth Vepakomma, Abhishek Singh, Hang Qiu, Li Shen, Peilin Zhao, Yan Kang, Yang Liu, Ramesh Raskar, Qiang Yang, Murali Annavaram, and Salman Avestimehr. 2020. FedML: A re- search library and benchmark for federated machine learning. (2020). arXiv:2007.13518
  15. Kevin Hsieh, Amar Phanishayee, Onur Mutlu, and Phillip B. Gibbons. 2020. The Non-IID Data Quagmire of Decentralized Machine Learning. In Proceedings of the 37th International Conference on Machine Learning (ICML'20). Article 408. http://proceedings.mlr.press/v119/hsieh20a/ hsieh20a.pdf
  16. Márk Jelasity, Spyros Voulgaris, Rachid Guerraoui, Anne-Marie Ker- marrec, and Maarten Van Steen. 2007. Gossip-based peer sampling. ACM Transactions on Computer Systems (TOCS) 25, 3 (2007), 8-es.
  17. Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Gar- rett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchin- son, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Kho- dak, Jakub Konecný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Hang Qi, Daniel Ramage, Ramesh Raskar, Mariana Raykova, Dawn Song, Weikang Song, Sebas- tian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2020. Advances and open problems in federated learning. Foundations and Trends in Machine Learning 14, 1-2 (2020). https://doi.org/10.1561/2200000083
  18. Anastasia Koloskova, Tao Lin, Sebastian U Stich, and Martin Jaggi. 2020. Decentralized Deep Learning with Arbitrary Communication Com- pression (ICLR'20). https://openreview.net/forum?id=SkgGCkrKvH
  19. Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, and Sebastian Stich. 2020. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates (ICML'20). https://proceedings. mlr.press/v119/koloskova20a.html
  20. Anastasia Koloskova, Sebastian Stich, and Martin Jaggi. 2019. De- centralized stochastic optimization and gossip algorithms with com- pressed communication (ICML'19). https://proceedings.mlr.press/v97/ koloskova19a.html
  21. Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2014. The CIFAR-10 dataset. 55, 5 (2014). https://www.cs.toronto.edu/~kriz/cifar.html
  22. Fan Lai, Yinwei Dai, Sanjay Singapuram, et al. 2022. FedScale: Bench- marking Model and System Performance of Federated Learning at Scale (ICML'22). https://proceedings.mlr.press/v162/lai22a.html
  23. Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. 2017. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gra- dient Descent (NIPS'17). https://proceedings.neurips.cc/paper_files/ paper/2017/file/f75526659f31040afeb61cb7133e4e6d-Paper.pdf
  24. Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training (ICLR'18). https://openreview.net/forum?id= SkhQHMW0W
  25. Yang Liu, Tao Fan, Tianjian Chen, Qian Xu, and Qiang Yang. 2021. FATE: An Industrial Grade Platform for Collaborative Learning With Data Protection. J. Mach. Learn. Res. 22, 226 (2021). http://jmlr.org/ papers/v22/20-815.html
  26. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data (AISTATS'17). https://proceedings. mlr.press/v54/mcmahan17a/mcmahan17a.pdf
  27. Message Passing Interface Forum. 2021. MPI: A Message-Passing In- terface Standard Version 4.0. https://www.mpi-forum.org/docs/mpi- 4.0/mpi40-report.pdf
  28. Christodoulos Pappas, Dimitris Chatzopoulos, Spyros Lalis, and Manolis Vavalis. 2021. IPLS: A Framework for Decentralized Fed- erated Learning (IFIP Networking'21). https://doi.org/10.23919/ IFIPNetworking52078.2021.9472790
  29. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Brad- bury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS'19. https://proceedings.neurips.cc/paper_files/paper/2019/ file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
  30. Holger R Roth, Yan Cheng, Yuhong Wen, Isaac Yang, Ziyue Xu, YuanT- ing Hsieh, Kristopher Kersten, Ahmed Harouni, Can Zhao, Kevin Lu, Zhihong Zhang, Wenqi Li, Andriy Myronenko, Dong Yang, Sean Yang, Nicola Rieke, Abood Quraini, Chester Chen, Daguang Xu, Nic Ma, Prerna Dogra, Mona G Flores, and Andrew Feng. 2022. NVIDIA FLARE: Federated Learning from Simulation to Real-World. In Work- shop on Federated Learning: Recent Advances and New Challenges. https://openreview.net/forum?id=hD9QaIQTL_f
  31. Rishi Sharma et al. 2022. decentralizepy: An open-source decen- tralized learning research framework. https://github.com/sacs- epfl/decentralizepy
  32. Nikko Strom. 2015. Scalable distributed DNN training using commodity GPU cloud computing. In 16th Annual Conference of the International Speech Communication Association (INTER- SPEECH'15). https://www.isca-speech.org/archive_v0/interspeech_ 2015/papers/i15_1488.pdf
  33. Thijs Vogels, Hadrien Hendrikx, and Martin Jaggi. 2022. Beyond spectral gap: the role of the topology in decentralized learning (NeurIPS'22). https://proceedings.neurips.cc/paper_files/paper/2022/ file/61162d94822d468ee6e92803340f2040-Paper-Conference.pdf
  34. Thijs Vogels, Sai Praneeth Karimireddy, and Martin Jaggi. 2020. Prac- tical Low-Rank Communication Compression in Decentralized Deep Learning (NeurIPS'20). https://proceedings.neurips.cc/paper_files/ paper/2020/file/a376802c0811f1b9088828288eb0d3f0-Paper.pdf
  35. Milos Vujasinovic. 2023. Secure Aggregation on Sparse Mod- els in Decentralized Learning Systems. Master's thesis. EPFL. https://www.epfl.ch/labs/sacs/wp-content/uploads/2023/02/Secure_ Aggregation_on_Sparse_Models_in_Decentralized_Learning_ Systems___Milos_Vujasinovic.pdf
  36. Lin Xiao, Stephen Boyd, and Seung-Jean Kim. 2007. Distributed average consensus with least-mean-square deviation. J. Parallel and Distrib. Comput. 67, 1 (2007). https://doi.org/10.1016/j.jpdc.2006.08.010
  37. Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and Françoise Beaufays. 2018. Applied federated learning: Improving google keyboard query suggestions. (2018). arXiv:1812.02903
  38. Tongtian Zhu, Fengxiang He, Lan Zhang, Zhengyang Niu, Mingli Song, and Dacheng Tao. 2022. Topology-aware generalization of decentralized SGD (ICML'22). https://proceedings.mlr.press/v162/ zhu22d.html
  39. Alexander Ziller, Andrew Trask, Antonio Lopardo, et al. 2021. PySyft: A library for easy federated learning. In Federated Learning Systems. https://doi.org/10.1007/978-3-030-70604-3_5