Abstract
Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance. It consists of many nodes that coordinate without a central server and exchange millions of parameters in the inherently iterative process of machine learning (ML) training. In addition, these nodes are connected in complex and potentially dynamic topologies. Assessing the intricate dynamics of such networks is clearly not an easy task. Often in literature, researchers resort to simulated environments that do not scale and fail to capture practical and crucial behaviors, including the ones associated to parallelism, data transfer, network delays, and wall-clock time. In this paper, we propose decentralizepy, a distributed framework for decentralized ML, which allows for the emulation of large-scale learning networks in arbitrary topologies. We demonstrate the capabilities of decentralizepy by deploying techniques such as sparsification and secure aggregation on top of several topologies, including dynamic networks with more than one thousand nodes. CCS Concepts: • Networks → Programming interfaces; • Computing methodologies → Distributed algorithms; Machine learning algorithms; • Computer systems organization → Peer-to-peer architectures.
References (39)
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irv- ing, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for Large-Scale machine learn- ing (OSDI'16). https://www.usenix.org/conference/osdi16/technical- sessions/presentation/abadi
- Dan Alistarh, Demjan Grubic, Jerry Z. Li, Ryota Tomioka, and Milan Vojnovic. 2017. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding (NIPS'17). https://proceedings.neurips.cc/paper_files/paper/2017/file/ 6c340f25839e6acdc73414517203f5f0-Paper.pdf
- Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov, and Cédric Renggli. 2018. The Convergence of Sparsified Gradient Methods (NIPS'18). https://proceedings.neurips.cc/paper_ files/paper/2018/file/314450613369e0ee72d0da7f6fee773c-Paper.pdf
- Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Erick Lavoie, and Anne-Marie Kermarrec. 2023. Refined Convergence and Topology Learning for Decentralized Optimization with Heterogeneous Data (AISTATS'23). arXiv:2204.04452
- Aurélien Bellet, Anne-Marie Kermarrec, and Erick Lavoie. 2022. D- Cliques: Compensating for Data Heterogeneity with Topology in De- centralized Federated Learning (SRDS'22). https://doi.org/10.1109/ SRDS55811.2022.00011
- Juan Benet. 2014. IPFS -Content Addressed, Versioned, P2P File System. CoRR (2014). arXiv:1407.3561
- Daniel J Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Titouan Parcol- let, and Nicholas D Lane. 2020. Flower: A Friendly Federated Learning Research Framework. (2020). arXiv:2007.14390
- Luca Boccassi et al. 2023. ZeroMQ: An open-source universal messag- ing library. https://zeromq.org
- Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, et al. 2019. Towards Federated Learning at Scale: System Design (ML- Sys'19). https://proceedings.mlsys.org/paper_files/paper/2019/file/ bd686fd640be98efaae0091fa301e613-Paper.pdf
- Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning (CCS '17). https://doi.org/10.1145/3133956.3133982
- Sebastian Caldas, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMa- han, Virginia Smith, and Ameet Talwalkar. 2019. Leaf: A benchmark for federated settings. In 2nd Intl. Workshop on Federated Learning for Data Privacy and Confidentiality (FL-NeurIPS'19). arXiv:1812.01097
- Akash Dhasade, Nevena Dresevic, Anne-Marie Kermarrec, and Rafael Pires. 2022. TEE-based decentralized recommender systems: The raw data sharing redemption (IPDPS'22). https://doi.org/10.1109/ IPDPS53621.2022.00050
- Paulo Gouveia, João Neves, Carlos Segarra, Luca Liechti, Shady Issa, Valerio Schiavoni, and Miguel Matos. 2020. Kollaps: Decentralized and Dynamic Topology Emulation (EuroSys '20). Article 23. https: //doi.org/10.1145/3342195.3387540
- Chaoyang He, Songze Li, Jinhyun So, Mi Zhang, Hongyi Wang, Xi- aoyang Wang, Praneeth Vepakomma, Abhishek Singh, Hang Qiu, Li Shen, Peilin Zhao, Yan Kang, Yang Liu, Ramesh Raskar, Qiang Yang, Murali Annavaram, and Salman Avestimehr. 2020. FedML: A re- search library and benchmark for federated machine learning. (2020). arXiv:2007.13518
- Kevin Hsieh, Amar Phanishayee, Onur Mutlu, and Phillip B. Gibbons. 2020. The Non-IID Data Quagmire of Decentralized Machine Learning. In Proceedings of the 37th International Conference on Machine Learning (ICML'20). Article 408. http://proceedings.mlr.press/v119/hsieh20a/ hsieh20a.pdf
- Márk Jelasity, Spyros Voulgaris, Rachid Guerraoui, Anne-Marie Ker- marrec, and Maarten Van Steen. 2007. Gossip-based peer sampling. ACM Transactions on Computer Systems (TOCS) 25, 3 (2007), 8-es.
- Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Gar- rett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchin- son, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Kho- dak, Jakub Konecný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Hang Qi, Daniel Ramage, Ramesh Raskar, Mariana Raykova, Dawn Song, Weikang Song, Sebas- tian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2020. Advances and open problems in federated learning. Foundations and Trends in Machine Learning 14, 1-2 (2020). https://doi.org/10.1561/2200000083
- Anastasia Koloskova, Tao Lin, Sebastian U Stich, and Martin Jaggi. 2020. Decentralized Deep Learning with Arbitrary Communication Com- pression (ICLR'20). https://openreview.net/forum?id=SkgGCkrKvH
- Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, and Sebastian Stich. 2020. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates (ICML'20). https://proceedings. mlr.press/v119/koloskova20a.html
- Anastasia Koloskova, Sebastian Stich, and Martin Jaggi. 2019. De- centralized stochastic optimization and gossip algorithms with com- pressed communication (ICML'19). https://proceedings.mlr.press/v97/ koloskova19a.html
- Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2014. The CIFAR-10 dataset. 55, 5 (2014). https://www.cs.toronto.edu/~kriz/cifar.html
- Fan Lai, Yinwei Dai, Sanjay Singapuram, et al. 2022. FedScale: Bench- marking Model and System Performance of Federated Learning at Scale (ICML'22). https://proceedings.mlr.press/v162/lai22a.html
- Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. 2017. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gra- dient Descent (NIPS'17). https://proceedings.neurips.cc/paper_files/ paper/2017/file/f75526659f31040afeb61cb7133e4e6d-Paper.pdf
- Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training (ICLR'18). https://openreview.net/forum?id= SkhQHMW0W
- Yang Liu, Tao Fan, Tianjian Chen, Qian Xu, and Qiang Yang. 2021. FATE: An Industrial Grade Platform for Collaborative Learning With Data Protection. J. Mach. Learn. Res. 22, 226 (2021). http://jmlr.org/ papers/v22/20-815.html
- Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data (AISTATS'17). https://proceedings. mlr.press/v54/mcmahan17a/mcmahan17a.pdf
- Message Passing Interface Forum. 2021. MPI: A Message-Passing In- terface Standard Version 4.0. https://www.mpi-forum.org/docs/mpi- 4.0/mpi40-report.pdf
- Christodoulos Pappas, Dimitris Chatzopoulos, Spyros Lalis, and Manolis Vavalis. 2021. IPLS: A Framework for Decentralized Fed- erated Learning (IFIP Networking'21). https://doi.org/10.23919/ IFIPNetworking52078.2021.9472790
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Brad- bury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS'19. https://proceedings.neurips.cc/paper_files/paper/2019/ file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
- Holger R Roth, Yan Cheng, Yuhong Wen, Isaac Yang, Ziyue Xu, YuanT- ing Hsieh, Kristopher Kersten, Ahmed Harouni, Can Zhao, Kevin Lu, Zhihong Zhang, Wenqi Li, Andriy Myronenko, Dong Yang, Sean Yang, Nicola Rieke, Abood Quraini, Chester Chen, Daguang Xu, Nic Ma, Prerna Dogra, Mona G Flores, and Andrew Feng. 2022. NVIDIA FLARE: Federated Learning from Simulation to Real-World. In Work- shop on Federated Learning: Recent Advances and New Challenges. https://openreview.net/forum?id=hD9QaIQTL_f
- Rishi Sharma et al. 2022. decentralizepy: An open-source decen- tralized learning research framework. https://github.com/sacs- epfl/decentralizepy
- Nikko Strom. 2015. Scalable distributed DNN training using commodity GPU cloud computing. In 16th Annual Conference of the International Speech Communication Association (INTER- SPEECH'15). https://www.isca-speech.org/archive_v0/interspeech_ 2015/papers/i15_1488.pdf
- Thijs Vogels, Hadrien Hendrikx, and Martin Jaggi. 2022. Beyond spectral gap: the role of the topology in decentralized learning (NeurIPS'22). https://proceedings.neurips.cc/paper_files/paper/2022/ file/61162d94822d468ee6e92803340f2040-Paper-Conference.pdf
- Thijs Vogels, Sai Praneeth Karimireddy, and Martin Jaggi. 2020. Prac- tical Low-Rank Communication Compression in Decentralized Deep Learning (NeurIPS'20). https://proceedings.neurips.cc/paper_files/ paper/2020/file/a376802c0811f1b9088828288eb0d3f0-Paper.pdf
- Milos Vujasinovic. 2023. Secure Aggregation on Sparse Mod- els in Decentralized Learning Systems. Master's thesis. EPFL. https://www.epfl.ch/labs/sacs/wp-content/uploads/2023/02/Secure_ Aggregation_on_Sparse_Models_in_Decentralized_Learning_ Systems___Milos_Vujasinovic.pdf
- Lin Xiao, Stephen Boyd, and Seung-Jean Kim. 2007. Distributed average consensus with least-mean-square deviation. J. Parallel and Distrib. Comput. 67, 1 (2007). https://doi.org/10.1016/j.jpdc.2006.08.010
- Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and Françoise Beaufays. 2018. Applied federated learning: Improving google keyboard query suggestions. (2018). arXiv:1812.02903
- Tongtian Zhu, Fengxiang He, Lan Zhang, Zhengyang Niu, Mingli Song, and Dacheng Tao. 2022. Topology-aware generalization of decentralized SGD (ICML'22). https://proceedings.mlr.press/v162/ zhu22d.html
- Alexander Ziller, Andrew Trask, Antonio Lopardo, et al. 2021. PySyft: A library for easy federated learning. In Federated Learning Systems. https://doi.org/10.1007/978-3-030-70604-3_5