Papers by Alexandre Santana

Software: Practice and Experience, 2021
Global schedulers are components in parallel runtime libraries that distribute the application... more Global schedulers are components in parallel runtime libraries that distribute the application's workload across physical resources. More often than not, applications showcase dynamic load imbalance and require customized scheduling solutions to avoid wasting resources. Some libraries lack support for user‐defined schedulers and developers resort to unofficial extensions that are harder to reuse and maintain. We propose a global scheduler software design, entitled ARTful model, to create user‐defined solutions with minimal alterations in the runtime library. Our model uses a component‐based design to separate components from the runtime library and the scheduling policy implementation. The ARTful modeldescribes the interface of a portable scheduler library, allowing policies to operate on different runtime libraries. We study the overhead induced by our design through our ARTful library implementation metaprogramming‐oriented global scheduling library using workload‐aware schedu...
2018 Symposium on High Performance Computing Systems (WSCAD), 2018
Global schedulers are components used in parallel solutions, specially in dynamic applications, t... more Global schedulers are components used in parallel solutions, specially in dynamic applications, to optimize resource usage. Nonetheless, their development is a cumbersome process due to necessary adaptations to cope with the programming interfaces and abstractions of runtime systems. This paper proposes a model to dissociate schedulers from runtime systems to lower software complexity. Our model is based on the scheduler breakdown into modular and reusable concepts that better express the scheduler requirements. Through the use of meta-programming and design patterns, we were able to achieve fully reusable workload-aware scheduling strategies with up to 63% fewer lines of code with negligible run time overhead.

Periodical load balancing heuristics are employed in parallel iterative applications to assure th... more Periodical load balancing heuristics are employed in parallel iterative applications to assure the effective use of high performance computing platforms. Work stealing is one of the most widely used load balancing techniques, but it is not the most friendly for iterative applications. Optimal mapping of tasks to machines, while minimizing overall makespan, is regarded as an NP-Hard problem; so suboptimal heuristics are used to schedule these tasks in feasible time. Among the existing approaches, distributed load balancers are the most scalable for iterative applications and have much to profit from work stealing. In this work, we propose the discretization of application workload for load balancing, as well as two distributed load balancers: PackDrop, which is based on constrained work diffusion; and PackSteal, which is based on work stealing. Our algorithms group tasks in batches before migration, creating packs of homogeneous load to make scheduling decisions in an informed and ti...

Journal of Parallel and Distributed Computing, 2021
The scalability of high-performance, parallel iterative applications is directly affected by how ... more The scalability of high-performance, parallel iterative applications is directly affected by how well they use the available computing resources. These applications are subject to load imbalance due to the nature and dynamics of their computations. It is common that high performance systems employ periodic load balancing to tackle this issue. Dynamic load balancing algorithms redistribute the application's workload using heuristics to circumvent the NP-hard complexity of the problem However, scheduling heuristics must be fast to avoid hindering application performance when distributing the workload on large and distributed environments. In this work, we present a technique for low overhead, high quality scheduling decisions for parallel iterative applications. The technique relies on combined application workload information paired with distributed scheduling algorithms. An initial distributed step among scheduling agents group application tasks in packs of similar load to minimize messages among them. This information is used by our scheduling algorithm, Pack-StealLB, for its distributed-memory work stealing heuristic. Experimental results showed that PackStealLB is able to improve the performance of a molecular dynamics benchmark by up to 41%, outperforming other scheduling algorithms in most scenarios over almost one thousand cores.
Uploads
Papers by Alexandre Santana