Key research themes
1. How can recovery-oriented computing methodologies optimize system failure recovery to improve availability and reduce total cost of ownership?
This theme explores methods of designing computing systems that can recover quickly and efficiently from failures by rethinking recovery as a first-class design goal rather than a secondary concern, thereby enhancing system availability, reducing downtime costs, and lowering the total cost of ownership (TCO). The focus is on recovery-oriented computing (ROC) principles that target networked services with metrics such as availability, rapid scale, and change, analyzing failure causes and developing techniques for automatic and effective failure recovery.
2. What are the formal models and programming paradigms that enable systematic recovery and self-healing in software systems after failures?
This theme investigates formal approaches and frameworks for implementing recovery and self-healing capabilities in software systems. It includes transactional compensation models enabling undoing committed transactions without cascading aborts, recovery-oriented programming paradigms embedding monitoring and recovery actions for safety and liveness properties, and systems exhibiting self-healing inspired by biological analogies to autonomously detect, diagnose, and repair faults. The goal is to provide theoretical and practical bases for building software resilient to transient and permanent faults.
3. How can failure recovery be optimized in storage and network systems through algorithmic and architectural techniques to ensure minimum performance degradation during faults?
This theme considers optimizing failure recovery in storage and network infrastructures, focusing on minimizing recovery overhead, ensuring consistency without rollback cascades, and maintaining service continuity under component failures. It covers topics such as I/O optimal recovery schemes for erasure-coded storage minimizing read/write operations needed for reconstruction, failure recovery architectures in cluster computing free from domino effect, and fault-tolerance frameworks in software-defined networking (SDN) and optical transport networks.