Key research themes
1. How can policy gradient and natural gradient actor-critic methods be adapted for improved sample efficiency and stability in reinforcement learning with function approximation?
This research area investigates actor-critic algorithms leveraging gradient-based policy optimization, focusing on enhancing sample efficiency, convergence stability, and compatibility with function approximation, especially neural networks. It addresses challenges including high variance gradient estimates, off-policy evaluation, and the use of natural gradients to respect the parameterization geometry, enabling robust learning in large or continuous state and action spaces.
2. How can temporally extended actions (options) and hierarchical reinforcement learning be autonomously learned and optimized using policy gradient and natural gradient methods?
This research theme focuses on learning hierarchical policies through options, which are temporally extended actions, enabling scalable and efficient reinforcement learning. It advances methods to autonomously discover, optimize, and terminate such options within a unifying framework, employing policy gradient theory and natural gradient approaches to learn intra-option policies, termination conditions, and policies over options without predefined subgoals or extrinsic rewards.
3. What are effective algorithmic adaptations and architectures for actor-critic reinforcement learning tailored to control and autonomous systems with safety, sample efficiency, and application-specific constraints?
Research under this theme develops specialized actor-critic methods for real-world control applications, addressing challenges such as constrained computation and memory (e.g., in IoT devices), safety-critical environments, sample inefficiency, and model biases. Techniques include adaptive learning rates, biologically plausible training methods, multi-step evaluations, integration of robust control techniques, and human-inspired experience inference to improve reactivity, convergence speed, stability, and robustness of reinforcement learning controllers.