Academia.eduAcademia.edu

Adaptive Critic

description18 papers
group9 followers
lightbulbAbout this topic
An adaptive critic is a computational model used in reinforcement learning that employs two components: a critic, which evaluates the actions taken by an agent, and an actor, which selects actions based on the critic's feedback. This framework facilitates the optimization of decision-making processes in dynamic environments.
lightbulbAbout this topic
An adaptive critic is a computational model used in reinforcement learning that employs two components: a critic, which evaluates the actions taken by an agent, and an actor, which selects actions based on the critic's feedback. This framework facilitates the optimization of decision-making processes in dynamic environments.

Key research themes

1. How can policy gradient and natural gradient actor-critic methods be adapted for improved sample efficiency and stability in reinforcement learning with function approximation?

This research area investigates actor-critic algorithms leveraging gradient-based policy optimization, focusing on enhancing sample efficiency, convergence stability, and compatibility with function approximation, especially neural networks. It addresses challenges including high variance gradient estimates, off-policy evaluation, and the use of natural gradients to respect the parameterization geometry, enabling robust learning in large or continuous state and action spaces.

Key finding: This work introduces four actor-critic algorithms combining natural gradient methods and function approximation, proving their convergence and demonstrating that natural gradients reduce sensitivity to parameterization and... Read more
Key finding: Proposes an off-policy natural actor-critic algorithm utilizing state-action distribution correction with compatible features that enables integration with arbitrary neural networks approximating policy and value functions,... Read more
Key finding: Presents a class of actor-critic methods where the critic employs temporal difference learning with a linearly parameterized value function approximation tailored to the actor's parameterization. This approach guarantees... Read more
Key finding: Introduces the Randomized Policy Optimizer (RPO), an actor-critic algorithm with a modular design using parameterized action distributions and neural network approximators for policy and value functions. The method optimizes... Read more
Key finding: Proposes the TD with Regularized Corrections (TDRC) algorithm balancing the simplicity and performance of TD with the assured convergence of Gradient TD methods. It achieves practical stability and improved sample efficiency... Read more

2. How can temporally extended actions (options) and hierarchical reinforcement learning be autonomously learned and optimized using policy gradient and natural gradient methods?

This research theme focuses on learning hierarchical policies through options, which are temporally extended actions, enabling scalable and efficient reinforcement learning. It advances methods to autonomously discover, optimize, and terminate such options within a unifying framework, employing policy gradient theory and natural gradient approaches to learn intra-option policies, termination conditions, and policies over options without predefined subgoals or extrinsic rewards.

Key finding: Develops the option-critic framework deriving policy gradient theorems for simultaneous learning of intra-option policies, termination functions, and policy over options without additional reward signals or subgoals.... Read more
Key finding: Extends the option-critic architecture to incorporate natural gradients by deriving Fisher information matrices for option policies and termination functions, enabling efficient natural gradient updates. Employs compatible... Read more
Key finding: Reviews adaptive critic designs as neural network-based approximations of dynamic programming, emphasizing their roots in reinforcement learning for continuous control tasks requiring temporally extended action sequences. The... Read more

3. What are effective algorithmic adaptations and architectures for actor-critic reinforcement learning tailored to control and autonomous systems with safety, sample efficiency, and application-specific constraints?

Research under this theme develops specialized actor-critic methods for real-world control applications, addressing challenges such as constrained computation and memory (e.g., in IoT devices), safety-critical environments, sample inefficiency, and model biases. Techniques include adaptive learning rates, biologically plausible training methods, multi-step evaluations, integration of robust control techniques, and human-inspired experience inference to improve reactivity, convergence speed, stability, and robustness of reinforcement learning controllers.

Key finding: Proposes the LAC-AB algorithm combining linear actor-critic with an adaptive Adam-based learning rate optimized for fast reactivity to environmental changes in power-constrained IoT nodes. Demonstrates via real solar... Read more
Key finding: Develops a continuous-time adaptive critic controller using two neural networks (actor and critic) and integrates the Robust Integral of the Sign of the Error (RISE) feedback technique to guarantee semiglobal asymptotic... Read more
Key finding: Presents the first reinforcement learning architecture applying Equilibrium Propagation (EP) to train the actor network within an actor-critic framework, improving biological plausibility over backpropagation while... Read more
Key finding: Introduces a novel multi-step heuristic dynamic programming (MsHDP) algorithm initialized from zero cost function, proving convergence to the solution of the Hamilton-Jacobi-Bellman equation with stability guarantees. The... Read more
Key finding: Proposes a human-behavior inspired experience inference learning approach combining hippocampus-like model-based reference system and neocortex-like adaptive dynamic programming (reinforcement learning) with striatum-like... Read more

All papers in Adaptive Critic

Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is... more
Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is... more
The control of dissipative distributed parameter systems with strong convective phenomena is considered, employing model order reduction. The accuracy of the derived reduced order model (ROM) and the associated observer may decrease as... more
Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is... more
This paper is concerned with a novel integrated multi-step heuristic dynamic programming (MsHDP) algorithm for solving optimal control problems. It is shown that, initialized by the zero cost function, MsHDP can converge to the optimal... more
The purpose of this paper is to describe the design, development and simulation of a real time controller for an intelligent, vision guided robot. The use of a creative controller that can select its own tasks is demonstrated. This... more
Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is... more
Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is... more
Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is... more
Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is... more
Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is... more
The mechanistic model of the phytoplankton photosynthesis-light intensity relationship and nitrogen transformation cycle are investigated. Assuming that phytoplankton regulates its photosynthetic production rate with certain strategy... more
Even though dynamic programming offers an optimal control solution in a state feedback form, the method is overwhelmed by computational and storage requirements. Approximate dynamic programming implemented with an Adaptive Critic (AC)... more
Intelligent mobile robots must often operate in an unstructured environment cluttered with obstacles and with many possible action paths to accomplish a variety of tasks. Such machines have many potential useful applications in medicine,... more
The purpose of this paper is to describe the design, development and simulation of a real time controller for an intelligent, vision guided robot. The use of a creative controller that can select its own tasks is demonstrated. This... more
Adaptive critic (AC) based controllers are typically discrete and/or yield a uniformly ultimately bounded stability result because of the presence of disturbances and unknown approximation errors. A continuous-time AC controller is... more
Aeroelastic study of flight vehicles has been a subject of great interest and research in the last several years. Aileron reversal and flutter related problems are due in part to the elasticity of a typical airplane. Structural dynamics... more
This paper addresses a single network adaptive critic(SNAC) based continuous time near-optimal control strategy for robotic manipulator with partially known dynamics. The optimal control of the robot manipulator is generalized to the... more
Abstract: Beavers are often found to be in conflict with human interests by creating nuisances like building dams on flowing water (leading to flooding), blocking irrigation canals, cutting down timbers, etc. At the same time they... more
Abstract: A neural network based optimal control synthesis approach is presented for systems modeled by partial di!erential equations. The problem is formulated via discrete dynamic programming and the necessary conditions of optimality... more
Download research papers for free!