Policy Optimization

description21 papers

group0 followers

lightbulbAbout this topic

Policy optimization is a process in decision-making and control theory that seeks to identify the most effective strategies or policies to achieve specific objectives, often through mathematical modeling and computational techniques. It involves evaluating and refining policies based on performance metrics to enhance outcomes in various contexts, such as economics, engineering, and artificial intelligence.

lightbulbAbout this topic

Key research themes

1. How can policy gradient and actor-critic methods improve sample efficiency and stability in reinforcement learning?

This research area focuses on developing policy optimization algorithms based on policy gradients and actor-critic architectures that achieve better sample efficiency, convergence guarantees, and stability, especially in continuous control domains. The motivation arises because traditional policy gradient methods suffer from high variance and sample inefficiency, and trust region approaches, while effective, can be computationally expensive or incompatible with certain architectures. Combining natural gradients, compatible function approximation, and novel surrogate objectives are core to advancing these methods.

Proximal Policy Optimization Algorithms

by Prafulla Dhariwal

2022

Key finding: Introduced a new family of policy gradient methods called Proximal Policy Optimization (PPO) that use a clipped surrogate objective enabling multiple minibatch updates on the same data. PPO trades off performance guarantees... Read more

articleView Paper downloadDownload

Natural actor-critic algorithms

by Shalabh Bhatnagar

2021, Automatica

Key finding: Developed natural actor-critic methods that incorporate natural gradients computed via the inverse Fisher information matrix, improving invariance to parameterization and leading to faster and more stable policy updates. The... Read more

articleView Paper downloadDownload

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

by Shalabh Bhatnagar

2025, 2022 International Joint Conference on Neural Networks (IJCNN)

Key finding: Proposed an off-policy natural actor-critic algorithm that uses state-action distribution correction to handle off-policy data and compatible features allowing the use of nonlinear function approximators like neural networks... Read more

articleView Paper downloadDownload

Combining Backpropagation with Equilibrium Propagation to improve an Actor-Critic RL framework

by Eric Chalmers

2024

Key finding: Presented the first reinforcement learning actor-critic architecture in which the actor is trained via Equilibrium Propagation, a biologically plausible learning algorithm, and the critic via backpropagation. The hybrid... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can policy optimization leverage temporal abstraction and goal-conditioned policies to improve decision making?

This line of research investigates the augmentation of policy optimization algorithms with temporal abstraction mechanisms such as options and goal-conditioned policies. The aim is to learn policies that operate over multiple time scales or conditioned on specific goals to capture environment dynamics more effectively. This facilitates better transferability, hierarchical learning, and improved exploration by structuring the policy space at a more functional level, beyond primitive actions. Methods include deriving option-critic architectures with policy gradient theorems for options and learning actionable latent representations from goal-conditioned policies.

The Option-Critic Architecture

by amit kumar

2021

Key finding: Derived policy gradient theorems for options, enabling simultaneous learning of intra-option policies, termination conditions, and the policy over options without additional extrinsic rewards or subgoals. This approach scales... Read more

articleView Paper downloadDownload

Goal-Conditioned Policies

by ABHISHEK GUPTA

2023

Key finding: Introduced actionable representations for control (ARC) derived from goal-conditioned policies where Euclidean distances in the learned representation correspond to expected differences in actions required to reach different... Read more

articleView Paper downloadDownload

Stochastic abstract policies: generalizing knowledge to improve reinforcement learning

by Anna Costa

2017, IEEE transactions on cybernetics

Key finding: Proposed leveraging stochastic abstract policies, which generalize over previously solved source tasks in relational representations, to accelerate learning in target tasks by transferring policy knowledge in a generalized... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can constrained policy learning from observational or batch data achieve minimax optimal regret under practical constraints?

This thematic area addresses policy optimization when learning from observational or batch data under constraints such as budget, fairness, or functional form. The core challenge is learning treatment assignment or decision policies that satisfy these constraints while optimizing expected outcomes. Researchers develop algorithms with regret guarantees that scale favorably with the complexity of the policy class and derive lower bounds for minimax regret, providing sharp theoretical characterizations and practical algorithms applicable beyond randomized trials including settings with endogenous treatments.

Efficient Policy Learning

by Susan Athey

2023, ArXiv

Key finding: Derived minimax lower bounds for regret in constrained policy learning from observational data and proposed algorithms that asymptotically attain these bounds up to constant factors. The methods handle binary and continuous... Read more

articleView Paper downloadDownload

Reinforcement Learning And Its Application To Control

by Roderic A. Grupen

2016

Key finding: Discussed the use of reinforcement learning for direct and model-free control from observed rewards without relying on value function approximation, emphasizing practical constraints in policy learning, and the importance of... Read more

articleView Paper downloadDownload

Policy Search by Dynamic Programming

by Andrew Sham

2023

Key finding: Proposed a policy search algorithm (PSDP) that, given a base distribution over states indicating typical visitation frequency, can efficiently find good non-stationary policies with performance guarantees after a finite... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Policy Optimization

GRPO Model of Quantum Entanglement

by Michael Popov

2025, RG

New application of our AI Abstract Engineering techniques in quantum theory of entanglement is considered. We design AI experiment with Conway’s quantum particle equipped with mathematical free will ( predicted by Conway’s Strong Free... more

descriptionView Paper arrow_downwardDownload

Applying Reinforcement Learning to Optimize Healthcare Insurance Premium Pricing

by Mylib In

2025, Frontiers in Health Informatics

In recent years, reinforcement learning (RL) has garnered increasing attention for its applications in various domains, including finance, robotics, and healthcare. One critical area in healthcare where RL has shown potential is the... more

descriptionView Paper arrow_downwardDownload

Optimal and Long-Term Dynamic Transport Policy Design: Seeking Maximum Social Welfare through a Pricing Scheme

by daniel de la hoz

2023, International Journal of Sustainable Transportation

descriptionView Paper arrow_downwardDownload

Optimal and Long-Term Dynamic Transport Policy Design: Seeking Maximum Social Welfare through a Pricing Scheme

by Daniel De la hoz

2022, International Journal of Sustainable Transportation

descriptionView Paper arrow_downwardDownload

Optimal and Long-Term Dynamic Transport Policy Design: Seeking Maximum Social Welfare through a Pricing Scheme

by Daniel Sánchez De La Hoz

2022, International Journal of Sustainable Transportation

descriptionView Paper arrow_downwardDownload

An enhanced least-squares approach for reinforcement learning

by Cihan H Dagli

2022, Proceedings of the International Joint Conference on Neural Networks, 2003.

This paper presents an enhanced Least-Squares approach for solving reinforcement learning control problems. Model-free Least-Squares policy iteration (LSPI) method bas been successfully used for this learning domain. Although LSPI is a... more

descriptionView Paper arrow_downwardDownload

Optimal and Long-Term Dynamic Transport Policy Design: Seeking Maximum Social Welfare through a Pricing Scheme

by luis alessandro guzman

2021, International Journal of Sustainable Transportation

descriptionView Paper arrow_downwardDownload

Optimal and Long-Term Dynamic Transport Policy Design: Seeking Maximum Social Welfare through a Pricing Scheme

by LUIS MIGUEL SUAREZ GUZMAN

2021, International Journal of Sustainable Transportation

descriptionView Paper arrow_downwardDownload

Optimal and Long-Term Dynamic Transport Policy Design: Seeking Maximum Social Welfare through a Pricing Scheme

by Luis Alejandro Aybar Guzman

2021, International Journal of Sustainable Transportation