Markov Decision Processes

description506 papers

group165 followers

lightbulbAbout this topic

Markov Decision Processes (MDPs) are mathematical frameworks used for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs consist of states, actions, transition probabilities, and rewards, enabling the analysis of optimal policies to maximize cumulative rewards over time.

lightbulbAbout this topic

Key research themes

1. What are the computational complexity challenges and algorithmic solutions for solving Markov Decision Processes (MDPs)?

This research focuses on understanding the computational hardness of MDPs, the class of algorithms designed for exact and approximate solutions, and the exploitation of problem-specific structure to improve efficiency. It matters because MDPs underpin many applications from AI planning to operations research, but theoretical polynomial-time solvability contrasts with practical inefficiency, especially for large-scale problems.

On the complexity of solving Markov decision problems

by Michael Littman

2015, Proceedings of the Eleventh …

Key finding: The paper establishes that while any MDP can be formulated as a linear program solvable in polynomial time, the polynomial order is large, rendering these algorithms impractical. Specific MDP algorithms known do not run in... Read more

articleView Paper downloadDownload

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

by Daniel Braun

2025, arXiv (Cornell University)

Key finding: This work proposes a unified framework that integrates information-theoretic constraints with model uncertainty in MDP planning. It formulates a generalized variational principle incorporating both control cost (via... Read more

articleView Paper downloadDownload

Active Learning of Dynamic Bayesian Networks in Markov Decision Processes

by Andrew Barto

2023, Lecture Notes in Computer Science

Key finding: The paper develops an algorithm to learn dynamic Bayesian network (DBN) representations of factored MDPs through trajectory-based exploration, incorporating a novel action selection scheme that maximizes data gathering... Read more

articleView Paper downloadDownload

A Class of Solvable Markov Decision Models with Incomplete Information

by Eugene Feinberg

2025, arXiv (Cornell University)

Key finding: Introduces the class of Markov decision processes with incomplete information (MDPII) featuring semi-uniform Feller transition probabilities, and demonstrates equivalence to belief-state MDPs with the same property. The paper... Read more

articleView Paper downloadDownload

Characterization of the optimal average cost in Markov decision chains driven by a risk-seeking controller

by Hugo Adán Cruz Suárez

2023, Journal of Applied Probability

Key finding: Analyzes Markov decision chains on countable state spaces under risk-sensitive average cost criteria with constant risk-seeking behavior. It proves equality and constancy of optimal inferior and superior limit average value... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can uncertainty, ambiguity, and incomplete/inaccurate information be represented and incorporated in Markov Decision Processes?

This research area investigates extensions of classical MDPs to settings where rewards, state observability, or model parameters are uncertain or imprecise. Capturing such uncertainty more faithfully leads to richer models like fuzzy reward MDPs, partially observable MDPs (POMDPs), and robust planners that consider adversarial or misspecified transitions. Accounting for this uncertainty is essential for realistic decision-making and policy robustness.

Markov decision processes with fuzzy rewards

by Yuji Yoshida

2023

Key finding: Introduces fuzzy set representations for vector-valued rewards in infinite horizon discounted MDPs. Defines infinite horizon fuzzy expected discounted reward (FEDR) characterized as a unique fixed point of a contractive... Read more

articleView Paper downloadDownload

A Class of Solvable Markov Decision Models with Incomplete Information

by Eugene Feinberg

2025, arXiv (Cornell University)

Key finding: (Also relevant here) Develops conditions under which MDPs with incomplete state observations (MDPIIs) have well-defined belief-state MDP representations with good continuity properties that guarantee existence of optimal... Read more

articleView Paper downloadDownload

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

by Daniel Braun

2025, arXiv (Cornell University)

Key finding: The framework explicitly includes model uncertainty by biasing beliefs toward worst-case or best-case models via information processing (KL divergence) constraints. This incorporation of model misspecification uncertainty... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can Markov Decision Processes be applied and extended in specific domains such as autonomous driving and healthcare modeling?

This theme covers the utilization of advanced MDP models—often augmented by probabilistic logic or fuzzy representations—for behavior selection, planning, and economic evaluation in applied contexts. The focus is on how tailored MDP frameworks can effectively model complex temporal decision problems in domains like self-driving car behavior control and healthcare resource allocation, incorporating domain-specific constraints and uncertainties.

Autonomous Behavior Selection For Self-driving Cars Using Probabilistic Logic Factored Markov Decision Processes

by Héctor H Avilés Arriaga

2024, Applied Artificial Intelligence

Key finding: Proposes probabilistic logic factored MDPs (PL-fMDPs) combining probabilistic logic programming with factored MDPs to generate interpretable, rule-based behavior selection policies for self-driving cars. Evaluation in a... Read more

articleView Paper downloadDownload

Probabilistic logic Markov decision processes for modeling driving behaviors in self-driving cars

by Héctor H Avilés Arriaga and

2023, Ibero-American Conference on Artificial Intelligence

Key finding: Develops an MDP formalism encoded in probabilistic logic (MDP-ProbLog) that expresses driving scenarios with logical rules and probabilistic facts to select optimal driving actions (e.g., overtaking, keeping distance).... Read more

articleView Paper downloadDownload

Markov Models in health care

by Desiree Zouainll

2021

Key finding: Reviews the application of Markov chain models in clinical decision making and economic evaluation of chronic diseases, emphasizing the ability of Markov models to represent disease progression via discrete health states over... Read more

articleView Paper downloadDownload

A solution approach for sponsored search advertising and dynamic pricing for a perishable product and an online retailer with budget constraint

by Shabnam Salehi

2023

Key finding: Develops and analyzes a stochastic dynamic programming approach integrating sponsored search advertising budget allocation with dynamic pricing for perishable inventory over a finite horizon. Proves structural properties of... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Markov Decision Processes

AsistO: A Qualitative MDP-based Recommender System for Power Plant Operation

by Alberto Reyes

2025, Computación y Sistemas

This paper proposes a novel and practical model-based learning approach with iterative refinement for solving continuous (and hybrid) Markov decision processes. Initially, an approximate model is learned using conventional sampling... more

descriptionView Paper arrow_downwardDownload

A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events

by Vivek Borkar

2025, Journal of Machine Learning Research

We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures associated with a Markov chain conditioned on a... more

descriptionView Paper arrow_downwardDownload

Optimal control of random walks, birth and death processes, and queues

by Richard Serfozo

2025, Advances in Applied Probability

This is a study of simple random walks, birth and death processes, and M/M/s queues that have transition probabilities and rates that are sequentially controlled at jump times of the processes. Each control action yields a one-step reward... more

descriptionView Paper arrow_downwardDownload

SULFR: Simulation of Urban Logistic For Reinforcement

by jilles steeve dibangoye

2025, HAL (Le Centre pour la Communication Scientifique Directe)

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more

descriptionView Paper arrow_downwardDownload

Auctions for Resource Allocation in Overlay Networks

by Fernando Paganini

2025, Lecture Notes in Computer Science

The paper studies the problem of allocating bandwidth resources of a Service Overlay Network, to optimize revenue. Clients bid for network capacity in periodically held auctions, under the condition that resources allocated in an auction... more

descriptionView Paper arrow_downwardDownload

Handbook of Markov decision processes: methods and applications

by Eugene Feinberg

2025

From the reviews:" The authors of this book, part of the Medical Radiology-Diagnostic Imaging and radiation Oncology series, present a sound approach to diagnosis of lung disease based on the CT appearance of the pulmonary... more

descriptionView Paper arrow_downwardDownload

Optimal claim behaviour for vehicle damage insurances

by Nico Dellaert

2025, Insurance Mathematics & Economics

descriptionView Paper arrow_downwardDownload

New prioritized value iteration for Markov decision processes

by jose miguel ramirez pinales

2025, Artificial Intelligence Review

The problem of solving large Markov decision processes accurately and quickly is challenging. Since the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the... more

descriptionView Paper arrow_downwardDownload

Integrating Partial Model Knowledge in Model Free RL Algorithms

by Ron Meir

2025

In reinforcement learning an agent uses online feedback from the environment and prior knowledge in order to adaptively select an effective policy. Model free approaches address this task by directly mapping external and internal states... more

descriptionView Paper arrow_downwardDownload

Integrating a partial model into model free reinforcement learning

by Ron Meir

2025, Journal of Machine Learning Research

descriptionView Paper arrow_downwardDownload

Decomposition of Markov Decision Processes Using Directed Graphs

by François Charpillet

2024, In Poster session of the European …

descriptionView Paper arrow_downwardDownload

Lexicographic refinements in possibilistic decision trees and finite-horizon Markov decision processes

by Régis Sabbadin

2024, Fuzzy Sets and Systems

Possibilistic de cision theory has be e n propose d twe nty years ago and has had se ve ral extensions since the n. Even though ap pe aling for its ability to handle qualitative decision proble ms, possibilistic decision the ory suffe rs... more

descriptionView Paper arrow_downwardDownload

Approximating spatial markov decision processes for environmental management

by Régis Sabbadin

2024

In this paper we will focus on spatialized decision problems which we propose to model in the framework of (highly) multidimensional Markov Decision Processes (MDPs) which exhibit only local dependencies between variables. We propose to... more

descriptionView Paper arrow_downwardDownload

Ant Colony Optimization Revisited from a Randomized Shortest Path Perspective

by Amin Mantrach

2024

In this letter, it is shown that the randomized shortest-path framework (RSP, [15]) provides a theoretical interpretation of a class of ant colony optimization (ACO) algorithms, enjoying some nice properties. According to RSP, ants are... more

descriptionView Paper arrow_downwardDownload

Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor

by Peter Bro Miltersen

2024, arXiv (Cornell University)

Ye showed recently that the simplex method with Dantzig pivoting rule, as well as Howard's policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More... more

descriptionView Paper arrow_downwardDownload

Constrained discounted Markov decision processes with Borel state spaces

by Eugene Feinberg

2024, Automatica

We study discrete-time discounted constrained Markov decision processes (CMDPs) with Borel state and action spaces. These CMDPs satisfy either weak (W) continuity conditions, that is, the transition probability is weakly continuous and... more

descriptionView Paper arrow_downwardDownload

On a Version of Dontchev and Hager's Inverse Mapping Theorem

by THANAA ABDULRAHMAN ALARFAJ

2024

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

descriptionView Paper arrow_downwardDownload

Fast Algorithms for Energy Games in Special Cases

by Antonis Skarlatos

2024, arXiv (Cornell University)

In this paper, we study algorithms for special cases of energy games, a class of turn-based games on graphs that show up in the quantitative analysis of reactive systems. In an energy game, the vertices of a weighted directed graph belong... more

descriptionView Paper arrow_downwardDownload

Magnifying Lens Abstraction for Stochastic Games with Discounted and Long-run Average Objectives

by Pritam roy

2024

Turn-based stochastic games and its important subclass Markov decision processes (MDPs) provide models for systems with both probabilistic and nondeterministic behaviors. We consider turn-based stochastic games with two classical... more

descriptionView Paper arrow_downwardDownload

Practical Linear Value-approximation Techniques for First-order MDPs

by Craig Boutilier

2024, arXiv (Cornell University)

Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to... more

descriptionView Paper arrow_downwardDownload

Queueing Network Models of Ambulance Offload Delays

by Eman Almehdawe

2024

I am most grateful to my supervisors Prof. Elizabeth Jewkes and Prof. Qi-Ming He for their guidance, support and inspiration throughout my Ph.D. studies. Thank you Prof. Jewkes for your encouragement and support during my Masters degree... more

descriptionView Paper arrow_downwardDownload

Queueing Network Models of Ambulance Offload Delays

by Eman Almehdawe

2024

descriptionView Paper arrow_downwardDownload

Production and Inventory Control of a Single Product Assemble-to-Order System with Multiple Customer Classes

by Mohsen Elhafsi

2024, Management Science

We consider the optimal production and inventory control of an assemble-to-order system with m components, one end-product, and n customer classes. A control policy specifies when to produce each component and, whenever an order is... more

descriptionView Paper arrow_downwardDownload

A Production-Inventory System With Both Patient and Impatient Demand Classes

by Mohsen Elhafsi

2024, IEEE Transactions on Automation Science and Engineering

We consider a production-inventory system with two customer classes, one patient and one impatient. Orders from the patient class can be backordered if needed while orders from the impatient class must be rejected if they cannot be... more

descriptionView Paper arrow_downwardDownload

Dynamic admission control in a call center with one shared and two dedicated service facilities

by Lerzan Örmeci

2024, Report Eurandom

Calls of two classes arrive at a call center according to two independent Poisson processes. The center has two dedicated stations, one for each class, and one shared station. All three stations consist of parallel servers and no waiting... more

descriptionView Paper arrow_downwardDownload

A Reinforcement Learning Based Approach for Joint Multi-Agent Decision Making

by Vaneet Aggarwal

2024, ArXiv

Reinforcement Learning (RL) is being increasingly applied to optimize complex functions that may have a stochastic component. RL is extended to multi-agent systems to find policies to optimize systems that require agents to coordinate or... more

descriptionView Paper arrow_downwardDownload

Accelerating Iterative Methods for Bounded Reachability Probabilities in Markov Decision Processes

by Journal of Computer and Knowledge Engineering

2024, Journal of Computer and Knowledge Engineering

Probabilistic model checking is a formal method for verification of the quantitative and qualitative properties of computer systems with stochastic behaviors. Markov Decision Processes (MDPs) are well-known formalisms for modeling this... more

descriptionView Paper arrow_downwardDownload

PREDICTIVE LINGUISTICS, INNER LANGUAGE & MENTAL HEALTH

by Prof. Mathieu Guidere

2024, JARHSS

Predictive linguistics is a growing field at the interface of language sciences, cognitive sciences, and artificial intelligence, focusing on how humans and machines use predictive processes to process and produce language. This... more

descriptionView Paper arrow_downwardDownload

The Essential Dynamics Algorithm: Essential Results

by MARTIN ALCOCER VAZQUEZ Martin

2024

This paper presents a novel algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces that trades speed for accuracy. A transform of the stochastic MDP into a deterministic... more

descriptionView Paper arrow_downwardDownload

Answer set programming for non-stationary Markov decision processes

by Ramon Lopez de Mantaras

2024, Applied Intelligence

Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision... more

descriptionView Paper arrow_downwardDownload

Production of a New Drug: A Sequential Investment ProcessUnder Uncertainty

by Marcello Basili

2024, RePEc: Research Papers in Economics

On the basis of a database of more than 80 thousand records on total retails and production costs of the pharmaceutical industry worldwide we consider four classes of drugs. We evaluate the expected profits of an investment in a new drug... more

descriptionView Paper arrow_downwardDownload

Accelerating flat reinforcement learning on a robot by using subgoals in a hierarchical framework

by pieter jonker

2024

descriptionView Paper arrow_downwardDownload

Risk preferences and optimal management of uneven-aged forests in the presence of climate change: a Markov decision process approach

by Régis Sabbadin

2024

Climatic changes will affect the occurrence probability of extreme windstorms. Consequently, management of uneven-aged forests can only be optimized correctly if changes in climatic conditions are considered. This article determines the... more

descriptionView Paper arrow_downwardDownload

Win‐wins for biodiversity and ecosystem service conservation depend on the trophic levels of the species providing services

by Régis Sabbadin

2024, Journal of Applied Ecology

Confronted by significant impacts to ecosystems world‐wide, decision makers face the challenge of maintaining both biodiversity and the provision of ecosystem services (ES). However, the objectives of managing biodiversity and supplying... more

descriptionView Paper arrow_downwardDownload

Empirical comparison of probabilistic and possibilistic Markov decision processes algorithms

by Régis Sabbadin

2024

Classical stochastic Markov Decision Processes (MDPs) and possibilistic MDPs (-MDPs) aim at solving the same kind of problems, involving sequential decision making under uncertainty. The underlying uncertainty model (probabilistic /... more

descriptionView Paper arrow_downwardDownload

A Step toward Decision making in Diagnostic Applications using Single Agent Learning Algorithms

by Dr. Deepak Vidhate

2024

The output of the system is a sequence of actions in some applications. There is no such measure as the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not important; the... more

descriptionView Paper arrow_downwardDownload

Health Aware Planning under uncertainty for UAV missions with heterogeneous teams

by Girish Chowdhary

2024

In large-scale persistent missions, the vehicle capabilities and health often degrade over time. This paper presents a Health Aware Planning (HAP) Framework for longduration complex UAV missions by establishing close feedback between the... more

descriptionView Paper arrow_downwardDownload

Approximate dynamic programming via direct search in the space of value function approximations

by Marcelo Fragoso

2024, European Journal of Operational Research

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the... more

descriptionView Paper arrow_downwardDownload

Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming

by Eugene Feinberg

2024, arXiv: Optimization and Control

This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given... more

descriptionView Paper arrow_downwardDownload

Constrained Discounted Markov Decision Processes and Hamiltonian Cycles

by Eugene Feinberg

2024, Mathematics of Operations Research

This paper establishes new links between stochastic and discrete optimization. We consider the following three problems for discrete time Markov Decision Processes with finite states and action sets: (i) find an optimal deterministic... more

descriptionView Paper arrow_downwardDownload

Constrained Markov Decision Models with Weighted Discounted Rewards

by Eugene Feinberg

2024, Mathematics of Operations Research

This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a different discount factor Such models arise, e.g., in production... more

descriptionView Paper arrow_downwardDownload

Qualitative Logics and Equivalences for Probabilistic Systems

by Luca de Alfaro

2024, Logical Methods in Computer Science

We investigate logics and equivalence relations that capture the qualitative behavior of Markov Decision Processes (MDPs). We present Qualitative Randomized Ctl (Qrctl): formulas of this logic can express the fact that certain temporal... more

descriptionView Paper arrow_downwardDownload

Magnifying Lens Abstraction for Stochastic Games with Discounted and Long-run Average Objectives

by Luca de Alfaro

2024

descriptionView Paper arrow_downwardDownload

Modelling Internet End-to-End Loss Behaviors: A Congestion Control Perspective

by Vinh Bùi

2024, Journal of Computers

This paper proposes a new approach to modelling and controlling Internet end-to-end loss behaviours. Rather than select the model structure from the loss observations as being done previously, we construct a new loss model based on the... more

descriptionView Paper arrow_downwardDownload

A Markovian Approach to Multipath Data Transfer in Overlay Networks

by Vinh Bùi

2024, IEEE Transactions on Parallel and Distributed Systems

The use of multipath routing in overlay networks is a promising solution to improve performance and availability of Internet applications, without the replacement of the existing TCP/IP infrastructure. In this paper, we propose an... more

descriptionView Paper arrow_downwardDownload

My Brain is Full: When More Memory Helps

by Judy Goldsmith

2024, arXiv (Cornell University)

We consider the problem of finding good fi nite-horizon policies for POMDPs under the expected reward metric. The policies con sidered are free finite-memory policies with limited memory; a policy is a mapping from the space of... more

descriptionView Paper arrow_downwardDownload

Solving POMDPs by Searching the Space of Finite Policies

by Nicolas Meuleau

2024, arXiv (Cornell University)

Solving partially observable Markov decision processes (POMDPs) is highly intractable in gen eral, at least in part because the optimal policy may be infi nitely large. In this paper, we ex plore the problem of fi nding the optimal policy... more

descriptionView Paper arrow_downwardDownload

Hierarchical Solution of Markov Decision Processes using Macro-actions

by Nicolas Meuleau

2024, arXiv (Cornell University)

We investigate the use of temporally abstract actions, or macro-actions, in the solution of Markov decision processes. Unlike current mod els that combine both primitive actions and macro-actions and leave the state space un changed, we... more

descriptionView Paper arrow_downwardDownload

Temporal infomax leads to almost deterministic dynamical systems

by Thomas Wennekers

2024, Neurocomputing

The well-known Kullback-Leibler divergence of a random ÿeld from its factorization quantiÿes spatial interdependences of the corresponding stochastic elements. We introduce a generalized measure called 'stochastic interaction' that... more

descriptionView Paper arrow_downwardDownload

Thermodynamic Formalism for Generalized Markov Shifts on Infinitely Many States

by Rodrigo Bissacot

2024, arXiv (Cornell University)

Given a 0-1 infinite matrix A and its countable Markov shift ΣA, one of the authors and M. Laca have introduced a kind of generalized countable Markov shift XA = ΣA ∪ YA, where YA is a special set of finite admissible words. For some of the most studied countable Markov shifts ΣA, XA is a compactification of ΣA, and always it is at least locally compact. We developed the thermodynamic formalism on the space XA, exploring the connections with standard results on ΣA. New phenomena appear, such as new conformal measures and a lengthtype phase transition: the eigenmeasure lives on ΣA at high temperature and lives on YA at low temperature. Using a pressure-point definition proposed by M. Denker and M. Yuri for iterated function systems, we proved that the Gurevich pressure is a natural definition for the pressure function in the generalized setting. For the gauge action, the Gurevich entropy is a critical temperature for the existence of new conformal measures (KMS states) living on YA. We exhibit examples with infinitely (even uncountable) many new extremal conformal measures, undetectable in the usual formalism. We prove that conformal measures always exist at low temperatures when the potential is coercive enough. We characterized a basis of the topology of XA to study the weak * convergence of measures on XA, and we show some cases where the conformal measure living on YA converges to a conformal one living on ΣA. We prove the equivalence among several notions of conformality for locally compact Hausdorff second countable spaces, including quasi-invariant measures for generalized Renault-Deaconu groupoids. Contents 6.1. Renewal shift 57 6.2. Pair renewal shift 61 6.3. Prime renewal shift: infinitely many extremal conformal measures 65 6.4. Infinite entropy and uncountably many extremal conformal measures 65 6.5. Infinite emitters, Y A-families and extremal conformal measures 68 7. Eigenmeasures 69 7.1. An example of length-type phase transition 72 8. Concluding remarks 77 Acknowledgements 78 9. Appendix 79 9.1. Auxiliary results 79 9.2. Proof of the Theorem 65 82 References 85

descriptionView Paper arrow_downwardDownload

Markov Decision Processes

Key research themes

1. What are the computational complexity challenges and algorithmic solutions for solving Markov Decision Processes (MDPs)?

2. How can uncertainty, ambiguity, and incomplete/inaccurate information be represented and incorporated in Markov Decision Processes?

3. How can Markov Decision Processes be applied and extended in specific domains such as autonomous driving and healthcare modeling?

Related Topics

All papers in Markov Decision Processes