Academia.eduAcademia.edu

Markov Decision Processes

description506 papers
group165 followers
lightbulbAbout this topic
Markov Decision Processes (MDPs) are mathematical frameworks used for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs consist of states, actions, transition probabilities, and rewards, enabling the analysis of optimal policies to maximize cumulative rewards over time.
lightbulbAbout this topic
Markov Decision Processes (MDPs) are mathematical frameworks used for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs consist of states, actions, transition probabilities, and rewards, enabling the analysis of optimal policies to maximize cumulative rewards over time.

Key research themes

1. What are the computational complexity challenges and algorithmic solutions for solving Markov Decision Processes (MDPs)?

This research focuses on understanding the computational hardness of MDPs, the class of algorithms designed for exact and approximate solutions, and the exploitation of problem-specific structure to improve efficiency. It matters because MDPs underpin many applications from AI planning to operations research, but theoretical polynomial-time solvability contrasts with practical inefficiency, especially for large-scale problems.

Key finding: The paper establishes that while any MDP can be formulated as a linear program solvable in polynomial time, the polynomial order is large, rendering these algorithms impractical. Specific MDP algorithms known do not run in... Read more
Key finding: This work proposes a unified framework that integrates information-theoretic constraints with model uncertainty in MDP planning. It formulates a generalized variational principle incorporating both control cost (via... Read more
Key finding: The paper develops an algorithm to learn dynamic Bayesian network (DBN) representations of factored MDPs through trajectory-based exploration, incorporating a novel action selection scheme that maximizes data gathering... Read more
Key finding: Introduces the class of Markov decision processes with incomplete information (MDPII) featuring semi-uniform Feller transition probabilities, and demonstrates equivalence to belief-state MDPs with the same property. The paper... Read more
Key finding: Analyzes Markov decision chains on countable state spaces under risk-sensitive average cost criteria with constant risk-seeking behavior. It proves equality and constancy of optimal inferior and superior limit average value... Read more

2. How can uncertainty, ambiguity, and incomplete/inaccurate information be represented and incorporated in Markov Decision Processes?

This research area investigates extensions of classical MDPs to settings where rewards, state observability, or model parameters are uncertain or imprecise. Capturing such uncertainty more faithfully leads to richer models like fuzzy reward MDPs, partially observable MDPs (POMDPs), and robust planners that consider adversarial or misspecified transitions. Accounting for this uncertainty is essential for realistic decision-making and policy robustness.

Key finding: Introduces fuzzy set representations for vector-valued rewards in infinite horizon discounted MDPs. Defines infinite horizon fuzzy expected discounted reward (FEDR) characterized as a unique fixed point of a contractive... Read more
Key finding: (Also relevant here) Develops conditions under which MDPs with incomplete state observations (MDPIIs) have well-defined belief-state MDP representations with good continuity properties that guarantee existence of optimal... Read more
Key finding: The framework explicitly includes model uncertainty by biasing beliefs toward worst-case or best-case models via information processing (KL divergence) constraints. This incorporation of model misspecification uncertainty... Read more

3. How can Markov Decision Processes be applied and extended in specific domains such as autonomous driving and healthcare modeling?

This theme covers the utilization of advanced MDP models—often augmented by probabilistic logic or fuzzy representations—for behavior selection, planning, and economic evaluation in applied contexts. The focus is on how tailored MDP frameworks can effectively model complex temporal decision problems in domains like self-driving car behavior control and healthcare resource allocation, incorporating domain-specific constraints and uncertainties.

Key finding: Proposes probabilistic logic factored MDPs (PL-fMDPs) combining probabilistic logic programming with factored MDPs to generate interpretable, rule-based behavior selection policies for self-driving cars. Evaluation in a... Read more
Key finding: Develops an MDP formalism encoded in probabilistic logic (MDP-ProbLog) that expresses driving scenarios with logical rules and probabilistic facts to select optimal driving actions (e.g., overtaking, keeping distance).... Read more
Key finding: Reviews the application of Markov chain models in clinical decision making and economic evaluation of chronic diseases, emphasizing the ability of Markov models to represent disease progression via discrete health states over... Read more
Key finding: Develops and analyzes a stochastic dynamic programming approach integrating sponsored search advertising budget allocation with dynamic pricing for perishable inventory over a finite horizon. Proves structural properties of... Read more

All papers in Markov Decision Processes

This paper proposes a novel and practical model-based learning approach with iterative refinement for solving continuous (and hybrid) Markov decision processes. Initially, an approximate model is learned using conventional sampling... more
We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures associated with a Markov chain conditioned on a... more
This is a study of simple random walks, birth and death processes, and M/M/s queues that have transition probabilities and rates that are sequentially controlled at jump times of the processes. Each control action yields a one-step reward... more
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
The paper studies the problem of allocating bandwidth resources of a Service Overlay Network, to optimize revenue. Clients bid for network capacity in periodically held auctions, under the condition that resources allocated in an auction... more
From the reviews:" The authors of this book, part of the Medical Radiology-Diagnostic Imaging and radiation Oncology series, present a sound approach to diagnosis of lung disease based on the CT appearance of the pulmonary... more
The problem of solving large Markov decision processes accurately and quickly is challenging. Since the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the... more
In reinforcement learning an agent uses online feedback from the environment and prior knowledge in order to adaptively select an effective policy. Model free approaches address this task by directly mapping external and internal states... more
Possibilistic de cision theory has be e n propose d twe nty years ago and has had se ve ral extensions since the n. Even though ap pe aling for its ability to handle qualitative decision proble ms, possibilistic decision the ory suffe rs... more
In this paper we will focus on spatialized decision problems which we propose to model in the framework of (highly) multidimensional Markov Decision Processes (MDPs) which exhibit only local dependencies between variables. We propose to... more
In this letter, it is shown that the randomized shortest-path framework (RSP, [15]) provides a theoretical interpretation of a class of ant colony optimization (ACO) algorithms, enjoying some nice properties. According to RSP, ants are... more
Ye showed recently that the simplex method with Dantzig pivoting rule, as well as Howard's policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More... more
We study discrete-time discounted constrained Markov decision processes (CMDPs) with Borel state and action spaces. These CMDPs satisfy either weak (W) continuity conditions, that is, the transition probability is weakly continuous and... more
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
In this paper, we study algorithms for special cases of energy games, a class of turn-based games on graphs that show up in the quantitative analysis of reactive systems. In an energy game, the vertices of a weighted directed graph belong... more
Turn-based stochastic games and its important subclass Markov decision processes (MDPs) provide models for systems with both probabilistic and nondeterministic behaviors. We consider turn-based stochastic games with two classical... more
Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to... more
I am most grateful to my supervisors Prof. Elizabeth Jewkes and Prof. Qi-Ming He for their guidance, support and inspiration throughout my Ph.D. studies. Thank you Prof. Jewkes for your encouragement and support during my Masters degree... more
I am most grateful to my supervisors Prof. Elizabeth Jewkes and Prof. Qi-Ming He for their guidance, support and inspiration throughout my Ph.D. studies. Thank you Prof. Jewkes for your encouragement and support during my Masters degree... more
We consider the optimal production and inventory control of an assemble-to-order system with m components, one end-product, and n customer classes. A control policy specifies when to produce each component and, whenever an order is... more
We consider a production-inventory system with two customer classes, one patient and one impatient. Orders from the patient class can be backordered if needed while orders from the impatient class must be rejected if they cannot be... more
Calls of two classes arrive at a call center according to two independent Poisson processes. The center has two dedicated stations, one for each class, and one shared station. All three stations consist of parallel servers and no waiting... more
Reinforcement Learning (RL) is being increasingly applied to optimize complex functions that may have a stochastic component. RL is extended to multi-agent systems to find policies to optimize systems that require agents to coordinate or... more
Probabilistic model checking is a formal method for verification of the quantitative and qualitative properties of computer systems with stochastic behaviors. Markov Decision Processes (MDPs) are well-known formalisms for modeling this... more
Predictive linguistics is a growing field at the interface of language sciences, cognitive sciences, and artificial intelligence, focusing on how humans and machines use predictive processes to process and produce language. This... more
This paper presents a novel algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces that trades speed for accuracy. A transform of the stochastic MDP into a deterministic... more
Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision... more
On the basis of a database of more than 80 thousand records on total retails and production costs of the pharmaceutical industry worldwide we consider four classes of drugs. We evaluate the expected profits of an investment in a new drug... more
Climatic changes will affect the occurrence probability of extreme windstorms. Consequently, management of uneven-aged forests can only be optimized correctly if changes in climatic conditions are considered. This article determines the... more
Confronted by significant impacts to ecosystems world‐wide, decision makers face the challenge of maintaining both biodiversity and the provision of ecosystem services (ES). However, the objectives of managing biodiversity and supplying... more
Classical stochastic Markov Decision Processes (MDPs) and possibilistic MDPs (-MDPs) aim at solving the same kind of problems, involving sequential decision making under uncertainty. The underlying uncertainty model (probabilistic /... more
The output of the system is a sequence of actions in some applications. There is no such measure as the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not important; the... more
In large-scale persistent missions, the vehicle capabilities and health often degrade over time. This paper presents a Health Aware Planning (HAP) Framework for longduration complex UAV missions by establishing close feedback between the... more
This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the... more
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given... more
This paper establishes new links between stochastic and discrete optimization. We consider the following three problems for discrete time Markov Decision Processes with finite states and action sets: (i) find an optimal deterministic... more
This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a different discount factor Such models arise, e.g., in production... more
We investigate logics and equivalence relations that capture the qualitative behavior of Markov Decision Processes (MDPs). We present Qualitative Randomized Ctl (Qrctl): formulas of this logic can express the fact that certain temporal... more
Turn-based stochastic games and its important subclass Markov decision processes (MDPs) provide models for systems with both probabilistic and nondeterministic behaviors. We consider turn-based stochastic games with two classical... more
This paper proposes a new approach to modelling and controlling Internet end-to-end loss behaviours. Rather than select the model structure from the loss observations as being done previously, we construct a new loss model based on the... more
The use of multipath routing in overlay networks is a promising solution to improve performance and availability of Internet applications, without the replacement of the existing TCP/IP infrastructure. In this paper, we propose an... more
We consider the problem of finding good fi nite-horizon policies for POMDPs under the expected reward metric. The policies con sidered are free finite-memory policies with limited memory; a policy is a mapping from the space of... more
Solving partially observable Markov decision processes (POMDPs) is highly intractable in gen eral, at least in part because the optimal policy may be infi nitely large. In this paper, we ex plore the problem of fi nding the optimal policy... more
We investigate the use of temporally abstract actions, or macro-actions, in the solution of Markov decision processes. Unlike current mod els that combine both primitive actions and macro-actions and leave the state space un changed, we... more
The well-known Kullback-Leibler divergence of a random ÿeld from its factorization quantiÿes spatial interdependences of the corresponding stochastic elements. We introduce a generalized measure called 'stochastic interaction' that... more
Given a 0-1 infinite matrix A and its countable Markov shift ΣA, one of the authors and M. Laca have introduced a kind of generalized countable Markov shift XA = ΣA ∪ YA, where YA is a special set of finite admissible words. For some of... more
Download research papers for free!