Real-Time Decision Making for Large POMDPs

Paquet, Sébastien; Tobin, Ludovic; Chaib-draa, Brahim

doi:10.1007/11424918_49

Outline

Title

Abstract

Introduction

Experiments and Results

Related Work and Conclusion

References

All Topics

Computer Science

Artificial Intelligence

Real-Time Decision Making for Large POMDPs

Brahim Chaib-draa

2005, Lecture Notes in Computer Science

https://doi.org/10.1007/11424918_49

visibility

…

description

5 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

In this paper, we introduce an approach called RTBSS (Real-Time Belief Space Search) for real-time decision making in large POMDPs. The approach is based on a look-ahead search that is applied online each time the agent has to make a decision. RTBSS is particularly interesting for large real-time environments where offline solutions are not applicable because of their complexity.

Solomon Shimony, Ronen I Brafman

International Joint Conference on Artificial Intelligence, 2007

Recent scaling up of POMDP solvers towards re- alistic applications is largely due to point-based methods which quickly converge to an approximate solution for medium-sized problems. Of this family HSVI, which uses trial-based asynchronous value iteration, can handle the largest domains. In this paper we suggest a new algorithm, FSVI, that uses the underlying MDP to traverse the belief space

downloadDownload free PDF View PDFchevron_right

Learning in POMDPs with Monte Carlo Tree Search

Christopher Amato

2017

The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.

downloadDownload free PDF View PDFchevron_right

Winning back the CUP for distributed POMDPs: planning over continuous belief spaces

Ranjit Nair

2006

Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are evolving as a popular approach for modeling multiagent systems, and many different algorithms have been proposed to obtain locally or globally optimal policies. Unfortunately, most of these algorithms have either been explicitly designed or experimentally evaluated assuming knowledge of a starting belief point, an assumption that often does not hold in complex, uncertain domains. Instead, in such domains, it is important for agents to explicitly plan over continuous belief spaces. This paper provides a novel algorithm to explicitly compute finite horizon policies over continuous belief spaces, without restricting the space of policies. By marrying an efficient single-agent POMDP solver with a heuristic distributed POMDP policy-generation algorithm, locally optimal joint policies are obtained, each of which dominates within a different part of the belief region. We provide heuristics that significantly improve the efficiency of the resulting algorithm and provide detailed experimental results. To the best of our knowledge, these are the first run-time results for analytically generating policies over continuous belief spaces in distributed POMDPs.

downloadDownload free PDF View PDFchevron_right

Belief Selection in Point-Based Planning Algorithms for POMDPs

Masoumeh Izadi

Lecture Notes in Computer Science, 2006

Current point-based planning algorithms for solving partially observable Markov decision processes (POMDPs) have demonstrated that a good approximation of the value function can be derived by interpolation from the values of a specially selected set of points. The performance of these algorithms can be improved by eliminating unnecessary backups or concentrating on more important points in the belief simplex. We study three methods designed to improve point-based value iteration algorithms. The first two methods are based on reachability analysis on the POMDP belief space. This approach relies on prioritizing the beliefs based on how they are reached from the given initial belief state. The third approach is motivated by the observation that beliefs which are the most overestimated or underestimated have greater influence on the precision of value function than other beliefs. We present an empirical evaluation illustrating how the performance of point-based value iteration varies with these approaches.

downloadDownload free PDF View PDFchevron_right

Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion

Chris Amato

Planning under uncertainty for multiagent systems can be formalized as a decentralized partially observable Markov decision process. We advance the state of the art for optimal solution of this model, building on the Multiagent A* heuristic search method. A key insight is that we can avoid the full expansion of a search node that generates a number of children that is doubly exponential in the node’s depth. Instead, we incrementally expand the children only when a next child might have the highest heuristic value. We target a subsequent bottleneck by introducing a more memory-efficient representation for our heuristic functions. Proof is given that the resulting algorithm is correct and experiments demonstrate a significant speedup over the state of the art, allowing for optimal solutions over longer horizons for many benchmark problems

downloadDownload free PDF View PDFchevron_right

Stochastic local search for POMDP controllers

Craig Boutilier

PROCEEDINGS OF THE NATIONAL …, 2004

The search for finite-state controllers for partially observable Markov decision processes (POMDPs) is often based on approaches like gradient ascent, attractive because of their relatively low computational cost. In this paper, we illustrate a basic problem with gradient-based methods applied to POMDPs, where the sequential nature of the decision problem is at issue, and propose a new stochastic local search method as an alternative. The heuristics used in our procedure mimic the sequential reasoning inherent in optimal dynamic programming (DP) approaches. We show that our algorithm consistently finds higher quality controllers than gradient ascent, and is competitive with (and, for some problems, superior to) other state-of-the-art controller and DP-based algorithms on large-scale POMDPs.

downloadDownload free PDF View PDFchevron_right

Exploration in POMDP belief space and its impact on value iteration approximation

Doina Precup

2006

Decision making under uncertainty is among the most challenging tasks in the artificial intelligence. Although solution methods to this class of problems are intractable in general, some promising approximation methods have been proposed recently. In particular, point-based planning algorithms for solving partially observable Markov decision processes (POMDPs) have demonstrated that a good approximation of the value function can be obtained by interpolating between the values of a selected set of points. The agent must make a choice as to how to sample these points. Ideally, we need to sample in order to build an accurate approximation in less time. In this paper, we relate this problem to the explorationexploitation tradeoff in the space of POMDP reachable beliefs. Furthermore, we show that there exists an influential control parameter for this tradeoff. As a result, we provide a controllable tighter bound for the point-based value iteration (PBVI) approximation [4] based on knowledge about the domain. We study two criteria designed to improve point-based value iteration algorithms when selecting candidate points. The first is based on reachability analysis from the given initial belief state. The second criterion is based on the degree of stochasticity of the problem domain and the topological structure of possible beliefs experienced by the agent. We present an empirical evaluation illustrating the effect of these criteria on the performance of point-based value iteration.

downloadDownload free PDF View PDFchevron_right

Hybrid POMDP algorithms

Brahim Chaib-draa

2006

Abstract When an agent evolves in a partially observable environment, it has to deal with uncertainties when choosing its actions. An efficient model for such environments is to use partially observable Markov decision processes (POMDPs). Many algorithms have been developed for POMDPs. Some use an offline approach, learning a complete policy before the execution. Others use an online approach, constructing the policy online for the current belief state.

downloadDownload free PDF View PDFchevron_right

Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

Weihong Zhang

Lecture Notes in Computer Science, 2001

Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper, we first study a somewhat restrictive class of special POMDPs called almost-discernible POMDPs and propose an anytime algorithm called spaceprogressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almost-discernible POMDPs. We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.

downloadDownload free PDF View PDFchevron_right

Memory-bounded dynamic programming for DEC-POMDPs

Shlomo Zilberstein, Sven Seuken

2007

Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (6)

Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algo- rithm for pomdps. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-03), Acapulco, Mexico (2003) 1025-1032
Smith, T., Simmons, R.: Heuristic search value iteration for pomdps. In: Proceed- ings of the 20th Conference on Uncertainty in Artificial Intelligence(UAI-04), Banff, Canada (2004)
Braziunas, D., Boutilier, C.: Stochastic local search for pomdp controllers. In: The Nineteenth National Conference on Artificial Intelligence (AAAI-04). (2004)
Poupart, P.: Exploiting Structure to Efficiently Solve Large Scale Partially Ob- servable Markov Decision Processes. PhD thesis, University of Toronto (2005) (to appear).
Spaan, M.T.J., Vlassis, N.: A point-based pomdp algorithm for robot planning. In: In Proceedings of the IEEE International Conference on Robotics and Automation, New Orleans, Louisiana (2004) 2399-2404
Geffner, H., Bonet, B.: Solving large pomdps using real time dynamic programming (1998)

Brahim Chaib-draa

2005

Abstract In this paper, we present an online method for POMDPs, called RTBSS (Real-Time Belief Space Search), which is based on a look-ahead search to find the best action to execute at each cycle in an environment. We thus avoid the overwhelming complexity of computing a policy for each possible situation. By doing so, we show that this method is particularly efficient for large real-time environments where offline approaches are not applicable because of their complexity.

downloadDownload free PDF View PDFchevron_right

AEMS: An anytime online search algorithm for approximate policy refinement in large POMDPs

Brahim Chaib-draa

2007

Abstract Solving large Partially Observable Markov Decision Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even stateof-the-art approaches fail to solve large POMDPs in reasonable time.

downloadDownload free PDF View PDFchevron_right

Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Brahim Chaib-draa

Advances in neural information processing systems, 2008

Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error. We provide a general theorem showing that these search heuristics are admissible, and lead to complete and ...

downloadDownload free PDF View PDFchevron_right

Use Of Online POMDP with Heuristic Search Applied to Decision Making in Real Time Strategy Games

Thiago Naves

2018

Decision making in real-time strategy games (RTS) is a complex task due to the number of actions available and the environment with partial information. Partially observable Markov decision process (POMDP) is an approach that provides good performance and reward values in environments with the limitations discussed. However, its use in RTS games is limited due to real-time constraints and difficulty in abstracting a complete set of actions into a single decision-making. This work proposes the application of online POMDP with heuristic applied to decision making in the gaming domain of StarCraft. An architecture that allows the consideration of macro actions and a modification in the AEMS algorithm with use of game time as additive, are proposed. The results show that POMDP is applicable to RTS games, satisfying time constraints and achieving good game results against StarCraft standard AI.

downloadDownload free PDF View PDFchevron_right

Belief-node condensation for online POMDP algorithms

Alexander Ferrein

2013 Africon, 2013

We consider online partially observable Markov decision processes (POMDPs) which compute policies by local look-ahead from the current belief-state. One problem is that belief-nodes deeper in the decision-tree increase in the number of states with non-zero probability they contain. Computation time of updating a belief-state is exponential in the number of states contained by the belief. Belief-update occurs for each node in a search tree. It would thus pay to reduce the size of the nodes while keeping the information they contain. In this paper, we compare four fast and frugal methods to reduce the size of belief-nodes in the search tree, hence improving the running-time of online POMDP algorithms.

downloadDownload free PDF View PDFchevron_right

VDCBPI: an approximate scalable algorithm for large POMDPs

Craig Boutilier

Advances in Neural Information Processing …, 2004

Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique [13] with Bounded Policy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states.

downloadDownload free PDF View PDFchevron_right

Improved POMDP Tree Search Planning with Prioritized Action Branching

Lawrence Bush

Proceedings of the AAAI Conference on Artificial Intelligence

Online solvers for partially observable Markov decision processes have difficulty scaling to problems with large action spaces. This paper proposes a method called PA-POMCPOW to sample a subset of the action space that provides varying mixtures of exploitation and exploration for inclusion in a search tree. The proposed method first evaluates the action space according to a score function that is a linear combination of expected reward and expected information gain. The actions with the highest score are then added to the search tree during tree expansion. Experiments show that PA-POMCPOW is able to outperform existing state-of-the-art solvers on problems with large discrete action spaces.

downloadDownload free PDF View PDFchevron_right

Real-Time Decision Making for Large POMDPs

Sign up for access to the world's latest research

Abstract

Related papers

References (6)

Related papers

Related topics