Real-Time Decision Making for Large POMDPs
2005, Lecture Notes in Computer Science
https://doi.org/10.1007/11424918_49…
5 pages
1 file
Sign up for access to the world's latest research
Abstract
In this paper, we introduce an approach called RTBSS (Real-Time Belief Space Search) for real-time decision making in large POMDPs. The approach is based on a look-ahead search that is applied online each time the agent has to make a decision. RTBSS is particularly interesting for large real-time environments where offline solutions are not applicable because of their complexity.
Related papers
International Joint Conference on Artificial Intelligence, 2007
Recent scaling up of POMDP solvers towards re- alistic applications is largely due to point-based methods which quickly converge to an approximate solution for medium-sized problems. Of this family HSVI, which uses trial-based asynchronous value iteration, can handle the largest domains. In this paper we suggest a new algorithm, FSVI, that uses the underlying MDP to traverse the belief space
2017
The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.
2006
Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are evolving as a popular approach for modeling multiagent systems, and many different algorithms have been proposed to obtain locally or globally optimal policies. Unfortunately, most of these algorithms have either been explicitly designed or experimentally evaluated assuming knowledge of a starting belief point, an assumption that often does not hold in complex, uncertain domains. Instead, in such domains, it is important for agents to explicitly plan over continuous belief spaces. This paper provides a novel algorithm to explicitly compute finite horizon policies over continuous belief spaces, without restricting the space of policies. By marrying an efficient single-agent POMDP solver with a heuristic distributed POMDP policy-generation algorithm, locally optimal joint policies are obtained, each of which dominates within a different part of the belief region. We provide heuristics that significantly improve the efficiency of the resulting algorithm and provide detailed experimental results. To the best of our knowledge, these are the first run-time results for analytically generating policies over continuous belief spaces in distributed POMDPs.
Lecture Notes in Computer Science, 2006
Current point-based planning algorithms for solving partially observable Markov decision processes (POMDPs) have demonstrated that a good approximation of the value function can be derived by interpolation from the values of a specially selected set of points. The performance of these algorithms can be improved by eliminating unnecessary backups or concentrating on more important points in the belief simplex. We study three methods designed to improve point-based value iteration algorithms. The first two methods are based on reachability analysis on the POMDP belief space. This approach relies on prioritizing the beliefs based on how they are reached from the given initial belief state. The third approach is motivated by the observation that beliefs which are the most overestimated or underestimated have greater influence on the precision of value function than other beliefs. We present an empirical evaluation illustrating how the performance of point-based value iteration varies with these approaches.
Planning under uncertainty for multiagent systems can be formalized as a decentralized partially observable Markov decision process. We advance the state of the art for optimal solution of this model, building on the Multiagent A* heuristic search method. A key insight is that we can avoid the full expansion of a search node that generates a number of children that is doubly exponential in the node’s depth. Instead, we incrementally expand the children only when a next child might have the highest heuristic value. We target a subsequent bottleneck by introducing a more memory-efficient representation for our heuristic functions. Proof is given that the resulting algorithm is correct and experiments demonstrate a significant speedup over the state of the art, allowing for optimal solutions over longer horizons for many benchmark problems
PROCEEDINGS OF THE NATIONAL …, 2004
The search for finite-state controllers for partially observable Markov decision processes (POMDPs) is often based on approaches like gradient ascent, attractive because of their relatively low computational cost. In this paper, we illustrate a basic problem with gradient-based methods applied to POMDPs, where the sequential nature of the decision problem is at issue, and propose a new stochastic local search method as an alternative. The heuristics used in our procedure mimic the sequential reasoning inherent in optimal dynamic programming (DP) approaches. We show that our algorithm consistently finds higher quality controllers than gradient ascent, and is competitive with (and, for some problems, superior to) other state-of-the-art controller and DP-based algorithms on large-scale POMDPs.
2006
Decision making under uncertainty is among the most challenging tasks in the artificial intelligence. Although solution methods to this class of problems are intractable in general, some promising approximation methods have been proposed recently. In particular, point-based planning algorithms for solving partially observable Markov decision processes (POMDPs) have demonstrated that a good approximation of the value function can be obtained by interpolating between the values of a selected set of points. The agent must make a choice as to how to sample these points. Ideally, we need to sample in order to build an accurate approximation in less time. In this paper, we relate this problem to the explorationexploitation tradeoff in the space of POMDP reachable beliefs. Furthermore, we show that there exists an influential control parameter for this tradeoff. As a result, we provide a controllable tighter bound for the point-based value iteration (PBVI) approximation [4] based on knowledge about the domain. We study two criteria designed to improve point-based value iteration algorithms when selecting candidate points. The first is based on reachability analysis from the given initial belief state. The second criterion is based on the degree of stochasticity of the problem domain and the topological structure of possible beliefs experienced by the agent. We present an empirical evaluation illustrating the effect of these criteria on the performance of point-based value iteration.
2006
Abstract When an agent evolves in a partially observable environment, it has to deal with uncertainties when choosing its actions. An efficient model for such environments is to use partially observable Markov decision processes (POMDPs). Many algorithms have been developed for POMDPs. Some use an offline approach, learning a complete policy before the execution. Others use an online approach, constructing the policy online for the current belief state.
Lecture Notes in Computer Science, 2001
Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper, we first study a somewhat restrictive class of special POMDPs called almost-discernible POMDPs and propose an anytime algorithm called spaceprogressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almost-discernible POMDPs. We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.
2007
Decentralized decision making under uncertainty has been shown to be intractable when each agent has different partial information about the domain. Thus, improving the applicability and scalability of planning algorithms is an important challenge. We present the first memory-bounded dynamic programming algorithm for finite-horizon decentralized POMDPs. A set of heuristics is used to identify relevant points of the infinitely large belief space. Using these belief points, the algorithm successively selects the best joint policies for each horizon. The algorithm is extremely efficient, having linear time and space complexity with respect to the horizon length. Experimental results show that it can handle horizons that are multiple orders of magnitude larger than what was previously possible, while achieving the same or better solution quality. These results significantly increase the applicability of decentralized decision-making techniques.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (6)
- Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algo- rithm for pomdps. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-03), Acapulco, Mexico (2003) 1025-1032
- Smith, T., Simmons, R.: Heuristic search value iteration for pomdps. In: Proceed- ings of the 20th Conference on Uncertainty in Artificial Intelligence(UAI-04), Banff, Canada (2004)
- Braziunas, D., Boutilier, C.: Stochastic local search for pomdp controllers. In: The Nineteenth National Conference on Artificial Intelligence (AAAI-04). (2004)
- Poupart, P.: Exploiting Structure to Efficiently Solve Large Scale Partially Ob- servable Markov Decision Processes. PhD thesis, University of Toronto (2005) (to appear).
- Spaan, M.T.J., Vlassis, N.: A point-based pomdp algorithm for robot planning. In: In Proceedings of the IEEE International Conference on Robotics and Automation, New Orleans, Louisiana (2004) 2399-2404
- Geffner, H., Bonet, B.: Solving large pomdps using real time dynamic programming (1998)