Point-based policy generation for decentralized POMDPs

Shlomo Zilberstein

Outline

Point-based policy generation for decentralized POMDPs

Shlomo Zilberstein

2010

Abstract

Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best joint policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improves the performance significantly in terms of both runtime and solution quality.

References (31)

REFERENCES
C. Amato, D. S. Bernstein, and S. Zilberstein. Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Autonomous Agents and Multi-Agent Systems, 2009.
C. Amato, A. Carlin, and S. Zilberstein. Bounded Dynamic Programming for Decetralized POMDPs. In AAMAS 2007 Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains, 2007.
C. Amato, J. S. Dibangoye, and S. Zilberstein. Incremental Policy Generation for Finite-Horizon DEC-POMDPs. In Proc. of the 19th Int. Conf. on Automated Planning and Scheduling, pages 2-9, 2009.
C. Amato and S. Zilberstein. Achieving goals in decentralized POMDPs. In Proc. of the 8th Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, pages 593-600, 2009.
R. Aras, A. Dutech, and F. Charpillet. Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps. In Proc. of the 17th Int. Conf. on Automated Planning and Scheduling, pages 18-25, 2007.
R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Transition-independent Decentralized Markov Decision Processes. In Proc. of the 2nd Int. Joint Conf. on Autonomous Agents and Multiagent Systems, pages 41-48, 2003.
D. S. Bernstein, C. Amato, E. A. Hansen, and S. Zilberstein. Policy iteration for decentralized control of Markov decision processes. Journal of Artificial Intelligence Research, 34:89-132, 2009.
D. S. Bernstein, E. A. Hansen, and S. Zilberstein. Bounded Policy Iteration for Decentralized POMDPs. In Proc. of the 19th Int. Joint Conf. on Artificial Intelligence, pages 1287-1292, 2005.
D. S. Bernstein, S. Zilberstein, and N. Immerman. The Complexity of Decentralized Control of Markov Decision Processes. In Proc. of the 16th Conf. on Uncertainty in Artificial Intelligence, pages 32-37, 2000.
A. Carlin and S. Zilberstein. Value-based observation compression for DEC-POMDPs. In Proc. of the 7th Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, pages 501-508, 2008.
J. S. Dibangoye, A. Mouaddib, and B. Chaib-draa. Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs. In Proc. of the 8th Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, pages 569-576, 2009.
R. Emery-Montemerlo, G. J. Gordon, J. G. Schneider, and S. Thrun. Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs. In Proc. of the 3rd Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, pages 136-143, 2004.
C. V. Goldman and S. Zilberstein. Optimizing information exchange in cooperative multi-agent systems. In Proc. of the 2nd Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, pages 137-144. ACM, 2003.
E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic Programming for Partially Observable Stochastic Games. In Proc. of the 19th National Conf. on Artificial Intelligence, pages 709-715, 2004.
R. Nair, M. Tambe, M. Roth, and M. Yokoo. Communication for Improving Policy Computation in Distributed POMDPs. In Proc. of the 3rd Int. Joint Conf. on Autonomous Agents and Multiagent Systems, pages 1098-1105, 2004.
R. Nair, M. Tambe, M. Yokoo, D. V. Pynadath, and S. Marsella. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings. In Proc. of the 18th Int. Joint Conf. on Artificial Intelligence, pages 705-711, 2003.
R. Nair, P. Varakantham, M. Tambe, and M. Yokoo. Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs. In Proc. of the 20st National Conf. on Artificial Intelligence, pages 133-139, 2005.
F. A. Oliehoek and N. Vlassis. Q-value functions for decentralized POMDPs. In Proc. of the 6th Int. Joint Conf. on Autonomous Agents and Multiagent Systems, pages 833-840, 2007.
F. A. Oliehoek, S. Whiteson, and M. T. J. Spaan. Lossless clustering of histories in decentralized POMDPs. In Proc. of the 8th Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, pages 577-584, 2009.
L. Peshkin, K.-E. Kim, N. Meuleau, and L. P. Kaelbling. Learning to cooperate via policy search. In Conference on Uncertainty in Artificial Intelligence, pages 489-496, 2000.
D. V. Pynadath and M. Tambe. The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models. Journal of Artificial Intelligence Research, 16:389-423, 2002.
M. Roth, R. G. Simmons, and M. M. Veloso. Reasoning about joint beliefs for execution-time communication decisions. In Proc. of the 4th Int. Joint Conf. on Autonomous Agents and Multiagent Systems, pages 786-793. ACM, 2005.
S. Seuken and S. Zilberstein. Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs. In Proc. of the 23rd Conf. in Uncertainty in Artificial Intelligence, 2007.
S. Seuken and S. Zilberstein. Memory-Bounded Dynamic Programming for DEC-POMDPs. In Proc. of the 20th Int.l Joint Conf. on Artificial Intelligence, pages 2009-2015, 2007.
S. Seuken and S. Zilberstein. Formal Models and Algorithms for Decentralized Decision Making under Uncertainty. Journal of Autonomous Agents and Multi-Agent Systems, 17(2):190-250, 2008.
D. Szer and F. Charpillet. Point-based Dynamic Programming for DEC-POMDPs. In Proc. of the 21st National Conf. on Artificial Intelligence, pages 1233-1238, 2006.
D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs. In Proc. of the 21st Conf. on Uncertainty in Artificial Intelligence, pages 576-590, 2005.
J. Tsitsiklis and M. Athans. On the complexity of decentralized decision making and detection problems. IEEE Transaction on Automatic Control, 30:440-446, 1985.
F. Wu, S. Zilberstein, and X. Chen. Multi-Agent Online Planning with Communication. In Proc. of the 19th Int. Conf. on Automated Planning and Scheduling, pages 321-328, 2009.
S. Zilberstein. Optimizing Decision Quality with Contract Algorithms. In Proc. of the 14th Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 1576-1582, 1995.

Point-based policy generation for decentralized POMDPs

Sign up for access to the world's latest research

Abstract

Related papers

References (31)

Related papers