Planning with Macro-Actions in Decentralized POMDPs

Chris Amato

doi:10.5555/2615731.2617451

Outline

Planning with Macro-Actions in Decentralized POMDPs

Chris Amato

INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS

https://doi.org/10.5555/2615731.2617451

visibility

…

description

8 pages

link

1 file

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions which may require different amounts of time to execute. We model macro-actions as \emph{options} in a factored Dec-POMDP model, focusing on options which depend only on information available to an individual agent while executing. This enables us to model systems where coordination decisions only occur at the level of deciding which macro-actions to execute, and the macro-actions themselves can then be executed to completion. The core technical difficulty when using options in a Dec-POMDP is that the options chosen by the agents no longer terminate at the same time. We present extensions of two leading Dec-POMDP algorithms for generating a policy with options and discuss the resulting form of optimality. Our results show that these algorithms retain agent coordination while allowing near-optimal solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods.

FAQs

What are the benefits of using macro-actions in Dec-POMDPs?add

The paper demonstrates that using macro-actions can significantly improve scalability and efficiency, allowing solutions for larger state-spaces and longer horizons. For example, the O-MBDP algorithm effectively solved a 50x50 grid problem, which traditional methods struggled to address.

How does the proposed method handle multiagent coordination?add

The research introduces a Dec-POMDP formulation that incorporates shared option policies, enabling agents to coordinate by reasoning about each other's macro-actions. This allows for efficient decision-making without requiring synchronization at every timestep.

What improvements were observed in benchmark tests?add

Experimental results indicated that the O-MBDP algorithm outperformed traditional approaches, solving large problems with fewer computational resources. Specifically, it demonstrated quicker execution times on benchmarks like the meeting-in-a-grid problem compared to MBDP-IPG and TBDP.

What challenges exist when extending single-agent options to multiagent settings?add

A major challenge is the unsynchronized execution of options among agents, complicating the evaluation of policies. The paper outlines adaptations to accommodate this, ensuring effective policy deployment while maintaining the integrity of multiagent interactions.

How did the authors validate their proposed algorithms?add

The authors validated their algorithms using existing benchmarks and custom scenarios, demonstrating their effectiveness through extensive simulations and performance comparisons. For instance, they tested on a typical Dec-POMDP benchmark and achieved high-quality solutions across various agent configurations.

Figures (9)

Figure 1: Policies for a single agent after one step of dynamic programming using options m; and m2 where (deterministic) terminal states for options are represented as (°.

Algorithm 1 Option-based dynamic programming (O-DP)

Algorithm 2 Option-based memory bounded dynamic pro- gramming (O-MBDP)

Figure 2: Value and time results for the meeting in a grid Dec-POMDP benchmark including leading Dec-POMDP approaches DecRSPI and MBDP as well as option-based DP and MBDP.

Table 1: Times and values for larger horizons on the meeting in a grid benchmark.

Figure 3: 4-agent meeting in a grid results showing (a) value and (b) running time on a 10 x 10 grid.

Figure 5: Value and time results for O-DP in the two-agent NAMO problem for various size grids (where size is the length of a single side) Figure 4: A 6x6 two-agent NAMO problem.

Table 2: Largest representative NAMO problems solvable by each approach. For GMAA*-ICE and TBDP problem size was increased until horizon 4 was not solvable.

References (34)

C. Amato, G. Chowdhary, A. Geramifard, N. K. Ure, and M. J. Kochenderfer. Decentralized control of partially observable Markov decision processes. In Proceedings of the Fifty-Second IEEE Conference on Decision and Control, 2013.
C. Amato, J. S. Dibangoye, and S. Zilberstein. Incremental policy generation for finite-horizon DEC-POMDPs. In Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling, pages 2-9, 2009.
R. Aras, A. Dutech, and F. Charpillet. Mixed integer linear programming for exact finite-horizon planning in decentralized POMDPs. In Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling, pages 18-25, 2007.
A. Barto and S. Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13:41-77, 2003.
R. Becker, V. Lesser, and S. Zilberstein. Decentralized Markov Decision Processes with Event-Driven Interactions. In Proceedings of the Third International Conference on Autonomous Agents and Multiagent Systems, pages 302-309, 2004.
R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Solving transition-independent decentralized Markov decision processes. Journal of Artificial Intelligence Research, 22:423-455, 2004.
D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4):819-840, 2002.
D. S. Bernstein, S. Zilberstein, and N. Immerman. The complexity of decentralized control of Markov decision processes. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pages 32-37, 2000.
A. Boularias and B. Chaib-draa. Exact dynamic programming for decentralized POMDPs with lossless policy compression. In Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling, 2008.
J. S. Dibangoye, C. Amato, O. Buffet, and F. Charpillet. Optimally solving Dec-POMDPs as continuous-state MDPs. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2013.
J. S. Dibangoye, C. Amato, A. Doniec, and F. Charpillet. Producing efficient error-bounded solutions for transition independent decentralized MDPs. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems, 2013.
T. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303, 2000.
M. Ghavamzadeh, S. Mahadevan, and R. Makar. Hierarchical multi-agent reinforcement learning. Journal of Autonomous Agents and Multi-Agent Systems, 13(2):197-229, 2006.
E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In Proceedings of the Nineteenth National Conference on Artificial Intelligence, pages 709-715, 2004.
B. Horling and V. Lesser. A survey of multi-agent organizational paradigms. The Knowledge Engineering Review, 19(4):281-316, 2004.
G. Konidaris and A. G. Barto. Building portable options: Skill transfer in reinforcement learning. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, pages 895-900, 2007.
G. Konidaris, L. P. Kaelbling, and T. Lozano-Perez. Symbol acquisition for task-level planning. In the AAAI 2013 Workshop on Learning Rich Representations from Low-Level Sensors, 2013.
A. Kumar and S. Zilberstein. Point-based backup for decentralized POMDPs: complexity and new algorithms. In Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, pages 1315-1322, 2010.
A. McGovern and A. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 361-368, 2001.
F. Melo and M. Veloso. Decentralized MDPs with sparse interactions. Artificial Intelligence, 2011.
J. V. Messias, M. T. Spaan, and P. U. Lima. GSMDPs for multi-robot sequential decision-making. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013.
R. Nair, P. Varakantham, M. Tambe, and M. Yokoo. Networked distributed POMDPs: a synthesis of distributed constraint optimization and POMDPs. In Proceedings of the Twentieth National Conference on Artificial Intelligence, 2005.
F. A. Oliehoek, M. T. J. Spaan, C. Amato, and S. Whiteson. Incremental clustering and expansion for faster optimal planning in Dec-POMDPs. Journal of Artificial Intelligence Research, 46:449-509, 2013.
F. A. Oliehoek, S. Whiteson, and M. T. J. Spaan. Approximate solutions for factored Dec-POMDPs with many agents. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems, 2013.
S. Seuken and S. Zilberstein. Memory-bounded dynamic programming for DEC-POMDPs. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, pages 2009-2015, 2007.
D. Silver and K. Ciosek. Compositional planning using optimal option models. In Proceedings of the Twenty-Ninth International Conference on Machine Learning, 2012.
M. T. J. Spaan and F. S. Melo. Interaction-driven Markov games for decentralized multiagent planning under uncertainty. In Proceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems, pages 525-532, 2008.
M. Stilman and J. Kuffner. Navigation among movable obstacles: Real-time reasoning in complex environments. International Journal on Humanoid Robotics, 2(4):479-504, 2005.
P. Stone, R. Sutton, and G. Kuhlmann. Reinforcement learning for robocup soccer keepaway. Adaptive Behavior, 13(3):165-188, 2005.
R. S. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1):181-211, 1999.
P. Velagapudi, P. R. Varakantham, , K. Sycara, and P. Scerri. Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents. In Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems, 2011.
F. Wu, S. Zilberstein, and X. Chen. Point-based policy generation for decentralized POMDPs. In Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, pages 1307-1314, 2010.
F. Wu, S. Zilberstein, and X. Chen. Point-based policy generation for decentralized POMDPs. In Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, pages 1307-1314, 2010.
F. Wu, S. Zilberstein, and X. Chen. Rollout sampling policy iteration for decentralized POMDPs. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 2010.

Planning with Macro-Actions in Decentralized POMDPs

Sign up for access to the world's latest research

Abstract

FAQs

Related papers

References (34)

Related papers

Related topics