Exploiting structure in policy construction
1995, International Joint Conference on …
Abstract
Markov decision processes (MDPs) have recently been applied to the problem of modeling decision-theoretic planning. While such traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large AI planning problems is questionable. We present an algorithm, called structured policy iteration (SPI), that constructs optimal policies without explicit enumeration of the state space. The algorithm retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploits the variable and propositional independencies in reflected in a temporal Bayesian network representation of MDPs. The principles behind SPI can be applied to any structured representation of stochastic actions, and the algorithm itself can be used in conjunction with recent approximation methods.
References (18)
- Barto, A., Bradtke, S., and Singh, S. 1995. Learning to Act using Realtime Dynamic programming. Artif. Intel., 72:81-138.
- Boutilier, C. and Dearden, R. 1994. Using abstractions for decision- theoretic planning with time constraints. AAAI-94, pp.1016- 1022, Seattle.
- Boutilier, C., Dearden, R., and Goldszmidt, M. 1994. Exploiting structure in optimal policy construction. Technical Report 94- 23, University of British Columbia, Vancouver.
- Chapman, D. and Kaelbling, L. P. 1991. Input generalization in de- layed reinforcement learning: An algorithm and performance comparisons. IJCAI-91, pp.726-731, Sydney.
- Darwiche, A. and Goldszmidt, M. 1994. Action networks: A frame- work for reasoning about actions and change under uncertainty. UAI-94, pp.136-144, Seattle.
- Dean, T., Kaelbling, L. P., Kirman, J., and Nicholson, A. 1993. Plan- ning with deadlines in stochastic domains. AAAI-93, pp.574- 579, Washington, D.C.
- Dean, T. and Kanazawa, K. 1989. A model for reasoning about persistence and causation. Comp. Intel., 5(3):142-150.
- Dean, T. and Wellman, M. 1991. Planning and Control. Morgan Kaufmann, San Mateo.
- Dearden, R. and Boutilier, C. 1994. Integrating planning and execu- tion in stochastic domains. UAI-94, pp.162-169, Seattle.
- Howard, R. A. 1971. Dynamic Probabilistic Systems. Wiley.
- Kushmerick, N., Hanks, S., and Weld, D. 1994. An algorithm for probabilistic least-commitment planning. AAAI-94, pp.1073- 1078, Seattle.
- Poole, D. 1993. Probabilistic Horn abduction and Bayesian net- works. Artif. Intel., 64(1):81-129.
- Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York.
- Puterman, M. and Shin, M. 1978. Modified policy iteration algo- rithms for discounted Markov decision problems. Mgmt. Sci., 24:1127-1137.
- Rivest, R. 1987. Learning decision lists. Mach. Learn., 2:229-246.
- Smith, J., Holtzman, S., and Matheson, J. 1993. Structuring condi- tional relationships in influence diagrams. Op. Res., 41(2):280- 297.
- Tash, J. and Russell, S. 1994. Control strategies for a stochastic planner. AAAI-94, pp.1079-1085, Seattle.
- Tatman, J. and Shachter, R. 1990. Dynamic programming and influ- ence diagrams. IEEE Trans. Sys., Man, Cyber., 20:365-379.