Academia.eduAcademia.edu

Outline

MARKOV DECISION PROBLEMS AND STATE-ACTION FREQUENCIES

1991, Siam Journal on Control and Optimization

https://doi.org/10.1137/0329043

Abstract

Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each state-action pair, describing for each state the relative number of uses of each action. These "conditional frequencies," which are defined pathwise, are shown to determine the "state-action frequencies" that, in the finite case, are known to determine the costs. This is extended to the countable case, allowing for unbounded costs. The space of frequencies is shown to be compact and convex, and the extreme points are identified with stationary deterministic policies.

References (29)

  1. E. ALTMAN AND A. SHWARTZ, Optimal priority assignment with general constraints, in Proc. 24th Allerton Conference, University of Illinois, Urbana-Champaign, IL, October 1986.
  2. Non-stationary policies for controlled Markov chains, EE Pub. 633, Technion-Israel Institute of Technology, Haifa, Israel, June 1987, submitted.
  3. Optimalpriority assignment: a time sharing approach, IEEE Trans. Automat. Control, 34 (1989), pp. 1098-1102.
  4. Adaptive control of constrained Markov chains, IEEE Trans. Automat. Control, to appear (1991).
  5. J. S. BARAS, A. J. DORSEY, AND A. M. MAKOWSKI, Discrete time competing queues with geometric service requirements: stability, parameter estimation and adaptive control, SIAM J. Control Optim., submitted.
  6. J. S. BARAS, D.-J. MA, AND A. M. MAKOWSKI, K competing queues with geometric service requirements and linear costs: the c rule is always optimal, Systems Control Lett., 6 (1985), pp. 173-180.
  7. M. BAYAL-GURSOY AND K. W. ROSS, Variability sensitive Markov decision processes, Math. Oper. Res., to appear.
  8. P. BILLINGSLEY, Convergence of Probability Measures, John Wiley, New York, 1968.
  9. V. BORKAR, On minimum cost per unit time control of Markov Chains, SIAM J. Control Optim., 22 (1983), pp. 965-978.
  10. Control of Markov chains with long-run average cost criterion, in Proc. Stochastic Differential Systems, W. Fleming and P.-L. Lions, eds., Springer-Verlag, Berlin, New York, 1986, pp. 57-77.
  11. C. BUYYUKKOC, P. VARAIYA, AND J. WALRAND, The c/ rule revisited, Adv. Appl. Probab., 17 (1985), pp. 237-238.
  12. R. CAVAZOS-CADENA, Existence of optimal stationary policies in average-reward Markov decision processes with a recurrent state, Appl. Math. Optim., submitted.
  13. K. L. CHUNG, Markov Chains with Stationary Transition Probabilities, 2nd ed., Springer-Verlag, New York, 1967.
  14. R. DEKKER AND A. HORDIJK, Average, sensitive and Blackwell optimalpolicies in denumerable Markov decision chains with unbounded rewards, Math. Oper. Res., 13 (1988), pp. 395-421.
  15. C. DERMAN, Finite State Markovian Decision Processes, Academic Press, New York, 1970.
  16. J. A. FILAR, L. C. M. KALLLENBERG, AND H. M. LEE, Variance penalized Markov decision processes, Math. Oper. Res., 14 (1989), pp. 147-161.
  17. L. FISHER AND S. M. ROSS, An example in denumerable decision processes, Ann. Math. Statist., 39 (1968), pp. 674-675.
  18. L. FISHER, On recurrent denumerable decision processes, Ann. Math. Statist., 39 (1968), 424-434.
  19. E. GELENBE AND I. MITRANI, Analysis and Synthesis of Computer Systems, Academic Press, London, 1980.
  20. P. HALL AND C. C. HEYDE, Martingale Limit Theory and Its Applications, John Wiley, New York, 1980.
  21. A. HORDIJK, Dynamic Programming and Markov Potential Theory, Mathematical Center Tracts, no. 51, Amsterdam, the Netherlands, 1974.
  22. A. HORDIJK AND L. C. M. KALLENBERG, Constrained undiscounted stochastic dynamic programming, Math. Oper. Res., 9 (1984), pp. 276-289.
  23. H. KAWAI, A variance minimization problem for a Markov Decision process, European J. Oper. Res., 31 (1987), pp. 140-145.
  24. M. KURANO, Markov decision processes with a minimum-variance criterion, J. Optim. Theory Anal., 123 (1987), pp. 572-583.
  25. A. M. MAKOWSKI AND A. SHWARTZ, Recurrence properties of a system of competing queues, with applications, EE Pub. 627, Technion-Israel Institute of Technology, Haifa, Israel, 1987.
  26. P. NAIN AND K. W. ROSS, Optimal priority assignment with hard constraint, IEEE Trans. Automat. Control, 31 (1986), pp. 883-888.
  27. D. REVUZ, Markov Chains, North-Holland, Amsterdam, the Netherlands, 1975.
  28. L. I. SENNOTT A new condition for the existence of optimal stationary policies in average cost Markov decision processes, Oper. Res. Lett., 5 (1986), pp. 17-23.
  29. Average cost optimal stationary policies in infinite state Markov decision processes with unbounded costs, Oper. Res., 37 (1989), pp. 626-633.