The (optimal) design of many engineering systems can be adequately recast as a Markov decision pr... more The (optimal) design of many engineering systems can be adequately recast as a Markov decision process, where requirements on system performance are captured in the form of constraints. In this paper, various optimality results for constrained Markov decision processes are briefly reviewed; the corresponding implementation issues are discussed and shown to lead to several problems of parameter estimation. Simple situations where such constrained problems naturally arise, are presented in the context of queueing systems, in order to illustrate various points of the theory. In each case, the structure of the optimal policy is exhibited.
Consider a controlled Markov chain with countable state and action spaces. Basic quantities that ... more Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each state-action pair, describing for each state the relative number of uses of each action. These "conditional frequencies," which are defined pathwise, are shown to determine the "state-action frequencies" that, in the finite case, are known to determine the costs. This is extended to the countable case, allowing for unbounded costs. The space of frequencies is shown to be compact and convex, and the extreme points are identified with stationary deterministic policies.
We consider the optimization of finite-state, finite-action Markov decision processes under const... more We consider the optimization of finite-state, finite-action Markov decision processes under constraints. Costs and constraints are of the discounted or average type, and possibly finite-horizon. We investigate the sensitivity of the optimal cost and optimal policy to changes in various parameters. We relate several optimization problems to a generic linear program, through which we investigate sensitivity issues. We establish conditions for the continuity of the optimal value in the discount factor. In particular, the optimal value and optimal policy for the expected average cost are obtained as limits of the dicounted case, as the discount factor goes to one. This generalizes a well-known result for the unconstrained case. We also establish the continuity in the discount factor for certain non-stationary policies. We then discuss the sensitivity of optimal policies and optimal values to small changes in the transition matrix and in the instantaneous cost functions. The importance of the last two results is related to the performance of adaptive policies for constrained MDP under various cost criteria [3,5]. Finally, we establish the convergence of the optimal value for the discounted constrained finite horizon problem to the optimal value of the corresponding infinite horizon problem.
This paper deals with constrained optimization of Markov Decision Processes with a countable stat... more This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semi-continuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions.
Consider a controlled Markov chain with countable state and action spaces. Basic quantities that ... more Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each state-action pair, describing for each state the relative number of uses of each action. These "conditional frequencies," which are defined pathwise, are shown to determine the "state-action frequencies" that, in the finite case, are known to determine the costs. This is extended to the countable case, allowing for unbounded costs. The space of frequencies is shown to be compact and convex, and the extreme points are identified with stationary deterministic policies.
Handbook of markov decision processes methods and applications
From the reviews:" The authors of this book, part of the Medical Radiolo... more From the reviews:" The authors of this book, part of the Medical Radiology-Diagnostic Imaging and radiation Oncology series, present a sound approach to diagnosis of lung disease based on the CT appearance of the pulmonary parenchyma.... The book is appropriate for the practicing radiologist who wishes to embark on evaluation of the lungs and their diseases according to anatomic parameters, especially HRCT.... Clinical physicians with a particular interest in pulmonary disease will find the book useful as well." ...
Optimal priority assignment: a time sharing approach
IEEE Transactions on Automatic Control, 1989
Abstract Nonstationary time-sharing policies are introduced to obtain optimal controls for new co... more Abstract Nonstationary time-sharing policies are introduced to obtain optimal controls for new constrained optimization problems. The criteria are expected time averages of sizes of the queues. These policies and their cost are computed through linear programs. The achievable region of the vector of queues' length is characterized. Other applications of time-sharing policies are discussed
We consider the optimization of nite-state, nite-action Markov Decision processes, under constrai... more We consider the optimization of nite-state, nite-action Markov Decision processes, under constraints. Costs and constraints are of the discounted or average type, and possibly nitehorizon. We investigate the sensitivity of the optimal cost and optimal policy to changes in various parameters.
We consider the constrained optimization of a finite-state, finite action Markov chain. In the ad... more We consider the constrained optimization of a finite-state, finite action Markov chain. In the adaptive problem, the transition probabilities are assumed to be unknown, and no prior distribution on their values is given. We consider constrained optimization problems in terms of several cost criteria which are asymptotic in nature. For these criteria we show that it is possible to achieve the same optimal cost as in the non-adaptive case. We first formulate a constrained optimization problem under each of the cost criteria and establish the existence of optimal stationary policies. Since the adaptive problem is inherently non-stationary, we suggest a class ofAsymptotically Stationary (AS) policies, and show that, under each of the cost criteria, the costs of an AS policy depend only on its limiting behavior. This property implies that there exist optimal AS policies. A method for generating adaptive policies is then suggested, which leads to strongly consistent estimators for the unknown transition probabilities. A way to guarantee that these policies are also optimal is to couple them with the adaptive algorithm of [3]. This leads to optimal policies for each of the adaptive constrained optimization problems under discussion.
Moreover, rare events tend to be the critical quantities in the performance of many systems. For ... more Moreover, rare events tend to be the critical quantities in the performance of many systems. For example, information is typically transmitted along a communications line, and stored temporarily along the way in bu ers. The channel is typically designed so that the average tra c rate is ...
Uploads
Papers by Adam Shwartz