Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs (1999)
Tags
Markov Decision Processes, Optimization, Reinforcement Learning
Abstract
If you have planned to achieve one particular goal in a stochastic delayed rewards problem and then someone asks about a different goal what should you do? What if you need to be ready to quickly supply an answer for any possible goal? This paper shows that by using a new kind of automatically generated abstract action hierarchy that with N states, preparing for all of N possible goals can be much much cheaper than N times the work of preparing for one goal. In goal-based Markov Decision Problems, it is usual to generate a policy ?(x), mapping states to actions, and a value function J(x), mapping states to an estimate of minimum expected cost-to-goal, starting at x. In this paper we will use the terminology that a multi-policy ? ? (x; y) (for all state-pairs (x; y)) maps a state x to the first action it should take in order to reach y with expected minimum cost and a multi-valuefunction J ? (x; y) is a definition of this minimum cost. Building these objects quickly and with little memory is the main purpose of this paper, but a secondary result is a natural, automatic, way to create a set of parsomonious yet powerful abstract actions for MDPs. The paper concludes with a set of empirical results on increasingly large MDPs.
Full text
Download (application/pdf, 447.5 kB)
Approximate BibTeX Entry
@inproceedings{moore-multivalue,
Year = {1999},
Pages = {1316-1323},
Publisher = {Morgan Kaufmann},
Address = {340 Pine Street, 6th Fl., San Francisco, CA 94104},
Booktitle = {Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm},
Author = {Andrew Moore Leemon Baird},
Title = {Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs}
}