Generalization in Reinforcement Learning: Safely Approximating the Value Function (1995)

Justin Boyan, Andrew Moore


Markov Decision Processes, Reinforcement Learning


A straightforward approach to the curse of dimensionality in reinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. We then introduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the benefits of successful generalization.

Full text

Download (application/pdf, 727.3 kB)

Approximate BibTeX Entry

    Year = {1995},
    Pages = {369-376},
    Publisher = {The MIT Press},
    Address = {Cambridge, MA},
    Booktitle = {Neural Information Processing Systems 7},
    Editor = {G. Tesauro & D.S. Touretzky & T.K. Lee},
    Author = { Justin Boyan, Andrew Moore },
    Title = {Generalization in Reinforcement Learning: Safely Approximating the Value Function}

Copyright 2010, Carnegie Mellon University, Auton Lab. All Rights Reserved.