autonlab.org
WARNING: you are not looking at the live version but at an older version.

Direct Policy Search using Paired Statistical Tests (2001)

Malcolm Strens Andrew Moore

Tags

Optimization, Reinforcement Learning

Abstract

Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng & Jordan, 1999). We evaluate Pegasus, and other paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) Paired tests can improve performance of deterministic and stochastic optimization procedures. (ii) Our proposed alternatives to Pegasus can generalize better, by using a different test statistic, or changing the scenarios during learning. (iii) Adapting the number of trials used for each policy comparison yields fast and robust learning.

Full text

Download (application/pdf, 481.3 kB)

Approximate BibTeX Entry

@inproceedings{strens-direct,
    Year = {2001},
    Pages = {545-552},
    Publisher = {Morgan Kaufmann, San Francisco, CA},
    Booktitle = {Proceedings of the 18th International Conference on Machine Learning},
    Author = {Malcolm Strens Andrew Moore},
    Title = {Direct Policy Search using Paired Statistical Tests}
}

Copyright 2010, Carnegie Mellon University, Auton Lab. All Rights Reserved.