Direct Policy Search using Paired Statistical Tests (2001)
Tags
Optimization, Reinforcement Learning
Abstract
Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng & Jordan, 1999). We evaluate Pegasus, and other paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) Paired tests can improve performance of deterministic and stochastic optimization procedures. (ii) Our proposed alternatives to Pegasus can generalize better, by using a different test statistic, or changing the scenarios during learning. (iii) Adapting the number of trials used for each policy comparison yields fast and robust learning.
Full text
Download (application/pdf, 481.3 kB)
Approximate BibTeX Entry
@inproceedings{strens-direct,
Year = {2001},
Pages = {545-552},
Publisher = {Morgan Kaufmann, San Francisco, CA},
Booktitle = {Proceedings of the 18th International Conference on Machine Learning},
Author = {Malcolm Strens Andrew Moore},
Title = {Direct Policy Search using Paired Statistical Tests}
}