You need to be happy about Markov Decision Processes (the previous Andrew Tutorial) before venturing into Reinforcement Learning. It concerns the fascinating question of whether you can train a controller to perform optimally in a world where it may be necessary to suck up some short term punishment in order to achieve long term reward. We will discuss certainty-equivalent RL, the Temporal Difference (TD) learning, and finally Q-learning. The curse of dimensionality will be constantly learning over our shoulder, salivating and cackling.