EC 700 A3, Spring 2021: Introduction to Reinforcement Learning

Course Description: Reinforcement learning is a subfield of artificial intelligence which deals with learning from repeated interactions with an environment.  Reinforcement learning is the basis for state-of-the-art algorithms for playing strategy games such as Chess, Go, Backgammon, and Starcraft, as well as a number of problems throughout robotics, operations research, and other fields of engineering. In this course, we will study the fundamental principles of reinforcement learning.  Our focus will be the main algorithms in the field: we’ll try to develop a mathematical understanding of why they work. While a moderate amount of coding will be required, the most important pre-requisite is mathematical maturity and a facility with constructing proofs.

Instructor: Alex Olshevsky

Time: MW 2:30-4:15

Tentative Plan:

  1. Markov Decision Process and Dynamic Programming, Value Iteration and Policy Iteration.
  2. Monte Carlo and Temporal Difference Methods. SARSA, DYNA, and Q-learning. Eligibility traces.
  3. Sample Complexity Issues.
  4. Using Deep Neural Networks for Approximation. Deep Q-learning. Approximation theorems, the benefits of depth, PAC-Bayes bounds for neural networks.
  5. Policy Gradient, Actor-Critic methods.
  6. Imitation learning, Inverse Reinforcement Learning, Distributed Reinforcement Learning (time permitting).
  7. Frontiers of RL research: using LSTM, Attention models, Transformers with reinforcement learning (time permitting).

Prerequisites: Some basic proficiency with Python. Strong familiarity with Markov chains. The most important prerequisite is mathematical maturity and a facility with constructing proofs.

Textbook: There will not be a single textbook four the course; rather, we will rely on lecture notes which will mix materials from different sources, most of which will come from research papers. In terms of textbooks, we’ll use bits and pieces from:

Resources: 

Classic Papers: 

  1. A Markovian Decision Process by Bellman (1957).
  2. Dynamic Programming by Bellman (1966).
  3. Neuron-Like Adaptive Elements that Can Solve Difficult Learning Control Problems by Barto, Sutton, and Andersen (1983).
  4. Learning to Predict by the Method of Temporal Differences by Sutton (1988).
  5. Q-Learning By Watkins and Dayan (1992).
  6. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning by Williams (1992).
  7. Asynchronous Stochastic Approximation and Q-Learning by Tsitsiklis (1994).
  8. Temporal Difference Learning and TD-Gammon by Tesauro (1995).
  9. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding by Sutton (1996).
  10. An Analysis of Temporal Differences Methods with Function Approximation by Tsitsiklis and Van Roy (1998).
  11. Policy Gradient Methods for Reinforcement Learning with Function Approximation by Sutton, McAllester, Singh, Mansour (2000).
  12. On Actor-Critic Methods by Konda and Tsitsiklis (2003).
  13. Playing Atari with Deep Reinforcement Learning by Mnih et al (2013).
  14. Mastering the Game of Go Without Human Knowledge by Silver et al (2017).
  15. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm by Silver et al (2017).

 

Fall 2021: Flyer for EC 400, an undergraduate course in RL.