{"id":665,"date":"2020-10-13T15:19:25","date_gmt":"2020-10-13T19:19:25","guid":{"rendered":"https:\/\/sites.bu.edu\/aolshevsky\/?page_id=665"},"modified":"2021-05-08T07:18:49","modified_gmt":"2021-05-08T11:18:49","slug":"foundations-of-reinforcement-learning","status":"publish","type":"page","link":"https:\/\/sites.bu.edu\/aolshevsky\/foundations-of-reinforcement-learning\/","title":{"rendered":"EC 700 A3, Spring 2021: Introduction to Reinforcement Learning"},"content":{"rendered":"<p><strong>Course Description:<\/strong> Reinforcement learning is a subfield of artificial intelligence which deals with learning from repeated interactions with an environment.\u00a0 Reinforcement learning is the basis for state-of-the-art algorithms for playing strategy games such as Chess, Go, Backgammon, and Starcraft, as well as a number of problems throughout robotics, operations research, and other fields of engineering. In this course, we will study the fundamental principles of reinforcement learning.\u00a0 Our focus will be the main algorithms in the field: we&#8217;ll try to develop a mathematical understanding of why they work. While a moderate amount of coding will be required, the most important pre-requisite is mathematical maturity and a facility with constructing proofs.<\/p>\n<p><strong>Instructor:\u00a0<\/strong><a href=\"https:\/\/sites.bu.edu\/aolshevsky\">Alex Olshevsky<\/a><\/p>\n<p><strong>Time:\u00a0<\/strong>MW 2:30-4:15<\/p>\n<p><strong>Tentative Plan: <\/strong><\/p>\n<ol>\n<li>Markov Decision Process and Dynamic Programming, Value Iteration and Policy Iteration.<\/li>\n<li>Monte Carlo and Temporal Difference Methods. SARSA, DYNA, and Q-learning. Eligibility traces.<\/li>\n<li>Sample Complexity Issues.<\/li>\n<li>Using Deep Neural Networks for Approximation. Deep Q-learning. Approximation theorems, the benefits of depth, PAC-Bayes bounds for neural networks.<\/li>\n<li>Policy Gradient, Actor-Critic methods.<\/li>\n<li>Imitation learning, Inverse Reinforcement Learning, Distributed Reinforcement Learning (time permitting).<\/li>\n<li>Frontiers of RL research: using LSTM, Attention models, Transformers with reinforcement learning (time permitting).<\/li>\n<\/ol>\n<p><strong>Prerequisites:\u00a0<\/strong>Some basic proficiency with Python. Strong familiarity with Markov chains. The most important prerequisite is mathematical maturity and a facility with constructing proofs.<\/p>\n<p><strong>Textbook: <\/strong>There will not be a single textbook four the course; rather, we will rely on lecture notes which will mix materials from different sources, most of which will come from research papers. In terms of textbooks, we&#8217;ll use bits and pieces from:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.incompleteideas.net\/book\/the-book-2nd.html\">Reinforcement Learning<\/a> by Sutton and Barto. Available for free online.<\/li>\n<li><a href=\"https:\/\/www.amazon.com\/Deep-Reinforcement-Learning-Python-Hands\/dp\/0135172381\/ref=pd_sbs_14_2\/144-7209571-5934528\">Foundations of Deep Reinforcement Learning<\/a> by Graesser and Keng.<\/li>\n<li><a href=\"https:\/\/www.amazon.com\/Neuro-Dynamic-Programming-Optimization-Neural-Computation\/dp\/1886529108\">Neuro-Dynamic Programming<\/a> by Bertsekas and Tsitsiklis.<\/li>\n<li><a href=\"https:\/\/www.amazon.com\/Markov-Decision-Processes-Stochastic-Programming\/dp\/0471727822\">Markov Decision Processes<\/a> by Puterman.<\/li>\n<li><a href=\"https:\/\/www.amazon.com\/Dynamic-Programming-Optimal-Control-Vol\/dp\/1886529442\">Dynamic Programming, Vol. II<\/a> by Bertsekas.<\/li>\n<\/ul>\n<p><strong>Resources:\u00a0<\/strong><\/p>\n<ul>\n<li>A great set videos for <a href=\"http:\/\/nanjiang.cs.illinois.edu\/cs598\/\">CS 598<\/a> taught at UIUC.<\/li>\n<li>Lecture notes from a <a href=\"https:\/\/ieor8100.github.io\/rl\/\">IEOR 8100<\/a> taught at Columbia.<\/li>\n<li>A set of lecture videos from <a href=\"https:\/\/www.youtube.com\/watch?v=FgzM3zpZ55o&amp;list=PLoROMvodv4rOSOPzutgyCTapiGlY2Nd8u\">CS 234<\/a> at Stanford.<\/li>\n<li>Lecture notes from <a href=\"https:\/\/web.stanford.edu\/class\/msande338\/\">MS&amp;E 338<\/a> at Stanford.<\/li>\n<li><a href=\"https:\/\/sites.google.com\/illinois.edu\/mdps-and-rl\/home?authuser=1\">ECE 586<\/a> at UIUC, including a set of lecture notes.<\/li>\n<li>Working draft of <a href=\"https:\/\/rltheorybook.github.io\/\">a textbook<\/a> by Agarwal, Jiang, Kakade, Sun.<\/li>\n<li>Reinforcement learning <a href=\"https:\/\/www.reddit.com\/r\/reinforcementlearning\/\">subreddit<\/a>.<\/li>\n<li><a href=\"https:\/\/www.davidsilver.uk\/teaching\/\">Ten lecture videos<\/a> for a class on RL by David Silver.<\/li>\n<li>Videos of <a href=\"https:\/\/www.youtube.com\/playlist?list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A\">CS 285<\/a> at Berkeley.<\/li>\n<li>Lecture videos for <a href=\"https:\/\/ocw.mit.edu\/courses\/electrical-engineering-and-computer-science\/6-832-underactuated-robotics-spring-2009\/video-lectures\/\">Underactuated Robotics<\/a> at MIT. The second half of the course focuses on RL methods.<\/li>\n<li>Lecture videos for <a href=\"https:\/\/people.eecs.berkeley.edu\/~pabbeel\/cs287-fa19\/\">Advanced Robotics<\/a> at Berkeley. A substantial part of the class is about RL.<\/li>\n<li>Deep RL <a href=\"https:\/\/sites.google.com\/view\/deep-rl-bootcamp\/lectures\">bootcamp<\/a>.<\/li>\n<li><a href=\"https:\/\/www.youtube.com\/playlist?list=PLyqSpQzTE6M_FwzHFAyf4LSkz_IjMyjD9\">Introduction to Reinforcement Learning<\/a> from IIT.<\/li>\n<\/ul>\n<p><strong>Classic Papers:\u00a0<\/strong><\/p>\n<ol>\n<li><a href=\"https:\/\/www.jstor.org\/stable\/24900506\">A Markovian Decision Proces<\/a>s by Bellman (1957).<\/li>\n<li><a href=\"https:\/\/science.sciencemag.org\/content\/153\/3731\/34.abstract\">Dynamic Programming<\/a> by Bellman (1966).<\/li>\n<li><a href=\"http:\/\/www.derongliu.org\/adp\/adp-cdrom\/Barto1983.pdf\">Neuron-Like Adaptive Elements that Can Solve Difficult Learning Control Problems<\/a> by Barto, Sutton, and Andersen (1983).<\/li>\n<li><a href=\"https:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00115009.pdf\">Learning to Predict by the Method of Temporal Differences<\/a> by Sutton (1988).<\/li>\n<li><a href=\"https:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00992698.pdf\">Q-Learning<\/a> By Watkins and Dayan (1992).<\/li>\n<li><a href=\"https:\/\/people.cs.umass.edu\/~barto\/courses\/cs687\/williams92simple.pdf\">Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning<\/a> by Williams (1992).<\/li>\n<li><a href=\"https:\/\/link.springer.com\/content\/pdf\/10.1007\/BF00993306.pdf\">Asynchronous Stochastic Approximation and Q-Learning<\/a> by Tsitsiklis (1994).<\/li>\n<li><a href=\"https:\/\/cling.csd.uwo.ca\/cs346a\/extra\/tdgammon.pdf\">Temporal Difference Learning and TD-Gammon<\/a> by Tesauro (1995).<\/li>\n<li><a href=\"https:\/\/proceedings.neurips.cc\/paper\/1995\/file\/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf\">Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding<\/a> by Sutton (1996).<\/li>\n<li><a href=\"http:\/\/web.mit.edu\/jnt\/www\/Papers\/J063-97-bvr-td.pdf\">An Analysis of Temporal Differences Methods with Function Approximation<\/a> by Tsitsiklis and Van Roy (1998).<\/li>\n<li><a href=\"https:\/\/papers.nips.cc\/paper\/1999\/file\/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf\">Policy Gradient Methods for Reinforcement Learning with Function Approximation<\/a> by Sutton, McAllester, Singh, Mansour (2000).<\/li>\n<li><a href=\"http:\/\/web.mit.edu\/~jnt\/www\/Papers\/J094-03-kon-actors.pdf\">On Actor-Critic Methods<\/a> by Konda and Tsitsiklis (2003).<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1312.5602\">Playing Atari with Deep Reinforcement Learning<\/a> by Mnih et al (2013).<\/li>\n<li><a href=\"https:\/\/www.nature.com\/articles\/nature24270.\">Mastering the Game of Go Without Human Knowledge<\/a> by Silver et al (2017).<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1712.01815\">Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm<\/a> by Silver et al (2017).<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p><strong>Fall 2021:\u00a0<\/strong><a href=\"\/aolshevsky\/files\/2021\/04\/700_syllabus.pdf\">Flyer<\/a>\u00a0for EC 400, an undergraduate course in RL.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Course Description: Reinforcement learning is a subfield of artificial intelligence which deals with learning from repeated interactions with an environment.\u00a0 Reinforcement learning is the basis for state-of-the-art algorithms for playing strategy games such as Chess, Go, Backgammon, and Starcraft, as well as a number of problems throughout robotics, operations research, and other fields of engineering. [&hellip;]<\/p>\n","protected":false},"author":12455,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/sites.bu.edu\/aolshevsky\/wp-json\/wp\/v2\/pages\/665"}],"collection":[{"href":"https:\/\/sites.bu.edu\/aolshevsky\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/sites.bu.edu\/aolshevsky\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/sites.bu.edu\/aolshevsky\/wp-json\/wp\/v2\/users\/12455"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.bu.edu\/aolshevsky\/wp-json\/wp\/v2\/comments?post=665"}],"version-history":[{"count":50,"href":"https:\/\/sites.bu.edu\/aolshevsky\/wp-json\/wp\/v2\/pages\/665\/revisions"}],"predecessor-version":[{"id":753,"href":"https:\/\/sites.bu.edu\/aolshevsky\/wp-json\/wp\/v2\/pages\/665\/revisions\/753"}],"wp:attachment":[{"href":"https:\/\/sites.bu.edu\/aolshevsky\/wp-json\/wp\/v2\/media?parent=665"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}