dennybritz / reinforcement-learning
Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing dennybritz/reinforcement-learning in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Summary (README)
PreviewOverview
This repository provides code, exercises and solutions for popular Reinforcement Learning algorithms. These are meant to serve as a learning tool to complement the theoretical materials from
Each folder in corresponds to one or more chapters of the above textbook and/or course. In addition to exercises and solution, each folder also contains a list of learning goals, a brief concept summary, and links to the relevant readings.
All code is written in Python 3 and uses RL environments from OpenAI Gym. Advanced techniques use Tensorflow for neural network implementations.
Table of Contents
- Introduction to RL problems & OpenAI Gym
- MDPs and Bellman Equations
- Dynamic Programming: Model-Based RL, Policy Iteration and Value Iteration
- Monte Carlo Model-Free Prediction & Control
- Temporal Difference Model-Free Prediction & Control
- Function Approximation
- Deep Q Learning (WIP)
- Policy Gradient Methods (WIP)
- Learning and Planning (WIP)
- Exploration and Exploitation (WIP)
List of Implemented Algorithms
- Dynamic Programming Policy Evaluation
- Dynamic Programming Policy Iteration
- Dynamic Programming Value Iteration
- Monte Carlo Prediction
- Monte Carlo Control with Epsilon-Greedy Policies
- Monte Carlo Off-Policy Control with Importance Sampling
- SARSA (On Policy TD Learning)
- Q-Learning (Off Policy TD Learning)
- Q-Learning with Linear Function Approximation
- Deep Q-Learning for Atari Games
- Double Deep-Q Learning for Atari Games
- Deep Q-Learning with Prioritized Experience Replay (WIP)
- Policy Gradient: REINFORCE with Baseline
- Policy Gradient: Actor Critic with Baseline
- Policy Gradient: Actor Critic with Baseline for Continuous Action Spaces
- Deterministic Policy Gradients for Continuous Action Spaces (WIP)
- Deep Deterministic Policy Gradients (DDPG) (WIP)
- Asynchronous Advantage Actor Critic (A3C)
Resources
Textbooks:
Classes:
- David Silver's Reinforcement Learning Course (UCL, 2015)
- CS294 - Deep Reinforcement Learning (Berkeley, Fall 2015)
- CS 8803 - Reinforcement Learning (Georgia Tech)
- CS885 - Reinforcement Learning (UWaterloo), Spring 2018
- CS294-112 - Deep Reinforcement Learning (UC Berkeley)
Talks/Tutorials:
- Introduction to Reinforcement Learning (Joelle Pineau @ Deep Learning Summer School 2016)
- Deep Reinforcement Learning (Pieter Abbeel @ Deep Learning Summer School 2016)
- Deep Reinforcement Learning ICML 2016 Tutorial (David Silver)
- Tutorial: Introduction to Reinforcement Learning with Function Approximation
- John Schulman - Deep Reinforcement Learning (4 Lectures)
- Deep Reinforcement Learning Slides @ NIPS 2016
- OpenAI Spinning Up
- Advanced Deep Learning & Reinforcement Learning (UCL 2018, DeepMind) -Deep RL Bootcamp
Other Projects:
Selected Papers:
- Human-Level Control through Deep Reinforcement Learning (2015-02)
- Deep Reinforcement Learning with Double Q-learning (2015-09)
- Continuous control with deep reinforcement learning (2015-09)
- Prioritized Experience Replay (2015-11)
- Dueling Network Architectures for Deep Reinforcement Learning (2015-11)
- Asynchronous Methods for Deep Reinforcement Learning (2016-02)
- Deep Reinforcement Learning from Self-Play in Imperfect-Information Games (2016-03)
- Mastering the game of Go with deep neural networks and tree search