DI-engine
User Guide
Installation
Quick Start
Key Concept
Introduction to RL
RL Environments Tutorial
Hands on RL
DQN
C51
QRDQN
IQN
Rainbow
SQN
SQIL
A2C
ACER
PPO
PPG
IMPALA
DDPG
TD3
SAC
QMIX
COMA
ATOC
CollaQ
RND
GAIL
VPN
MCTS
AlphaGo
AlphaGoZero
RL Environments Tutorial
Best Practice
API Doc
FAQ
Feature
Developer Guide
Developer Guide
Tutorial-Developer
Architecture Design
DI-engine
Docs
»
Hands on RL
View page source
Hands on RL
¶
DQN
Overview
Quick Facts
Key Equations or Key Graphs
Pseudo-code
Extensions
Implementations
Reference
C51
Overview
Quick Facts
Pseudo-code
Key Equations or Key Graphs
Extensions
Implementation
QRDQN
Overview
Quick Facts
Key Equations or Key Graphs
Pseudo-code
Extensions
Implementation
IQN
Overview
Quick Facts
Key Equations
Key Graphs
Extensions
Implementation
References
Rainbow
Overview
Quick Facts
Double Q-learning
Prioritized Experience Replay(PER)
Dueling Network
Multi-step Learning
Noisy Net
Extensions
Implementation
Experiments on Rainbow Tricks
References
SQN
Overview
Quick Facts
Key Equations or Key Graphs
Pseudocode
Extensions
Implementation
Other Public Implementations
SQIL
Overview
Quick Facts
Key Equations or Key Graphs
Pseudo-code
Implementations
References
A2C
Overview
Quick Facts
Key Equations or Key Graphs
Pseudo-code
Extensions
Implementation
References
ACER
Overview
Quick Facts
Key Equations
Retrace Q-value estimation
policy gradient
Pseudocode
Implementations
Reference
PPO
Overview
Quick Facts
Key Equations or Key Graphs
Pseudo-code
Extensions
Implementation
References
PPG
Overview
Quick Facts
Key Graphs
Key Equations
Pseudo-code
Extensions
Implementation
References
IMPALA
Overview
Quick Facts
Key Equations
Key Graphs
Implementations
Reference
DDPG
Overview
Quick Facts
Key Equations or Key Graphs
Pseudocode
Extensions
Implementations
Model
Train actor-critic model
Target Network
Other Public Implementations
References
TD3
Overview
Quick Facts
Key Equations or Key Graphs
Pseudocode
Extensions
Implementations
Model
Train actor-critic model
Target Network
Target Policy Smoothing Regularization
Other Public Implementations
References
SAC
Overview
Quick Facts
Key Equations or Key Graphs
Pseudocode
Extensions
Implementation
Other Public Implementations
QMIX
Overview
Quick Facts
Key Equations or Key Graphs
Extensions
Implementations
References
COMA
Overview
Quick Facts
Key Equations or Key Graphs
Extensions
Implementations
References
ATOC
Overview
Quick Facts
Key Equations or Key Graphs
Extensions
Implementations
References
CollaQ
Overview
Quick Facts
Key Equations or Key Graphs
Extensions
Implementations
References
RND
Overview
Quick Facts
Key Equations or Key Graphs
The implementation details that matters
Pseudo-Code
Code Implementation
RndNetwork
RndRewardModel
Train RndRewardModel
Calculate RND Reward
Benchmark Results
Author’s Tensorflow Implementation
Reference
GAIL
Overview
Quick Facts
Key Equations or Key Graphs
Pseudo-Code
Extensions
Reference
VPN
Overview
Quick Facts
Key Equations or Key Graphs
Pseudo-code
Extensions
Implementations
Reference
MCTS
Overview
Quick Facts
Key Equations or Key Graphs
Pseudo-code
Extensions
Implementations
References
AlphaGo
Overview
Quick-Facts
Key Equations or Key Graphs
Pseudo-code
Extensions
Implementations
AlphaGoZero
Overview
Quick-Facts
Key Equations or Key Graphs
Pseudo-code
Extensions
Implementations
References