@@ -85,12 +85,14 @@ It does so by clipping gradient flow if the updated policy
is not close to the policy used to sample the data.</p>
<p>You can find an experiment that uses it <ahref="experiment.html">here</a>.
The experiment uses <ahref="gae.html">Generalized Advantage Estimation</a>.</p>
<p><ahref="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/rl/ppo/experiment.ipynb"><imgalt="Open In Colab"src="https://colab.research.google.com/assets/colab-badge.svg"/></a>
@@ -85,6 +85,8 @@ It does so by clipping gradient flow if the updated policy
is not close to the policy used to sample the data.</p>
<p>You can find an experiment that uses it <ahref="https://nn.labml.ai/rl/ppo/experiment.html">here</a>.
The experiment uses <ahref="https://nn.labml.ai/rl/ppo/gae.html">Generalized Advantage Estimation</a>.</p>
<p><ahref="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/rl/ppo/experiment.ipynb"><imgalt="Open In Colab"src="https://colab.research.google.com/assets/colab-badge.svg"/></a>
@@ -21,6 +21,9 @@ is not close to the policy used to sample the data.
You can find an experiment that uses it [here](experiment.html).
The experiment uses [Generalized Advantage Estimation](gae.html).
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/rl/ppo/experiment.ipynb)
@@ -8,6 +8,9 @@ summary: Annotated implementation to train a PPO agent on Atari Breakout game.
This experiment trains Proximal Policy Optimization (PPO) agent Atari Breakout game on OpenAI Gym.
It runs the [game environments on multiple processes](../game.html) to sample efficiently.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/rl/ppo/experiment.ipynb)
@@ -13,4 +13,7 @@ It does so by clipping gradient flow if the updated policy
is not close to the policy used to sample the data.
You can find an experiment that uses it [here](https://nn.labml.ai/rl/ppo/experiment.html).
The experiment uses [Generalized Advantage Estimation](https://nn.labml.ai/rl/ppo/gae.html).
\ No newline at end of file
The experiment uses [Generalized Advantage Estimation](https://nn.labml.ai/rl/ppo/gae.html).
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/rl/ppo/experiment.ipynb)