提交 ec9a58c6 编写于 作者: V Varuna Jayasiri

dqn experiment

上级 8d1be06a
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -1211,17 +1211,7 @@ You can change this while the experiment is running.
<p>Initialize the trainer</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">391</span> <span class="n">m</span> <span class="o">=</span> <span class="n">Trainer</span><span class="p">(</span>
<span class="lineno">392</span> <span class="n">updates</span><span class="o">=</span><span class="n">configs</span><span class="p">[</span><span class="s1">&#39;updates&#39;</span><span class="p">],</span>
<span class="lineno">393</span> <span class="n">epochs</span><span class="o">=</span><span class="n">configs</span><span class="p">[</span><span class="s1">&#39;epochs&#39;</span><span class="p">],</span>
<span class="lineno">394</span> <span class="n">n_workers</span><span class="o">=</span><span class="n">configs</span><span class="p">[</span><span class="s1">&#39;n_workers&#39;</span><span class="p">],</span>
<span class="lineno">395</span> <span class="n">worker_steps</span><span class="o">=</span><span class="n">configs</span><span class="p">[</span><span class="s1">&#39;worker_steps&#39;</span><span class="p">],</span>
<span class="lineno">396</span> <span class="n">batches</span><span class="o">=</span><span class="n">configs</span><span class="p">[</span><span class="s1">&#39;batches&#39;</span><span class="p">],</span>
<span class="lineno">397</span> <span class="n">value_loss_coef</span><span class="o">=</span><span class="n">configs</span><span class="p">[</span><span class="s1">&#39;value_loss_coef&#39;</span><span class="p">],</span>
<span class="lineno">398</span> <span class="n">entropy_bonus_coef</span><span class="o">=</span><span class="n">configs</span><span class="p">[</span><span class="s1">&#39;entropy_bonus_coef&#39;</span><span class="p">],</span>
<span class="lineno">399</span> <span class="n">clip_range</span><span class="o">=</span><span class="n">configs</span><span class="p">[</span><span class="s1">&#39;clip_range&#39;</span><span class="p">],</span>
<span class="lineno">400</span> <span class="n">learning_rate</span><span class="o">=</span><span class="n">configs</span><span class="p">[</span><span class="s1">&#39;learning_rate&#39;</span><span class="p">],</span>
<span class="lineno">401</span> <span class="p">)</span></pre></div>
<div class="highlight"><pre><span class="lineno">391</span> <span class="n">m</span> <span class="o">=</span> <span class="n">Trainer</span><span class="p">(</span><span class="o">**</span><span class="n">configs</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-93'>
......@@ -1232,8 +1222,8 @@ You can change this while the experiment is running.
<p>Run and monitor the experiment</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">404</span> <span class="k">with</span> <span class="n">experiment</span><span class="o">.</span><span class="n">start</span><span class="p">():</span>
<span class="lineno">405</span> <span class="n">m</span><span class="o">.</span><span class="n">run_training_loop</span><span class="p">()</span></pre></div>
<div class="highlight"><pre><span class="lineno">394</span> <span class="k">with</span> <span class="n">experiment</span><span class="o">.</span><span class="n">start</span><span class="p">():</span>
<span class="lineno">395</span> <span class="n">m</span><span class="o">.</span><span class="n">run_training_loop</span><span class="p">()</span></pre></div>
</div>
</div>
<div class='section' id='section-94'>
......@@ -1244,7 +1234,7 @@ You can change this while the experiment is running.
<p>Stop the workers</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">407</span> <span class="n">m</span><span class="o">.</span><span class="n">destroy</span><span class="p">()</span></pre></div>
<div class="highlight"><pre><span class="lineno">397</span> <span class="n">m</span><span class="o">.</span><span class="n">destroy</span><span class="p">()</span></pre></div>
</div>
</div>
<div class='section' id='section-95'>
......@@ -1255,8 +1245,8 @@ You can change this while the experiment is running.
<h2>Run it</h2>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">411</span><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
<span class="lineno">412</span> <span class="n">main</span><span class="p">()</span></pre></div>
<div class="highlight"><pre><span class="lineno">401</span><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
<span class="lineno">402</span> <span class="n">main</span><span class="p">()</span></pre></div>
</div>
</div>
<div class='footer'>
......
......@@ -498,14 +498,14 @@
<url>
<loc>https://nn.labml.ai/transformers/primer_ez/experiment.html</loc>
<lastmod>2021-09-23T16:30:00+00:00</lastmod>
<lastmod>2021-09-24T16:30:00+00:00</lastmod>
<priority>1.00</priority>
</url>
<url>
<loc>https://nn.labml.ai/transformers/primer_ez/variations.html</loc>
<lastmod>2021-09-23T16:30:00+00:00</lastmod>
<lastmod>2021-09-24T16:30:00+00:00</lastmod>
<priority>1.00</priority>
</url>
......@@ -890,7 +890,7 @@
<url>
<loc>https://nn.labml.ai/rl/dqn/experiment.html</loc>
<lastmod>2021-04-04T16:30:00+00:00</lastmod>
<lastmod>2021-10-02T16:30:00+00:00</lastmod>
<priority>1.00</priority>
</url>
......@@ -918,7 +918,7 @@
<url>
<loc>https://nn.labml.ai/rl/ppo/experiment.html</loc>
<lastmod>2021-08-19T16:30:00+00:00</lastmod>
<lastmod>2021-10-02T16:30:00+00:00</lastmod>
<priority>1.00</priority>
</url>
......
......@@ -18,10 +18,8 @@ This is a [PyTorch](https://pytorch.org) implementation of paper
Here is the [experiment](experiment.html) and [model](model.html) implementation.
\(
\def\green#1{{\color{yellowgreen}{#1}}}
\)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/dqn/experiment.ipynb)
[![View Run](https://img.shields.io/badge/labml-experiment-brightgreen)](https://app.labml.ai/run/a0da8048235511ecb9affd797fa27714)
"""
from typing import Tuple
......
......@@ -8,6 +8,9 @@ summary: Implementation of DQN experiment with Atari Breakout
This experiment trains a Deep Q Network (DQN) to play Atari Breakout game on OpenAI Gym.
It runs the [game environments on multiple processes](../game.html) to sample efficiently.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/dqn/experiment.ipynb)
[![View Run](https://img.shields.io/badge/labml-experiment-brightgreen)](https://app.labml.ai/run/a0da8048235511ecb9affd797fa27714)
"""
import numpy as np
......@@ -44,8 +47,6 @@ class Trainer:
update_target_model: int,
learning_rate: FloatDynamicHyperParam,
):
# #### Configurations
# number of workers
self.n_workers = n_workers
# steps sampled on each update
......@@ -92,8 +93,12 @@ class Trainer:
# initialize tensors for observations
self.obs = np.zeros((self.n_workers, 4, 84, 84), dtype=np.uint8)
# reset the workers
for worker in self.workers:
worker.child.send(("reset", None))
# get the initial observations
for i, worker in enumerate(self.workers):
self.obs[i] = worker.child.recv()
......
......@@ -5,6 +5,9 @@ summary: Implementation of neural network model for Deep Q Network (DQN).
---
# Deep Q Network (DQN) Model
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/dqn/experiment.ipynb)
[![View Run](https://img.shields.io/badge/labml-experiment-brightgreen)](https://app.labml.ai/run/a0da8048235511ecb9affd797fa27714)
"""
import torch
......
......@@ -8,6 +8,9 @@ summary: Annotated implementation of prioritized experience replay using a binar
This implements paper [Prioritized experience replay](https://papers.labml.ai/paper/1511.05952),
using a binary segment tree.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/dqn/experiment.ipynb)
[![View Run](https://img.shields.io/badge/labml-experiment-brightgreen)](https://app.labml.ai/run/a0da8048235511ecb9affd797fa27714)
"""
import random
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册