An easy way of deploying epsilon greedy exploration when sampling data has already been shown above. It is
called by the ``epsilon_greedy`` function each step. And you can select your own decay strategy, such as envstep and train_iter.
An easy way of deploying epsilon greedy exploration when sampling data is shown as follows:
.. code-block:: python
...
...
@@ -214,12 +213,14 @@ called by the ``epsilon_greedy`` function each step. And you can select your own
eps = epsilon_greedy(learner.train_iter)
...
Firstly, you should call ``get_epsilon_greedy_fn`` to acquire an eps-greedy function. Then, you should call ``epsilon_greedy`` function at each step. The epsilon decay strategy can be configured by you, for example, start value, end value, type of decay(linear, exponential). And you can control whether it decay by env step or train iteration.
Visualization & Logging
~~~~~~~~~~~~~~~~~~~~~~~~~
Some environments have a rendering surface or visualization. DI-engine doesn't use render interface but add a switch to save these replays.
After training, users can add the next lines to enable this function. If everything is working fine, you can find some videos with ``.mp4`` suffix in the ``replay_path`` (some GUI interfaces are normal).
Some environments have a rendering visualization. DI-engine doesn't use render interface, but supports saving replay videos instead.
After training, users can add the code shown below to enable this function. If everything works well, you can find some videos with ``.mp4`` suffix in directory ``replay_path`` (some GUI interfaces are normal).
.. code-block:: python
...
...
@@ -235,7 +236,7 @@ After training, users can add the next lines to enable this function. If everyth
.. note::
If users want to visualize with a trained policy, please refer to ``dizoo/classic_control/cartpole/entry/cartpole_dqn_eval.py`` to construct a user-defined evaluation function, and indicate two fields ``env.replay_path`` and ``policy.learn.learner.hook.load_ckpt_before_run`` in config, an example is shown as follows:
If users want to visualize with a trained policy, please refer to ``dizoo/classic_control/cartpole/entry/cartpole_dqn_eval.py`` to construct a user-defined evaluation function, and indicate two fields ``env.replay_path`` and ``policy.learn.learner.hook.load_ckpt_before_run`` in config. An example is shown as follows:
.. code-block:: python
...
...
@@ -252,7 +253,7 @@ After training, users can add the next lines to enable this function. If everyth
.. tip::
Each new RL environments can define their own ``enable_save_replay`` method to specify how to generate replay files. DI-engine utilizes ``gym wrapper (coupled with ffmpeg)`` to generate replay for some traditional environments. If users encounter some errors in recording videos by ``gym wrapper`` , you should install ``ffmpeg`` first.
All new RL environments can define their own ``enable_save_replay`` method to specify how to generate replay files. DI-engine utilizes ``gym wrapper (coupled with ffmpeg)`` to generate replays for some traditional environments. If users encounter some errors in recording videos by ``gym wrapper``, you should install ``ffmpeg`` first.
Similar with other Deep Learning platforms, DI-engine uses tensorboard to record key parameters and results during
...
...
@@ -269,8 +270,8 @@ DQN experiment.
Loading & Saving checkpoints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is usually needed to save and resume an experiments with model checkpoints. DI-engine saves and loads checkpoints
in the same way as PyTorch.
It is usually needed to save and resume an experiment with model checkpoint.
DI-engine saves and loads checkpoints in the same way as PyTorch.
@@ -347,8 +347,7 @@ collecting data, updating policy, updating related modules and evaluation.</p>
<p>DI-engine supports various useful tools in common RL training, as shown in follows.</p>
<divclass="section"id="epsilon-greedy">
<h3>Epsilon Greedy<aclass="headerlink"href="#epsilon-greedy"title="Permalink to this headline">¶</a></h3>
<p>An easy way of deploying epsilon greedy exploration when sampling data has already been shown above. It is
called by the <codeclass="docutils literal notranslate"><spanclass="pre">epsilon_greedy</span></code> function each step. And you can select your own decay strategy, such as envstep and train_iter.</p>
<p>An easy way of deploying epsilon greedy exploration when sampling data is shown as follows:</p>
@@ -358,11 +357,12 @@ called by the <code class="docutils literal notranslate"><span class="pre">epsil
<spanclass="o">...</span>
</pre></div>
</div>
<p>Firstly, you should call <codeclass="docutils literal notranslate"><spanclass="pre">get_epsilon_greedy_fn</span></code> to acquire an eps-greedy function. Then, you should call <codeclass="docutils literal notranslate"><spanclass="pre">epsilon_greedy</span></code> function at each step. The epsilon decay strategy can be configured by you, for example, start value, end value, type of decay(linear, exponential). And you can control whether it decay by env step or train iteration.</p>
</div>
<divclass="section"id="visualization-logging">
<h3>Visualization & Logging<aclass="headerlink"href="#visualization-logging"title="Permalink to this headline">¶</a></h3>
<p>Some environments have a rendering surface or visualization. DI-engine doesn’t use render interface but add a switch to save these replays.
After training, users can add the next lines to enable this function. If everything is working fine, you can find some videos with <codeclass="docutils literal notranslate"><spanclass="pre">.mp4</span></code> suffix in the<codeclass="docutils literal notranslate"><spanclass="pre">replay_path</span></code> (some GUI interfaces are normal).</p>
<p>Some environments have a rendering visualization. DI-engine doesn’t use render interface, but supports saving replay videos instead.
After training, users can add the code shown below to enable this function. If everything works well, you can find some videos with <codeclass="docutils literal notranslate"><spanclass="pre">.mp4</span></code> suffix in directory<codeclass="docutils literal notranslate"><spanclass="pre">replay_path</span></code> (some GUI interfaces are normal).</p>
@@ -375,7 +375,7 @@ After training, users can add the next lines to enable this function. If everyth
</div>
<divclass="admonition note">
<pclass="admonition-title">Note</p>
<p>If users want to visualize with a trained policy, please refer to <codeclass="docutils literal notranslate"><spanclass="pre">dizoo/classic_control/cartpole/entry/cartpole_dqn_eval.py</span></code> to construct a user-defined evaluation function, and indicate two fields <codeclass="docutils literal notranslate"><spanclass="pre">env.replay_path</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">policy.learn.learner.hook.load_ckpt_before_run</span></code> in config, an example is shown as follows:</p>
<p>If users want to visualize with a trained policy, please refer to <codeclass="docutils literal notranslate"><spanclass="pre">dizoo/classic_control/cartpole/entry/cartpole_dqn_eval.py</span></code> to construct a user-defined evaluation function, and indicate two fields <codeclass="docutils literal notranslate"><spanclass="pre">env.replay_path</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">policy.learn.learner.hook.load_ckpt_before_run</span></code> in config. An example is shown as follows:</p>
@@ -391,7 +391,7 @@ After training, users can add the next lines to enable this function. If everyth
</div>
<divclass="admonition tip">
<pclass="admonition-title">Tip</p>
<p>Each new RL environments can define their own <codeclass="docutils literal notranslate"><spanclass="pre">enable_save_replay</span></code> method to specify how to generate replay files. DI-engine utilizes <codeclass="docutils literal notranslate"><spanclass="pre">gym</span><spanclass="pre">wrapper</span><spanclass="pre">(coupled</span><spanclass="pre">with</span><spanclass="pre">ffmpeg)</span></code> to generate replay for some traditional environments. If users encounter some errors in recording videos by <codeclass="docutils literal notranslate"><spanclass="pre">gym</span><spanclass="pre">wrapper</span></code>, you should install <codeclass="docutils literal notranslate"><spanclass="pre">ffmpeg</span></code> first.</p>
<p>All new RL environments can define their own <codeclass="docutils literal notranslate"><spanclass="pre">enable_save_replay</span></code> method to specify how to generate replay files. DI-engine utilizes <codeclass="docutils literal notranslate"><spanclass="pre">gym</span><spanclass="pre">wrapper</span><spanclass="pre">(coupled</span><spanclass="pre">with</span><spanclass="pre">ffmpeg)</span></code> to generate replays for some traditional environments. If users encounter some errors in recording videos by <codeclass="docutils literal notranslate"><spanclass="pre">gym</span><spanclass="pre">wrapper</span></code>, you should install <codeclass="docutils literal notranslate"><spanclass="pre">ffmpeg</span></code> first.</p>
</div>
<p>Similar with other Deep Learning platforms, DI-engine uses tensorboard to record key parameters and results during
training. In addition to the default logging parameters, users can add their own logging parameters as follow.</p>
@@ -423,9 +423,9 @@ in the same way as PyTorch.</p>
</pre></div>
</div>
<p>To deploy this in a more elegant way, DI-engine is configured to use
<aclass="reference internal"href="../api_doc/worker/learner/learner.html#ding.worker.learner.learner_hook.LearnerHook"title="ding.worker.learner.learner_hook.LearnerHook"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Learner</span><spanclass="pre">Hooks</span></code></a> to handle these cases. The saving hook is
automatically frequently called after training iterations. And to load & save checkpoints at the beginning and
in the end, users can simply add one line code before & after training as follow.</p>
<aclass="reference internal"href="../api_doc/worker/learner/learner.html#ding.worker.learner.learner_hook.LearnerHook"title="ding.worker.learner.learner_hook.LearnerHook"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">Learner</span><spanclass="pre">Hook</span></code></a> to handle these cases. The saving hook is
automatically called after training iterations. And to load & save checkpoints at the beginning and
in the end, users can simply add one-line code before & after training as follows.</p>
<h1>Tensorboard and Logging demo<aclass="headerlink"href="#tensorboard-and-logging-demo"title="Permalink to this headline">¶</a></h1>
<divclass="toctree-wrapper compound">
</div>
<p>In this page, the default tensorboard and logging information is detailly described. A <codeclass="docutils literal notranslate"><spanclass="pre">CartPole</span><spanclass="pre">DQN</span></code> experiment
is used as example.</p>
<p>In this page, the default tensorboard and logging information is described in detail.
A <codeclass="docutils literal notranslate"><spanclass="pre">CartPole</span><spanclass="pre">DQN</span></code> experiment is used as example.</p>
<divclass="section"id="tensorboard-info">
<h2>Tensorboard info<aclass="headerlink"href="#tensorboard-info"title="Permalink to this headline">¶</a></h2>
<p>There are 4 main parts in tensorboard: buffer, collector, learner and evaluator, corresponding to 4 modules.