2020-08-14 18:21:38

b6158a0e · wizardforcel · 2d3fcef2 · b6158a0e
隐藏空白更改
内联并排

Showing with 19 addition and 19 deletion

docs/tf-1x-dl-cookbook/09.md docs/tf-1x-dl-cookbook/09.md +19 -19

未找到文件。
--- a/docs/tf-1x-dl-cookbook/09.md
+++ b/docs/tf-1x-dl-cookbook/09.md
@@ -47,7 +47,7 @@ Adapted from Reinforcement Learning: an Introduction by Sutton and BartoEven our

 # 做好准备

-首先要做的是安装 OpenAI Gym； 使用`pip install gym`可以完成最少的安装。 OpenAI 体育馆提供了多种环境，例如 Atari，棋盘游戏以及 2D 或 3D 物理引擎。 最小安装可在 Windows 上运行，并且仅支持基本环境-算法，`toy_text`和`classic_control`-但如果您要探索其他环境，则它们将需要更多的依赖项。 OS X 和 Ubuntu 支持完整版本。 可以在 OpenAI Gym 的 GitHub 链接（ [https://github.com/openai/gym#installing-dependencies-for-specific-environments](https://github.com/openai/gym#installing-dependencies-for-specific-environments) ）上阅读详细说明。
+首先要做的是安装 OpenAI Gym； 使用`pip install gym`可以完成最少的安装。 OpenAI 体育馆提供了多种环境，例如 Atari，棋盘游戏以及 2D 或 3D 物理引擎。 最小安装可在 Windows 上运行，并且仅支持基本环境-算法，`toy_text`和`classic_control`-但如果您要探索其他环境，则它们将需要更多的依赖项。 OS X 和 Ubuntu 支持完整版本。 可以在 OpenAI Gym 的 [GitHub 链接](https://github.com/openai/gym#installing-dependencies-for-specific-environments)上阅读详细说明。

 # 怎么做...

@@ -121,7 +121,7 @@ env.close()

 # 还有更多...

-OpenAI Gym 由许多不同的环境组成，其活跃的贡献社区在其中添加了许多环境。 要获取所有现有环境的列表，可以运行以下简单代码（摘自 [https://github.com/openai/gym](https://github.com/openai/gym) ）：
+OpenAI Gym 由许多不同的环境组成，其活跃的贡献社区在其中添加了许多环境。 要获取所有现有环境的列表，可以运行[以下简单代码](https://github.com/openai/gym)：

 ```py
 from gym import envs
@@ -137,9 +137,9 @@ for env_id in sorted(env_ids):

 # 也可以看看

-*   可以从 [https://gym.openai.com/envs](https://gym.openai.com/envs) 获取有关不同环境的详细信息。
-*   在[中为某些环境维护了 Wiki 页面 https://github.com/openai/gym/wiki](https://github.com/openai/gym/wiki)
-*   可以从 [https://github.com/openai/gym](https://github.com/openai/gym) 获得有关安装说明和依赖项的详细信息。
+*   可以从[这里](https://gym.openai.com/envs)获取有关不同环境的详细信息。
+*   在[这个链接](https://github.com/openai/gym/wiki)中为某些环境维护了 Wiki 页面
+*   可以从[这个链接](https://github.com/openai/gym)获得有关安装说明和依赖项的详细信息。

 # 实现神经网络代理来扮演吃豆人

@@ -298,7 +298,7 @@ Q 学习的最简单实现涉及维护和更新状态-作用值查找表； 表

 # 做好准备

-我们将训练一个线性神经网络来解决`'CartPole-v0'`环境（ [https://github.com/openai/gym/wiki/CartPole-v0](https://github.com/openai/gym/wiki/CartPole-v0) ）。 目的是平衡手推车上的杆。 观测状态由四个连续值参数组成：推车位置[-2.4，2.4]，推车速度[-∞，∞]，极角[〜-41.8º，〜41.8º]和尖端极速[-∞] ，∞]。 可以通过向左或向右推推车来实现平衡，因此动作空间由两个可能的动作组成。 您可以看到`CartPole-v0`环境空间：
+我们将训练一个线性神经网络来解决[`'CartPole-v0'`环境](https://github.com/openai/gym/wiki/CartPole-v0)。 目的是平衡手推车上的杆。 观测状态由四个连续值参数组成：推车位置`[-2.4, 2.4]`，推车速度`[-∞, ∞]`，极角`[~-41.8º, ~41.8º]`和极限速度`[-∞, ∞]`。 可以通过向左或向右推推车来实现平衡，因此动作空间由两个可能的动作组成。 您可以看到`CartPole-v0`环境空间：

 ![](img/e726841a-c270-47f9-a19d-0c1d342e87cd.png)

@@ -490,14 +490,14 @@ if __name__ == '__main__':

 尽管有很多有关 Q 学习的 Web 链接，但一些有用的链接如下：

-*   https://zh.wikipedia.org/wiki/Q 学习
-*   [http://mnemstudio.org/path-finding-q-learning-tutorial.htm](http://mnemstudio.org/path-finding-q-learning-tutorial.htm)
-*   [http://artint.info/html/ArtInt_265.html](http://artint.info/html/ArtInt_265.html)
-*   [https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0](https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0)
+*   https://zh.wikipedia.org/wiki/Q-Learning
+*   <http://mnemstudio.org/path-finding-q-learning-tutorial.htm>
+*   <http://artint.info/html/ArtInt_265.html>
+*   <https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0>

 # 使用 Deep Q Networks 的 Atari 游戏

-**深层 Q 网络**（**DQN**）是 Q 学习与**卷积神经网络**（**CNN**）的结合。 其他在 2013 年（ [https://arxiv.org/pdf/1312.5602.pdf](https://arxiv.org/pdf/1312.5602.pdf) ）。 CNN 网络具有提取空间信息的能力，因此能够从原始像素数据中学习成功的控制策略。 我们已经在第 4 章，*卷积神经网络*中使用了 CNN，因此我们直接从这里开始。
+**深层 Q 网络**（**DQN**）是 Q 学习与**卷积神经网络**（**CNN**）的结合。 [由 Mnih 等人在 2013 年提出](https://arxiv.org/pdf/1312.5602.pdf)。 CNN 网络具有提取空间信息的能力，因此能够从原始像素数据中学习成功的控制策略。 我们已经在第 4 章，*卷积神经网络*中使用了 CNN，因此我们直接从这里开始。

 This recipe is based on the original DQN paper, Playing Atari with Deep Reinforcement Learning by DeepMind. In the paper, they used a concept called **experience replay**, which involved randomly sampling the previous game moves (state, action reward, next state).

@@ -839,9 +839,9 @@ env = wrappers.Monitor(env, '/save-path')

 # 也可以看看

-*   Mnih，Volodymyr 等人，通过深度强化学习玩 Atari，arXiv 预印本 arXiv：1312.5602（2013）（ [https://arxiv.org/pdf/1312.5602.pdf](https://arxiv.org/pdf/1312.5602.pdf) ）
-*   Mnih，Volodymyr 等人。 通过深度强化学习进行人级控制，《自然》 518.7540（2015）：529-533
-*   玩 Atari 的 DQN 的一个很酷的实现： [https://github.com/devsisters/DQN-tensorflow](https://github.com/devsisters/DQN-tensorflow)
+*   `Mnih, Volodymyr, and others, Playing Atari with deep reinforcement learning, arXiv preprint arXiv:1312.5602 (2013) (https://arxiv.org/pdf/1312.5602.pdf)`
+*   `Mnih, Volodymyr, et al. Human-level control through deep reinforcement learning, Nature 518.7540 (2015): 529-533`
+*   [玩 Atari 的 DQN 的一个很酷的实现](https://github.com/devsisters/DQN-tensorflow)

 # 玩 Pong 游戏的策略梯度

@@ -879,7 +879,7 @@ def save(self):

 # 怎么做...

-1.  此食谱的代码基于 Andrej Karpathy 博客（ [http://karpathy.github.io/2016/05/31/rl/](http://karpathy.github.io/2016/05/31/rl/) ），并且其中一部分已由 Sam 的代码进行了改编 Greydanus（ [https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5](https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5) ）。
+1.  此食谱的代码基于 [Andrej Karpathy 博客](http://karpathy.github.io/2016/05/31/rl/)，并且其中一部分已由 [Sam Greydanus](https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5) 的代码进行了改编。
 2.  我们有通常的进口：

 ```py
@@ -1080,7 +1080,7 @@ AlphaGo Zero 使用深层神经网络，该网络将原始板表示形式（当

 # 也可以看看

-*   [https://arxiv.org/pdf/1602.01783.pdf](https://arxiv.org/pdf/1602.01783.pdf)
-*   [http://ufal.mff.cuni.cz/~straka/courses/npfl114/2016/sutton-bookdraft2016sep.pdf](http://ufal.mff.cuni.cz/~straka/courses/npfl114/2016/sutton-bookdraft2016sep.pdf)
-*   [http://karpathy.github.io/2016/05/31/rl/](http://karpathy.github.io/2016/05/31/rl/)
-*   Xavier Glorot 和 Yoshua Bengio，“了解训练深度前馈神经网络的困难”，第十三届国际人工智能与统计国际会议论文集，2010 年， [http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf)
\ No newline at end of file
+*   <https://arxiv.org/pdf/1602.01783.pdf>
+*   <http://ufal.mff.cuni.cz/~straka/courses/npfl114/2016/sutton-bookdraft2016sep.pdf>
+*   <http://karpathy.github.io/2016/05/31/rl/>
+*   Xavier Glorot 和 Yoshua Bengio，[“了解训练深度前馈神经网络的困难”](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf)，第十三届国际人工智能与统计国际会议论文集，2010 年
\ No newline at end of file