Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PARL
提交
e8797bd0
P
PARL
项目概览
PaddlePaddle
/
PARL
通知
67
Star
3
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
18
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PARL
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
18
Issue
18
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
e8797bd0
编写于
6月 16, 2020
作者:
R
rical730
提交者:
GitHub
6月 16, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
update tutorials (#298)
* update tutorials
上级
2deefa8f
变更
3
隐藏空白更改
内联
并排
Showing
3 changed file
with
26 addition
and
20 deletion
+26
-20
examples/tutorials/lesson3/dqn/train.py
examples/tutorials/lesson3/dqn/train.py
+2
-2
examples/tutorials/lesson4/policy_gradient/train.py
examples/tutorials/lesson4/policy_gradient/train.py
+16
-12
examples/tutorials/lesson5/ddpg/train.py
examples/tutorials/lesson5/ddpg/train.py
+8
-6
未找到文件。
examples/tutorials/lesson3/dqn/train.py
浏览文件 @
e8797bd0
...
...
@@ -21,7 +21,7 @@ import parl
from
parl.utils
import
logger
# 日志打印工具
from
model
import
Model
from
algorithm
import
DQN
from
algorithm
import
DQN
# from parl.algorithms import DQN # parl >= 1.3.1
from
agent
import
Agent
from
replay_memory
import
ReplayMemory
...
...
@@ -117,7 +117,7 @@ def main():
# test part
eval_reward
=
evaluate
(
env
,
agent
,
render
=
True
)
# render=True 查看显示效果
logger
.
info
(
'episode:{} e_greed:{}
test_
reward:{}'
.
format
(
logger
.
info
(
'episode:{} e_greed:{}
Test
reward:{}'
.
format
(
episode
,
agent
.
e_greed
,
eval_reward
))
# 训练结束,保存模型
...
...
examples/tutorials/lesson4/policy_gradient/train.py
浏览文件 @
e8797bd0
...
...
@@ -28,6 +28,7 @@ from parl.utils import logger
LEARNING_RATE
=
1e-3
# 训练一个episode
def
run_episode
(
env
,
agent
):
obs_list
,
action_list
,
reward_list
=
[],
[],
[]
obs
=
env
.
reset
()
...
...
@@ -44,19 +45,22 @@ def run_episode(env, agent):
return
obs_list
,
action_list
,
reward_list
# 评估 agent, 跑
1 个episode
# 评估 agent, 跑
5 个episode,总reward求平均
def
evaluate
(
env
,
agent
,
render
=
False
):
obs
=
env
.
reset
()
episode_reward
=
0
while
True
:
action
=
agent
.
predict
(
obs
)
obs
,
reward
,
isOver
,
_
=
env
.
step
(
action
)
episode_reward
+=
reward
if
render
:
env
.
render
()
if
isOver
:
break
return
episode_reward
eval_reward
=
[]
for
i
in
range
(
5
):
obs
=
env
.
reset
()
episode_reward
=
0
while
True
:
action
=
agent
.
predict
(
obs
)
obs
,
reward
,
isOver
,
_
=
env
.
step
(
action
)
episode_reward
+=
reward
if
render
:
env
.
render
()
if
isOver
:
break
eval_reward
.
append
(
episode_reward
)
return
np
.
mean
(
eval_reward
)
def
calc_reward_to_go
(
reward_list
,
gamma
=
1.0
):
...
...
examples/tutorials/lesson5/ddpg/train.py
浏览文件 @
e8797bd0
...
...
@@ -37,7 +37,8 @@ NOISE = 0.05 # 动作噪声方差
TRAIN_EPISODE
=
6e3
# 训练的总episode数
def
run_train_episode
(
agent
,
env
,
rpm
):
# 训练一个episode
def
run_episode
(
agent
,
env
,
rpm
):
obs
=
env
.
reset
()
total_reward
=
0
steps
=
0
...
...
@@ -68,7 +69,8 @@ def run_train_episode(agent, env, rpm):
return
total_reward
def
run_evaluate_episode
(
env
,
agent
,
render
=
False
):
# 评估 agent, 跑 5 个episode,总reward求平均
def
evaluate
(
env
,
agent
,
render
=
False
):
eval_reward
=
[]
for
i
in
range
(
5
):
obs
=
env
.
reset
()
...
...
@@ -109,16 +111,16 @@ def main():
rpm
=
ReplayMemory
(
MEMORY_SIZE
)
# 往经验池中预存数据
while
len
(
rpm
)
<
MEMORY_WARMUP_SIZE
:
run_
train_
episode
(
agent
,
env
,
rpm
)
run_episode
(
agent
,
env
,
rpm
)
episode
=
0
while
episode
<
TRAIN_EPISODE
:
for
i
in
range
(
50
):
total_reward
=
run_
train_
episode
(
agent
,
env
,
rpm
)
total_reward
=
run_episode
(
agent
,
env
,
rpm
)
episode
+=
1
eval_reward
=
run_evaluate_episod
e
(
env
,
agent
,
render
=
False
)
logger
.
info
(
'episode:{}
test_
reward:{}'
.
format
(
eval_reward
=
evaluat
e
(
env
,
agent
,
render
=
False
)
logger
.
info
(
'episode:{}
Test
reward:{}'
.
format
(
episode
,
eval_reward
))
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录