Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PARL
提交
f46ad361
P
PARL
项目概览
PaddlePaddle
/
PARL
通知
67
Star
3
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
18
列表
看板
标记
里程碑
合并请求
3
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PARL
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
18
Issue
18
列表
看板
标记
里程碑
合并请求
3
合并请求
3
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
f46ad361
编写于
3月 25, 2020
作者:
H
Hongsheng Zeng
提交者:
GitHub
3月 25, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fix a2c cannot run in paddle 1.6.0 (#232)
* fix a2c cannot run in paddle 1.6.0 * fix impala compatibility * yapf
上级
8c9bf1fa
变更
6
隐藏空白更改
内联
并排
Showing
6 changed file
with
13 addition
and
6 deletion
+13
-6
examples/A2C/atari_agent.py
examples/A2C/atari_agent.py
+4
-1
examples/IMPALA/atari_agent.py
examples/IMPALA/atari_agent.py
+4
-1
examples/IMPALA/train.py
examples/IMPALA/train.py
+1
-1
examples/LiftSim_baseline/A2C/lift_agent.py
examples/LiftSim_baseline/A2C/lift_agent.py
+4
-1
parl/algorithms/fluid/a3c.py
parl/algorithms/fluid/a3c.py
+0
-1
parl/algorithms/fluid/impala/impala.py
parl/algorithms/fluid/impala/impala.py
+0
-1
未找到文件。
examples/A2C/atari_agent.py
浏览文件 @
f46ad361
...
@@ -71,7 +71,10 @@ class AtariAgent(parl.Agent):
...
@@ -71,7 +71,10 @@ class AtariAgent(parl.Agent):
lr
=
layers
.
data
(
lr
=
layers
.
data
(
name
=
'lr'
,
shape
=
[
1
],
dtype
=
'float32'
,
append_batch_size
=
False
)
name
=
'lr'
,
shape
=
[
1
],
dtype
=
'float32'
,
append_batch_size
=
False
)
entropy_coeff
=
layers
.
data
(
entropy_coeff
=
layers
.
data
(
name
=
'entropy_coeff'
,
shape
=
[],
dtype
=
'float32'
)
name
=
'entropy_coeff'
,
shape
=
[
1
],
dtype
=
'float32'
,
append_batch_size
=
False
)
total_loss
,
pi_loss
,
vf_loss
,
entropy
=
self
.
alg
.
learn
(
total_loss
,
pi_loss
,
vf_loss
,
entropy
=
self
.
alg
.
learn
(
obs
,
actions
,
advantages
,
target_values
,
lr
,
entropy_coeff
)
obs
,
actions
,
advantages
,
target_values
,
lr
,
entropy_coeff
)
...
...
examples/IMPALA/atari_agent.py
浏览文件 @
f46ad361
...
@@ -58,7 +58,10 @@ class AtariAgent(parl.Agent):
...
@@ -58,7 +58,10 @@ class AtariAgent(parl.Agent):
lr
=
layers
.
data
(
lr
=
layers
.
data
(
name
=
'lr'
,
shape
=
[
1
],
dtype
=
'float32'
,
append_batch_size
=
False
)
name
=
'lr'
,
shape
=
[
1
],
dtype
=
'float32'
,
append_batch_size
=
False
)
entropy_coeff
=
layers
.
data
(
entropy_coeff
=
layers
.
data
(
name
=
'entropy_coeff'
,
shape
=
[],
dtype
=
'float32'
)
name
=
'entropy_coeff'
,
shape
=
[
1
],
dtype
=
'float32'
,
append_batch_size
=
False
)
self
.
learn_reader
=
fluid
.
layers
.
create_py_reader_by_data
(
self
.
learn_reader
=
fluid
.
layers
.
create_py_reader_by_data
(
capacity
=
32
,
capacity
=
32
,
...
...
examples/IMPALA/train.py
浏览文件 @
f46ad361
...
@@ -123,7 +123,7 @@ class Learner(object):
...
@@ -123,7 +123,7 @@ class Learner(object):
obs_np
,
actions_np
,
behaviour_logits_np
,
rewards_np
,
obs_np
,
actions_np
,
behaviour_logits_np
,
rewards_np
,
dones_np
,
dones_np
,
np
.
float32
(
self
.
lr
),
np
.
float32
(
self
.
lr
),
np
.
float32
(
self
.
entropy_coeff
)
np
.
array
([
self
.
entropy_coeff
],
dtype
=
'float32'
)
]
]
def
run_learn
(
self
):
def
run_learn
(
self
):
...
...
examples/LiftSim_baseline/A2C/lift_agent.py
浏览文件 @
f46ad361
...
@@ -67,7 +67,10 @@ class LiftAgent(parl.Agent):
...
@@ -67,7 +67,10 @@ class LiftAgent(parl.Agent):
lr
=
layers
.
data
(
lr
=
layers
.
data
(
name
=
'lr'
,
shape
=
[
1
],
dtype
=
'float32'
,
append_batch_size
=
False
)
name
=
'lr'
,
shape
=
[
1
],
dtype
=
'float32'
,
append_batch_size
=
False
)
entropy_coeff
=
layers
.
data
(
entropy_coeff
=
layers
.
data
(
name
=
'entropy_coeff'
,
shape
=
[],
dtype
=
'float32'
)
name
=
'entropy_coeff'
,
shape
=
[
1
],
dtype
=
'float32'
,
append_batch_size
=
False
)
total_loss
,
pi_loss
,
vf_loss
,
entropy
=
self
.
alg
.
learn
(
total_loss
,
pi_loss
,
vf_loss
,
entropy
=
self
.
alg
.
learn
(
obs
,
actions
,
advantages
,
target_values
,
lr
,
entropy_coeff
)
obs
,
actions
,
advantages
,
target_values
,
lr
,
entropy_coeff
)
...
...
parl/algorithms/fluid/a3c.py
浏览文件 @
f46ad361
...
@@ -72,7 +72,6 @@ class A3C(Algorithm):
...
@@ -72,7 +72,6 @@ class A3C(Algorithm):
policy_entropy
=
policy_distribution
.
entropy
()
policy_entropy
=
policy_distribution
.
entropy
()
entropy
=
layers
.
reduce_sum
(
policy_entropy
)
entropy
=
layers
.
reduce_sum
(
policy_entropy
)
entropy_coeff
=
layers
.
reshape
(
entropy_coeff
,
shape
=
[
1
])
total_loss
=
(
total_loss
=
(
pi_loss
+
vf_loss
*
self
.
vf_loss_coeff
+
entropy
*
entropy_coeff
)
pi_loss
+
vf_loss
*
self
.
vf_loss_coeff
+
entropy
*
entropy_coeff
)
...
...
parl/algorithms/fluid/impala/impala.py
浏览文件 @
f46ad361
...
@@ -78,7 +78,6 @@ class VTraceLoss(object):
...
@@ -78,7 +78,6 @@ class VTraceLoss(object):
self
.
entropy
=
layers
.
reduce_sum
(
policy_entropy
)
self
.
entropy
=
layers
.
reduce_sum
(
policy_entropy
)
# The summed weighted loss
# The summed weighted loss
entropy_coeff
=
layers
.
reshape
(
entropy_coeff
,
shape
=
[
1
])
self
.
total_loss
=
(
self
.
pi_loss
+
self
.
vf_loss
*
vf_loss_coeff
+
self
.
total_loss
=
(
self
.
pi_loss
+
self
.
vf_loss
*
vf_loss_coeff
+
self
.
entropy
*
entropy_coeff
)
self
.
entropy
*
entropy_coeff
)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录