Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
OpenDILab开源决策智能平台
DI-engine
提交
6e345566
D
DI-engine
项目概览
OpenDILab开源决策智能平台
/
DI-engine
上一次同步 接近 3 年
通知
67
Star
322
Fork
1
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
1
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DI-engine
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
1
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
6e345566
编写于
10月 22, 2021
作者:
P
PaParaZz1
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Deploying to gh-pages from @ aa91fa603c10aa9d51eaa2ea55f2f3cee7831340
🚀
上级
281ffe75
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
7 addition
and
4 deletion
+7
-4
_modules/ding/policy/ppo.html
_modules/ding/policy/ppo.html
+7
-4
未找到文件。
_modules/ding/policy/ppo.html
浏览文件 @
6e345566
...
...
@@ -200,6 +200,8 @@
<span
class=
"n"
>
recompute_adv
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
True
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
continuous
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
True
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
multi_agent
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
False
</span><span
class=
"p"
>
,
</span>
<span
class=
"c1"
>
# (bool) Whether to need policy data in process transition
</span>
<span
class=
"n"
>
transition_with_policy_data
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
True
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
learn
</span><span
class=
"o"
>
=
</span><span
class=
"nb"
>
dict
</span><span
class=
"p"
>
(
</span>
<span
class=
"c1"
>
# (bool) Whether to use multi gpu
</span>
<span
class=
"n"
>
multi_gpu
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
False
</span><span
class=
"p"
>
,
</span>
...
...
@@ -509,7 +511,7 @@
<span
class=
"k"
>
for
</span>
<span
class=
"n"
>
i
</span>
<span
class=
"ow"
>
in
</span>
<span
class=
"nb"
>
range
</span><span
class=
"p"
>
(
</span><span
class=
"nb"
>
len
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
)):
</span>
<span
class=
"n"
>
data
</span><span
class=
"p"
>
[
</span><span
class=
"n"
>
i
</span><span
class=
"p"
>
][
</span><span
class=
"s1"
>
'
value
'
</span><span
class=
"p"
>
]
</span>
<span
class=
"o"
>
*=
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_running_mean_std
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
std
</span>
<span
class=
"n"
>
data
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
get_gae
</span><span
class=
"p"
>
(
</span>
<span
class=
"n"
>
data
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
to_device
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
last_value
</span><span
class=
"p"
>
,
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_device
</span><span
class=
"p"
>
),
</span>
<span
class=
"n"
>
gamma
</span><span
class=
"o"
>
=
</span><span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_gamma
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
gae_lambda
</span><span
class=
"o"
>
=
</span><span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_gae_lambda
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
cuda
</span><span
class=
"o"
>
=
</span><span
class=
"
bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_cuda
</span>
<span
class=
"n"
>
data
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
to_device
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
last_value
</span><span
class=
"p"
>
,
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_device
</span><span
class=
"p"
>
),
</span>
<span
class=
"n"
>
gamma
</span><span
class=
"o"
>
=
</span><span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_gamma
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
gae_lambda
</span><span
class=
"o"
>
=
</span><span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_gae_lambda
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
cuda
</span><span
class=
"o"
>
=
</span><span
class=
"
kc"
>
False
</span><span
class=
"p"
>
,
</span>
<span
class=
"p"
>
)
</span>
<span
class=
"k"
>
if
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_value_norm
</span><span
class=
"p"
>
:
</span>
<span
class=
"k"
>
for
</span>
<span
class=
"n"
>
i
</span>
<span
class=
"ow"
>
in
</span>
<span
class=
"nb"
>
range
</span><span
class=
"p"
>
(
</span><span
class=
"nb"
>
len
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
)):
</span>
...
...
@@ -592,8 +594,7 @@
<span
class=
"nb"
>
type
</span><span
class=
"o"
>
=
</span><span
class=
"s1"
>
'
ppo
'
</span><span
class=
"p"
>
,
</span>
<span
class=
"c1"
>
# (bool) Whether to use cuda for network.
</span>
<span
class=
"n"
>
cuda
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
False
</span><span
class=
"p"
>
,
</span>
<span
class=
"c1"
>
# (bool) Whether the RL algorithm is on-policy or off-policy. (Note: in practice PPO can be off-policy used)
</span>
<span
class=
"n"
>
on_policy
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
True
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
on_policy
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
False
</span><span
class=
"p"
>
,
</span>
<span
class=
"c1"
>
# (bool) Whether to use priority(priority sample, IS weight, update priority)
</span>
<span
class=
"n"
>
priority
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
False
</span><span
class=
"p"
>
,
</span>
<span
class=
"c1"
>
# (bool) Whether use Importance Sampling Weight to correct biased update. If True, priority must be True.
</span>
...
...
@@ -601,6 +602,8 @@
<span
class=
"c1"
>
# (bool) Whether to use nstep_return for value loss
</span>
<span
class=
"n"
>
nstep_return
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
False
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
nstep
</span><span
class=
"o"
>
=
</span><span
class=
"mi"
>
3
</span><span
class=
"p"
>
,
</span>
<span
class=
"c1"
>
# (bool) Whether to need policy data in process transition
</span>
<span
class=
"n"
>
transition_with_policy_data
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
True
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
learn
</span><span
class=
"o"
>
=
</span><span
class=
"nb"
>
dict
</span><span
class=
"p"
>
(
</span>
<span
class=
"c1"
>
# (bool) Whether to use multi gpu
</span>
<span
class=
"n"
>
multi_gpu
</span><span
class=
"o"
>
=
</span><span
class=
"kc"
>
False
</span><span
class=
"p"
>
,
</span>
...
...
@@ -844,7 +847,7 @@
<span
class=
"n"
>
data
</span><span
class=
"p"
>
[
</span><span
class=
"o"
>
-
</span><span
class=
"mi"
>
1
</span><span
class=
"p"
>
][
</span><span
class=
"s1"
>
'
done
'
</span><span
class=
"p"
>
],
</span>
<span
class=
"n"
>
gamma
</span><span
class=
"o"
>
=
</span><span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_gamma
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
gae_lambda
</span><span
class=
"o"
>
=
</span><span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_gae_lambda
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
cuda
</span><span
class=
"o"
>
=
</span><span
class=
"
bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_cuda
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
cuda
</span><span
class=
"o"
>
=
</span><span
class=
"
kc"
>
False
</span><span
class=
"p"
>
,
</span>
<span
class=
"p"
>
)
</span>
<span
class=
"k"
>
if
</span>
<span
class=
"ow"
>
not
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_nstep_return
</span><span
class=
"p"
>
:
</span>
<span
class=
"k"
>
return
</span>
<span
class=
"n"
>
get_train_sample
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
data
</span><span
class=
"p"
>
,
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
_unroll_len
</span><span
class=
"p"
>
)
</span>
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录