Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
OpenDILab开源决策智能平台
DI-engine
提交
47315983
D
DI-engine
项目概览
OpenDILab开源决策智能平台
/
DI-engine
上一次同步 2 年多
通知
56
Star
321
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
1
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DI-engine
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
1
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
提交
47315983
编写于
8月 11, 2021
作者:
N
niuyazhe
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
hotfix(nyz): fix lunarlander dqn config and get formatted config
上级
fd908cdc
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
56 addition
and
12 deletion
+56
-12
ding/config/utils.py
ding/config/utils.py
+52
-5
dizoo/box2d/lunarlander/config/lunarlander_dqn_config.py
dizoo/box2d/lunarlander/config/lunarlander_dqn_config.py
+4
-7
未找到文件。
ding/config/utils.py
浏览文件 @
47315983
...
@@ -349,12 +349,59 @@ def save_config_formatted(config_: dict, path: str = 'formatted_total_config.py'
...
@@ -349,12 +349,59 @@ def save_config_formatted(config_: dict, path: str = 'formatted_total_config.py'
f
.
write
(
" replay_buffer=dict(
\n
"
)
f
.
write
(
" replay_buffer=dict(
\n
"
)
for
k4
,
v4
in
v3
.
items
():
for
k4
,
v4
in
v3
.
items
():
if
(
k4
!=
'monitor'
and
k4
!=
'thruput_controller'
):
if
(
k4
!=
'monitor'
and
k4
!=
'thruput_controller'
):
if
(
isinstance
(
v4
,
str
)):
if
(
isinstance
(
v4
,
dict
)):
f
.
write
(
" {}='{}',
\n
"
.
format
(
k4
,
v4
))
f
.
write
(
" {}=dict(
\n
"
.
format
(
k4
))
elif
v4
==
float
(
'inf'
):
for
k5
,
v5
in
v4
.
items
():
f
.
write
(
" {}=float('{}'),
\n
"
.
format
(
k4
,
v4
))
if
(
isinstance
(
v5
,
str
)):
f
.
write
(
" {}='{}',
\n
"
.
format
(
k5
,
v5
))
elif
v5
==
float
(
'inf'
):
f
.
write
(
" {}=float('{}'),
\n
"
.
format
(
k5
,
v5
))
elif
(
isinstance
(
v5
,
dict
)):
f
.
write
(
" {}=dict(
\n
"
.
format
(
k5
))
for
k6
,
v6
in
v5
.
items
():
if
(
isinstance
(
v6
,
str
)):
f
.
write
(
" {}='{}',
\n
"
.
format
(
k6
,
v6
))
elif
v6
==
float
(
'inf'
):
f
.
write
(
" {}=float('{}'),
\n
"
.
format
(
k6
,
v6
)
)
elif
(
isinstance
(
v6
,
dict
)):
f
.
write
(
" {}=dict(
\n
"
.
format
(
k6
))
for
k7
,
v7
in
v6
.
items
():
if
(
isinstance
(
v7
,
str
)):
f
.
write
(
" {}='{}',
\n
"
.
format
(
k7
,
v7
)
)
elif
v7
==
float
(
'inf'
):
f
.
write
(
" {}=float('{}'),
\n
"
.
format
(
k7
,
v7
)
)
else
:
f
.
write
(
" {}={},
\n
"
.
format
(
k7
,
v7
)
)
f
.
write
(
" ),
\n
"
)
else
:
f
.
write
(
" {}={},
\n
"
.
format
(
k6
,
v6
))
f
.
write
(
" ),
\n
"
)
else
:
f
.
write
(
" {}={},
\n
"
.
format
(
k5
,
v5
))
f
.
write
(
" ),
\n
"
)
else
:
else
:
f
.
write
(
" {}={},
\n
"
.
format
(
k4
,
v4
))
if
(
isinstance
(
v4
,
str
)):
f
.
write
(
" {}='{}',
\n
"
.
format
(
k4
,
v4
))
elif
v4
==
float
(
'inf'
):
f
.
write
(
" {}=float('{}'),
\n
"
.
format
(
k4
,
v4
))
else
:
f
.
write
(
" {}={},
\n
"
.
format
(
k4
,
v4
))
else
:
else
:
if
(
k4
==
'monitor'
):
if
(
k4
==
'monitor'
):
f
.
write
(
" monitor=dict(
\n
"
)
f
.
write
(
" monitor=dict(
\n
"
)
...
...
dizoo/box2d/lunarlander/config/lunarlander_dqn_config.py
浏览文件 @
47315983
...
@@ -17,11 +17,10 @@ lunarlander_dqn_default_config = dict(
...
@@ -17,11 +17,10 @@ lunarlander_dqn_default_config = dict(
cuda
=
False
,
cuda
=
False
,
# Whether the RL algorithm is on-policy or off-policy.
# Whether the RL algorithm is on-policy or off-policy.
on_policy
=
False
,
on_policy
=
False
,
# Model config used for model creating. Remember to change this, especially "obs_dim" and "action_dim" according to specific env.
model
=
dict
(
model
=
dict
(
obs_
dim
=
8
,
obs_
shape
=
8
,
action_
dim
=
4
,
action_
shape
=
4
,
encoder_hidden_
dim
_list
=
[
512
,
64
],
encoder_hidden_
size
_list
=
[
512
,
64
],
# Whether to use dueling head.
# Whether to use dueling head.
dueling
=
True
,
dueling
=
True
,
),
),
...
@@ -31,8 +30,6 @@ lunarlander_dqn_default_config = dict(
...
@@ -31,8 +30,6 @@ lunarlander_dqn_default_config = dict(
nstep
=
nstep
,
nstep
=
nstep
,
# learn_mode config
# learn_mode config
learn
=
dict
(
learn
=
dict
(
# How many steps to train after collector's one collection. Bigger "train_iteration" means bigger off-policy.
# collect data -> train fixed steps -> collect data -> ...
update_per_collect
=
10
,
update_per_collect
=
10
,
batch_size
=
64
,
batch_size
=
64
,
learning_rate
=
0.001
,
learning_rate
=
0.001
,
...
@@ -55,7 +52,7 @@ lunarlander_dqn_default_config = dict(
...
@@ -55,7 +52,7 @@ lunarlander_dqn_default_config = dict(
type
=
'exp'
,
type
=
'exp'
,
start
=
0.95
,
start
=
0.95
,
end
=
0.1
,
end
=
0.1
,
decay
=
50
_
000
,
decay
=
50000
,
),
),
replay_buffer
=
dict
(
replay_buffer_size
=
100000
,
)
replay_buffer
=
dict
(
replay_buffer_size
=
100000
,
)
),
),
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录