Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
49cce3fd
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
49cce3fd
编写于
12月 28, 2018
作者:
Q
Qiao Longfei
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fix dist sparse l2 decay
test=develop
上级
dc8eca82
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
13 addition
and
12 deletion
+13
-12
python/paddle/fluid/tests/unittests/dist_se_resnext.py
python/paddle/fluid/tests/unittests/dist_se_resnext.py
+0
-1
python/paddle/fluid/transpiler/distribute_transpiler.py
python/paddle/fluid/transpiler/distribute_transpiler.py
+13
-11
未找到文件。
python/paddle/fluid/tests/unittests/dist_se_resnext.py
浏览文件 @
49cce3fd
...
...
@@ -235,7 +235,6 @@ class DistSeResneXt2x2(TestDistRunnerBase):
bd
=
[
step
*
e
for
e
in
epochs
]
base_lr
=
0.1
lr
=
[]
lr
=
[
base_lr
*
(
0.1
**
i
)
for
i
in
range
(
len
(
bd
)
+
1
)]
optimizer
=
fluid
.
optimizer
.
Momentum
(
...
...
python/paddle/fluid/transpiler/distribute_transpiler.py
浏览文件 @
49cce3fd
...
...
@@ -744,12 +744,6 @@ class DistributeTranspiler(object):
elif
op
not
in
lr_ops
:
self
.
_append_pserver_non_opt_ops
(
block
,
op
)
def
__op_have_grad_input__
(
op
):
for
varname
in
op
.
input_arg_names
:
if
varname
.
find
(
"@GRAD"
)
>=
0
:
return
varname
return
""
def
__clone_lr_op_sub_block__
(
op
,
program
,
lr_block
):
if
not
op
.
has_attr
(
'sub_block'
):
return
...
...
@@ -800,7 +794,7 @@ class DistributeTranspiler(object):
merged_var
=
None
for
_
,
op
in
enumerate
(
self
.
optimize_ops
):
# find the origin grad var before clipping/L2Decay,
# merged_var should be the input var name of L2Decay
buil
# merged_var should be the input var name of L2Decay
grad_varname_for_block
=
op
.
attr
(
OP_ROLE_VAR_ATTR_NAME
)[
1
]
if
op
.
attr
(
OP_ROLE_VAR_ATTR_NAME
)[
0
]
==
optimize_target_param_name
:
...
...
@@ -1278,9 +1272,8 @@ class DistributeTranspiler(object):
# create table param and grad var in pserver program
# create table optimize block in pserver program
table_opt_op
=
[
op
for
op
in
self
.
optimize_ops
if
'Param'
in
op
.
input_names
and
op
.
input
(
"Param"
)[
0
]
==
self
.
table_name
op
for
op
in
self
.
optimize_ops
if
'Param'
in
op
.
input_names
and
op
.
input
(
"Param"
)[
0
]
==
self
.
table_name
][
0
]
origin_param_var
=
self
.
origin_program
.
global_block
().
vars
[
...
...
@@ -1676,7 +1669,16 @@ class DistributeTranspiler(object):
if
self
.
config
.
enable_dc_asgd
:
new_inputs
[
key
]
=
dc
else
:
new_inputs
[
key
]
=
merged_var
# Note!! This is for l2decay on sparse gradient, because it will create a new tensor for
# decayed gradient but not inplace modify the origin one
origin_grad_name
=
opt_op
.
input
(
key
)[
0
]
if
core
.
kNewGradSuffix
(
)
in
origin_grad_name
and
pserver_block
.
has_var
(
origin_grad_name
):
new_grad
=
pserver_block
.
var
(
origin_grad_name
)
new_inputs
[
key
]
=
new_grad
else
:
new_inputs
[
key
]
=
merged_var
elif
key
==
"Param"
:
param_block
=
_get_param_block
(
opt_op
)
if
not
param_block
:
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录