Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Crayon鑫
Paddle
提交
75644caf
P
Paddle
项目概览
Crayon鑫
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
75644caf
编写于
3月 11, 2021
作者:
S
sandyhouse
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
update
上级
5cd2bfec
变更
4
隐藏空白更改
内联
并排
Showing
4 changed file
with
13 addition
and
12 deletion
+13
-12
python/paddle/distributed/fleet/meta_optimizers/sharding/fp16_helper.py
...distributed/fleet/meta_optimizers/sharding/fp16_helper.py
+1
-1
python/paddle/distributed/fleet/meta_optimizers/sharding/utils.py
...addle/distributed/fleet/meta_optimizers/sharding/utils.py
+1
-1
python/paddle/distributed/fleet/meta_optimizers/sharding_optimizer.py
...e/distributed/fleet/meta_optimizers/sharding_optimizer.py
+8
-10
python/paddle/fluid/optimizer.py
python/paddle/fluid/optimizer.py
+3
-0
未找到文件。
python/paddle/distributed/fleet/meta_optimizers/sharding/fp16_helper.py
浏览文件 @
75644caf
...
...
@@ -105,7 +105,7 @@ class FP16Utils(object):
reversed_x
=
[]
reversed_x_paramname
=
[]
for
input_name
in
op
.
desc
.
input
(
'X'
):
param_name
=
input_name
.
strip
(
"@GRAD"
)
param_name
=
input_name
.
strip
(
"@GRAD
@MERGED
"
)
if
param_name
not
in
shard
.
global_params
:
raise
ValueError
(
"Input 'X' of check_finite_and_unscale must"
...
...
python/paddle/distributed/fleet/meta_optimizers/sharding/utils.py
浏览文件 @
75644caf
...
...
@@ -357,7 +357,7 @@ def get_grad_device(grad_name, shard):
base_name
=
None
# mind the traversal order
possible_suffixes
=
[
'.cast_fp16@GRAD
_0'
,
'.cast_fp16@GRAD'
,
'@GRAD_0
'
,
'@GRAD'
'.cast_fp16@GRAD
@MERGED'
,
'.cast_fp16@GRAD'
,
'@GRAD@MERGED
'
,
'@GRAD'
]
for
suffix
in
possible_suffixes
:
if
suffix
in
grad_name
:
...
...
python/paddle/distributed/fleet/meta_optimizers/sharding_optimizer.py
浏览文件 @
75644caf
...
...
@@ -103,8 +103,6 @@ class ShardingOptimizer(MetaOptimizerBase):
self
.
pp_bz
=
self
.
user_defined_strategy
.
sharding_configs
[
"pp_bz"
]
self
.
pp_allreduce_in_optimize
=
self
.
user_defined_strategy
.
sharding_configs
[
"pp_allreduce_in_optimize"
]
self
.
optimize_offload
=
self
.
user_defined_strategy
.
sharding_configs
[
"optimize_offload"
]
if
self
.
inner_opt
is
None
:
raise
ValueError
(
...
...
@@ -947,8 +945,9 @@ class ShardingOptimizer(MetaOptimizerBase):
]
self
.
pp_group_size
=
self
.
pipeline_nodes
self
.
pp_group_endpoints
=
[
ep
for
idx
,
ep
in
enumerate
(
self
.
endpoints
)
if
(
idx
%
self
.
sharding_group_size
)
==
self
.
sharding_rank
ep
for
idx
,
ep
in
enumerate
(
self
.
endpoints
)
if
(
idx
%
self
.
sharding_group_size
)
==
self
.
sharding_rank
]
else
:
self
.
mp_group_id
=
0
...
...
@@ -972,12 +971,11 @@ class ShardingOptimizer(MetaOptimizerBase):
self
.
_inner_parallelism_size
*
self
.
sharding_group_size
)
self
.
megatron_rank
=
self
.
global_rank
%
self
.
_inner_parallelism_size
self
.
sharding_group_endpoints
=
[
ep
for
idx
,
ep
in
enumerate
(
self
.
endpoints
)
if
(
idx
//
(
self
.
_inner_parallelism_size
*
self
.
sharding_group_size
))
==
self
.
sharding_group_id
and
idx
%
self
.
_inner_parallelism_size
==
self
.
megatron_rank
ep
for
idx
,
ep
in
enumerate
(
self
.
endpoints
)
if
(
idx
//
(
self
.
_inner_parallelism_size
*
self
.
sharding_group_size
)
)
==
self
.
sharding_group_id
and
idx
%
self
.
_inner_parallelism_size
==
self
.
megatron_rank
]
print
(
"sharding_endpoint:"
,
self
.
sharding_group_endpoints
)
print
(
"sharding_rank:"
,
self
.
sharding_rank
)
...
...
python/paddle/fluid/optimizer.py
浏览文件 @
75644caf
...
...
@@ -4898,6 +4898,7 @@ class PipelineOptimizer(object):
self
.
_op_role_key
:
self
.
_op_role
.
Backward
,
})
offset
+=
1
merged_gradient_names
.
append
(
merged_param_grad_name
)
else
:
# cast gradient to fp32 to accumulate to merged gradient
cast_grad_var_name
=
param_grad_name
+
'@TMP'
...
...
@@ -4928,6 +4929,8 @@ class PipelineOptimizer(object):
self
.
_op_role_var_key
:
op_role_var
})
offset
+=
1
merged_gradient_names
.
append
(
merged_param_grad_name
)
return
merged_gradient_names
def
_add_sub_blocks
(
self
,
main_block
,
program_list
):
main_program
=
main_block
.
program
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录