Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
73eacf3e
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
73eacf3e
编写于
5年前
作者:
G
gongweibao
提交者:
GitHub
5年前
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Polish codes of old prs (#17981)
上级
fe43b2ee
release/1.5
v1.5.2
v1.5.1
v1.5.0
无相关合并请求
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
14 addition
and
8 deletion
+14
-8
paddle/fluid/framework/ir/alloc_continuous_space_for_grad_pass.cc
...luid/framework/ir/alloc_continuous_space_for_grad_pass.cc
+6
-3
python/paddle/distributed/launch.py
python/paddle/distributed/launch.py
+8
-5
未找到文件。
paddle/fluid/framework/ir/alloc_continuous_space_for_grad_pass.cc
浏览文件 @
73eacf3e
...
...
@@ -42,6 +42,9 @@ DEFINE_int32(
namespace
paddle
{
namespace
framework
{
namespace
ir
{
// unit of the FLAGS_fuse_parameter_memory_size.
static
constexpr
double
kMB
=
1048576.0
;
// SetFuseParameterGroupsSize and SetFuseParameterMemorySize are used in unit
// test, because it is invalid that seting 'FLAGS_fuse_parameter_memory_size'
// and 'FLAGS_fuse_parameter_groups_size' in unit test.
...
...
@@ -228,8 +231,8 @@ class AllocContinuousSpaceForGradPass : public ir::Pass {
}
VLOG
(
10
)
<<
out
.
str
()
<<
", group size:"
<<
group_grads_params
->
at
(
i
).
size
()
<<
", group memory size:"
<<
static_cast
<
double
>
(
gps_size
)
/
1048576.0
<<
"(MB)"
;
<<
", group memory size:"
<<
static_cast
<
double
>
(
gps_size
)
/
kMB
<<
"(MB)"
;
}
}
...
...
@@ -270,7 +273,7 @@ class AllocContinuousSpaceForGradPass : public ir::Pass {
break
;
}
if
(
static_cast
<
double
>
(
local_group_memory_size
)
/
1048576.0
>=
if
(
static_cast
<
double
>
(
local_group_memory_size
)
/
kMB
>=
group_memory_size
)
{
break
;
}
...
...
This diff is collapsed.
Click to expand it.
python/paddle/distributed/launch.py
浏览文件 @
73eacf3e
...
...
@@ -164,6 +164,13 @@ def start_procs(args):
", node_ips:"
,
node_ips
,
", nranks:"
,
nranks
)
current_env
=
copy
.
copy
(
default_env
)
# paddle broadcast ncclUniqueId use socket, and
# proxy maybe make trainers unreachable, so delete them.
# if we set them to "", grpc will log error message "bad uri"
# so just delete them.
current_env
.
pop
(
"http_proxy"
,
None
)
current_env
.
pop
(
"https_proxy"
,
None
)
procs
=
[]
cmds
=
[]
for
i
in
range
(
0
,
selected_gpus_num
):
...
...
@@ -173,11 +180,7 @@ def start_procs(args):
"PADDLE_CURRENT_ENDPOINT"
:
"%s:%d"
%
(
current_node_ip
,
args
.
started_port
+
i
),
"PADDLE_TRAINERS_NUM"
:
"%d"
%
nranks
,
"PADDLE_TRAINER_ENDPOINTS"
:
trainers_endpoints
,
# paddle broadcast ncclUniqueId use socket, and
# proxy maybe make trainers unreachable, so set them to ""
"http_proxy"
:
""
,
"https_proxy"
:
""
"PADDLE_TRAINER_ENDPOINTS"
:
trainers_endpoints
})
cmd
=
[
sys
.
executable
,
"-u"
,
args
.
training_script
...
...
This diff is collapsed.
Click to expand it.
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录
反馈
建议
客服
返回
顶部