Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
be1373dc
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
be1373dc
编写于
4月 01, 2018
作者:
X
Xin Pan
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Polish
上级
46f3a39e
变更
3
显示空白变更内容
内联
并排
Showing
3 changed file
with
11 addition
and
7 deletion
+11
-7
paddle/fluid/framework/details/nccl_all_reduce_op_handle.h
paddle/fluid/framework/details/nccl_all_reduce_op_handle.h
+1
-1
paddle/fluid/framework/details/op_handle_base.h
paddle/fluid/framework/details/op_handle_base.h
+3
-1
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
...le/fluid/framework/details/threaded_ssa_graph_executor.cc
+7
-5
未找到文件。
paddle/fluid/framework/details/nccl_all_reduce_op_handle.h
浏览文件 @
be1373dc
...
...
@@ -39,7 +39,7 @@ struct NCCLAllReduceOpHandle : public OpHandleBase {
// Delay and buffer nccl_all_reduce together can significantly increase
// performance. Disable this feature by returning false.
bool
Is
DelayedOp
()
override
{
return
true
;
};
bool
Is
MultiDeviceTransfer
()
override
{
return
true
;
};
protected:
void
RunImpl
()
override
;
...
...
paddle/fluid/framework/details/op_handle_base.h
浏览文件 @
be1373dc
...
...
@@ -55,7 +55,9 @@ class OpHandleBase {
void
AddOutput
(
VarHandleBase
*
out
);
virtual
bool
IsDelayedOp
()
{
return
false
;
}
// If the Op involves data transfer of multiple devices that
// will likely block other computations.
virtual
bool
IsMultiDeviceTransfer
()
{
return
false
;
}
protected:
virtual
void
RunImpl
()
=
0
;
...
...
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
浏览文件 @
be1373dc
...
...
@@ -50,7 +50,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
// together since we currently cannot overlap computation and memcpy streams.
// Should revisit it if overlapping is available.
std
::
unordered_set
<
OpHandleBase
*>
delayed_ops
;
std
::
unordered_set
<
OpHandleBase
*>
after
_delayed_ops
;
std
::
unordered_set
<
OpHandleBase
*>
blocked_by
_delayed_ops
;
std
::
unordered_set
<
VarHandleBase
*>
delayed_vars
;
auto
InsertPendingVar
=
[
&
pending_vars
,
&
ready_vars
](
VarHandleBase
&
var
)
{
...
...
@@ -119,7 +119,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
auto
run_all_ready_ops
=
[
&
]
{
for
(
auto
*
op
:
ready_ops
)
{
if
(
op
->
Is
DelayedOp
())
{
if
(
op
->
Is
MultiDeviceTransfer
())
{
delayed_ops
.
insert
(
op
);
delayed_vars
.
insert
(
op
->
outputs_
.
begin
(),
op
->
outputs_
.
end
());
ready_vars
.
Extend
(
op
->
outputs_
);
...
...
@@ -162,20 +162,22 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
--
deps
;
if
(
deps
==
0
)
{
if
(
delayed_vars
.
find
(
ready_var
)
!=
delayed_vars
.
end
())
{
after
_delayed_ops
.
insert
(
op
);
blocked_by
_delayed_ops
.
insert
(
op
);
}
else
{
ready_ops
.
insert
(
op
);
}
}
}
}
// When there are no other ops to schedule, schedule buffered delayed
// ops and unblock other ops.
if
(
ready_ops
.
empty
()
&&
!
delayed_ops
.
empty
()
&&
running_ops_
==
0
)
{
RunDelayedOps
(
delayed_ops
);
delayed_ops
.
clear
();
for
(
auto
*
op
:
after
_delayed_ops
)
{
for
(
auto
*
op
:
blocked_by
_delayed_ops
)
{
ready_ops
.
insert
(
op
);
}
after
_delayed_ops
.
clear
();
blocked_by
_delayed_ops
.
clear
();
}
// Keep loop until all vars are ready.
}
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录