Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Crayon鑫
Paddle
提交
46f3a39e
P
Paddle
项目概览
Crayon鑫
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
46f3a39e
编写于
4月 01, 2018
作者:
X
Xin Pan
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
polish and add comments.
上级
d0ac9253
变更
3
隐藏空白更改
内联
并排
Showing
3 changed file
with
12 addition
and
3 deletion
+12
-3
paddle/fluid/framework/details/nccl_all_reduce_op_handle.h
paddle/fluid/framework/details/nccl_all_reduce_op_handle.h
+2
-0
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
...le/fluid/framework/details/threaded_ssa_graph_executor.cc
+4
-1
python/paddle/fluid/parallel_executor.py
python/paddle/fluid/parallel_executor.py
+6
-2
未找到文件。
paddle/fluid/framework/details/nccl_all_reduce_op_handle.h
浏览文件 @
46f3a39e
...
...
@@ -37,6 +37,8 @@ struct NCCLAllReduceOpHandle : public OpHandleBase {
std
::
string
Name
()
const
override
;
// Delay and buffer nccl_all_reduce together can significantly increase
// performance. Disable this feature by returning false.
bool
IsDelayedOp
()
override
{
return
true
;
};
protected:
...
...
paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
浏览文件 @
46f3a39e
...
...
@@ -45,7 +45,10 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
std
::
unordered_set
<
VarHandleBase
*>
pending_vars
;
BlockingQueue
<
VarHandleBase
*>
ready_vars
;
std
::
unordered_set
<
OpHandleBase
*>
ready_ops
;
// For ops (e.g. nccl_all_reduce) that need to coordinate multiple
// streams from multiple GPUs, it's faster to buffer them and schedule
// together since we currently cannot overlap computation and memcpy streams.
// Should revisit it if overlapping is available.
std
::
unordered_set
<
OpHandleBase
*>
delayed_ops
;
std
::
unordered_set
<
OpHandleBase
*>
after_delayed_ops
;
std
::
unordered_set
<
VarHandleBase
*>
delayed_vars
;
...
...
python/paddle/fluid/parallel_executor.py
浏览文件 @
46f3a39e
...
...
@@ -16,7 +16,6 @@ import core
import
multiprocessing
import
framework
import
executor
import
sys
__all__
=
[
'ParallelExecutor'
]
...
...
@@ -36,7 +35,12 @@ class ParallelExecutor(object):
places
.
append
(
p
)
if
num_threads
is
None
:
num_threads
=
len
(
places
)
if
use_cuda
:
# Experiments on se-resnext shows that too many threads hurt
# performance. Worth tunning for other models in the future.
num_threads
=
len
(
places
)
else
:
min
(
len
(
places
)
*
2
,
multiprocessing
.
cpu_count
())
startup
=
framework
.
default_startup_program
()
main
=
framework
.
default_main_program
()
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录