Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
Paddle
提交
22bb262a
P
Paddle
项目概览
BaiXuePrincess
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
22bb262a
编写于
3月 15, 2018
作者:
Y
Yu Yang
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Remove out of date design
上级
ae88fdef
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
0 addition
and
74 deletion
+0
-74
doc/design/parallel_executor.md
doc/design/parallel_executor.md
+0
-74
未找到文件。
doc/design/parallel_executor.md
已删除
100644 → 0
浏览文件 @
ae88fdef
# ParallelExecutor Design Doc
## Introduction
We introduce
`ParallelExecutor`
to run multi-GPU training in PaddlePaddle Fluid. It supports
1.
keeping a copy of the parameters on each GPU
1.
allreduce on a separate stream allowing computation and communication overlap
An example of switching single GPU training to multiple GPUs:
```
python
cost
=
your_neural_network
()
opt
=
fluid
.
optimizer
.
SGDOptimizer
()
opt
.
minimize
(
avg_cost
)
# change Executor -> ParallelExecutor
exe
=
fluid
.
ParallelExecutor
(
gpu_list
=
[
0
,
1
])
for
iter
in
xranges
(
iter_num
):
exe
.
run
()
```
## Design
In the constructor, a list of parameter, whose gradients need to be allreduced, is given.
During the runtime,
`ParallelExecutor`
starts
`#gpu`
threads to run each
`Executor`
. For every
operator run on each GPU, it will automatically sync with different streams when necessary.
```
c++
// if op's input is params' grad:
// sync with allreduce stream
// e.g. sgd should wait for allreduce to be finished
CallBack
->
BeforeOp
(
op
);
op
->
Run
(
*
local_scope
,
place_
);
// if op's output is params' grad:
// sync with computation stream
// e.g. allreduce shoudl wait for fc_grad to be finished.
CallBack
->
AfterOp
(
op
);
```
And the
`Callback`
object can be implemented as the following
```
c++
struct
AllReduceCallBack
{
void
BeforeOp
(
framework
::
OperatorBase
*
op
);
void
AfterOp
(
framework
::
OperatorBase
*
op
);
std
::
unordered_set
<
std
::
string
>
reduced_param_grad_names
;
std
::
unordered_set
<
std
::
string
>
param_grad_names_
;
platform
::
DeviceContext
*
computation_dev_ctx
;
// computation device context
platform
::
DeviceContext
*
communication_dev_ctx
;
// communication device context
framework
::
Scope
*
scope
;
platform
::
NCCL
::
Communicator
*
nccl_com
;
};
AllReduceCallBack
::
BeforeOp
(
framework
::
OperatorBase
*
op
)
{
if
(
op
->
Input
()
in
reduced_param_grad_names
)
{
communication_dev_ctx
->
Wait
();
reduced_param_grad_names
.
erase
(
op
->
Input
())
}
}
AllReduceCallBack
::
AfterOp
(
framework
::
OperatorBase
*
op
)
{
if
(
op
->
Output
()
in
param_grad_names
)
{
computation_dev_ctx
->
Wait
();
reduced_param_grad_names
.
insert
(
op
->
Output
());
ncclAllreduce
(
scope
,
op
->
Output
(),
communication_dev_ctx
);
}
}
```
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录