Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Crayon鑫
Paddle
提交
7081f214
P
Paddle
项目概览
Crayon鑫
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
7081f214
编写于
1月 23, 2018
作者:
K
kavyasrinet
提交者:
GitHub
1月 23, 2018
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Update the parameter_server doc (#7805)
上级
7ed48bd0
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
20 addition
and
20 deletion
+20
-20
doc/design/dist_refactor/parameter_server.md
doc/design/dist_refactor/parameter_server.md
+20
-20
未找到文件。
doc/design/dist_refactor/parameter_server.md
浏览文件 @
7081f214
...
...
@@ -9,16 +9,16 @@ different purposes.
## Background
The previous implementations of the parameter server do
es
not run a
The previous implementations of the parameter server do not run a
fluid sub-program. Parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
trainer a
nd
the parameter server.
trainer a
s well as
the parameter server.
It would be great if we can write code once and use them on both the
trainer and the parameter server
:
reduces code duplication and
improves extensibility. Given that after the current refactor, we are
representing everything as a comput
ing
graph on the
trainer. Representing everything as a comput
ing
graph on the parameter
It would be great if we can write code once and use them on both
:
the
trainer and the parameter server
, since this
reduces code duplication and
improves extensibility. Given that after the current refactor
ing
, we are
representing everything as a comput
ation
graph on the
trainer. Representing everything as a comput
ation
graph on the parameter
server becomes a natural extension.
## Design
...
...
@@ -30,9 +30,9 @@ into sub-programs to be scheduled on different nodes with the following
steps:
1.
OP placement: the OPs will be placed on different nodes according
to
heuristic that minimizes
estimated total computation
to
a heuristic that minimizes the
estimated total computation
time. Currently we will use a simple heuristic that puts parameter
varable on parameter server workers and everything else on trainer
var
i
able on parameter server workers and everything else on trainer
workers.
1.
Add communication OPs to enable the communication between nodes.
...
...
@@ -47,22 +47,22 @@ After converting:
<img
src=
"src/dist-graph.png"
width=
"700"
/>
1.
The parameter variable W and it
'
s optimizer program are placed on the parameter server.
1.
The parameter variable W and its optimizer program are placed on the parameter server.
1.
Operators are added to the program.
-
*Send*
sends data to the connected
*Recv*
operator. The
scheduler on the receive node will only schedule
*Recv*
operator
to run when the
*Send*
operator has ran (the
*Send*
OP will mark
the
*Recv*
OP runnable automatically).
-
*Enueue*
enqueues the input variable, it can block until space
-
*En
q
ueue*
enqueues the input variable, it can block until space
become available in the queue.
-
*Dequeue*
outputs configurable numbers of tensors from the
queue. It will block until the queue ha
ve
the required number of
queue. It will block until the queue ha
s
the required number of
tensors.
### Benefits
-
Model parallelism become
easier to implement: it'
s an extension to
-
Model parallelism become
s easier to implement: it i
s an extension to
the trainer - parameter server approach. We can have several "Transpilers"
to achieve different goals.
-
User-defined optimizer is easier to add - user can now express it as
...
...
@@ -72,22 +72,22 @@ After converting:
### Challenges
-
It
's important to balance the parameter shards of
on multiple
parameter server
. If a single parameter is very big (
some
-
It
is important to balance the parameter shards
on multiple
parameter server
s. If a single parameter is very big (for example:
some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
-
In the "Aync SGD" figure, the "W" variable on the parameter server
could be read and wr
ote
concurrently. See
-
In the "A
s
ync SGD" figure, the "W" variable on the parameter server
could be read and wr
itten
concurrently. See
[
here
](
https://github.com/PaddlePaddle/Paddle/pull/6394
)
for more
details about concurrent program in
f
luid.
details about concurrent program in
F
luid.
### Discussion
-
Can the Enqueue OP be implemented under our current tensor design
(put
s
the input tensor into the queue tensor)?
-
*Dequeue*
OP will have variable numbers of output (depend
s
on the
(put the input tensor into the queue tensor)?
-
*Dequeue*
OP will have variable numbers of output (depend
ing
on the
`min_count`
attribute), does our current design support it? (similar
question for the
*Add*
OP)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录