Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
机器未来
Paddle
提交
bf0f4a21
P
Paddle
项目概览
机器未来
/
Paddle
与 Fork 源项目一致
Fork自
PaddlePaddle / Paddle
通知
1
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1
Issue
1
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
bf0f4a21
编写于
1月 24, 2018
作者:
T
Travis CI
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Deploy to GitHub Pages:
7081f214
上级
ca3ebe34
变更
6
隐藏空白更改
内联
并排
Showing
6 changed file
with
82 addition
and
82 deletion
+82
-82
develop/doc/_sources/design/dist_refactor/parameter_server.md.txt
...doc/_sources/design/dist_refactor/parameter_server.md.txt
+20
-20
develop/doc/design/dist_refactor/parameter_server.html
develop/doc/design/dist_refactor/parameter_server.html
+20
-20
develop/doc/searchindex.js
develop/doc/searchindex.js
+1
-1
develop/doc_cn/_sources/design/dist_refactor/parameter_server.md.txt
..._cn/_sources/design/dist_refactor/parameter_server.md.txt
+20
-20
develop/doc_cn/design/dist_refactor/parameter_server.html
develop/doc_cn/design/dist_refactor/parameter_server.html
+20
-20
develop/doc_cn/searchindex.js
develop/doc_cn/searchindex.js
+1
-1
未找到文件。
develop/doc/_sources/design/dist_refactor/parameter_server.md.txt
浏览文件 @
bf0f4a21
...
...
@@ -9,16 +9,16 @@ different purposes.
## Background
The previous implementations of the parameter server do
es
not run a
The previous implementations of the parameter server do not run a
fluid sub-program. Parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
trainer a
nd
the parameter server.
trainer a
s well as
the parameter server.
It would be great if we can write code once and use them on both the
trainer and the parameter server
:
reduces code duplication and
improves extensibility. Given that after the current refactor, we are
representing everything as a comput
ing
graph on the
trainer. Representing everything as a comput
ing
graph on the parameter
It would be great if we can write code once and use them on both
:
the
trainer and the parameter server
, since this
reduces code duplication and
improves extensibility. Given that after the current refactor
ing
, we are
representing everything as a comput
ation
graph on the
trainer. Representing everything as a comput
ation
graph on the parameter
server becomes a natural extension.
## Design
...
...
@@ -30,9 +30,9 @@ into sub-programs to be scheduled on different nodes with the following
steps:
1. OP placement: the OPs will be placed on different nodes according
to
heuristic that minimizes
estimated total computation
to
a heuristic that minimizes the
estimated total computation
time. Currently we will use a simple heuristic that puts parameter
varable on parameter server workers and everything else on trainer
var
i
able on parameter server workers and everything else on trainer
workers.
1. Add communication OPs to enable the communication between nodes.
...
...
@@ -47,22 +47,22 @@ After converting:
<img src="src/dist-graph.png" width="700"/>
1. The parameter variable W and it
'
s optimizer program are placed on the parameter server.
1. The parameter variable W and its optimizer program are placed on the parameter server.
1. Operators are added to the program.
- *Send* sends data to the connected *Recv* operator. The
scheduler on the receive node will only schedule *Recv* operator
to run when the *Send* operator has ran (the *Send* OP will mark
the *Recv* OP runnable automatically).
- *Enueue* enqueues the input variable, it can block until space
- *En
q
ueue* enqueues the input variable, it can block until space
become available in the queue.
- *Dequeue* outputs configurable numbers of tensors from the
queue. It will block until the queue ha
ve
the required number of
queue. It will block until the queue ha
s
the required number of
tensors.
### Benefits
- Model parallelism become
easier to implement: it'
s an extension to
- Model parallelism become
s easier to implement: it i
s an extension to
the trainer - parameter server approach. We can have several "Transpilers"
to achieve different goals.
- User-defined optimizer is easier to add - user can now express it as
...
...
@@ -72,22 +72,22 @@ After converting:
### Challenges
- It
's important to balance the parameter shards of
on multiple
parameter server
. If a single parameter is very big (
some
- It
is important to balance the parameter shards
on multiple
parameter server
s. If a single parameter is very big (for example:
some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
- In the "Aync SGD" figure, the "W" variable on the parameter server
could be read and wr
ote
concurrently. See
- In the "A
s
ync SGD" figure, the "W" variable on the parameter server
could be read and wr
itten
concurrently. See
[here](https://github.com/PaddlePaddle/Paddle/pull/6394) for more
details about concurrent program in
f
luid.
details about concurrent program in
F
luid.
### Discussion
- Can the Enqueue OP be implemented under our current tensor design
(put
s
the input tensor into the queue tensor)?
- *Dequeue* OP will have variable numbers of output (depend
s
on the
(put the input tensor into the queue tensor)?
- *Dequeue* OP will have variable numbers of output (depend
ing
on the
`min_count` attribute), does our current design support it? (similar
question for the *Add* OP)
...
...
develop/doc/design/dist_refactor/parameter_server.html
浏览文件 @
bf0f4a21
...
...
@@ -220,15 +220,15 @@ different purposes.</p>
</div>
<div
class=
"section"
id=
"background"
>
<span
id=
"background"
></span><h2>
Background
<a
class=
"headerlink"
href=
"#background"
title=
"Permalink to this headline"
>
¶
</a></h2>
<p>
The previous implementations of the parameter server do
es
not run a
<p>
The previous implementations of the parameter server do not run a
fluid sub-program. Parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
trainer a
nd
the parameter server.
</p>
<p>
It would be great if we can write code once and use them on both the
trainer and the parameter server
:
reduces code duplication and
improves extensibility. Given that after the current refactor, we are
representing everything as a comput
ing
graph on the
trainer. Representing everything as a comput
ing
graph on the parameter
trainer a
s well as
the parameter server.
</p>
<p>
It would be great if we can write code once and use them on both
:
the
trainer and the parameter server
, since this
reduces code duplication and
improves extensibility. Given that after the current refactor
ing
, we are
representing everything as a comput
ation
graph on the
trainer. Representing everything as a comput
ation
graph on the parameter
server becomes a natural extension.
</p>
</div>
<div
class=
"section"
id=
"design"
>
...
...
@@ -240,9 +240,9 @@ into sub-programs to be scheduled on different nodes with the following
steps:
</p>
<ol
class=
"simple"
>
<li>
OP placement: the OPs will be placed on different nodes according
to
heuristic that minimizes
estimated total computation
to
a heuristic that minimizes the
estimated total computation
time. Currently we will use a simple heuristic that puts parameter
varable on parameter server workers and everything else on trainer
var
i
able on parameter server workers and everything else on trainer
workers.
</li>
<li>
Add communication OPs to enable the communication between nodes.
</li>
</ol>
...
...
@@ -253,16 +253,16 @@ subgraphs for the trainer and the parameter server:</p>
<p>
After converting:
</p>
<p><img
src=
"src/dist-graph.png"
width=
"700"
/></p>
<ol
class=
"simple"
>
<li>
The parameter variable W and it
’
s optimizer program are placed on the parameter server.
</li>
<li>
The parameter variable W and its optimizer program are placed on the parameter server.
</li>
<li>
Operators are added to the program.
<ul>
<li><em>
Send
</em>
sends data to the connected
<em>
Recv
</em>
operator. The
scheduler on the receive node will only schedule
<em>
Recv
</em>
operator
to run when the
<em>
Send
</em>
operator has ran (the
<em>
Send
</em>
OP will mark
the
<em>
Recv
</em>
OP runnable automatically).
</li>
<li><em>
Enueue
</em>
enqueues the input variable, it can block until space
<li><em>
En
q
ueue
</em>
enqueues the input variable, it can block until space
become available in the queue.
</li>
<li><em>
Dequeue
</em>
outputs configurable numbers of tensors from the
queue. It will block until the queue ha
ve
the required number of
queue. It will block until the queue ha
s
the required number of
tensors.
</li>
</ul>
</li>
...
...
@@ -271,7 +271,7 @@ tensors.</li>
<div
class=
"section"
id=
"benefits"
>
<span
id=
"benefits"
></span><h3>
Benefits
<a
class=
"headerlink"
href=
"#benefits"
title=
"Permalink to this headline"
>
¶
</a></h3>
<ul
class=
"simple"
>
<li>
Model parallelism become
easier to implement: it
’
s an extension to
<li>
Model parallelism become
s easier to implement: it i
s an extension to
the trainer - parameter server approach. We can have several
“
Transpilers
”
to achieve different goals.
</li>
<li>
User-defined optimizer is easier to add - user can now express it as
...
...
@@ -283,24 +283,24 @@ server mentioned in the background section.</li>
<div
class=
"section"
id=
"challenges"
>
<span
id=
"challenges"
></span><h3>
Challenges
<a
class=
"headerlink"
href=
"#challenges"
title=
"Permalink to this headline"
>
¶
</a></h3>
<ul
class=
"simple"
>
<li>
It
’
s important to balance the parameter shards of
on multiple
parameter server
. If a single parameter is very big (
some
<li>
It
is important to balance the parameter shards
on multiple
parameter server
s. If a single parameter is very big (for example:
some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
</li>
<li>
In the
“
Aync SGD
”
figure, the
“
W
”
variable on the parameter server
could be read and wr
ote
concurrently. See
<li>
In the
“
A
s
ync SGD
”
figure, the
“
W
”
variable on the parameter server
could be read and wr
itten
concurrently. See
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/pull/6394"
>
here
</a>
for more
details about concurrent program in
f
luid.
</li>
details about concurrent program in
F
luid.
</li>
</ul>
</div>
<div
class=
"section"
id=
"discussion"
>
<span
id=
"discussion"
></span><h3>
Discussion
<a
class=
"headerlink"
href=
"#discussion"
title=
"Permalink to this headline"
>
¶
</a></h3>
<ul
class=
"simple"
>
<li>
Can the Enqueue OP be implemented under our current tensor design
(put
s
the input tensor into the queue tensor)?
</li>
<li><em>
Dequeue
</em>
OP will have variable numbers of output (depend
s
on the
(put the input tensor into the queue tensor)?
</li>
<li><em>
Dequeue
</em>
OP will have variable numbers of output (depend
ing
on the
<code
class=
"docutils literal"
><span
class=
"pre"
>
min_count
</span></code>
attribute), does our current design support it? (similar
question for the
<em>
Add
</em>
OP)
</li>
</ul>
...
...
develop/doc/searchindex.js
浏览文件 @
bf0f4a21
因为 它太大了无法显示 source diff 。你可以改为
查看blob
。
develop/doc_cn/_sources/design/dist_refactor/parameter_server.md.txt
浏览文件 @
bf0f4a21
...
...
@@ -9,16 +9,16 @@ different purposes.
## Background
The previous implementations of the parameter server do
es
not run a
The previous implementations of the parameter server do not run a
fluid sub-program. Parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
trainer a
nd
the parameter server.
trainer a
s well as
the parameter server.
It would be great if we can write code once and use them on both the
trainer and the parameter server
:
reduces code duplication and
improves extensibility. Given that after the current refactor, we are
representing everything as a comput
ing
graph on the
trainer. Representing everything as a comput
ing
graph on the parameter
It would be great if we can write code once and use them on both
:
the
trainer and the parameter server
, since this
reduces code duplication and
improves extensibility. Given that after the current refactor
ing
, we are
representing everything as a comput
ation
graph on the
trainer. Representing everything as a comput
ation
graph on the parameter
server becomes a natural extension.
## Design
...
...
@@ -30,9 +30,9 @@ into sub-programs to be scheduled on different nodes with the following
steps:
1. OP placement: the OPs will be placed on different nodes according
to
heuristic that minimizes
estimated total computation
to
a heuristic that minimizes the
estimated total computation
time. Currently we will use a simple heuristic that puts parameter
varable on parameter server workers and everything else on trainer
var
i
able on parameter server workers and everything else on trainer
workers.
1. Add communication OPs to enable the communication between nodes.
...
...
@@ -47,22 +47,22 @@ After converting:
<img src="src/dist-graph.png" width="700"/>
1. The parameter variable W and it
'
s optimizer program are placed on the parameter server.
1. The parameter variable W and its optimizer program are placed on the parameter server.
1. Operators are added to the program.
- *Send* sends data to the connected *Recv* operator. The
scheduler on the receive node will only schedule *Recv* operator
to run when the *Send* operator has ran (the *Send* OP will mark
the *Recv* OP runnable automatically).
- *Enueue* enqueues the input variable, it can block until space
- *En
q
ueue* enqueues the input variable, it can block until space
become available in the queue.
- *Dequeue* outputs configurable numbers of tensors from the
queue. It will block until the queue ha
ve
the required number of
queue. It will block until the queue ha
s
the required number of
tensors.
### Benefits
- Model parallelism become
easier to implement: it'
s an extension to
- Model parallelism become
s easier to implement: it i
s an extension to
the trainer - parameter server approach. We can have several "Transpilers"
to achieve different goals.
- User-defined optimizer is easier to add - user can now express it as
...
...
@@ -72,22 +72,22 @@ After converting:
### Challenges
- It
's important to balance the parameter shards of
on multiple
parameter server
. If a single parameter is very big (
some
- It
is important to balance the parameter shards
on multiple
parameter server
s. If a single parameter is very big (for example:
some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
- In the "Aync SGD" figure, the "W" variable on the parameter server
could be read and wr
ote
concurrently. See
- In the "A
s
ync SGD" figure, the "W" variable on the parameter server
could be read and wr
itten
concurrently. See
[here](https://github.com/PaddlePaddle/Paddle/pull/6394) for more
details about concurrent program in
f
luid.
details about concurrent program in
F
luid.
### Discussion
- Can the Enqueue OP be implemented under our current tensor design
(put
s
the input tensor into the queue tensor)?
- *Dequeue* OP will have variable numbers of output (depend
s
on the
(put the input tensor into the queue tensor)?
- *Dequeue* OP will have variable numbers of output (depend
ing
on the
`min_count` attribute), does our current design support it? (similar
question for the *Add* OP)
...
...
develop/doc_cn/design/dist_refactor/parameter_server.html
浏览文件 @
bf0f4a21
...
...
@@ -239,15 +239,15 @@ different purposes.</p>
</div>
<div
class=
"section"
id=
"background"
>
<span
id=
"background"
></span><h2>
Background
<a
class=
"headerlink"
href=
"#background"
title=
"永久链接至标题"
>
¶
</a></h2>
<p>
The previous implementations of the parameter server do
es
not run a
<p>
The previous implementations of the parameter server do not run a
fluid sub-program. Parameter initialization, optimizer computation, network
communication and checkpointing are implemented twice on both the
trainer a
nd
the parameter server.
</p>
<p>
It would be great if we can write code once and use them on both the
trainer and the parameter server
:
reduces code duplication and
improves extensibility. Given that after the current refactor, we are
representing everything as a comput
ing
graph on the
trainer. Representing everything as a comput
ing
graph on the parameter
trainer a
s well as
the parameter server.
</p>
<p>
It would be great if we can write code once and use them on both
:
the
trainer and the parameter server
, since this
reduces code duplication and
improves extensibility. Given that after the current refactor
ing
, we are
representing everything as a comput
ation
graph on the
trainer. Representing everything as a comput
ation
graph on the parameter
server becomes a natural extension.
</p>
</div>
<div
class=
"section"
id=
"design"
>
...
...
@@ -259,9 +259,9 @@ into sub-programs to be scheduled on different nodes with the following
steps:
</p>
<ol
class=
"simple"
>
<li>
OP placement: the OPs will be placed on different nodes according
to
heuristic that minimizes
estimated total computation
to
a heuristic that minimizes the
estimated total computation
time. Currently we will use a simple heuristic that puts parameter
varable on parameter server workers and everything else on trainer
var
i
able on parameter server workers and everything else on trainer
workers.
</li>
<li>
Add communication OPs to enable the communication between nodes.
</li>
</ol>
...
...
@@ -272,16 +272,16 @@ subgraphs for the trainer and the parameter server:</p>
<p>
After converting:
</p>
<p><img
src=
"src/dist-graph.png"
width=
"700"
/></p>
<ol
class=
"simple"
>
<li>
The parameter variable W and it
’
s optimizer program are placed on the parameter server.
</li>
<li>
The parameter variable W and its optimizer program are placed on the parameter server.
</li>
<li>
Operators are added to the program.
<ul>
<li><em>
Send
</em>
sends data to the connected
<em>
Recv
</em>
operator. The
scheduler on the receive node will only schedule
<em>
Recv
</em>
operator
to run when the
<em>
Send
</em>
operator has ran (the
<em>
Send
</em>
OP will mark
the
<em>
Recv
</em>
OP runnable automatically).
</li>
<li><em>
Enueue
</em>
enqueues the input variable, it can block until space
<li><em>
En
q
ueue
</em>
enqueues the input variable, it can block until space
become available in the queue.
</li>
<li><em>
Dequeue
</em>
outputs configurable numbers of tensors from the
queue. It will block until the queue ha
ve
the required number of
queue. It will block until the queue ha
s
the required number of
tensors.
</li>
</ul>
</li>
...
...
@@ -290,7 +290,7 @@ tensors.</li>
<div
class=
"section"
id=
"benefits"
>
<span
id=
"benefits"
></span><h3>
Benefits
<a
class=
"headerlink"
href=
"#benefits"
title=
"永久链接至标题"
>
¶
</a></h3>
<ul
class=
"simple"
>
<li>
Model parallelism become
easier to implement: it
’
s an extension to
<li>
Model parallelism become
s easier to implement: it i
s an extension to
the trainer - parameter server approach. We can have several
“
Transpilers
”
to achieve different goals.
</li>
<li>
User-defined optimizer is easier to add - user can now express it as
...
...
@@ -302,24 +302,24 @@ server mentioned in the background section.</li>
<div
class=
"section"
id=
"challenges"
>
<span
id=
"challenges"
></span><h3>
Challenges
<a
class=
"headerlink"
href=
"#challenges"
title=
"永久链接至标题"
>
¶
</a></h3>
<ul
class=
"simple"
>
<li>
It
’
s important to balance the parameter shards of
on multiple
parameter server
. If a single parameter is very big (
some
<li>
It
is important to balance the parameter shards
on multiple
parameter server
s. If a single parameter is very big (for example:
some
word-embedding, fully connected, softmax layer), we need to
automatically partition the single parameter onto different
parameter servers when possible (only element-wise optimizer depends
on the parameter variable).
</li>
<li>
In the
“
Aync SGD
”
figure, the
“
W
”
variable on the parameter server
could be read and wr
ote
concurrently. See
<li>
In the
“
A
s
ync SGD
”
figure, the
“
W
”
variable on the parameter server
could be read and wr
itten
concurrently. See
<a
class=
"reference external"
href=
"https://github.com/PaddlePaddle/Paddle/pull/6394"
>
here
</a>
for more
details about concurrent program in
f
luid.
</li>
details about concurrent program in
F
luid.
</li>
</ul>
</div>
<div
class=
"section"
id=
"discussion"
>
<span
id=
"discussion"
></span><h3>
Discussion
<a
class=
"headerlink"
href=
"#discussion"
title=
"永久链接至标题"
>
¶
</a></h3>
<ul
class=
"simple"
>
<li>
Can the Enqueue OP be implemented under our current tensor design
(put
s
the input tensor into the queue tensor)?
</li>
<li><em>
Dequeue
</em>
OP will have variable numbers of output (depend
s
on the
(put the input tensor into the queue tensor)?
</li>
<li><em>
Dequeue
</em>
OP will have variable numbers of output (depend
ing
on the
<code
class=
"docutils literal"
><span
class=
"pre"
>
min_count
</span></code>
attribute), does our current design support it? (similar
question for the
<em>
Add
</em>
OP)
</li>
</ul>
...
...
develop/doc_cn/searchindex.js
浏览文件 @
bf0f4a21
因为 它太大了无法显示 source diff 。你可以改为
查看blob
。
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录