Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleDetection
提交
253d3c49
P
PaddleDetection
项目概览
PaddlePaddle
/
PaddleDetection
1 年多 前同步成功
通知
696
Star
11112
Fork
2696
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
184
列表
看板
标记
里程碑
合并请求
40
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleDetection
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
184
Issue
184
列表
看板
标记
里程碑
合并请求
40
合并请求
40
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
253d3c49
编写于
9月 01, 2017
作者:
H
Helin Wang
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Design Doc: Fully Static Graph
上级
7bcb1fc3
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
122 addition
and
0 deletion
+122
-0
doc/design/fully_static_graph.md
doc/design/fully_static_graph.md
+122
-0
未找到文件。
doc/design/fully_static_graph.md
0 → 100644
浏览文件 @
253d3c49
# Design Doc: Fully Static Graph
## Abstract
We propose the
*fully static graph*
rule: training and inference must
be fully specified by the static graph. This means training and
inference should be able to run solely on the cpp core (no Python
involved), everything should be implemented as an OP.
The user can still use Python to achieve the same result for
convenience when experimenting locally, but the distributed training
will not support Python.
## Background
There are two paradigms for expressing the computation graph: dynamic
and static. The dynamic paradigm constructs the graph on the fly:
every time
`eval`
is called, a new graph is created. The static
paradigm constructs the graph first, and then calls
`eval`
. There is
no new graph created each time
`eval`
is called.
The dynamic graph has the advantage of being flexible but is highly
dependent on the host language (most commonly Python). The static
graph is not as flexible, but more optimization can be done since the
graph is known before computing happens. PaddlePaddle is using the
static graph approach since we are focused on production deployment
and cluster training, efficiency is the key.
This design doc is trying to address an important question for the
static graph approach: should the training logic be fully specified by
the static graph?
For example, it's common to control the graph evaluation from Python:
```
Python
for i in range(10000):
paddle.eval(train_op)
```
In the above example: the training logic is not fully specified by the
graph: Python still take the control of the training logic.
## Fully Static Graph
The training logic should be fully specified by the graph (but we
still support controlling the graph evaluation from Python). Because
Python adds complication for distributed training:
-
The distributed training engine needs to place the computation graph
onto different nodes, and add communication OPs for data across node
boundaries. They are very hard to do if the training logic is not
fully specified by the graph.
-
For fault recovery, every runtime state needs to be saved. But the
state in Python code (such as training loop index and data reader
position) could not be saved.
-
Allowing executing arbitrary Python code on Paddle Cloud make
training data safety very hard if not impossible to control.
### Benefits
-
A clear separation between graph declaration (current using Python)
and graph execution. It's easier for us to add a new language
binding (or invent our own deep learning graph specification
language).
-
Local or distributed graph execution is easier to optimize.
-
Much easier to ensure training data safety on Paddle Cloud.
### Example
To give a concrete example, for loop is essential for the training:
with every loop, a new mini-batch is fed into the training
system. Under the fully static graph rule, we
**must**
implement the for
loop as an OP:
```
Python
# pseudo code, we need to discuss the for loop interface
i = pd.Variable(0)
optimizer = paddle.op.Adam()
# specify the input file as the argument, or
# leave blank and specify using config when running on Paddle Cloud
input = paddle.op.recordIO("/home/data/input.recordio")
q_x, q_y = input[0], input[1]
loss = pd.op.square(pd.op.sub(pd.op.add(pd.op.mul(x, w), b), y))
def cond(i):
return i < 10000
with pd.for_loop(cond, [i]) as loop
# Dequeue a new example each iteration.
x = q_x.dequeue()
y = q_y.dequeue()
optimizer.minimize(loss)
pd.add(i, 1)
# or paddle.save_target(loop, "job.bin") and
# submit the saved file to Paddle Cloud.
paddle.eval(loop)
```
The above code can run on both locally and on Paddle Cloud.
For user's convenience, he can use the Python for loop:
```
Python
optimizer = paddle.op.Adam()
input = paddle.op.recordIO("/home/data/input.recordio")
q_x, q_y = input[0], input[1]
x = q_x.dequeue()
y = q_y.dequeue()
loss = pd.op.square(pd.op.sub(pd.op.add(pd.op.mul(x, w), b), y))
train_op = optimizer.minimize(loss)
for i in range(10000):
paddle.eval(train_op)
```
The above code can only run locally.
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录