fluid.md 7.8 KB
Newer Older
Y
Yi Wang 已提交
1 2 3 4
# Design Doc: PaddlePaddle Fluid

## Why Fluid

5
When Baidu developed PaddlePaddle in 2013, the only well-known open source deep learning system at the time was Caffe.  However, when PaddlePaddle was open-sourced in 2016, many other choices were available. There was a challenge -- what is the need for open sourcing yet another deep learning framework?
Y
Yi Wang 已提交
6

7
Fluid is the answer.  Fluid is similar to PyTorch and TensorFlow Eager Execution, which describes the "process" of training or inference using the concept of a model.  In fact in PyTorch, TensorFlow Eager Execution and Fluid, there is no  concept of a model at all. The details are covered in the sections below. Fluid is currently more extreme in the above mentioned idea than PyTorch and Eager Execution, and we are trying to push Fluid towards the directions of a compiler and a new programming language for deep learning.
Y
Yi Wang 已提交
8 9 10

## The Evolution of Deep Learning Systems

11
Deep learning infrastructure is one of the fastest evolving technologies. Within four years, there have already been three generations of technologies invented.
Y
Yi Wang 已提交
12

_青葱's avatar
_青葱 已提交
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
<table>
<thead>
<tr>
<th>Existed since</th>
<th>model as sequence of layers</th>
<th>model as graph of operators</th>
<th>No model</th>
</tr>
</thead>
<tbody>
<tr>
<td>2013 </td>
<td>Caffe, Theano, Torch, PaddlePaddle </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>2015 </td>
<td> </td>
<td>TensorFlow, MxNet, Caffe2, ONNX, n-graph </td>
<td> </td>
</tr>
<tr>
<td>2016 </td>
<td> </td>
<td>   </td>
<td> PyTorch, TensorFlow Eager Execution, PaddlePaddle Fluid</td>
</tr>
</tbody>
</table>

Y
Yi Wang 已提交
44

45
From the above table, we see that the deep learning technology is evolving towards getting rid of the concept of a model.  To understand the reasons behind this direction, a comparison of the *programming paradigms* or the ways to program deep learning applications using these systems, would be helpful. The following section goes over these.
Y
Yi Wang 已提交
46 47 48

## Deep Learning Programming Paradigms

49
With the systems listed as the first or second generation, e.g., Caffe or TensorFlow, an AI application training program looks like the following:
Y
Yi Wang 已提交
50 51 52 53 54 55 56 57 58 59 60 61

```python
x = layer.data("image")
l = layer.data("label")
f = layer.fc(x, W)
s = layer.softmax(f)
c = layer.mse(l, s)

for i in xrange(1000): # train for 1000 iterations
    m = read_minibatch()
    forward({input=x, data=m}, minimize=c)
    backward(...)
62

Y
Yi Wang 已提交
63 64 65 66 67
print W # print the trained model parameters.
```

The above program includes two parts:

68 69
1. The first part describes the model, and
2. The second part describes the training process (or inference process) for the model.
Y
Yi Wang 已提交
70

71
This paradigm has a well-known problem that limits the productivity of programmers. If the programmer made a mistake in configuring the model, the error messages wouldn't show up until the second part is executed and `forward` and `backward` propagations are performed. This makes it difficult for the programmer to debug and locate a mistake that is located blocks away from the actual error prompt.
Y
Yi Wang 已提交
72

73
This problem of being hard to debug and re-iterate fast on a program is the primary reason that programmers, in general,  prefer PyTorch over the older systems.  Using PyTorch, we would write the above program as following:
Y
Yi Wang 已提交
74 75 76 77 78 79 80 81 82 83 84 85

```python
W = tensor(...)

for i in xrange(1000): # train for 1000 iterations
    m = read_minibatch()
    x = m["image"]
    l = m["label"]
    f = layer.fc(x, W)
    s = layer.softmax(f)
    c = layer.mse(l, s)
    backward()
86

Y
Yi Wang 已提交
87 88 89
print W # print the trained model parameters.
```

90
We can see that the main difference is the moving the model configuration part (the first step) into the training loop.  This change would allow the mistakes in model configuration to be reported where they actually appear in the programming block.  This change also represents the model better, or its forward pass, by keeping the configuration process in the training loop.
Y
Yi Wang 已提交
91 92 93

## Describe Arbitrary Models for the Future

94
Describing the process instead of the model also brings Fluid, the flexibility to define different non-standard models that haven't been invented yet.
Y
Yi Wang 已提交
95

96
As we write out the program for the process, we can write an RNN as a loop, instead of an RNN as a layer or as an operator.  A PyTorch example would look like the following:
Y
Yi Wang 已提交
97 98 99 100 101 102 103 104 105

```python
for i in xrange(1000):
    m = read_minibatch()
    x = m["sentence"]
    for t in xrange x.len():
        h[t] = the_step(x[t])
```        

106
With Fluid, the training loop and the RNN in the above program are not really Python loops, but just a "loop structure" provided by Fluid and implemented in C++ as the following:
Y
Yi Wang 已提交
107 108 109 110 111 112 113 114 115 116 117

```python
train_loop = layers.While(cond)
with train_loop.block():
  m = read_minibatch()
  x = m["sentence"]
  rnn = layers.While(...)
  with rnn.block():
    h[t] = the_step(input[t])
```    

118
An actual Fluid example is described  [here](https://github.com/PaddlePaddle/Paddle/blob/bde090a97564b9c61a6aaa38b72ccc4889d102d9/python/paddle/fluid/tests/unittests/test_while_op.py#L50-L58).
Y
Yi Wang 已提交
119

120
From the example, the Fluid programs look very similar to their PyTorch equivalent programs, except that Fluid's loop structure, wrapped with Python's `with` statement, could run much faster than just a Python loop.
Y
Yi Wang 已提交
121

122
We have more examples of the [`if-then-else`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/execution/if_else_op.md) structure of Fluid.
Y
Yi Wang 已提交
123 124 125

## Turing Completeness

126
In computability theory, a system of data-manipulation rules, such as a programming language, is said to be Turing complete if it can be used to simulate any Turing machine.  For a programming language, if it provides if-then-else and loop, it is Turing complete.  From the above examples, Fluid seems to be Turing complete; however, it is noteworthy to notice that there  is a slight difference between the `if-then-else` of Fluid and that of a programming language. The difference being that the former runs both of its branches and splits the input mini-batch into two -- one for the True condition and another for the False condition. This hasn't been researched in depth if this is equivalent to the `if-then-else` in programming languages that makes them Turing-complete.  Based on a conversation with [Yuang Yu](https://research.google.com/pubs/104812.html), it seems to be the case but this needs to be looked into in-depth.
Y
Yi Wang 已提交
127 128 129

## The Execution of a Fluid Program

130
There are two ways to execute a Fluid program.  When a program is executed, it creates a protobuf message [`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and is conceptually like an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
Y
Yi Wang 已提交
131

S
Shan Yi 已提交
132
There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program.
Y
Yi Wang 已提交
133

Y
Yi Wang 已提交
134
Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md).
Y
Yi Wang 已提交
135

136
## Backward Compatibility of Fluid
Y
Yi Wang 已提交
137

138
Given all the advantages from the removal of the concept of a *model*, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference.  For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph).  Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators.  The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators.
Y
Yi Wang 已提交
139

140
For Fluid, we can write a converter that extracts the parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.