提交 2dc5c69e 编写于 作者: Y Yi Wang 提交者: dzhwinter

Add Fluid Compiler design doc (#7178)

* Add fluid_compiler.md

* Paragraphing
上级 eee62648
......@@ -105,18 +105,10 @@ There are two ways to execute a Fluid program. When a program is executed, it c
There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program.
Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.
Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md).
## Backward Compatibility of Fluid
Given all the advantages from the removal of the concept of a *model*, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference. For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph). Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators. The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators.
For Fluid, we can write a converter that extracts the parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.
## Towards a Deep Learning Language and the Compiler
We can change the `if-then-else` and loop structure a little bit in the above Fluid example programs, to make it into a new programming language, different than Python.
Even if we do not invent a new language, as long as we get the `ProgramDesc` message filled in, we can write a transpiler, which translates each invocation to an operator, into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using `nvcc`. Another transpiler could generate MKL-friendly code that should be built using `icc` from Intel. More interestingly, we can translate a Fluid program into its distributed version of two `ProgramDesc` messages, one for running on the trainer process, and the other one for the parameter server. For more details of the last example, the [concurrent programming design](concurrent_programming.md) document would be a good pointer. The following figure explains the proposed two-stage process:
![](fluid-compiler.png)
# PaddlePaddle Fluid: Towards a Compiled Programming Language
As described in [fluid.md](fluid.md), when a Fluid application program
runs, it generates a `ProgramDesc` protobuf message as an intermediate
representation of itself. The C++ class `Executor` can run this
protobuf message as an interpreter. This article describes the Fluid
compiler.
![](fluid-compiler.png)
## ProgramDesc
Before we go deeper into the idea of compiled language, let us take a
look at a simple example Fluid application.
```python
import "fluid"
func paddlepaddle() {
X = fluid.read(...)
W = fluid.Tensor(...)
Y = fluid.mult(X, W)
}
```
This program consists of a [block](block.md) of three operators --
`read`, `assign`, and `mult`. Its `ProgramDesc` message looks like
the following
```protobuf
message ProgramDesc {
block[0] = Block {
vars = [X, W, Y],
ops = [
read(output = X)
assign(input = ..., output = W)
mult(input = {X, W}, output = Y)
],
}
}
```
## Transpilers
We can write a transpiler program that takes a `ProgramDesc`, e.g.,
the above one, and outputs another `ProgramDesc`. Let us take some
examples:
1. *Memory optimization transpiler*: We can write a transpiler that
inserts some `FreeMemoryOp`s in the above example `ProgramDesc` so
to free memory early, before the end of an iteration, so to keep a
small memory footprint.
1. *Distributed training transpiler*: We can write a transpiler that
converts a`ProgramDesc` into its distributed version of two
`ProgramDesc`s -- one for running by the trainer processes and the
other for the parameter server.
In the rest of this article, we talk about a special kind of
transpiler, *Native code generator*, which takes a `ProgramDesc` and
generates a `.cu` (or `.cc`) file, which could be built by C++
compilers (gcc, nvcc, icc) into binaries.
## Native Code Generator
For the above example, the native code generator transpiler, say, the
CUDA code generator, should generate a `main` function:
```c++
void main() {
auto X = fluid_cuda_read(...);
auto W = fluid_cuda_create_tensor(...);
auto Y = fluid_cuda_mult(X, W);
}
```
and the definitions of functions `fluid_cuda_read`,
`fluid_cuda_create_tensor`, and `fluid_cuda_mult`. Please be aware
that each function could just define a C++ instance of an operator and
run it. For example
```c++
paddle::Tensor fluid_cuda_read(...) {
paddle::Tensor t;
paddle::operator::Read r(&t, ...);
r.Run();
return t;
}
```
For computational operators that have multiple *kernels*, each for a
specific hardware platform, for example, the `mult` operator, the
generated code should call its CUDA kernel:
```c++
paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a,
const paddle::Tensor& b) {
paddle::Tensor t;
paddle::operator::Mult m(a, b, ...);
Mult.Run(cuda_context);
}
```
where `cuda_context` could be a global variable of type
`paddle::CUDADeviceContext`.
## Multi-Block Code Generation
Most Fluid application programs may have more than one blocks. To
execute them, we need to trace [scopes](scope.md).
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册