fluid_compiler.md 3.1 KB
Newer Older
Y
Yi Wang 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
# PaddlePaddle Fluid: Towards a Compiled Programming Language

As described in [fluid.md](fluid.md), when a Fluid application program
runs, it generates a `ProgramDesc` protobuf message as an intermediate
representation of itself.  The C++ class `Executor` can run this
protobuf message as an interpreter.  This article describes the Fluid
compiler.

![](fluid-compiler.png)

## ProgramDesc

Before we go deeper into the idea of compiled language, let us take a
look at a simple example Fluid application.

```python
import "fluid"

func paddlepaddle() {
  X = fluid.read(...)
  W = fluid.Tensor(...)
  Y = fluid.mult(X, W)
}
```

W
weixing02 已提交
26
This program consists of a [block](../concepts/block.md) of three operators --
Y
Yi Wang 已提交
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
`read`, `assign`, and `mult`.  Its `ProgramDesc` message looks like
the following

```protobuf
message ProgramDesc {
  block[0] = Block {
    vars = [X, W, Y],
    ops = [
      read(output = X)
      assign(input = ..., output = W)
      mult(input = {X, W}, output = Y)
    ],
  }
}
```
W
weixing02 已提交
42

Y
Yi Wang 已提交
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
## Transpilers

We can write a transpiler program that takes a `ProgramDesc`, e.g.,
the above one, and outputs another `ProgramDesc`.  Let us take some
examples:

1. *Memory optimization transpiler*: We can write a transpiler that
   inserts some `FreeMemoryOp`s in the above example `ProgramDesc` so
   to free memory early, before the end of an iteration, so to keep a
   small memory footprint.

1. *Distributed training transpiler*: We can write a transpiler that
   converts a`ProgramDesc` into its distributed version of two
   `ProgramDesc`s -- one for running by the trainer processes and the
   other for the parameter server.

In the rest of this article, we talk about a special kind of
transpiler, *Native code generator*, which takes a `ProgramDesc` and
generates a `.cu` (or `.cc`) file, which could be built by C++
compilers (gcc, nvcc, icc) into binaries.

## Native Code Generator

For the above example, the native code generator transpiler, say, the
CUDA code generator, should generate a `main` function:

```c++
void main() {
  auto X = fluid_cuda_read(...);
  auto W = fluid_cuda_create_tensor(...);
  auto Y = fluid_cuda_mult(X, W);
}
```

and the definitions of functions `fluid_cuda_read`,
`fluid_cuda_create_tensor`, and `fluid_cuda_mult`.  Please be aware
that each function could just define a C++ instance of an operator and
run it.  For example

```c++
paddle::Tensor fluid_cuda_read(...) {
  paddle::Tensor t;
  paddle::operator::Read r(&t, ...);
  r.Run();
  return t;
}
```

For computational operators that have multiple *kernels*, each for a
specific hardware platform, for example, the `mult` operator, the
generated code should call its CUDA kernel:

```c++
W
weixing02 已提交
96
paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a,
Y
Yi Wang 已提交
97 98 99 100 101 102 103 104 105 106 107 108 109
                               const paddle::Tensor& b) {
  paddle::Tensor t;
  paddle::operator::Mult m(a, b, ...);
  Mult.Run(cuda_context);
}
```

where `cuda_context` could be a global variable of type
`paddle::CUDADeviceContext`.

## Multi-Block Code Generation

Most Fluid application programs may have more than one blocks.  To
W
weixing02 已提交
110
execute them, we need to trace [scopes](../concepts/scope.md).