Deploy to GitHub Pages: cd775a13

c5a3067a · Travis CI · f2cb48f1 · c5a3067a · c5a3067a · c5a3067a
12 changed file
--- a/develop/doc/_sources/design/fluid.md.txt
+++ b/develop/doc/_sources/design/fluid.md.txt
@@ -105,18 +105,10 @@ There are two ways to execute a Fluid program.  When a program is executed, it c
 There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program.
-Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.
+Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md).
 ## Backward Compatibility of Fluid
 Given all the advantages from the removal of the concept of a *model*, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference.  For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph).  Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators.  The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators.
 For Fluid, we can write a converter that extracts the parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.
-## Towards a Deep Learning Language and the Compiler
-We can change the `if-then-else` and loop structure a little bit in the above Fluid example programs, to make it into a new programming language, different than Python.
-Even if we do not invent a new language, as long as we get the `ProgramDesc` message filled in, we can write a transpiler, which translates each invocation to an operator, into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using `nvcc`.  Another transpiler could generate MKL-friendly code that should be built using `icc` from Intel.  More interestingly, we can translate a Fluid program into its distributed version of two `ProgramDesc` messages, one for running on the trainer process, and the other one for the parameter server.  For more details of the last example, the [concurrent programming design](concurrent_programming.md) document would be a good pointer.  The following figure explains the proposed two-stage process:
-![](fluid-compiler.png)
--- a/develop/doc/_sources/design/fluid_compiler.md.txt
+++ b/develop/doc/_sources/design/fluid_compiler.md.txt
+# PaddlePaddle Fluid: Towards a Compiled Programming Language
+As described in [fluid.md](fluid.md), when a Fluid application program
+runs, it generates a `ProgramDesc` protobuf message as an intermediate
+representation of itself.  The C++ class `Executor` can run this
+protobuf message as an interpreter.  This article describes the Fluid
+compiler.
+![](fluid-compiler.png)
+## ProgramDesc
+Before we go deeper into the idea of compiled language, let us take a
+look at a simple example Fluid application.
+```python
+import "fluid"
+func paddlepaddle() {
+  X = fluid.read(...)
+  W = fluid.Tensor(...)
+  Y = fluid.mult(X, W)
+}
+```
+This program consists of a [block](block.md) of three operators --
+`read`, `assign`, and `mult`.  Its `ProgramDesc` message looks like
+the following
+```protobuf
+message ProgramDesc {
+  block[0] = Block {
+    vars = [X, W, Y],
+    ops = [
+      read(output = X)
+      assign(input = ..., output = W)
+      mult(input = {X, W}, output = Y)
+    ],
+  }
+}
+```
+## Transpilers
+We can write a transpiler program that takes a `ProgramDesc`, e.g.,
+the above one, and outputs another `ProgramDesc`.  Let us take some
+examples:
+1. *Memory optimization transpiler*: We can write a transpiler that
+   inserts some `FreeMemoryOp`s in the above example `ProgramDesc` so
+   to free memory early, before the end of an iteration, so to keep a
+   small memory footprint.
+1. *Distributed training transpiler*: We can write a transpiler that
+   converts a`ProgramDesc` into its distributed version of two
+   `ProgramDesc`s -- one for running by the trainer processes and the
+   other for the parameter server.
+In the rest of this article, we talk about a special kind of
+transpiler, *Native code generator*, which takes a `ProgramDesc` and
+generates a `.cu` (or `.cc`) file, which could be built by C++
+compilers (gcc, nvcc, icc) into binaries.
+## Native Code Generator
+For the above example, the native code generator transpiler, say, the
+CUDA code generator, should generate a `main` function:
+```c++
+void main() {
+  auto X = fluid_cuda_read(...);
+  auto W = fluid_cuda_create_tensor(...);
+  auto Y = fluid_cuda_mult(X, W);
+}
+```
+and the definitions of functions `fluid_cuda_read`,
+`fluid_cuda_create_tensor`, and `fluid_cuda_mult`.  Please be aware
+that each function could just define a C++ instance of an operator and
+run it.  For example
+```c++
+paddle::Tensor fluid_cuda_read(...) {
+  paddle::Tensor t;
+  paddle::operator::Read r(&t, ...);
+  r.Run();
+  return t;
+}
+```
+For computational operators that have multiple *kernels*, each for a
+specific hardware platform, for example, the `mult` operator, the
+generated code should call its CUDA kernel:
+```c++
+paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a, 
+                               const paddle::Tensor& b) {
+  paddle::Tensor t;
+  paddle::operator::Mult m(a, b, ...);
+  Mult.Run(cuda_context);
+}
+```
+where `cuda_context` could be a global variable of type
+`paddle::CUDADeviceContext`.
+## Multi-Block Code Generation
+Most Fluid application programs may have more than one blocks.  To
+execute them, we need to trace [scopes](scope.md).
--- a/develop/doc/design/fluid.html
+++ b/develop/doc/design/fluid.html
@@ -299,19 +299,13 @@
 <span id="the-execution-of-a-fluid-program"></span><h2>The Execution of a Fluid Program<a class="headerlink" href="#the-execution-of-a-fluid-program" title="Permalink to this headline">¶</a></h2>
 <p>There are two ways to execute a Fluid program.  When a program is executed, it creates a protobuf message <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145"><code class="docutils literal"><span class="pre">ProgramDesc</span></code></a> that describes the process and is conceptually like an <a class="reference external" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a>.</p>
 <p>There is a C++ class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h"><code class="docutils literal"><span class="pre">Executor</span></code></a>, which runs a <code class="docutils literal"><span class="pre">ProgramDesc</span></code>, similar to how an interpreter runs a Python program.</p>
-<p>Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.</p>
+<p>Fluid is moving towards the direction of a compiler, which is explain in <a class="reference internal" href="fluid_compiler.html"><span class="doc">fluid</span></a>.</p>
 </div>
 <div class="section" id="backward-compatibility-of-fluid">
 <span id="backward-compatibility-of-fluid"></span><h2>Backward Compatibility of Fluid<a class="headerlink" href="#backward-compatibility-of-fluid" title="Permalink to this headline">¶</a></h2>
 <p>Given all the advantages from the removal of the concept of a <em>model</em>, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference.  For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as <a class="reference external" href="https://github.com/NervanaSystems/ngraph">n-graph</a>.  Similarly, <a class="reference external" href="https://www.movidius.com/">Movidius</a> is producing a mobile deep learning chip that reads and runs graphs of operators.  The well-known <a class="reference external" href="https://github.com/onnx/onnx">ONNX</a> is also a file format of graphs of operators.</p>
 <p>For Fluid, we can write a converter that extracts the parts in the <code class="docutils literal"><span class="pre">ProgramDesc</span></code> protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.</p>
 </div>
-<div class="section" id="towards-a-deep-learning-language-and-the-compiler">
-<span id="towards-a-deep-learning-language-and-the-compiler"></span><h2>Towards a Deep Learning Language and the Compiler<a class="headerlink" href="#towards-a-deep-learning-language-and-the-compiler" title="Permalink to this headline">¶</a></h2>
-<p>We can change the <code class="docutils literal"><span class="pre">if-then-else</span></code> and loop structure a little bit in the above Fluid example programs, to make it into a new programming language, different than Python.</p>
-<p>Even if we do not invent a new language, as long as we get the <code class="docutils literal"><span class="pre">ProgramDesc</span></code> message filled in, we can write a transpiler, which translates each invocation to an operator, into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using <code class="docutils literal"><span class="pre">nvcc</span></code>.  Another transpiler could generate MKL-friendly code that should be built using <code class="docutils literal"><span class="pre">icc</span></code> from Intel.  More interestingly, we can translate a Fluid program into its distributed version of two <code class="docutils literal"><span class="pre">ProgramDesc</span></code> messages, one for running on the trainer process, and the other one for the parameter server.  For more details of the last example, the <a class="reference internal" href="concurrent_programming.html"><span class="doc">concurrent programming design</span></a> document would be a good pointer.  The following figure explains the proposed two-stage process:</p>
-<p><img alt="" src="../_images/fluid-compiler.png" /></p>
-</div>
 </div>

--- a/develop/doc/design/fluid_compiler.html
+++ b/develop/doc/design/fluid_compiler.html
--- a/develop/doc/objects.inv
+++ b/develop/doc/objects.inv
--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_sources/design/fluid.md.txt
+++ b/develop/doc_cn/_sources/design/fluid.md.txt
@@ -105,18 +105,10 @@ There are two ways to execute a Fluid program.  When a program is executed, it c
 There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program.
-Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.
+Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md).
 ## Backward Compatibility of Fluid
 Given all the advantages from the removal of the concept of a *model*, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference.  For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph).  Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators.  The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators.
 For Fluid, we can write a converter that extracts the parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.
-## Towards a Deep Learning Language and the Compiler
-We can change the `if-then-else` and loop structure a little bit in the above Fluid example programs, to make it into a new programming language, different than Python.
-Even if we do not invent a new language, as long as we get the `ProgramDesc` message filled in, we can write a transpiler, which translates each invocation to an operator, into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using `nvcc`.  Another transpiler could generate MKL-friendly code that should be built using `icc` from Intel.  More interestingly, we can translate a Fluid program into its distributed version of two `ProgramDesc` messages, one for running on the trainer process, and the other one for the parameter server.  For more details of the last example, the [concurrent programming design](concurrent_programming.md) document would be a good pointer.  The following figure explains the proposed two-stage process:
-![](fluid-compiler.png)
--- a/develop/doc_cn/_sources/design/fluid_compiler.md.txt
+++ b/develop/doc_cn/_sources/design/fluid_compiler.md.txt
+# PaddlePaddle Fluid: Towards a Compiled Programming Language
+As described in [fluid.md](fluid.md), when a Fluid application program
+runs, it generates a `ProgramDesc` protobuf message as an intermediate
+representation of itself.  The C++ class `Executor` can run this
+protobuf message as an interpreter.  This article describes the Fluid
+compiler.
+![](fluid-compiler.png)
+## ProgramDesc
+Before we go deeper into the idea of compiled language, let us take a
+look at a simple example Fluid application.
+```python
+import "fluid"
+func paddlepaddle() {
+  X = fluid.read(...)
+  W = fluid.Tensor(...)
+  Y = fluid.mult(X, W)
+}
+```
+This program consists of a [block](block.md) of three operators --
+`read`, `assign`, and `mult`.  Its `ProgramDesc` message looks like
+the following
+```protobuf
+message ProgramDesc {
+  block[0] = Block {
+    vars = [X, W, Y],
+    ops = [
+      read(output = X)
+      assign(input = ..., output = W)
+      mult(input = {X, W}, output = Y)
+    ],
+  }
+}
+```
+## Transpilers
+We can write a transpiler program that takes a `ProgramDesc`, e.g.,
+the above one, and outputs another `ProgramDesc`.  Let us take some
+examples:
+1. *Memory optimization transpiler*: We can write a transpiler that
+   inserts some `FreeMemoryOp`s in the above example `ProgramDesc` so
+   to free memory early, before the end of an iteration, so to keep a
+   small memory footprint.
+1. *Distributed training transpiler*: We can write a transpiler that
+   converts a`ProgramDesc` into its distributed version of two
+   `ProgramDesc`s -- one for running by the trainer processes and the
+   other for the parameter server.
+In the rest of this article, we talk about a special kind of
+transpiler, *Native code generator*, which takes a `ProgramDesc` and
+generates a `.cu` (or `.cc`) file, which could be built by C++
+compilers (gcc, nvcc, icc) into binaries.
+## Native Code Generator
+For the above example, the native code generator transpiler, say, the
+CUDA code generator, should generate a `main` function:
+```c++
+void main() {
+  auto X = fluid_cuda_read(...);
+  auto W = fluid_cuda_create_tensor(...);
+  auto Y = fluid_cuda_mult(X, W);
+}
+```
+and the definitions of functions `fluid_cuda_read`,
+`fluid_cuda_create_tensor`, and `fluid_cuda_mult`.  Please be aware
+that each function could just define a C++ instance of an operator and
+run it.  For example
+```c++
+paddle::Tensor fluid_cuda_read(...) {
+  paddle::Tensor t;
+  paddle::operator::Read r(&t, ...);
+  r.Run();
+  return t;
+}
+```
+For computational operators that have multiple *kernels*, each for a
+specific hardware platform, for example, the `mult` operator, the
+generated code should call its CUDA kernel:
+```c++
+paddle::Tensor fluid_cuda_mult(const paddle::Tensor& a, 
+                               const paddle::Tensor& b) {
+  paddle::Tensor t;
+  paddle::operator::Mult m(a, b, ...);
+  Mult.Run(cuda_context);
+}
+```
+where `cuda_context` could be a global variable of type
+`paddle::CUDADeviceContext`.
+## Multi-Block Code Generation
+Most Fluid application programs may have more than one blocks.  To
+execute them, we need to trace [scopes](scope.md).
--- a/develop/doc_cn/design/fluid.html
+++ b/develop/doc_cn/design/fluid.html
@@ -318,19 +318,13 @@
 <span id="the-execution-of-a-fluid-program"></span><h2>The Execution of a Fluid Program<a class="headerlink" href="#the-execution-of-a-fluid-program" title="永久链接至标题">¶</a></h2>
 <p>There are two ways to execute a Fluid program.  When a program is executed, it creates a protobuf message <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145"><code class="docutils literal"><span class="pre">ProgramDesc</span></code></a> that describes the process and is conceptually like an <a class="reference external" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a>.</p>
 <p>There is a C++ class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h"><code class="docutils literal"><span class="pre">Executor</span></code></a>, which runs a <code class="docutils literal"><span class="pre">ProgramDesc</span></code>, similar to how an interpreter runs a Python program.</p>
-<p>Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.</p>
+<p>Fluid is moving towards the direction of a compiler, which is explain in <a class="reference internal" href="fluid_compiler.html"><span class="doc">fluid</span></a>.</p>
 </div>
 <div class="section" id="backward-compatibility-of-fluid">
 <span id="backward-compatibility-of-fluid"></span><h2>Backward Compatibility of Fluid<a class="headerlink" href="#backward-compatibility-of-fluid" title="永久链接至标题">¶</a></h2>
 <p>Given all the advantages from the removal of the concept of a <em>model</em>, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference.  For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as <a class="reference external" href="https://github.com/NervanaSystems/ngraph">n-graph</a>.  Similarly, <a class="reference external" href="https://www.movidius.com/">Movidius</a> is producing a mobile deep learning chip that reads and runs graphs of operators.  The well-known <a class="reference external" href="https://github.com/onnx/onnx">ONNX</a> is also a file format of graphs of operators.</p>
 <p>For Fluid, we can write a converter that extracts the parts in the <code class="docutils literal"><span class="pre">ProgramDesc</span></code> protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.</p>
 </div>
-<div class="section" id="towards-a-deep-learning-language-and-the-compiler">
-<span id="towards-a-deep-learning-language-and-the-compiler"></span><h2>Towards a Deep Learning Language and the Compiler<a class="headerlink" href="#towards-a-deep-learning-language-and-the-compiler" title="永久链接至标题">¶</a></h2>
-<p>We can change the <code class="docutils literal"><span class="pre">if-then-else</span></code> and loop structure a little bit in the above Fluid example programs, to make it into a new programming language, different than Python.</p>
-<p>Even if we do not invent a new language, as long as we get the <code class="docutils literal"><span class="pre">ProgramDesc</span></code> message filled in, we can write a transpiler, which translates each invocation to an operator, into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using <code class="docutils literal"><span class="pre">nvcc</span></code>.  Another transpiler could generate MKL-friendly code that should be built using <code class="docutils literal"><span class="pre">icc</span></code> from Intel.  More interestingly, we can translate a Fluid program into its distributed version of two <code class="docutils literal"><span class="pre">ProgramDesc</span></code> messages, one for running on the trainer process, and the other one for the parameter server.  For more details of the last example, the <a class="reference internal" href="concurrent_programming.html"><span class="doc">concurrent programming design</span></a> document would be a good pointer.  The following figure explains the proposed two-stage process:</p>
-<p><img alt="" src="../_images/fluid-compiler.png" /></p>
-</div>
 </div>

--- a/develop/doc_cn/design/fluid_compiler.html
+++ b/develop/doc_cn/design/fluid_compiler.html
--- a/develop/doc_cn/objects.inv
+++ b/develop/doc_cn/objects.inv
--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js