@@ -105,18 +105,10 @@ There are two ways to execute a Fluid program. When a program is executed, it c
There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program.
Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.
Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md).
## Backward Compatibility of Fluid
Given all the advantages from the removal of the concept of a *model*, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference. For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph). Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators. The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators.
For Fluid, we can write a converter that extracts the parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.
## Towards a Deep Learning Language and the Compiler
We can change the `if-then-else` and loop structure a little bit in the above Fluid example programs, to make it into a new programming language, different than Python.
Even if we do not invent a new language, as long as we get the `ProgramDesc` message filled in, we can write a transpiler, which translates each invocation to an operator, into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using `nvcc`. Another transpiler could generate MKL-friendly code that should be built using `icc` from Intel. More interestingly, we can translate a Fluid program into its distributed version of two `ProgramDesc` messages, one for running on the trainer process, and the other one for the parameter server. For more details of the last example, the [concurrent programming design](concurrent_programming.md) document would be a good pointer. The following figure explains the proposed two-stage process:
<spanid="the-execution-of-a-fluid-program"></span><h2>The Execution of a Fluid Program<aclass="headerlink"href="#the-execution-of-a-fluid-program"title="Permalink to this headline">¶</a></h2>
<p>There are two ways to execute a Fluid program. When a program is executed, it creates a protobuf message <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145"><codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code></a> that describes the process and is conceptually like an <aclass="reference external"href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a>.</p>
<p>There is a C++ class <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h"><codeclass="docutils literal"><spanclass="pre">Executor</span></code></a>, which runs a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code>, similar to how an interpreter runs a Python program.</p>
<p>Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.</p>
<p>Fluid is moving towards the direction of a compiler, which is explain in <aclass="reference internal"href="fluid_compiler.html"><spanclass="doc">fluid</span></a>.</p>
<spanid="backward-compatibility-of-fluid"></span><h2>Backward Compatibility of Fluid<aclass="headerlink"href="#backward-compatibility-of-fluid"title="Permalink to this headline">¶</a></h2>
<p>Given all the advantages from the removal of the concept of a <em>model</em>, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference. For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as <aclass="reference external"href="https://github.com/NervanaSystems/ngraph">n-graph</a>. Similarly, <aclass="reference external"href="https://www.movidius.com/">Movidius</a> is producing a mobile deep learning chip that reads and runs graphs of operators. The well-known <aclass="reference external"href="https://github.com/onnx/onnx">ONNX</a> is also a file format of graphs of operators.</p>
<p>For Fluid, we can write a converter that extracts the parts in the <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.</p>
<spanid="towards-a-deep-learning-language-and-the-compiler"></span><h2>Towards a Deep Learning Language and the Compiler<aclass="headerlink"href="#towards-a-deep-learning-language-and-the-compiler"title="Permalink to this headline">¶</a></h2>
<p>We can change the <codeclass="docutils literal"><spanclass="pre">if-then-else</span></code> and loop structure a little bit in the above Fluid example programs, to make it into a new programming language, different than Python.</p>
<p>Even if we do not invent a new language, as long as we get the <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> message filled in, we can write a transpiler, which translates each invocation to an operator, into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using <codeclass="docutils literal"><spanclass="pre">nvcc</span></code>. Another transpiler could generate MKL-friendly code that should be built using <codeclass="docutils literal"><spanclass="pre">icc</span></code> from Intel. More interestingly, we can translate a Fluid program into its distributed version of two <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> messages, one for running on the trainer process, and the other one for the parameter server. For more details of the last example, the <aclass="reference internal"href="concurrent_programming.html"><spanclass="doc">concurrent programming design</span></a> document would be a good pointer. The following figure explains the proposed two-stage process:</p>
<spanid="paddlepaddle-fluid-towards-a-compiled-programming-language"></span><h1>PaddlePaddle Fluid: Towards a Compiled Programming Language<aclass="headerlink"href="#paddlepaddle-fluid-towards-a-compiled-programming-language"title="Permalink to this headline">¶</a></h1>
<p>As described in <aclass="reference internal"href="fluid.html"><spanclass="doc">fluid.md</span></a>, when a Fluid application program
runs, it generates a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> protobuf message as an intermediate
representation of itself. The C++ class <codeclass="docutils literal"><spanclass="pre">Executor</span></code> can run this
protobuf message as an interpreter. This article describes the Fluid
<p>This program consists of a <aclass="reference internal"href="block.html"><spanclass="doc">block</span></a> of three operators –
<codeclass="docutils literal"><spanclass="pre">read</span></code>, <codeclass="docutils literal"><spanclass="pre">assign</span></code>, and <codeclass="docutils literal"><spanclass="pre">mult</span></code>. Its <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> message looks like
<spanid="transpilers"></span><h2>Transpilers<aclass="headerlink"href="#transpilers"title="Permalink to this headline">¶</a></h2>
<p>We can write a transpiler program that takes a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code>, e.g.,
the above one, and outputs another <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code>. Let us take some
examples:</p>
<olclass="simple">
<li><em>Memory optimization transpiler</em>: We can write a transpiler that
inserts some <codeclass="docutils literal"><spanclass="pre">FreeMemoryOp</span></code>s in the above example <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> so
to free memory early, before the end of an iteration, so to keep a
small memory footprint.</li>
<li><em>Distributed training transpiler</em>: We can write a transpiler that
converts a<codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> into its distributed version of two
<codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code>s – one for running by the trainer processes and the
other for the parameter server.</li>
</ol>
<p>In the rest of this article, we talk about a special kind of
transpiler, <em>Native code generator</em>, which takes a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> and
generates a <codeclass="docutils literal"><spanclass="pre">.cu</span></code> (or <codeclass="docutils literal"><spanclass="pre">.cc</span></code>) file, which could be built by C++
compilers (gcc, nvcc, icc) into binaries.</p>
</div>
<divclass="section"id="native-code-generator">
<spanid="native-code-generator"></span><h2>Native Code Generator<aclass="headerlink"href="#native-code-generator"title="Permalink to this headline">¶</a></h2>
<p>For the above example, the native code generator transpiler, say, the
CUDA code generator, should generate a <codeclass="docutils literal"><spanclass="pre">main</span></code> function:</p>
<p>and the definitions of functions <codeclass="docutils literal"><spanclass="pre">fluid_cuda_read</span></code>,
<codeclass="docutils literal"><spanclass="pre">fluid_cuda_create_tensor</span></code>, and <codeclass="docutils literal"><spanclass="pre">fluid_cuda_mult</span></code>. Please be aware
that each function could just define a C++ instance of an operator and
<spanid="multi-block-code-generation"></span><h2>Multi-Block Code Generation<aclass="headerlink"href="#multi-block-code-generation"title="Permalink to this headline">¶</a></h2>
<p>Most Fluid application programs may have more than one blocks. To
execute them, we need to trace <aclass="reference internal"href="scope.html"><spanclass="doc">scopes</span></a>.</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
@@ -105,18 +105,10 @@ There are two ways to execute a Fluid program. When a program is executed, it c
There is a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc`, similar to how an interpreter runs a Python program.
Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.
Fluid is moving towards the direction of a compiler, which is explain in [fluid_compiler.md](fluid_compiler.md).
## Backward Compatibility of Fluid
Given all the advantages from the removal of the concept of a *model*, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference. For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph). Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators. The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators.
For Fluid, we can write a converter that extracts the parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.
## Towards a Deep Learning Language and the Compiler
We can change the `if-then-else` and loop structure a little bit in the above Fluid example programs, to make it into a new programming language, different than Python.
Even if we do not invent a new language, as long as we get the `ProgramDesc` message filled in, we can write a transpiler, which translates each invocation to an operator, into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using `nvcc`. Another transpiler could generate MKL-friendly code that should be built using `icc` from Intel. More interestingly, we can translate a Fluid program into its distributed version of two `ProgramDesc` messages, one for running on the trainer process, and the other one for the parameter server. For more details of the last example, the [concurrent programming design](concurrent_programming.md) document would be a good pointer. The following figure explains the proposed two-stage process:
<spanid="the-execution-of-a-fluid-program"></span><h2>The Execution of a Fluid Program<aclass="headerlink"href="#the-execution-of-a-fluid-program"title="永久链接至标题">¶</a></h2>
<p>There are two ways to execute a Fluid program. When a program is executed, it creates a protobuf message <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145"><codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code></a> that describes the process and is conceptually like an <aclass="reference external"href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a>.</p>
<p>There is a C++ class <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h"><codeclass="docutils literal"><spanclass="pre">Executor</span></code></a>, which runs a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code>, similar to how an interpreter runs a Python program.</p>
<p>Fluid is moving towards the direction of a compiler, which is explain in more detail later in this article.</p>
<p>Fluid is moving towards the direction of a compiler, which is explain in <aclass="reference internal"href="fluid_compiler.html"><spanclass="doc">fluid</span></a>.</p>
<spanid="backward-compatibility-of-fluid"></span><h2>Backward Compatibility of Fluid<aclass="headerlink"href="#backward-compatibility-of-fluid"title="永久链接至标题">¶</a></h2>
<p>Given all the advantages from the removal of the concept of a <em>model</em>, hardware manufacturers might still prefer the existence of the concept of a model, so it would be easier for them to support multiple frameworks all at once and could run a trained model during inference. For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads the models in the format known as <aclass="reference external"href="https://github.com/NervanaSystems/ngraph">n-graph</a>. Similarly, <aclass="reference external"href="https://www.movidius.com/">Movidius</a> is producing a mobile deep learning chip that reads and runs graphs of operators. The well-known <aclass="reference external"href="https://github.com/onnx/onnx">ONNX</a> is also a file format of graphs of operators.</p>
<p>For Fluid, we can write a converter that extracts the parts in the <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> protobuf message, converts them into a graph of operators, and exports the graph into the ONNX or n-graph format.</p>
<spanid="towards-a-deep-learning-language-and-the-compiler"></span><h2>Towards a Deep Learning Language and the Compiler<aclass="headerlink"href="#towards-a-deep-learning-language-and-the-compiler"title="永久链接至标题">¶</a></h2>
<p>We can change the <codeclass="docutils literal"><spanclass="pre">if-then-else</span></code> and loop structure a little bit in the above Fluid example programs, to make it into a new programming language, different than Python.</p>
<p>Even if we do not invent a new language, as long as we get the <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> message filled in, we can write a transpiler, which translates each invocation to an operator, into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using <codeclass="docutils literal"><spanclass="pre">nvcc</span></code>. Another transpiler could generate MKL-friendly code that should be built using <codeclass="docutils literal"><spanclass="pre">icc</span></code> from Intel. More interestingly, we can translate a Fluid program into its distributed version of two <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> messages, one for running on the trainer process, and the other one for the parameter server. For more details of the last example, the <aclass="reference internal"href="concurrent_programming.html"><spanclass="doc">concurrent programming design</span></a> document would be a good pointer. The following figure explains the proposed two-stage process:</p>
<spanid="paddlepaddle-fluid-towards-a-compiled-programming-language"></span><h1>PaddlePaddle Fluid: Towards a Compiled Programming Language<aclass="headerlink"href="#paddlepaddle-fluid-towards-a-compiled-programming-language"title="永久链接至标题">¶</a></h1>
<p>As described in <aclass="reference internal"href="fluid.html"><spanclass="doc">fluid.md</span></a>, when a Fluid application program
runs, it generates a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> protobuf message as an intermediate
representation of itself. The C++ class <codeclass="docutils literal"><spanclass="pre">Executor</span></code> can run this
protobuf message as an interpreter. This article describes the Fluid
<p>This program consists of a <aclass="reference internal"href="block.html"><spanclass="doc">block</span></a> of three operators –
<codeclass="docutils literal"><spanclass="pre">read</span></code>, <codeclass="docutils literal"><spanclass="pre">assign</span></code>, and <codeclass="docutils literal"><spanclass="pre">mult</span></code>. Its <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> message looks like
<p>We can write a transpiler program that takes a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code>, e.g.,
the above one, and outputs another <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code>. Let us take some
examples:</p>
<olclass="simple">
<li><em>Memory optimization transpiler</em>: We can write a transpiler that
inserts some <codeclass="docutils literal"><spanclass="pre">FreeMemoryOp</span></code>s in the above example <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> so
to free memory early, before the end of an iteration, so to keep a
small memory footprint.</li>
<li><em>Distributed training transpiler</em>: We can write a transpiler that
converts a<codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> into its distributed version of two
<codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code>s – one for running by the trainer processes and the
other for the parameter server.</li>
</ol>
<p>In the rest of this article, we talk about a special kind of
transpiler, <em>Native code generator</em>, which takes a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> and
generates a <codeclass="docutils literal"><spanclass="pre">.cu</span></code> (or <codeclass="docutils literal"><spanclass="pre">.cc</span></code>) file, which could be built by C++
<p>and the definitions of functions <codeclass="docutils literal"><spanclass="pre">fluid_cuda_read</span></code>,
<codeclass="docutils literal"><spanclass="pre">fluid_cuda_create_tensor</span></code>, and <codeclass="docutils literal"><spanclass="pre">fluid_cuda_mult</span></code>. Please be aware
that each function could just define a C++ instance of an operator and
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.