Deploy to GitHub Pages: 16a61004

1c192cb3 · Travis CI · 5f13bcd7 · 1c192cb3 · 1c192cb3 · 1c192cb3
10 changed file
--- a/develop/doc/_images/fluid-compiler.png
+++ b/develop/doc/_images/fluid-compiler.png
--- a/develop/doc/_sources/design/fluid.md.txt
+++ b/develop/doc/_sources/design/fluid.md.txt
+# Design Doc: PaddlePaddle Fluid
+## Why Fluid
+When Baidu developed PaddlePaddle in 2013, the only well-known open source deep learning system was Caffe.  However, when it open-sourced PaddlePaddle in 2016, there had been many other choices over there.  We were facing a challenge -- why would we open source yet another one?
+Fluid is the answer.  Fluid is similar to PyTorch and TensorFlow Eager Execution, which describes the "process" of training or inference a model, but not the model itself.  Indeed, in PyTorch, Eager Execution, and Fluid, there is no such a concept of the model at all. I will explain in this article, Fluid is currently more extreme in this idea than PyTorch and Eager Execution, and we are pushing Fluid towards a compiler and even a new programming language for deep learning
+## The Evolution of Deep Learning Systems
+Deep learning infrastructure is one of the fastest involving technology.  Within only four years, there have been three generations of technologies invented. 
+| Since around | model = sequence of layers | model = graph of operators | No model |
+|--|--|--|--|
+| 2013 | Caffe, Theano, Torch, PaddlePaddle | | |
+| 2015 | | TensorFlow, MxNet, Caffe2, ONNX, n-graph | |
+| 2016 | | | PyTorch, TensorFlow Eager Execution, PaddlePaddle Fluid |
+From the above table, we see that the technology is evolving towards the removal of the concept of the model.  To better understand the reason, let us compare the *programming paradigms*, or, the ways we program deep learning applications using these systems.
+## Deep Learning Programming Paradigms
+With any system listed as the first or second generation, e.g., Caffe or TensorFlow, an AI application training program looks like the following:
+```python
+x = layer.data("image")
+l = layer.data("label")
+f = layer.fc(x, W)
+s = layer.softmax(f)
+c = layer.mse(l, s)
+for i in xrange(1000): # train for 1000 iterations
+    m = read_minibatch()
+    forward({input=x, data=m}, minimize=c)
+    backward(...)
+print W # print the trained model parameters.
+```
+The above program includes two parts:
+1. the first part describes the model, and
+2. the second part describes the training process (or inference process).
+This paradigm has a well-known problem that limits programmers' productivity.  Suppose that we made some mistakes at configuring the model in the first part of the program, when we run the program, it wouldn't prompt error messages until the execution enters the second part, when the invocation to `forward` or `backward` raise errors.   It is difficult for the programmer to realize and locate that there is a mistake many lines away from where the error appears.
+This problem of hard to debug a program is the primary reason that programmers prefer PyTorch than elder systems.  Using PyTorch, we would write the above program like the following
+```python
+W = tensor(...)
+for i in xrange(1000): # train for 1000 iterations
+    m = read_minibatch()
+    x = m["image"]
+    l = m["label"]
+    f = layer.fc(x, W)
+    s = layer.softmax(f)
+    c = layer.mse(l, s)
+    backward()
+print W # print the trained model parameters.
+```
+We can see that the main difference is the moving of the model configuration, the first part, into the train loop.  This change would allow that mistakes in model configuration reported where they appear.  This change also represents the model, or its forward pass, by the process in the training loop.
+## Describe Arbitrary Models for the Future
+Describing the process instead of the model also brings Fluid the flexibility to define models not yet invented.
+As we can program the process, we can write an RNN as a loop, instead of an RNN layer or operator.  A PyTorch example could look like
+```python
+for i in xrange(1000):
+    m = read_minibatch()
+    x = m["sentence"]
+    for t in xrange x.len():
+        h[t] = the_step(x[t])
+```        
+With Fluid, the training loop and the RNN in the above program are not Python loop, but a "loop structure" provided by Fluid and implemented in C++:
+```python
+train_loop = layers.While(cond)
+with train_loop.block():
+  m = read_minibatch()
+  x = m["sentence"]
+  rnn = layers.While(...)
+  with rnn.block():
+    h[t] = the_step(input[t])
+```    
+A real Fluid example is [here](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/python/paddle/v2/fluid/tests/test_while_op.py#L36-L44).
+From these examples, you can see that Fluid programs look similar to their PyTorch equivalent, except that Fluid's loop structure, wrapped with Python's `with` statement, could run much faster than Python's loop.
+We have more examples of the [`if-then-else`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/if_else_op.md) structure of Fluid.
+## Turing Completeness
+In computability theory, a system of data-manipulation rules, such as a programming language, is said to be Turing complete if it can be used to simulate any Turing machine.  For a programming language, if it provides if-then-else and loop, it is Turing complete.  From above examples, Fluid seems Turing complete; however, I would like to point out is a slight difference between the if-then-else of Fluid and that in a programming language is that the former runs both of its branches.  It splits the input minibatch into two -- one for the true condition and one for the false.  I am not sure if this is equivalent to the if-then-else that makes programming languages Turing-complete.  I talked with [Yuang Yu](https://research.google.com/pubs/104812.html), but I need to figure out more.
+## The Execution of a Fluid Program
+There are two ways to run a Fluid program.  When we run an example program, it creates a protobuf message [`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and conceptually likes an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
+We have a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc` like that an interpreter runs a Python program.
+We are moving towards a compiler, which we will explain in more details later in this article.
+## Backward Compatibility
+Given all advantages from the removal of the concept *model*, hardware manufacturers might still prefer the existence of the concept model, so they could build their hardware reads and runs a trained model for inference.  For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph).  Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators too.  The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators.
+For Fluid, we can write a converter that extracts parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports into the ONNX or n-graph format.
+## Towards a Deep Learning Language and the Compiler
+We can change the if-then-else and loop structure a little bit in the above Fluid example programs so to make it a new programming language, different from Python.
+Even if we don't invent a new language, as long as we get the `ProgramDesc` message filled in, we can write a transpiler, which translates each invocation to an operator into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using `nvcc`.  Another transpiler could generate MKL-friendly code that should be built using `icc` from Intel.  More interestingly, we can translate a Fluid program into its distributed version of two `ProgramDesc` messages, one for running on the trainer process, and the other one for the parameter server.  For more details of the last example, let us check the [concurrent programming design](concurrent_programming.md).  The following figure explains this two-stage process:
+![](fluid-compiler.png)
--- a/develop/doc/design/fluid.html
+++ b/develop/doc/design/fluid.html
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Design Doc: PaddlePaddle Fluid &mdash; PaddlePaddle  documentation</title>
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+        <link rel="index" title="Index"
+              href="../genindex.html"/>
+        <link rel="search" title="Search" href="../search.html"/>
+    <link rel="top" title="PaddlePaddle  documentation" href="../index.html"/> 
+  <link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" />
+  <link rel="stylesheet" href="../_static/css/override.css" type="text/css" />
+  <script>
+  var _hmt = _hmt || [];
+  (function() {
+    var hm = document.createElement("script");
+    hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
+    var s = document.getElementsByTagName("script")[0]; 
+    s.parentNode.insertBefore(hm, s);
+  })();
+  </script>
+  <script src="../_static/js/modernizr.min.js"></script>
+</head>
+<body class="wy-body-for-nav" role="document">
+  <header class="site-header">
+    <div class="site-logo">
+      <a href="/"><img src="../_static/images/PP_w.png"></a>
+    </div>
+    <div class="site-nav-links">
+      <div class="site-menu">
+        <a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Fork me on Github</a>
+        <div class="language-switcher dropdown">
+          <a type="button" data-toggle="dropdown">
+            <span>English</span>
+            <i class="fa fa-angle-up"></i>
+            <i class="fa fa-angle-down"></i>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/doc_cn">中文</a></li>
+            <li><a href="/doc">English</a></li>
+          </ul>
+        </div>
+        <ul class="site-page-links">
+          <li><a href="/">Home</a></li>
+        </ul>
+      </div>
+      <div class="doc-module">
+        <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_en.html">GET STARTED</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_en.html">HOW TO</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_en.html">API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../mobile/index_en.html">MOBILE</a></li>
+</ul>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>        
+      </div>
+    </div>
+  </header>
+  <div class="main-content-wrap">
+    <nav class="doc-menu-vertical" role="navigation">
+          <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_en.html">GET STARTED</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/pip_install_en.html">Install Using pip</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/docker_install_en.html">Run in Docker Containers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/dev/build_en.html">Build using Docker</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/build_from_source_en.html">Build from Sources</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_en.html">HOW TO</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cmd_parameter/index_en.html">Set Command-line Parameters</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/use_case_en.html">Use Case</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/arguments_en.html">Argument Outline</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/detail_introduction_en.html">Detail Description</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cluster/cluster_train_en.html">PaddlePaddle Distributed Training</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/new_layer_en.html">Write New Layers</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/contribute_to_paddle_en.html">Contribute Code</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/write_docs_en.html">Contribute Documentation</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/deep_model/rnn/index_en.html">RNN Models</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/rnn_config_en.html">RNN Configuration</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/optimization/gpu_profiling_en.html">Tune GPU Performance</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_en.html">API</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/model_configs.html">Model Configuration</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/activation.html">Activation</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/layer.html">Layers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/evaluators.html">Evaluators</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/optimizer.html">Optimizer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/pooling.html">Pooling</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/networks.html">Networks</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/attr.html">Parameter Attribute</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/data.html">Data Reader Interface and DataSets</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/data/data_reader.html">Data Reader Interface</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/data/image.html">Image Interface</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/data/dataset.html">Dataset</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/run_logic.html">Training and Inference</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/fluid.html">Fluid</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/layers.html">Layers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/data_feeder.html">DataFeeder</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/executor.html">Executor</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/initializer.html">Initializer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/evaluator.html">Evaluator</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/nets.html">Nets</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/optimizer.html">Optimizer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/param_attr.html">ParamAttr</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/profiler.html">Profiler</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/fluid/regularizer.html">Regularizer</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../mobile/index_en.html">MOBILE</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../mobile/cross_compiling_for_android_en.html">Build PaddlePaddle for Android</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../mobile/cross_compiling_for_raspberry_en.html">Build PaddlePaddle for Raspberry Pi</a></li>
+</ul>
+</li>
+</ul>
+    </nav>
+    <section class="doc-content-wrap">
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li>Design Doc: PaddlePaddle Fluid</li>
+  </ul>
+</div>
+      <div class="wy-nav-content" id="doc-content">
+        <div class="rst-content">
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+  <div class="section" id="design-doc-paddlepaddle-fluid">
+<span id="design-doc-paddlepaddle-fluid"></span><h1>Design Doc: PaddlePaddle Fluid<a class="headerlink" href="#design-doc-paddlepaddle-fluid" title="Permalink to this headline">¶</a></h1>
+<div class="section" id="why-fluid">
+<span id="why-fluid"></span><h2>Why Fluid<a class="headerlink" href="#why-fluid" title="Permalink to this headline">¶</a></h2>
+<p>When Baidu developed PaddlePaddle in 2013, the only well-known open source deep learning system was Caffe.  However, when it open-sourced PaddlePaddle in 2016, there had been many other choices over there.  We were facing a challenge &#8211; why would we open source yet another one?</p>
+<p>Fluid is the answer.  Fluid is similar to PyTorch and TensorFlow Eager Execution, which describes the &#8220;process&#8221; of training or inference a model, but not the model itself.  Indeed, in PyTorch, Eager Execution, and Fluid, there is no such a concept of the model at all. I will explain in this article, Fluid is currently more extreme in this idea than PyTorch and Eager Execution, and we are pushing Fluid towards a compiler and even a new programming language for deep learning</p>
+</div>
+<div class="section" id="the-evolution-of-deep-learning-systems">
+<span id="the-evolution-of-deep-learning-systems"></span><h2>The Evolution of Deep Learning Systems<a class="headerlink" href="#the-evolution-of-deep-learning-systems" title="Permalink to this headline">¶</a></h2>
+<p>Deep learning infrastructure is one of the fastest involving technology.  Within only four years, there have been three generations of technologies invented.</p>
+<p>| Since around | model = sequence of layers | model = graph of operators | No model |
+|&#8211;|&#8211;|&#8211;|&#8211;|
+| 2013 | Caffe, Theano, Torch, PaddlePaddle | | |
+| 2015 | | TensorFlow, MxNet, Caffe2, ONNX, n-graph | |
+| 2016 | | | PyTorch, TensorFlow Eager Execution, PaddlePaddle Fluid |</p>
+<p>From the above table, we see that the technology is evolving towards the removal of the concept of the model.  To better understand the reason, let us compare the <em>programming paradigms</em>, or, the ways we program deep learning applications using these systems.</p>
+</div>
+<div class="section" id="deep-learning-programming-paradigms">
+<span id="deep-learning-programming-paradigms"></span><h2>Deep Learning Programming Paradigms<a class="headerlink" href="#deep-learning-programming-paradigms" title="Permalink to this headline">¶</a></h2>
+<p>With any system listed as the first or second generation, e.g., Caffe or TensorFlow, an AI application training program looks like the following:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">x</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="s2">&quot;image&quot;</span><span class="p">)</span>
+<span class="n">l</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="s2">&quot;label&quot;</span><span class="p">)</span>
+<span class="n">f</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">W</span><span class="p">)</span>
+<span class="n">s</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
+<span class="n">c</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">mse</span><span class="p">(</span><span class="n">l</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
+<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span> <span class="c1"># train for 1000 iterations</span>
+    <span class="n">m</span> <span class="o">=</span> <span class="n">read_minibatch</span><span class="p">()</span>
+    <span class="n">forward</span><span class="p">({</span><span class="nb">input</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">m</span><span class="p">},</span> <span class="n">minimize</span><span class="o">=</span><span class="n">c</span><span class="p">)</span>
+    <span class="n">backward</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
+<span class="k">print</span> <span class="n">W</span> <span class="c1"># print the trained model parameters.</span>
+</pre></div>
+</div>
+<p>The above program includes two parts:</p>
+<ol class="simple">
+<li>the first part describes the model, and</li>
+<li>the second part describes the training process (or inference process).</li>
+</ol>
+<p>This paradigm has a well-known problem that limits programmers&#8217; productivity.  Suppose that we made some mistakes at configuring the model in the first part of the program, when we run the program, it wouldn&#8217;t prompt error messages until the execution enters the second part, when the invocation to <code class="docutils literal"><span class="pre">forward</span></code> or <code class="docutils literal"><span class="pre">backward</span></code> raise errors.   It is difficult for the programmer to realize and locate that there is a mistake many lines away from where the error appears.</p>
+<p>This problem of hard to debug a program is the primary reason that programmers prefer PyTorch than elder systems.  Using PyTorch, we would write the above program like the following</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">W</span> <span class="o">=</span> <span class="n">tensor</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
+<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span> <span class="c1"># train for 1000 iterations</span>
+    <span class="n">m</span> <span class="o">=</span> <span class="n">read_minibatch</span><span class="p">()</span>
+    <span class="n">x</span> <span class="o">=</span> <span class="n">m</span><span class="p">[</span><span class="s2">&quot;image&quot;</span><span class="p">]</span>
+    <span class="n">l</span> <span class="o">=</span> <span class="n">m</span><span class="p">[</span><span class="s2">&quot;label&quot;</span><span class="p">]</span>
+    <span class="n">f</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">W</span><span class="p">)</span>
+    <span class="n">s</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
+    <span class="n">c</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">mse</span><span class="p">(</span><span class="n">l</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
+    <span class="n">backward</span><span class="p">()</span>
+<span class="k">print</span> <span class="n">W</span> <span class="c1"># print the trained model parameters.</span>
+</pre></div>
+</div>
+<p>We can see that the main difference is the moving of the model configuration, the first part, into the train loop.  This change would allow that mistakes in model configuration reported where they appear.  This change also represents the model, or its forward pass, by the process in the training loop.</p>
+</div>
+<div class="section" id="describe-arbitrary-models-for-the-future">
+<span id="describe-arbitrary-models-for-the-future"></span><h2>Describe Arbitrary Models for the Future<a class="headerlink" href="#describe-arbitrary-models-for-the-future" title="Permalink to this headline">¶</a></h2>
+<p>Describing the process instead of the model also brings Fluid the flexibility to define models not yet invented.</p>
+<p>As we can program the process, we can write an RNN as a loop, instead of an RNN layer or operator.  A PyTorch example could look like</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span>
+    <span class="n">m</span> <span class="o">=</span> <span class="n">read_minibatch</span><span class="p">()</span>
+    <span class="n">x</span> <span class="o">=</span> <span class="n">m</span><span class="p">[</span><span class="s2">&quot;sentence&quot;</span><span class="p">]</span>
+    <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">xrange</span> <span class="n">x</span><span class="o">.</span><span class="n">len</span><span class="p">():</span>
+        <span class="n">h</span><span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="n">the_step</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">t</span><span class="p">])</span>
+</pre></div>
+</div>
+<p>With Fluid, the training loop and the RNN in the above program are not Python loop, but a &#8220;loop structure&#8221; provided by Fluid and implemented in C++:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">train_loop</span> <span class="o">=</span> <span class="n">layers</span><span class="o">.</span><span class="n">While</span><span class="p">(</span><span class="n">cond</span><span class="p">)</span>
+<span class="k">with</span> <span class="n">train_loop</span><span class="o">.</span><span class="n">block</span><span class="p">():</span>
+  <span class="n">m</span> <span class="o">=</span> <span class="n">read_minibatch</span><span class="p">()</span>
+  <span class="n">x</span> <span class="o">=</span> <span class="n">m</span><span class="p">[</span><span class="s2">&quot;sentence&quot;</span><span class="p">]</span>
+  <span class="n">rnn</span> <span class="o">=</span> <span class="n">layers</span><span class="o">.</span><span class="n">While</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
+  <span class="k">with</span> <span class="n">rnn</span><span class="o">.</span><span class="n">block</span><span class="p">():</span>
+    <span class="n">h</span><span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="n">the_step</span><span class="p">(</span><span class="nb">input</span><span class="p">[</span><span class="n">t</span><span class="p">])</span>
+</pre></div>
+</div>
+<p>A real Fluid example is <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/python/paddle/v2/fluid/tests/test_while_op.py#L36-L44">here</a>.</p>
+<p>From these examples, you can see that Fluid programs look similar to their PyTorch equivalent, except that Fluid&#8217;s loop structure, wrapped with Python&#8217;s <code class="docutils literal"><span class="pre">with</span></code> statement, could run much faster than Python&#8217;s loop.</p>
+<p>We have more examples of the <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/if_else_op.md"><code class="docutils literal"><span class="pre">if-then-else</span></code></a> structure of Fluid.</p>
+</div>
+<div class="section" id="turing-completeness">
+<span id="turing-completeness"></span><h2>Turing Completeness<a class="headerlink" href="#turing-completeness" title="Permalink to this headline">¶</a></h2>
+<p>In computability theory, a system of data-manipulation rules, such as a programming language, is said to be Turing complete if it can be used to simulate any Turing machine.  For a programming language, if it provides if-then-else and loop, it is Turing complete.  From above examples, Fluid seems Turing complete; however, I would like to point out is a slight difference between the if-then-else of Fluid and that in a programming language is that the former runs both of its branches.  It splits the input minibatch into two &#8211; one for the true condition and one for the false.  I am not sure if this is equivalent to the if-then-else that makes programming languages Turing-complete.  I talked with <a class="reference external" href="https://research.google.com/pubs/104812.html">Yuang Yu</a>, but I need to figure out more.</p>
+</div>
+<div class="section" id="the-execution-of-a-fluid-program">
+<span id="the-execution-of-a-fluid-program"></span><h2>The Execution of a Fluid Program<a class="headerlink" href="#the-execution-of-a-fluid-program" title="Permalink to this headline">¶</a></h2>
+<p>There are two ways to run a Fluid program.  When we run an example program, it creates a protobuf message <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145"><code class="docutils literal"><span class="pre">ProgramDesc</span></code></a> that describes the process and conceptually likes an <a class="reference external" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a>.</p>
+<p>We have a C++ class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h"><code class="docutils literal"><span class="pre">Executor</span></code></a>, which runs a <code class="docutils literal"><span class="pre">ProgramDesc</span></code> like that an interpreter runs a Python program.</p>
+<p>We are moving towards a compiler, which we will explain in more details later in this article.</p>
+</div>
+<div class="section" id="backward-compatibility">
+<span id="backward-compatibility"></span><h2>Backward Compatibility<a class="headerlink" href="#backward-compatibility" title="Permalink to this headline">¶</a></h2>
+<p>Given all advantages from the removal of the concept <em>model</em>, hardware manufacturers might still prefer the existence of the concept model, so they could build their hardware reads and runs a trained model for inference.  For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads models in the format known as <a class="reference external" href="https://github.com/NervanaSystems/ngraph">n-graph</a>.  Similarly, <a class="reference external" href="https://www.movidius.com/">Movidius</a> is producing a mobile deep learning chip that reads and runs graphs of operators too.  The well-known <a class="reference external" href="https://github.com/onnx/onnx">ONNX</a> is also a file format of graphs of operators.</p>
+<p>For Fluid, we can write a converter that extracts parts in the <code class="docutils literal"><span class="pre">ProgramDesc</span></code> protobuf message, converts them into a graph of operators, and exports into the ONNX or n-graph format.</p>
+</div>
+<div class="section" id="towards-a-deep-learning-language-and-the-compiler">
+<span id="towards-a-deep-learning-language-and-the-compiler"></span><h2>Towards a Deep Learning Language and the Compiler<a class="headerlink" href="#towards-a-deep-learning-language-and-the-compiler" title="Permalink to this headline">¶</a></h2>
+<p>We can change the if-then-else and loop structure a little bit in the above Fluid example programs so to make it a new programming language, different from Python.</p>
+<p>Even if we don&#8217;t invent a new language, as long as we get the <code class="docutils literal"><span class="pre">ProgramDesc</span></code> message filled in, we can write a transpiler, which translates each invocation to an operator into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using <code class="docutils literal"><span class="pre">nvcc</span></code>.  Another transpiler could generate MKL-friendly code that should be built using <code class="docutils literal"><span class="pre">icc</span></code> from Intel.  More interestingly, we can translate a Fluid program into its distributed version of two <code class="docutils literal"><span class="pre">ProgramDesc</span></code> messages, one for running on the trainer process, and the other one for the parameter server.  For more details of the last example, let us check the <a class="reference external" href="design/concurrent_programming.md">concurrent programming design</a>.  The following figure explains this two-stage process:</p>
+<p><img alt="" src="../_images/fluid-compiler.png" /></p>
+</div>
+</div>
+           </div>
+          </div>
+          <footer>
+  <hr/>
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016, PaddlePaddle developers.
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true,
+            SOURCELINK_SUFFIX: ".txt",
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+      <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
+  <script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script>
+  <script src="../_static/js/paddle_doc_init.js"></script> 
+</body>
+</html>
\ No newline at end of file
--- a/develop/doc/objects.inv
+++ b/develop/doc/objects.inv
--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_images/fluid-compiler.png
+++ b/develop/doc_cn/_images/fluid-compiler.png
--- a/develop/doc_cn/_sources/design/fluid.md.txt
+++ b/develop/doc_cn/_sources/design/fluid.md.txt
+# Design Doc: PaddlePaddle Fluid
+## Why Fluid
+When Baidu developed PaddlePaddle in 2013, the only well-known open source deep learning system was Caffe.  However, when it open-sourced PaddlePaddle in 2016, there had been many other choices over there.  We were facing a challenge -- why would we open source yet another one?
+Fluid is the answer.  Fluid is similar to PyTorch and TensorFlow Eager Execution, which describes the "process" of training or inference a model, but not the model itself.  Indeed, in PyTorch, Eager Execution, and Fluid, there is no such a concept of the model at all. I will explain in this article, Fluid is currently more extreme in this idea than PyTorch and Eager Execution, and we are pushing Fluid towards a compiler and even a new programming language for deep learning
+## The Evolution of Deep Learning Systems
+Deep learning infrastructure is one of the fastest involving technology.  Within only four years, there have been three generations of technologies invented. 
+| Since around | model = sequence of layers | model = graph of operators | No model |
+|--|--|--|--|
+| 2013 | Caffe, Theano, Torch, PaddlePaddle | | |
+| 2015 | | TensorFlow, MxNet, Caffe2, ONNX, n-graph | |
+| 2016 | | | PyTorch, TensorFlow Eager Execution, PaddlePaddle Fluid |
+From the above table, we see that the technology is evolving towards the removal of the concept of the model.  To better understand the reason, let us compare the *programming paradigms*, or, the ways we program deep learning applications using these systems.
+## Deep Learning Programming Paradigms
+With any system listed as the first or second generation, e.g., Caffe or TensorFlow, an AI application training program looks like the following:
+```python
+x = layer.data("image")
+l = layer.data("label")
+f = layer.fc(x, W)
+s = layer.softmax(f)
+c = layer.mse(l, s)
+for i in xrange(1000): # train for 1000 iterations
+    m = read_minibatch()
+    forward({input=x, data=m}, minimize=c)
+    backward(...)
+print W # print the trained model parameters.
+```
+The above program includes two parts:
+1. the first part describes the model, and
+2. the second part describes the training process (or inference process).
+This paradigm has a well-known problem that limits programmers' productivity.  Suppose that we made some mistakes at configuring the model in the first part of the program, when we run the program, it wouldn't prompt error messages until the execution enters the second part, when the invocation to `forward` or `backward` raise errors.   It is difficult for the programmer to realize and locate that there is a mistake many lines away from where the error appears.
+This problem of hard to debug a program is the primary reason that programmers prefer PyTorch than elder systems.  Using PyTorch, we would write the above program like the following
+```python
+W = tensor(...)
+for i in xrange(1000): # train for 1000 iterations
+    m = read_minibatch()
+    x = m["image"]
+    l = m["label"]
+    f = layer.fc(x, W)
+    s = layer.softmax(f)
+    c = layer.mse(l, s)
+    backward()
+print W # print the trained model parameters.
+```
+We can see that the main difference is the moving of the model configuration, the first part, into the train loop.  This change would allow that mistakes in model configuration reported where they appear.  This change also represents the model, or its forward pass, by the process in the training loop.
+## Describe Arbitrary Models for the Future
+Describing the process instead of the model also brings Fluid the flexibility to define models not yet invented.
+As we can program the process, we can write an RNN as a loop, instead of an RNN layer or operator.  A PyTorch example could look like
+```python
+for i in xrange(1000):
+    m = read_minibatch()
+    x = m["sentence"]
+    for t in xrange x.len():
+        h[t] = the_step(x[t])
+```        
+With Fluid, the training loop and the RNN in the above program are not Python loop, but a "loop structure" provided by Fluid and implemented in C++:
+```python
+train_loop = layers.While(cond)
+with train_loop.block():
+  m = read_minibatch()
+  x = m["sentence"]
+  rnn = layers.While(...)
+  with rnn.block():
+    h[t] = the_step(input[t])
+```    
+A real Fluid example is [here](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/python/paddle/v2/fluid/tests/test_while_op.py#L36-L44).
+From these examples, you can see that Fluid programs look similar to their PyTorch equivalent, except that Fluid's loop structure, wrapped with Python's `with` statement, could run much faster than Python's loop.
+We have more examples of the [`if-then-else`](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/if_else_op.md) structure of Fluid.
+## Turing Completeness
+In computability theory, a system of data-manipulation rules, such as a programming language, is said to be Turing complete if it can be used to simulate any Turing machine.  For a programming language, if it provides if-then-else and loop, it is Turing complete.  From above examples, Fluid seems Turing complete; however, I would like to point out is a slight difference between the if-then-else of Fluid and that in a programming language is that the former runs both of its branches.  It splits the input minibatch into two -- one for the true condition and one for the false.  I am not sure if this is equivalent to the if-then-else that makes programming languages Turing-complete.  I talked with [Yuang Yu](https://research.google.com/pubs/104812.html), but I need to figure out more.
+## The Execution of a Fluid Program
+There are two ways to run a Fluid program.  When we run an example program, it creates a protobuf message [`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and conceptually likes an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
+We have a C++ class [`Executor`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h), which runs a `ProgramDesc` like that an interpreter runs a Python program.
+We are moving towards a compiler, which we will explain in more details later in this article.
+## Backward Compatibility
+Given all advantages from the removal of the concept *model*, hardware manufacturers might still prefer the existence of the concept model, so they could build their hardware reads and runs a trained model for inference.  For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads models in the format known as [n-graph](https://github.com/NervanaSystems/ngraph).  Similarly, [Movidius](https://www.movidius.com/) is producing a mobile deep learning chip that reads and runs graphs of operators too.  The well-known [ONNX](https://github.com/onnx/onnx) is also a file format of graphs of operators.
+For Fluid, we can write a converter that extracts parts in the `ProgramDesc` protobuf message, converts them into a graph of operators, and exports into the ONNX or n-graph format.
+## Towards a Deep Learning Language and the Compiler
+We can change the if-then-else and loop structure a little bit in the above Fluid example programs so to make it a new programming language, different from Python.
+Even if we don't invent a new language, as long as we get the `ProgramDesc` message filled in, we can write a transpiler, which translates each invocation to an operator into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using `nvcc`.  Another transpiler could generate MKL-friendly code that should be built using `icc` from Intel.  More interestingly, we can translate a Fluid program into its distributed version of two `ProgramDesc` messages, one for running on the trainer process, and the other one for the parameter server.  For more details of the last example, let us check the [concurrent programming design](concurrent_programming.md).  The following figure explains this two-stage process:
+![](fluid-compiler.png)
--- a/develop/doc_cn/design/fluid.html
+++ b/develop/doc_cn/design/fluid.html
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <title>Design Doc: PaddlePaddle Fluid &mdash; PaddlePaddle  文档</title>
+    <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+        <link rel="index" title="索引"
+              href="../genindex.html"/>
+        <link rel="search" title="搜索" href="../search.html"/>
+    <link rel="top" title="PaddlePaddle  文档" href="../index.html"/> 
+  <link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" />
+  <link rel="stylesheet" href="../_static/css/override.css" type="text/css" />
+  <script>
+  var _hmt = _hmt || [];
+  (function() {
+    var hm = document.createElement("script");
+    hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
+    var s = document.getElementsByTagName("script")[0]; 
+    s.parentNode.insertBefore(hm, s);
+  })();
+  </script>
+  <script src="../_static/js/modernizr.min.js"></script>
+</head>
+<body class="wy-body-for-nav" role="document">
+  <header class="site-header">
+    <div class="site-logo">
+      <a href="/"><img src="../_static/images/PP_w.png"></a>
+    </div>
+    <div class="site-nav-links">
+      <div class="site-menu">
+        <a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Fork me on Github</a>
+        <div class="language-switcher dropdown">
+          <a type="button" data-toggle="dropdown">
+            <span>English</span>
+            <i class="fa fa-angle-up"></i>
+            <i class="fa fa-angle-down"></i>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/doc_cn">中文</a></li>
+            <li><a href="/doc">English</a></li>
+          </ul>
+        </div>
+        <ul class="site-page-links">
+          <li><a href="/">Home</a></li>
+        </ul>
+      </div>
+      <div class="doc-module">
+        <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_cn.html">新手入门</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_cn.html">进阶指南</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_cn.html">API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faq/index_cn.html">FAQ</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../mobile/index_cn.html">MOBILE</a></li>
+</ul>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>        
+      </div>
+    </div>
+  </header>
+  <div class="main-content-wrap">
+    <nav class="doc-menu-vertical" role="navigation">
+          <ul>
+<li class="toctree-l1"><a class="reference internal" href="../getstarted/index_cn.html">新手入门</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../getstarted/build_and_install/index_cn.html">安装与编译</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/pip_install_cn.html">使用pip安装</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/docker_install_cn.html">使用Docker安装运行</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/dev/build_cn.html">用Docker编译和测试PaddlePaddle</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../getstarted/build_and_install/build_from_source_cn.html">从源码编译</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../getstarted/concepts/use_concepts_cn.html">基本使用概念</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../howto/index_cn.html">进阶指南</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cmd_parameter/index_cn.html">设置命令行参数</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/use_case_cn.html">使用案例</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/arguments_cn.html">参数概述</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/usage/cmd_parameter/detail_introduction_cn.html">细节描述</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/cluster/cluster_train_cn.html">PaddlePaddle分布式训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_basis_cn.html">Kubernetes 简介</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_cn.html">Kubernetes单机训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/usage/k8s/k8s_distributed_cn.html">Kubernetes分布式训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/contribute_to_paddle_cn.html">如何贡献代码</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/dev/write_docs_cn.html">如何贡献/修改文档</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/deep_model/rnn/index_cn.html">RNN相关模型</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/rnn_config_cn.html">RNN配置</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/recurrent_group_cn.html">Recurrent Group教程</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/hierarchical_layer_cn.html">支持双层序列作为输入的Layer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../howto/deep_model/rnn/hrnn_rnn_api_compare_cn.html">单双层RNN API对比介绍</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../howto/optimization/gpu_profiling_cn.html">GPU性能分析与调优</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../api/index_cn.html">API</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/model_configs.html">模型配置</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/activation.html">Activation</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/layer.html">Layers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/evaluators.html">Evaluators</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/optimizer.html">Optimizer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/pooling.html">Pooling</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/networks.html">Networks</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/config/attr.html">Parameter Attribute</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/data.html">数据访问</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/data/data_reader.html">Data Reader Interface</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/data/image.html">Image Interface</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../api/v2/data/dataset.html">Dataset</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../api/v2/run_logic.html">训练与应用</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../faq/index_cn.html">FAQ</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../faq/build_and_install/index_cn.html">编译安装与单元测试</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../faq/model/index_cn.html">模型配置</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../faq/parameter/index_cn.html">参数设置</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../faq/local/index_cn.html">本地训练与预测</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../faq/cluster/index_cn.html">集群训练与预测</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../mobile/index_cn.html">MOBILE</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../mobile/cross_compiling_for_android_cn.html">Android平台编译指南</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../mobile/cross_compiling_for_ios_cn.html">iOS平台编译指南</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../mobile/cross_compiling_for_raspberry_cn.html">Raspberry Pi平台编译指南</a></li>
+</ul>
+</li>
+</ul>
+    </nav>
+    <section class="doc-content-wrap">
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li>Design Doc: PaddlePaddle Fluid</li>
+  </ul>
+</div>
+      <div class="wy-nav-content" id="doc-content">
+        <div class="rst-content">
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+  <div class="section" id="design-doc-paddlepaddle-fluid">
+<span id="design-doc-paddlepaddle-fluid"></span><h1>Design Doc: PaddlePaddle Fluid<a class="headerlink" href="#design-doc-paddlepaddle-fluid" title="永久链接至标题">¶</a></h1>
+<div class="section" id="why-fluid">
+<span id="why-fluid"></span><h2>Why Fluid<a class="headerlink" href="#why-fluid" title="永久链接至标题">¶</a></h2>
+<p>When Baidu developed PaddlePaddle in 2013, the only well-known open source deep learning system was Caffe.  However, when it open-sourced PaddlePaddle in 2016, there had been many other choices over there.  We were facing a challenge &#8211; why would we open source yet another one?</p>
+<p>Fluid is the answer.  Fluid is similar to PyTorch and TensorFlow Eager Execution, which describes the &#8220;process&#8221; of training or inference a model, but not the model itself.  Indeed, in PyTorch, Eager Execution, and Fluid, there is no such a concept of the model at all. I will explain in this article, Fluid is currently more extreme in this idea than PyTorch and Eager Execution, and we are pushing Fluid towards a compiler and even a new programming language for deep learning</p>
+</div>
+<div class="section" id="the-evolution-of-deep-learning-systems">
+<span id="the-evolution-of-deep-learning-systems"></span><h2>The Evolution of Deep Learning Systems<a class="headerlink" href="#the-evolution-of-deep-learning-systems" title="永久链接至标题">¶</a></h2>
+<p>Deep learning infrastructure is one of the fastest involving technology.  Within only four years, there have been three generations of technologies invented.</p>
+<p>| Since around | model = sequence of layers | model = graph of operators | No model |
+|&#8211;|&#8211;|&#8211;|&#8211;|
+| 2013 | Caffe, Theano, Torch, PaddlePaddle | | |
+| 2015 | | TensorFlow, MxNet, Caffe2, ONNX, n-graph | |
+| 2016 | | | PyTorch, TensorFlow Eager Execution, PaddlePaddle Fluid |</p>
+<p>From the above table, we see that the technology is evolving towards the removal of the concept of the model.  To better understand the reason, let us compare the <em>programming paradigms</em>, or, the ways we program deep learning applications using these systems.</p>
+</div>
+<div class="section" id="deep-learning-programming-paradigms">
+<span id="deep-learning-programming-paradigms"></span><h2>Deep Learning Programming Paradigms<a class="headerlink" href="#deep-learning-programming-paradigms" title="永久链接至标题">¶</a></h2>
+<p>With any system listed as the first or second generation, e.g., Caffe or TensorFlow, an AI application training program looks like the following:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">x</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="s2">&quot;image&quot;</span><span class="p">)</span>
+<span class="n">l</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">data</span><span class="p">(</span><span class="s2">&quot;label&quot;</span><span class="p">)</span>
+<span class="n">f</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">W</span><span class="p">)</span>
+<span class="n">s</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
+<span class="n">c</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">mse</span><span class="p">(</span><span class="n">l</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
+<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span> <span class="c1"># train for 1000 iterations</span>
+    <span class="n">m</span> <span class="o">=</span> <span class="n">read_minibatch</span><span class="p">()</span>
+    <span class="n">forward</span><span class="p">({</span><span class="nb">input</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">m</span><span class="p">},</span> <span class="n">minimize</span><span class="o">=</span><span class="n">c</span><span class="p">)</span>
+    <span class="n">backward</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
+<span class="k">print</span> <span class="n">W</span> <span class="c1"># print the trained model parameters.</span>
+</pre></div>
+</div>
+<p>The above program includes two parts:</p>
+<ol class="simple">
+<li>the first part describes the model, and</li>
+<li>the second part describes the training process (or inference process).</li>
+</ol>
+<p>This paradigm has a well-known problem that limits programmers&#8217; productivity.  Suppose that we made some mistakes at configuring the model in the first part of the program, when we run the program, it wouldn&#8217;t prompt error messages until the execution enters the second part, when the invocation to <code class="docutils literal"><span class="pre">forward</span></code> or <code class="docutils literal"><span class="pre">backward</span></code> raise errors.   It is difficult for the programmer to realize and locate that there is a mistake many lines away from where the error appears.</p>
+<p>This problem of hard to debug a program is the primary reason that programmers prefer PyTorch than elder systems.  Using PyTorch, we would write the above program like the following</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">W</span> <span class="o">=</span> <span class="n">tensor</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
+<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span> <span class="c1"># train for 1000 iterations</span>
+    <span class="n">m</span> <span class="o">=</span> <span class="n">read_minibatch</span><span class="p">()</span>
+    <span class="n">x</span> <span class="o">=</span> <span class="n">m</span><span class="p">[</span><span class="s2">&quot;image&quot;</span><span class="p">]</span>
+    <span class="n">l</span> <span class="o">=</span> <span class="n">m</span><span class="p">[</span><span class="s2">&quot;label&quot;</span><span class="p">]</span>
+    <span class="n">f</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">W</span><span class="p">)</span>
+    <span class="n">s</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
+    <span class="n">c</span> <span class="o">=</span> <span class="n">layer</span><span class="o">.</span><span class="n">mse</span><span class="p">(</span><span class="n">l</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
+    <span class="n">backward</span><span class="p">()</span>
+<span class="k">print</span> <span class="n">W</span> <span class="c1"># print the trained model parameters.</span>
+</pre></div>
+</div>
+<p>We can see that the main difference is the moving of the model configuration, the first part, into the train loop.  This change would allow that mistakes in model configuration reported where they appear.  This change also represents the model, or its forward pass, by the process in the training loop.</p>
+</div>
+<div class="section" id="describe-arbitrary-models-for-the-future">
+<span id="describe-arbitrary-models-for-the-future"></span><h2>Describe Arbitrary Models for the Future<a class="headerlink" href="#describe-arbitrary-models-for-the-future" title="永久链接至标题">¶</a></h2>
+<p>Describing the process instead of the model also brings Fluid the flexibility to define models not yet invented.</p>
+<p>As we can program the process, we can write an RNN as a loop, instead of an RNN layer or operator.  A PyTorch example could look like</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span>
+    <span class="n">m</span> <span class="o">=</span> <span class="n">read_minibatch</span><span class="p">()</span>
+    <span class="n">x</span> <span class="o">=</span> <span class="n">m</span><span class="p">[</span><span class="s2">&quot;sentence&quot;</span><span class="p">]</span>
+    <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">xrange</span> <span class="n">x</span><span class="o">.</span><span class="n">len</span><span class="p">():</span>
+        <span class="n">h</span><span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="n">the_step</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">t</span><span class="p">])</span>
+</pre></div>
+</div>
+<p>With Fluid, the training loop and the RNN in the above program are not Python loop, but a &#8220;loop structure&#8221; provided by Fluid and implemented in C++:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">train_loop</span> <span class="o">=</span> <span class="n">layers</span><span class="o">.</span><span class="n">While</span><span class="p">(</span><span class="n">cond</span><span class="p">)</span>
+<span class="k">with</span> <span class="n">train_loop</span><span class="o">.</span><span class="n">block</span><span class="p">():</span>
+  <span class="n">m</span> <span class="o">=</span> <span class="n">read_minibatch</span><span class="p">()</span>
+  <span class="n">x</span> <span class="o">=</span> <span class="n">m</span><span class="p">[</span><span class="s2">&quot;sentence&quot;</span><span class="p">]</span>
+  <span class="n">rnn</span> <span class="o">=</span> <span class="n">layers</span><span class="o">.</span><span class="n">While</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
+  <span class="k">with</span> <span class="n">rnn</span><span class="o">.</span><span class="n">block</span><span class="p">():</span>
+    <span class="n">h</span><span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="n">the_step</span><span class="p">(</span><span class="nb">input</span><span class="p">[</span><span class="n">t</span><span class="p">])</span>
+</pre></div>
+</div>
+<p>A real Fluid example is <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/python/paddle/v2/fluid/tests/test_while_op.py#L36-L44">here</a>.</p>
+<p>From these examples, you can see that Fluid programs look similar to their PyTorch equivalent, except that Fluid&#8217;s loop structure, wrapped with Python&#8217;s <code class="docutils literal"><span class="pre">with</span></code> statement, could run much faster than Python&#8217;s loop.</p>
+<p>We have more examples of the <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/if_else_op.md"><code class="docutils literal"><span class="pre">if-then-else</span></code></a> structure of Fluid.</p>
+</div>
+<div class="section" id="turing-completeness">
+<span id="turing-completeness"></span><h2>Turing Completeness<a class="headerlink" href="#turing-completeness" title="永久链接至标题">¶</a></h2>
+<p>In computability theory, a system of data-manipulation rules, such as a programming language, is said to be Turing complete if it can be used to simulate any Turing machine.  For a programming language, if it provides if-then-else and loop, it is Turing complete.  From above examples, Fluid seems Turing complete; however, I would like to point out is a slight difference between the if-then-else of Fluid and that in a programming language is that the former runs both of its branches.  It splits the input minibatch into two &#8211; one for the true condition and one for the false.  I am not sure if this is equivalent to the if-then-else that makes programming languages Turing-complete.  I talked with <a class="reference external" href="https://research.google.com/pubs/104812.html">Yuang Yu</a>, but I need to figure out more.</p>
+</div>
+<div class="section" id="the-execution-of-a-fluid-program">
+<span id="the-execution-of-a-fluid-program"></span><h2>The Execution of a Fluid Program<a class="headerlink" href="#the-execution-of-a-fluid-program" title="永久链接至标题">¶</a></h2>
+<p>There are two ways to run a Fluid program.  When we run an example program, it creates a protobuf message <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145"><code class="docutils literal"><span class="pre">ProgramDesc</span></code></a> that describes the process and conceptually likes an <a class="reference external" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a>.</p>
+<p>We have a C++ class <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/executor.h"><code class="docutils literal"><span class="pre">Executor</span></code></a>, which runs a <code class="docutils literal"><span class="pre">ProgramDesc</span></code> like that an interpreter runs a Python program.</p>
+<p>We are moving towards a compiler, which we will explain in more details later in this article.</p>
+</div>
+<div class="section" id="backward-compatibility">
+<span id="backward-compatibility"></span><h2>Backward Compatibility<a class="headerlink" href="#backward-compatibility" title="永久链接至标题">¶</a></h2>
+<p>Given all advantages from the removal of the concept <em>model</em>, hardware manufacturers might still prefer the existence of the concept model, so they could build their hardware reads and runs a trained model for inference.  For example, Nervana, a startup company acquired by Intel, has been working on an XPU that reads models in the format known as <a class="reference external" href="https://github.com/NervanaSystems/ngraph">n-graph</a>.  Similarly, <a class="reference external" href="https://www.movidius.com/">Movidius</a> is producing a mobile deep learning chip that reads and runs graphs of operators too.  The well-known <a class="reference external" href="https://github.com/onnx/onnx">ONNX</a> is also a file format of graphs of operators.</p>
+<p>For Fluid, we can write a converter that extracts parts in the <code class="docutils literal"><span class="pre">ProgramDesc</span></code> protobuf message, converts them into a graph of operators, and exports into the ONNX or n-graph format.</p>
+</div>
+<div class="section" id="towards-a-deep-learning-language-and-the-compiler">
+<span id="towards-a-deep-learning-language-and-the-compiler"></span><h2>Towards a Deep Learning Language and the Compiler<a class="headerlink" href="#towards-a-deep-learning-language-and-the-compiler" title="永久链接至标题">¶</a></h2>
+<p>We can change the if-then-else and loop structure a little bit in the above Fluid example programs so to make it a new programming language, different from Python.</p>
+<p>Even if we don&#8217;t invent a new language, as long as we get the <code class="docutils literal"><span class="pre">ProgramDesc</span></code> message filled in, we can write a transpiler, which translates each invocation to an operator into a C++ call to a kernel function of that operator. For example, a transpiler that weaves the CUDA kernels outputs an NVIDIA-friendly C++ program, which can be built using <code class="docutils literal"><span class="pre">nvcc</span></code>.  Another transpiler could generate MKL-friendly code that should be built using <code class="docutils literal"><span class="pre">icc</span></code> from Intel.  More interestingly, we can translate a Fluid program into its distributed version of two <code class="docutils literal"><span class="pre">ProgramDesc</span></code> messages, one for running on the trainer process, and the other one for the parameter server.  For more details of the last example, let us check the <a class="reference external" href="design/concurrent_programming.md">concurrent programming design</a>.  The following figure explains this two-stage process:</p>
+<p><img alt="" src="../_images/fluid-compiler.png" /></p>
+</div>
+</div>
+           </div>
+          </div>
+          <footer>
+  <hr/>
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016, PaddlePaddle developers.
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true,
+            SOURCELINK_SUFFIX: ".txt",
+        };
+    </script>
+      <script type="text/javascript" src="../_static/jquery.js"></script>
+      <script type="text/javascript" src="../_static/underscore.js"></script>
+      <script type="text/javascript" src="../_static/doctools.js"></script>
+      <script type="text/javascript" src="../_static/translations.js"></script>
+      <script type="text/javascript" src="https://cdn.bootcss.com/mathjax/2.7.0/MathJax.js"></script>
+    <script type="text/javascript" src="../_static/js/theme.js"></script>
+  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
+  <script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script>
+  <script src="../_static/js/paddle_doc_init.js"></script> 
+</body>
+</html>
\ No newline at end of file
--- a/develop/doc_cn/objects.inv
+++ b/develop/doc_cn/objects.inv
--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js