Merge pull request #9633 from weixing02/img

Upload fluid image sources to github

Merge pull request #9633 from weixing02/img
Upload fluid image sources to github
232b6fc6 · Tao Luo · GitHub · bc8f4360 · d988b9a9 · 232b6fc6
80 changed file
--- a/doc/fluid/design/algorithm/parameter_average.md
+++ b/doc/fluid/design/algorithm/parameter_average.md
@@ -7,7 +7,9 @@ Polyak and Juditsky (1992) showed that the test performance of simple average of
 Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for <img src="./images/theta_star.gif"/><br/> . The averaging is done as follows:
-![](./images/asgd.gif)
+<p align="center">
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/asgd.gif"><br />
+</p>
 We propose averaging for any optimizer similar to how ASGD performs it, as mentioned above.

--- a/doc/fluid/design/concurrent/channel.md
+++ b/doc/fluid/design/concurrent/channel.md
@@ -114,13 +114,13 @@ current thread under two conditions:
 #### Channel Send
 <p align="center">
-<img src="./images/channel_send.png"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/channel_send.png"/><br/>
 </p>
 #### Channel Receive
 <p align="center">
-<img src="./images/channel_recv.png"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/channel_recv.png"/><br/>
 </p>
 ## Limitations and Considerations

--- a/doc/fluid/design/concurrent/select_op.md
+++ b/doc/fluid/design/concurrent/select_op.md
@@ -254,7 +254,7 @@ only one case will be executed.
 ### select_op flow
 <p align="center">
-<img src="./images/select_op_workflow.png"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/select_op_workflow.png"/><br/>
 </p>
 The select algorithm is inspired by golang's select routine.  Please refer to

--- a/doc/fluid/design/dist_train/distributed_architecture.md
+++ b/doc/fluid/design/dist_train/distributed_architecture.md
@@ -40,11 +40,11 @@ computation is only specified in Python code which sits outside of PaddlePaddle,
 Similar to how a compiler uses an intermediate representation (IR) so that the programmer does not need to manually optimize their code for most of the cases, we can have an intermediate representation in PaddlePaddle as well. The compiler optimizes the IR as follows:
-<img src="src/compiler.png"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/compiler.png"/>
 PaddlePaddle can support model parallelism by converting the IR so that the user no longer needs to manually perform the computation and operations in the Python component:
-<img src="src/paddle-compile.png"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/paddle-compile.png"/>
 The IR for PaddlePaddle after refactoring is called a `Block`, it specifies the computation dependency graph and the variables used in the computation.
@@ -60,7 +60,7 @@ For a detailed explanation, refer to this document -
 The revamped distributed training architecture can address the above discussed limitations. Below is the illustration of how it does so:
-<img src="src/distributed_architecture.png"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/distributed_architecture.png"/>
 The major components are: *Python API*, *Distribute Transpiler* and *Remote Executor*.
@@ -152,7 +152,7 @@ for data in train_reader():
 `JobDesc` object describe the distributed job resource specification to run on
 Cluster environment.
-<img src="src/remote_executor.png" width="500" align="center" />
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/remote_executor.png" width="500" align="center" />
 `RemoteExecutor.run` sends the `ProgramDesc` and
 [TrainingJob](https://github.com/PaddlePaddle/cloud/blob/unreleased-tpr/doc/autoscale/README.md#training-job-resource)
@@ -171,7 +171,7 @@ In the future, a more general placement algorithm should be implemented, which m
 The local training architecture will be the same as the distributed training architecture, the difference is that everything runs locally, and there is just one PaddlePaddle runtime:
-<img src="src/local_architecture.png"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/local_architecture.png"/>
 ### Training Data

--- a/doc/fluid/design/dist_train/multi_cpu.md
+++ b/doc/fluid/design/dist_train/multi_cpu.md
@@ -8,11 +8,11 @@ Op graph to a multi-CPU Op graph, and run `ParallelDo` Op to run the graph.
 ## Transpiler
-<img src="src/multi-threads/single-thread@3x.png" width="300">
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/single-thread@3x.png" width="300">
 After converted:
-<img src="src/multi-threads/multi-threads@3x.png" width="1000">
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/multi-threads@3x.png" width="1000">
 ## Implement

--- a/doc/fluid/design/dist_train/parameter_server.md
+++ b/doc/fluid/design/dist_train/parameter_server.md
@@ -41,11 +41,11 @@ We will need these OPs: *Send*, *Recv*, *Enqueue*, *Dequeue*.
 Below is an example of converting the user defined graph to the
 subgraphs for the trainer and the parameter server:
-<img src="src/local-graph.png" width="300"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/local-graph.png" width="300"/>
 After converting:
-<img src="src/dist-graph.png" width="700"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/dist-graph.png" width="700"/>
 1. The parameter variable W and its optimizer program are placed on the parameter server.
 1. Operators are added to the program.
@@ -69,7 +69,7 @@ In Fluid, we introduce [SelectedRows](../selected_rows.md) to represent a list o
 non-zero gradient data. So when we do parameter optimization both locally and remotely,
 we only need to send those non-zero rows to the optimizer operators:
-<img src="src/sparse_update.png" width="700" />
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/sparse_update.png" width="700" />
 ### Benefits

--- a/doc/fluid/design/dynamic_rnn/rnn.md
+++ b/doc/fluid/design/dynamic_rnn/rnn.md
@@ -5,7 +5,7 @@ This document describes the RNN (Recurrent Neural Network) operator and how it i
 ## RNN Algorithm Implementation
 <p align="center">
-<img src="./rnn.jpg"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/rnn.jpg"/>
 </p>
 The above diagram shows an RNN unrolled into a full network.
@@ -22,7 +22,7 @@ There are several important concepts here:
 There could be local variables defined in each step-net.  PaddlePaddle runtime realizes these variables in *step-scopes* which are created for each step.
 <p align="center">
-<img src="./rnn.png"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/rnn.png"/><br/>
 Figure 2 illustrates the RNN's data flow
 </p>
@@ -93,7 +93,7 @@ For example, we could have a 2-level RNN, where the top level corresponds to par
 The following figure illustrates feeding in text into the lower level, one sentence at a step, and the feeding in step outputs to the top level. The final top level output is about the whole text.
 <p align="center">
-<img src="./2_level_rnn.png"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/2_level_rnn.png"/>
 </p>
 ```python
@@ -149,5 +149,5 @@ If the `output_all_steps` is set to False, it will only output the final time st
 <p align="center">
-<img src="./rnn_2level_data.png"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/rnn_2level_data.png"/>
 </p>
--- a/doc/fluid/design/modules/batch_norm_op.md
+++ b/doc/fluid/design/modules/batch_norm_op.md
@@ -66,7 +66,7 @@ As most C++ operators do, `batch_norm_op` is defined by inputs, outputs, attribu
 The following graph showes the training computational process of `batch_norm_op`:
-<img src="../images/batch_norm_op_kernel.png" width="800"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/batch_norm_op_kernel.png" width="800"/>
 cudnn provides APIs to finish the whole series of computation, we can use them in our GPU kernel.
@@ -124,7 +124,7 @@ for pass_id in range(PASS_NUM):
 `is_infer` is an attribute. Once an operator is created, its attributes can not be changed. It suggests us that we shall maintain two `batch_norm_op` in the model, one's `is_infer` is `True`(we call it `infer_batch_norm_op`) and the other one's is `False`(we call it `train_batch_norm_op`). They share all parameters and variables, but be placed in two different branches. That is to say, if a network contains a `batch_norm_op`, it will fork into two branches, one go through `train_batch_norm_op` and the other one go through `infer_batch_norm_op`:
 <div align=center>
-<img src="../images/batch_norm_fork.png" width="500"/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/batch_norm_fork.png" width="500"/>
 </div>
 Just like what is shown in the above graph, the net forks before `batch_norm_op` and will never merge again. All the operators after `batch_norm_op` will duplicate.

--- a/doc/fluid/design/modules/regularization.md
+++ b/doc/fluid/design/modules/regularization.md
@@ -6,17 +6,17 @@ A central problem in machine learning is how to design an algorithm that will pe
 ### Parameter Norm Penalties
 Most common regularization approaches in deep learning are based on limiting the capacity of the models by adding a parameter norm penalty to the objective function `J`. This is given as follows:
-<img src="./images/loss_equation.png" align="center"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/loss_equation.png" align="center"/><br/>
 The parameter `alpha` is a hyperparameter that weights the relative contribution of the norm penalty term, `omega`, relative to the standard objective function `J`.
 The most commonly used norm penalties are the L2 norm penalty and the L1 norm penalty. These are given as follows:
 ##### L2 Regularization:
-<img src="./images/l2_regularization.png" align="center"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/l2_regularization.png" align="center"/><br/>
 ##### L1 Regularization
-<img src="./images/l1_regularization.png" align="center"/><br/>
+<img src=".https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/l1_regularization.png" align="center"/><br/>
 A much more detailed mathematical background of regularization can be found [here](http://www.deeplearningbook.org/contents/regularization.html).
@@ -40,11 +40,11 @@ The idea of building ops for regularization is in sync with the refactored Paddl
 Below is an example of a really simple feed forward neural network.
-<img src="./images/feed_forward.png" align="center"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/feed_forward.png" align="center"/><br/>
 The Python API will modify this computation graph to add regularization operators. The modified computation graph will look as follows:
-<img src="./images/feed_forward_regularized.png" align="center"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/feed_forward_regularized.png" align="center"/><br/>
 ### Python API implementation for Regularization
@@ -64,9 +64,3 @@ Since we want to create the regularization ops in a lazy manner, the regularizat
 #### High-level API
 In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers).
--- a/doc/fluid/design/network/deep_speech_2.md
+++ b/doc/fluid/design/network/deep_speech_2.md
@@ -116,7 +116,7 @@ The classical DS2 network contains 15 layers (from bottom to top):
 - **One** CTC-loss layer
 <div align="center">
-<img src="images/ds2_network.png" width=350><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/ds2_network.png" width=350><br/>
 Figure 1. Archetecture of Deep Speech 2 Network.
 </div>
@@ -208,7 +208,7 @@ TODO by Assignees
 ### Beam Search with CTC and LM
 <div align="center">
-<img src="images/beam_search.png" width=600><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/beam_search.png" width=600><br/>
 Figure 2. Algorithm for CTC Beam Search Decoder.
 </div>

--- a/doc/fluid/design/network/sequence_decoder.md
+++ b/doc/fluid/design/network/sequence_decoder.md
@@ -199,7 +199,7 @@ Packing the `selected_generation_scores` will get a `LoDTensor`, and each tail i
 ## LoD and shape changes during decoding
 <p align="center">
-  <img src="./images/LOD-and-shape-changes-during-decoding.jpg"/>
+  <img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/LOD-and-shape-changes-during-decoding.jpg"/>
 </p>
 According to the image above, the only phase that changes the LoD is beam search.

--- a/doc/fluid/design/others/gan_api.md
+++ b/doc/fluid/design/others/gan_api.md
@@ -7,14 +7,14 @@ It applies several important concepts in machine learning system design, includi
 In our GAN design, we wrap it as a user-friendly easily customized python API to design different models. We take the conditional DC-GAN (Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [https://arxiv.org/abs/1511.06434]) as an example due to its good performance on image generation.
 <p align="center">
-<img src="./test.dot.png" width = "35%" align="center"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/test.dot.png" width = "35%" align="center"/><br/>
 Figure 1. The overall running logic of GAN. The black solid arrows indicate the forward pass; the green dashed arrows indicate the backward pass of generator training; the red dashed arrows indicate the backward pass of the discriminator training. The BP pass of the green (red) arrow should only update the parameters in the green (red) boxes. The diamonds indicate the data providers. d\_loss and g\_loss marked in red and green are the two targets we would like to run.
 </p>
 The operators, layers and functions required/optional to build a GAN demo is summarized in https://github.com/PaddlePaddle/Paddle/issues/4563.
 <p align="center">
-<img src="./dcgan.png" width = "90%" align="center"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/dcgan.png" width = "90%" align="center"/><br/>
 Figure 2. Photo borrowed from the original DC-GAN paper.
 </p>

--- a/doc/fluid/dev/releasing_process.md
+++ b/doc/fluid/dev/releasing_process.md
@@ -37,7 +37,7 @@ PaddlePaddle每次发新的版本，遵循以下流程:
 可以在此页面的"Artifacts"下拉框中找到生成的3个二进制文件，分别对应CAPI，`cp27m`和`cp27mu`的版本。然后按照上述的方法
 使用`twine`工具上传即可。
-<img src="ci_build_whl.png">
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/ci_build_whl.png">
 * 注：CI环境使用 https://github.com/PaddlePaddle/buildtools 这里的DockerImage作为编译环境以支持更多的Linux
  发型版，如果需要手动编译，也可以使用这些镜像。这些镜像也可以从 https://hub.docker.com/r/paddlepaddle/paddle_manylinux_devel/tags/ 下载得到。

--- a/doc/fluid/howto/performance/profiler.md
+++ b/doc/fluid/howto/performance/profiler.md
@@ -23,7 +23,7 @@ But how to record the time for the mixed C++ and CUDA program?  There many C++ A
 The overall flow is shown as the following figure.
-<img src="./images/profiler.png" align="center"/><br/>
+<img src="https://github.com/PaddlePaddle/Paddle/tree/develop/doc/fluid/images/profiler.png" align="center"/><br/>
 ### Event

--- a/doc/fluid/images/2_level_rnn.dot
+++ b/doc/fluid/images/2_level_rnn.dot
+digraph G {
+  rnn [label="1st level RNN" shape=box]
+  subgraph cluster0 {
+    label = "time step 0"
+    sent0 [label="sentence"]
+    sent1 [label="sentence"]
+    rnn1 [label="2nd level RNN" shape=box]
+    sent0 -> rnn1
+    sent1 -> rnn1
+  }
+  subgraph cluster1 {
+    label = "time step 1"
+    sent2 [label="sentence"]
+    sent3 [label="sentence"]
+    rnn2 [label="2nd level RNN" shape=box]
+    sent2 -> rnn2
+    sent3 -> rnn2
+  }
+  subgraph cluster2 {
+    label = "time step 2"
+    sent4 [label="sentence"]
+    sent5 [label="sentence"]
+    rnn3 [label="2nd level RNN" shape=box]
+    sent4 -> rnn3
+    sent5 -> rnn3
+  }
+  para0 [label="paragraph info 0"]
+  para1 [label="paragraph info 1"]
+  para2 [label="paragraph info 2"]
+  rnn1 -> para0
+  rnn2 -> para1
+  rnn3 -> para2
+  para0 -> rnn
+  para1 -> rnn
+  para2 -> rnn
+  chapter [label="chapter info"]
+  rnn -> chapter
+}
--- a/doc/fluid/images/2_level_rnn.png
+++ b/doc/fluid/images/2_level_rnn.png
--- a/doc/fluid/images/LOD-and-shape-changes-during-decoding.jpg
+++ b/doc/fluid/images/LOD-and-shape-changes-during-decoding.jpg
--- a/doc/fluid/images/asgd.gif
+++ b/doc/fluid/images/asgd.gif
--- a/doc/fluid/images/batch_norm_fork.dot
+++ b/doc/fluid/images/batch_norm_fork.dot
+digraph ImageBatchNormForkGragh {
+  subgraph cluster_before {
+    Prev [label="...", shape=plaintext];
+    Rnn [label="rnn_op", shape=box];
+    BatchNorm [label="batch_norm_op", shape=box];
+    Fc [label="fc_op", shape=box];
+    After [label="...", shape=plaintext];
+    Prev -> Rnn -> BatchNorm -> Fc -> After;
+    label="original";
+  }
+  subgraph cluster_after {
+    Prev2 [label="...", shape=plaintext];
+    Rnn2 [label="rnn_op", shape=box];
+    BatchNorm2_1 [label="train_batch_norm_op", shape=box];
+    BatchNorm2_2 [label="infer_batch_norm_op", shape=box];
+    Fc2_1 [label="fc_op", shape=box];
+    Fc2_2 [label="fc_op", shape=box];
+    After2_1 [label="...", shape=plaintext];
+    After2_2 [label="...", shape=plaintext];
+    Prev2 -> Rnn2 -> BatchNorm2_1 -> Fc2_1 -> After2_1;
+    Rnn2 -> BatchNorm2_2 ->Fc2_2 ->After2_2
+    label="forked";
+  }
+}
--- a/doc/fluid/images/batch_norm_fork.png
+++ b/doc/fluid/images/batch_norm_fork.png
--- a/doc/fluid/images/batch_norm_op_kernel.png
+++ b/doc/fluid/images/batch_norm_op_kernel.png
--- a/doc/fluid/images/beam_search.png
+++ b/doc/fluid/images/beam_search.png
--- a/doc/fluid/images/ci_build_whl.png
+++ b/doc/fluid/images/ci_build_whl.png
--- a/doc/fluid/images/compiler.graffle
+++ b/doc/fluid/images/compiler.graffle
--- a/doc/fluid/images/compiler.png
+++ b/doc/fluid/images/compiler.png
--- a/doc/fluid/images/control_flow_graph.png
+++ b/doc/fluid/images/control_flow_graph.png
--- a/doc/fluid/images/dataflow_equations.png
+++ b/doc/fluid/images/dataflow_equations.png
--- a/doc/fluid/images/dcgan.png
+++ b/doc/fluid/images/dcgan.png
--- a/doc/fluid/images/deep_learning.png
+++ b/doc/fluid/images/deep_learning.png
--- a/doc/fluid/images/dist-graph.graffle
+++ b/doc/fluid/images/dist-graph.graffle
--- a/doc/fluid/images/dist-graph.png
+++ b/doc/fluid/images/dist-graph.png
--- a/doc/fluid/images/distributed_architecture.graffle
+++ b/doc/fluid/images/distributed_architecture.graffle
--- a/doc/fluid/images/distributed_architecture.png
+++ b/doc/fluid/images/distributed_architecture.png
--- a/doc/fluid/images/ds2_network.png
+++ b/doc/fluid/images/ds2_network.png
--- a/doc/fluid/images/feed_forward.png
+++ b/doc/fluid/images/feed_forward.png
--- a/doc/fluid/images/feed_forward_regularized.png
+++ b/doc/fluid/images/feed_forward_regularized.png
--- a/doc/fluid/images/fluid-compiler.graffle
+++ b/doc/fluid/images/fluid-compiler.graffle
--- a/doc/fluid/images/fluid-compiler.png
+++ b/doc/fluid/images/fluid-compiler.png
--- a/doc/fluid/images/graph_construction_example.bash
+++ b/doc/fluid/images/graph_construction_example.bash
+cat ./graph_construction_example.dot | \
+    sed 's/color=red/color=red, style=invis/g' | \
+    sed 's/color=green/color=green, style=invis/g' | \
+    dot -Tpng > graph_construction_example_forward_only.png
+cat ./graph_construction_example.dot | \
+    sed 's/color=green/color=green, style=invis/g' | \
+    dot -Tpng > graph_construction_example_forward_backward.png
+cat ./graph_construction_example.dot | \
+    dot -Tpng > graph_construction_example_all.png
--- a/doc/fluid/images/graph_construction_example.dot
+++ b/doc/fluid/images/graph_construction_example.dot
+digraph ImageClassificationGraph {
+        ///////// The forward part /////////
+        FeedX [label="Feed", color=blue, shape=box];
+        FeedY [label="Feed", color=blue, shape=box];
+        InitW [label="Init", color=blue, shape=diamond];
+        Initb [label="Init", color=blue, shape=diamond];
+        FC [label="FC", color=blue, shape=box];
+        MSE [label="MSE", color=blue, shape=box];
+        x [label="x", color=blue, shape=oval];
+        l [label="l", color=blue, shape=oval];
+        y [label="y", color=blue, shape=oval];
+        W [label="W", color=blue, shape=doublecircle];
+        b [label="b", color=blue, shape=doublecircle];
+        cost [label="cost", color=blue, shape=oval];
+        FeedX -> x -> FC -> y -> MSE -> cost [color=blue];
+        FeedY -> l [color=blue];
+        InitW -> W [color=blue];
+        Initb -> b [color=blue];
+        W -> FC [color=blue];
+        b -> FC [color=blue];
+        l -> MSE [color=blue];
+        ////////// The backward part /////////
+        MSE_Grad [label="MSE_grad", color=red, shape=box];
+        FC_Grad [label="FC_grad", color=red, shape=box];
+        d_cost [label="d cost", color=red, shape=oval];
+        d_y [label="d y", color=red, shape=oval];
+        d_b [label="d b", color=red, shape=oval];
+        d_W [label="d W", color=red, shape=oval];
+        cost -> MSE_Grad [color=red];
+        d_cost -> MSE_Grad [color=red];
+        l -> MSE_Grad [color=red];
+        y -> MSE_Grad -> d_y [color=red];
+        x -> FC_Grad [color=red];
+        y -> FC_Grad [color=red];
+        d_y -> FC_Grad [color=red];
+        W -> FC_Grad -> d_W [color=red];
+        b -> FC_Grad -> d_b [color=red];
+        ////////// The optimizaiton part //////////
+        OPT_W [label="SGD", color=green, shape=box];
+        OPT_b [label="SGD", color=green, shape=box];
+        W -> OPT_W [color=green];
+        b -> OPT_b [color=green];
+        d_W -> OPT_W -> W [color=green];
+        d_b -> OPT_b -> b [color=green];
+        ////////// Groupings //////////
+        subgraph clusterMSE {
+                style=invis;
+                MSE;
+                MSE_Grad;
+        }
+        subgraph clusterFC {
+                style=invis;
+                FC;
+                FC_Grad;
+        }
+}
--- a/doc/fluid/images/graph_construction_example_all.png
+++ b/doc/fluid/images/graph_construction_example_all.png
--- a/doc/fluid/images/graph_construction_example_forward_backward.png
+++ b/doc/fluid/images/graph_construction_example_forward_backward.png
--- a/doc/fluid/images/graph_construction_example_forward_only.png
+++ b/doc/fluid/images/graph_construction_example_forward_only.png
--- a/doc/fluid/images/l1_regularization.png
+++ b/doc/fluid/images/l1_regularization.png
--- a/doc/fluid/images/l2_regularization.png
+++ b/doc/fluid/images/l2_regularization.png
--- a/doc/fluid/images/local-graph.graffle
+++ b/doc/fluid/images/local-graph.graffle
--- a/doc/fluid/images/local-graph.png
+++ b/doc/fluid/images/local-graph.png
--- a/doc/fluid/images/local_architecture.graffle
+++ b/doc/fluid/images/local_architecture.graffle
--- a/doc/fluid/images/local_architecture.png
+++ b/doc/fluid/images/local_architecture.png
--- a/doc/fluid/images/lookup_table.png
+++ b/doc/fluid/images/lookup_table.png
--- a/doc/fluid/images/lookup_table_training.png
+++ b/doc/fluid/images/lookup_table_training.png
--- a/doc/fluid/images/loss_equation.png
+++ b/doc/fluid/images/loss_equation.png
--- a/doc/fluid/images/multi-threads.graffle
+++ b/doc/fluid/images/multi-threads.graffle
--- a/doc/fluid/images/multi-threads@3x.png
+++ b/doc/fluid/images/multi-threads@3x.png
--- a/doc/fluid/images/multigpu_allreduce.graffle
+++ b/doc/fluid/images/multigpu_allreduce.graffle
--- a/doc/fluid/images/multigpu_allreduce.png
+++ b/doc/fluid/images/multigpu_allreduce.png
--- a/doc/fluid/images/multigpu_before_convert.graffle
+++ b/doc/fluid/images/multigpu_before_convert.graffle
--- a/doc/fluid/images/multigpu_before_convert.png
+++ b/doc/fluid/images/multigpu_before_convert.png
--- a/doc/fluid/images/multiple_reader.png
+++ b/doc/fluid/images/multiple_reader.png
--- a/doc/fluid/images/paddle-compile.graffle
+++ b/doc/fluid/images/paddle-compile.graffle
--- a/doc/fluid/images/paddle-compile.png
+++ b/doc/fluid/images/paddle-compile.png
--- a/doc/fluid/images/pprof_1.png
+++ b/doc/fluid/images/pprof_1.png
--- a/doc/fluid/images/pprof_2.png
+++ b/doc/fluid/images/pprof_2.png
--- a/doc/fluid/images/profiler.png
+++ b/doc/fluid/images/profiler.png
--- a/doc/fluid/images/readers.png
+++ b/doc/fluid/images/readers.png
--- a/doc/fluid/images/remote_executor.graffle
+++ b/doc/fluid/images/remote_executor.graffle
--- a/doc/fluid/images/remote_executor.png
+++ b/doc/fluid/images/remote_executor.png
--- a/doc/fluid/images/rnn.dot
+++ b/doc/fluid/images/rnn.dot
+digraph G {
+  label = "simple RNN implementation" 
+  ranksep=2;
+  //graph [nodesep=1, ranksep=1];
+  node[nodesep=1]
+  subgraph cluster0 {
+    label = "global scope"
+    rankdir = TB
+    W
+    boot_memory
+    input
+    output
+  }
+  subgraph cluster1 {
+    label = "step-scope 0"
+    rankdir = TB
+    memory0[label="memory"]
+    prememory0[label="pre-memory"]
+    step_input0[label="step input"]
+    step_output0[label="step output"]
+  }
+  subgraph cluster2 {
+    label = "step-scope 1"
+    rankdir = TB
+    memory1[label="memory"]
+    prememory1[label="pre-memory"]
+    step_input1[label="step input"]
+    step_output1[label="step output"]
+  }
+  subgraph cluster3 {
+    label = "step-scope 2"
+    rankdir = TB
+    memory2[label="memory"]
+    prememory2[label="pre-memory"]
+    step_input2[label="step input"]
+    step_output2[label="step output"]
+  }
+  stepnet [shape=box]
+  stepnet0 [shape=box, style=dashed]
+  stepnet1 [shape=box, style=dashed]
+  stepnet2 [shape=box, style=dashed]
+  edge[color=blue]
+  boot_memory -> prememory0 [label="init" color="blue"]
+  memory0 -> prememory1  [label="copy/reference" color="blue"]
+  memory1 -> prememory2 [label="copy/reference" color="blue"]
+  edge[color=black]
+  W -> stepnet0[constraint=false, style=dashed]
+  W -> stepnet1[constraint=false, style=dashed]
+  W -> stepnet2[constraint=false, style=dashed]
+  memory0 -> stepnet0[style=dashed]
+  prememory0 -> stepnet0 -> step_output0[style=dashed]
+  memory1 -> stepnet1[style=dashed]
+  prememory1 -> stepnet1 -> step_output1[style=dashed]
+  memory2 -> stepnet2[style=dashed]
+  prememory2 -> stepnet2 -> step_output2[style=dashed]
+  input -> step_input0
+  input -> step_input1
+  input -> step_input2
+  step_input0 -> stepnet0 [style=dashed]
+  step_input1 -> stepnet1[style=dashed]
+  step_input2 -> stepnet2[style=dashed]
+  step_output0 -> output
+  step_output1 -> output
+  step_output2 -> output
+  stepnet0 -> stepnet[style=dashed]
+  stepnet1 -> stepnet[style=dashed]
+  stepnet2 -> stepnet[style=dashed]
+}
--- a/doc/fluid/images/rnn.jpg
+++ b/doc/fluid/images/rnn.jpg
--- a/doc/fluid/images/rnn.png
+++ b/doc/fluid/images/rnn.png
--- a/doc/fluid/images/rnn_2level_data.dot
+++ b/doc/fluid/images/rnn_2level_data.dot
+digraph G {
+  chapter [label="chapter"]
+  subgraph cluster0 {
+    label = "paragraph 0"
+    top_rnn0[label="top rnn step 0" shape=box]
+    p0 [label="paragraph 0"]
+    p1 [label="paragraph 1"]
+  }
+  subgraph cluster1{
+    label = "paragraph 1"
+    top_rnn1[label="top rnn step 1" shape=box]
+    p2 [label="paragraph 0"]
+    p3 [label="paragraph 1"]
+  }
+  subgraph cluster_p0 {
+    label = "sentence 0"
+    low_rnn0 [label="low rnn step 0" shape=box]
+    s00 [label="sentence 0"]
+    s01 [label="sentence 1"]
+    low_rnn0 -> s00
+    low_rnn0 -> s01
+  }
+  subgraph cluster_p1 {
+    label = "sentence 1"
+    low_rnn1 [label="low rnn step 1" shape=box]
+    s10 [label="sentence 0"]
+    s11 [label="sentence 1"]
+    low_rnn1 -> s10
+    low_rnn1 -> s11
+  }
+  subgraph cluster_p2 {
+    label = "sentence 1"
+    low_rnn2 [label="low rnn step 0" shape=box]
+    s20 [label="sentence 0"]
+    s21 [label="sentence 1"]
+    low_rnn2 -> s20
+    low_rnn2 -> s21
+  }
+  subgraph cluster_p3 {
+    label = "sentence 1"
+    low_rnn3 [label="low rnn step 1" shape=box]
+    s30 [label="sentence 0"]
+    s31 [label="sentence 1"]
+    low_rnn3 -> s30
+    low_rnn3 -> s31
+  }
+  chapter -> top_rnn0
+  chapter -> top_rnn1
+  top_rnn0 -> p0
+  top_rnn0 -> p1
+  top_rnn1 -> p2
+  top_rnn1 -> p3
+  p0 -> low_rnn0
+  p1 -> low_rnn1
+  p2 -> low_rnn2
+  p3 -> low_rnn3
+}
--- a/doc/fluid/images/rnn_2level_data.png
+++ b/doc/fluid/images/rnn_2level_data.png
--- a/doc/fluid/images/single-thread@3x.png
+++ b/doc/fluid/images/single-thread@3x.png
--- a/doc/fluid/images/sparse_update.graffle
+++ b/doc/fluid/images/sparse_update.graffle
--- a/doc/fluid/images/sparse_update.png
+++ b/doc/fluid/images/sparse_update.png
--- a/doc/fluid/images/test.dot
+++ b/doc/fluid/images/test.dot
+digraph Test {
+    z -> generator -> G_img;
+    G_img -> discriminator -> D_f -> d_loss_f;
+    label0 -> d_loss_f -> d_loss;
+    img -> discriminator -> D_t -> d_loss_t;
+    label1 -> d_loss_t -> d_loss;
+    d_loss -> d_loss_t[color=red, style=dashed];
+    d_loss -> d_loss_f[color=red, style=dashed];
+    d_loss_t -> D_t[color=red, style=dashed];
+    d_loss_f -> D_f[color=red, style=dashed];
+    D_t -> discriminator[color=red, style=dashed];
+    D_f -> discriminator[color=red, style=dashed];
+    D_f -> g_loss;
+    label2 -> g_loss;
+    g_loss -> D_f[color=green, style=dashed];
+    D_f -> discriminator[color=green, style=dashed];
+    discriminator -> G_img[color=green, style=dashed];
+    G_img -> generator[color=green, style=dashed];
+    discriminator [color=red, shape=box];
+    generator [color=green, shape=box];
+    z [shape=diamond];
+    img [shape=diamond];
+    label0 [shape=diamond];
+    label1 [shape=diamond];
+    label2 [shape=diamond];
+    d_loss [color=red];
+    g_loss [color=green];
+}
--- a/doc/fluid/images/test.dot.png
+++ b/doc/fluid/images/test.dot.png
--- a/doc/fluid/images/theta_star.gif
+++ b/doc/fluid/images/theta_star.gif
--- a/doc/fluid/images/timeline.jpeg
+++ b/doc/fluid/images/timeline.jpeg
--- a/doc/fluid/images/tracing.jpeg
+++ b/doc/fluid/images/tracing.jpeg