merge conflict

84c01a53 · JiabinYang · 7dba9ffb · 8e019748 · 84c01a53 · 84c01a53
79 changed file
--- a/.gitignore
+++ b/.gitignore
 .vscode/
+/doc/fluid/menu.zh.json
+/doc/fluid/menu.en.json
--- a/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice.rst
+++ b/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice.rst
 .. _api_guide_cpu_training_best_practice:

-##################
+####################
 分布式CPU训练最佳实践
-##################
+####################

 提高CPU分布式训练的训练速度，主要要从两个方面来考虑：
 1）提高训练速度，主要是提高CPU的使用率；2）提高通信速度，主要是减少通信传输的数据量。
@@ -46,7 +46,7 @@ API详细使用方法参考 :ref:`cn_api_fluid_ParallelExecutor` ，简单实例
 提高通信速度
 ==========

-要减少通信数据量，提高通信速度，主要是使用稀疏更新 ，目前支持 `稀疏更新 <../layers/sparse_update.html>`_  的主要是  :ref:`cn_api_fluid_layers_embedding` 。
+要减少通信数据量，提高通信速度，主要是使用稀疏更新 ，目前支持  :ref:`api_guide_sparse_update` 的主要是  :ref:`cn_api_fluid_layers_embedding` 。

 .. code-block:: python


--- a/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice_en.rst
--- a/doc/fluid/advanced_usage/best_practice/dist_training_gpu.rst
+++ b/doc/fluid/advanced_usage/best_practice/dist_training_gpu.rst
 .. _best_practice_dist_training_gpu:

-性能优化最佳实践之：GPU分布式训练
-============================
+#####################
+分布式GPU训练最佳实践
+#####################

 开始优化您的GPU分布式训练任务
 -------------------------
@@ -170,7 +171,7 @@ PaddlePaddle Fluid使用“线程池” [#]_ 模型调度并执行Op，Op在启
 数据读取的优化在GPU训练中至关重要，尤其在不断增加batch_size提升吞吐时，计算对reader性能会有更高对要求，
 优化reader性能需要考虑的点包括：

-1. 使用 :code:`pyreader` 
+1. 使用 :code:`pyreader`
   参考 `这里 <../../user_guides/howto/prepare_data/use_py_reader.html>`_
   使用pyreader，并开启 :code:`use_double_buffer`
 2. reader返回uint8类型数据
@@ -229,7 +230,7 @@ PaddlePaddle Fluid使用“线程池” [#]_ 模型调度并执行Op，Op在启
              for batch_id in (iters_per_pass):
                  exe.run()
          pyreader.reset()
-   
+

 使用混合精度训练
 ++++++++++++++

--- a/doc/fluid/advanced_usage/best_practice/index_cn.rst
+++ b/doc/fluid/advanced_usage/best_practice/index_cn.rst
+#########
+最佳实践
+#########
+
+..  toctree::
+    :maxdepth: 1
+
+    cpu_train_best_practice.rst
+    dist_training_gpu.rst
--- a/doc/fluid/advanced_usage/best_practice/index_en.rst
+++ b/doc/fluid/advanced_usage/best_practice/index_en.rst
+###############
+Best Practice
+###############
+
+..  toctree::
+    :hidden:
+
+    cpu_train_best_practice_en.rst
--- a/doc/fluid/advanced_usage/deploy/inference/paddle_gpu_benchmark.md
+++ b/doc/fluid/advanced_usage/deploy/inference/paddle_gpu_benchmark.md
@@ -6,15 +6,15 @@
 - 测试模型 ResNet50，MobileNet，ResNet101, Inception V3.

 ## 测试对象
-**PaddlePaddle, Pytorch, Tensorflow**   
+**PaddlePaddle, Pytorch, Tensorflow**

- 在测试中，PaddlePaddle使用子图优化的方式集成了TensorRT, 模型[地址](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)。
+- 在测试中，PaddlePaddle使用子图优化的方式集成了TensorRT, 模型[地址](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models)。
 - Pytorch使用了原生的实现, 模型[地址1](https://github.com/pytorch/vision/tree/master/torchvision/models)、[地址2](https://github.com/marvis/pytorch-mobilenet)。
 - 对TensorFlow测试包括了对TF的原生的测试，和对TF—TRT的测试，**对TF—TRT的测试并没有达到预期的效果，后期会对其进行补充**， 模型[地址](https://github.com/tensorflow/models)。


-### ResNet50 
- 
+### ResNet50
+
 |batch_size|PaddlePaddle(ms)|Pytorch(ms)|TensorFlow(ms)|
 |---|---|---|---|
 |1|4.64117 |16.3|10.878|

--- a/doc/fluid/advanced_usage/design_idea/fluid_design_idea.md
+++ b/doc/fluid/advanced_usage/design_idea/fluid_design_idea.md
@@ -21,28 +21,28 @@ Fluid使用一种编译器式的执行流程，分为编译时和运行时两个
 </p>

 1. 编译时，用户编写一段python程序，通过调用 Fluid 提供的算子，向一段 Program 中添加变量（Tensor）以及对变量的操作（Operators 或者 Layers）。用户只需要描述核心的前向计算，不需要关心反向计算、分布式下以及异构设备下如何计算。
- 
+
 2. 原始的 Program 在平台内部转换为中间描述语言： `ProgramDesc`。
- 
+
 3. 编译期最重要的一个功能模块是 `Transpiler`。`Transpiler` 接受一段 `ProgramDesc` ，输出一段变化后的 `ProgramDesc` ，作为后端 `Executor` 最终需要执行的 Fluid Program

 4. 后端 Executor 接受 Transpiler 输出的这段 Program ，依次执行其中的 Operator（可以类比为程序语言中的指令），在执行过程中会为 Operator 创建所需的输入输出并进行管理。
-	


- 
-## 2. Program设计思想 
+
+
+## 2. Program设计思想

 用户完成网络定义后，一段 Fluid 程序中通常存在 2 段 Program：

  1. fluid.default_startup_program：定义了创建模型参数，输入输出，以及模型中可学习参数的初始化等各种操作
-    
+
    default_startup_program 可以由框架自动生成，使用时无需显示地创建
-    
+
    如果调用修改了参数的默认初始化方式，框架会自动的将相关的修改加入default_startup_program
-  
+
  2. fluid.default_main_program ：定义了神经网络模型，前向反向计算，以及优化算法对网络中可学习参数的更新
-    
+
    使用Fluid的核心就是构建起 default_main_program


@@ -53,7 +53,7 @@ Fluid 的 Program 的基本结构是一些嵌套 blocks，形式上类似一段
 blocks中包含：

 -  本地变量的定义
-  一系列的operator 
+-  一系列的operator

 block的概念与通用程序一致，例如在下列这段C++代码中包含三个block：

@@ -95,7 +95,7 @@ prob = ie()
 ```
 ### BlockDesc and ProgramDesc

-用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存，所有的`protobub`信息被定义在`framework.proto`中，在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)。
+用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存，所有的`protobuf`信息被定义在`framework.proto`中，在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)。

 `BlockDesc`中包含本地变量的定义`vars`，和一系列的operator`ops`：

@@ -172,12 +172,12 @@ class Executor{
 				Scope* scope,
 				int block_id) {
 			auto& block = pdesc.Block(block_id);
-			
+
 			//创建所有变量
 			for (auto& var : block.AllVars())
 				scope->Var(Var->Name());
 			}
-			
+
 			//创建OP并按顺序执行
 			for (auto& op_desc : block.AllOps()){
 				auto op = CreateOp(*op_desc);
@@ -300,7 +300,7 @@ BlockDesc中包含定义的 vars 和一系列的 ops，以输入x为例，python
 x = fluid.layers.data(name="x",shape=[1],dtype='float32')
 ```
 在BlockDesc中，变量x被描述为：
-``` 
+```
 vars {
    name: "x"
    type {
@@ -359,5 +359,5 @@ Fluid使用Executor.run来运行一段Program。
       [6.099215 ]], dtype=float32), array([1.6935859], dtype=float32)]
 ```

-至此您已经了解了Fluid 内部的执行流程的核心概念，更多框架使用细节请参考[使用指南](../../user_guides/index.html)相关内容，[模型库](../../user_guides/models/index_cn.html
+至此您已经了解了Fluid 内部的执行流程的核心概念，更多框架使用细节请参考[使用指南](../../user_guides/index_cn.html)相关内容，[模型库](../../user_guides/models/index_cn.html
 )中也为您提供了丰富的模型示例以供参考。
--- a/doc/fluid/advanced_usage/development/contribute_to_paddle/submit_pr_guide.md
+++ b/doc/fluid/advanced_usage/development/contribute_to_paddle/submit_pr_guide.md
-# Github提交PR指南
+# 提交PR注意事项

 ## 建立 Issue 并完成 Pull Request


--- a/doc/fluid/advanced_usage/development/new_op/index_cn.rst
+++ b/doc/fluid/advanced_usage/development/new_op/index_cn.rst
 #############
-新增Operator
+新增OP
 #############

 本部分将指导您如何新增Operator，也包括一些必要的注意事项

--- a/doc/fluid/advanced_usage/development/new_op/new_op.md
+++ b/doc/fluid/advanced_usage/development/new_op/new_op.md
-# 如何写新的op
+# 如何写新的OP

 ## 概念简介


--- a/doc/fluid/advanced_usage/development/new_op/op_notes.md
+++ b/doc/fluid/advanced_usage/development/new_op/op_notes.md
-# op相关注意事项
+# OP相关注意事项

 ## Fluid中Op的构建逻辑
 ### 1.Fluid中Op的构建逻辑

--- a/doc/fluid/advanced_usage/development/new_op/op_notes_en.md
+++ b/doc/fluid/advanced_usage/development/new_op/op_notes_en.md
@@ -11,7 +11,7 @@ The Fluid framework is designed to run on a variety of devices and third-party l
 Operator inheritance diagram:
 ![op_inheritance_relation_diagram](../../pics/op_inheritance_relation_diagram.png)

-For further information, please refer to: [multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices) , [scope](https://github.com/PaddlePaddle/FluidDoc/Blob/develop/doc/fluid/design/concepts/scope.md) , [Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/release/1.2/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md)
+For further information, please refer to: [multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices) , [scope](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/scope.md) , [Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/release/1.2/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md)

 ### 2.Op's registration logic
 The registration entries for each Operator include:

--- a/doc/fluid/advanced_usage/index.rst
+++ b/doc/fluid/advanced_usage/index.rst
@@ -2,17 +2,19 @@
 进阶使用
 ########

-..  todo::
-
 如果您非常熟悉 Fluid，期望获得更高效的模型或者定义自己的Operator，请阅读：

    - `Fluid 设计思想 <../advanced_usage/design_idea/fluid_design_idea.html>`_：介绍 Fluid 底层的设计思想，帮助您更好的理解框架运作过程

-	- `预测部署 <../advanced_usage/deploy/index_cn.html>`_ ：介绍如何应用训练好的模型进行预测
+    - `预测部署 <../advanced_usage/deploy/index_cn.html>`_ ：介绍如何应用训练好的模型进行预测
+
+    - `新增OP <../advanced_usage/development/new_op/index_cn.html>`_ ：介绍新增operator的方法及注意事项
+
+    - `性能调优 <../advanced_usage/development/profiling/index_cn.html>`_ ：介绍 Fluid 使用过程中的调优方法

-	- `新增operator <../advanced_usage/development/new_op/index_cn.html>`_ ：介绍新增operator的方法及注意事项
+    - `最佳实践 <../advanced_usage/best_practice/index_cn.html>`_

-	- `性能调优 <../advanced_usage/development/profiling/index_cn.html>`_ ：介绍 Fluid 使用过程中的调优方法
+    - `模型压缩工具库 <../advanced_usage/paddle_slim/paddle_slim.html>`_

 非常欢迎您为我们的开源社区做出贡献，关于如何贡献您的代码或文档，请阅读：

@@ -27,7 +29,7 @@
    deploy/index_cn.rst
    development/new_op/index_cn.rst
    development/profiling/index_cn.rst
+    best_practice/index_cn.rst
+    paddle_slim/paddle_slim.md
    development/contribute_to_paddle/index_cn.rst
    development/write_docs_cn.md
-    best_practice/dist_training_gpu.rst
-    paddle_slim/paddle_slim.md 
--- a/doc/fluid/advanced_usage/index_en.rst
+++ b/doc/fluid/advanced_usage/index_en.rst
@@ -29,3 +29,4 @@ We gladly encourage your contributions of codes and documentation to our communi
    development/profiling/index_en.rst
    development/contribute_to_paddle/index_en.rst
    development/write_docs_en.md
+    best_practice/index_en.rst
--- a/doc/fluid/api_cn/data/data_reader_cn.rst
+++ b/doc/fluid/api_cn/data/data_reader_cn.rst
@@ -26,7 +26,7 @@ DataFeeder将reader返回的数据转换为可以输入Executor和ParallelExecut
 	result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])


-如果您想在使用多个GPU训练模型时预先将数据单独输入GPU端，可以使用decorate_reader函数。 
+如果您想在使用多个GPU训练模型时预先将数据单独输入GPU端，可以使用decorate_reader函数。


 **代码示例**
@@ -90,7 +90,7 @@ DataFeeder将reader返回的数据转换为可以输入Executor和ParallelExecut



-.. note:: 
+.. note::

 	设备数量和mini-batches数量必须一致。

@@ -121,7 +121,7 @@ Reader
 	- reader是一个读取数据（从文件、网络、随机数生成器等）并生成数据项的函数。
 	- reader creator是返回reader函数的函数。
 	- reader decorator是一个函数，它接受一个或多个reader，并返回一个reader。
-	- batch reader是一个函数，它读取数据（从reader、文件、网络、随机数生成器等）并生成一批数据项。 
+	- batch reader是一个函数，它读取数据（从reader、文件、网络、随机数生成器等）并生成一批数据项。


 Data Reader Interface
@@ -133,10 +133,10 @@ Data Reader Interface

 	iterable = data_reader()

-从iterable生成的元素应该是单个数据条目，而不是mini batch。数据输入可以是单个项目，也可以是项目的元组，但应为 `支持的类型 <http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types>`_ （如, numpy 1d array of float32, int, list of int）
+从iterable生成的元素应该是单个数据条目，而不是mini batch。数据输入可以是单个项目，也可以是项目的元组，但应为 `支持的类型 <../../user_guides/howto/prepare_data/feeding_data.html#fluid>`_ （如, numpy 1d array of float32, int, list of int）


-单项目数据读取器创建者的示例实现： 
+单项目数据读取器创建者的示例实现：

 ..  code-block:: python

@@ -147,7 +147,7 @@ Data Reader Interface
 	return reader


-多项目数据读取器创建者的示例实现： 
+多项目数据读取器创建者的示例实现：

 ..  code-block:: python

@@ -194,11 +194,11 @@ Data Reader Interface

 参数：
    - **readers** - 将被组合的多个读取器。
-    - **check_alignment** (bool) - 如果为True，将检查输入reader是否正确对齐。如果为False，将不检查对齐，将丢弃跟踪输出。默认值True。 
+    - **check_alignment** (bool) - 如果为True，将检查输入reader是否正确对齐。如果为False，将不检查对齐，将丢弃跟踪输出。默认值True。

 返回：新的数据读取器

-抛出异常： 	``ComposeNotAligned`` – reader的输出不一致。 当check_alignment设置为False，不会升高。 
+抛出异常： 	``ComposeNotAligned`` – reader的输出不一致。 当check_alignment设置为False，不会升高。



@@ -220,7 +220,7 @@ Data Reader Interface

 创建数据读取器，该reader的数据输出将被无序排列。

-由原始reader创建的迭代器的输出将被缓冲到shuffle缓冲区，然后进行打乱。打乱缓冲区的大小由参数buf_size决定。 
+由原始reader创建的迭代器的输出将被缓冲到shuffle缓冲区，然后进行打乱。打乱缓冲区的大小由参数buf_size决定。

 参数：
    - **reader** (callable)  – 输出会被打乱的原始reader
@@ -257,7 +257,7 @@ Data Reader Interface
 PipeReader通过流从一个命令中读取数据，将它的stdout放到管道缓冲区中，并将其重定向到解析器进行解析，然后根据需要的格式生成数据。


-您可以使用标准Linux命令或调用其他Program来读取数据，例如通过HDFS、CEPH、URL、AWS S3中读取： 
+您可以使用标准Linux命令或调用其他Program来读取数据，例如通过HDFS、CEPH、URL、AWS S3中读取：

 **代码示例**

@@ -340,7 +340,7 @@ Creator包包含一些简单的reader creator，可以在用户Program中使用

 .. py:function:: paddle.reader.creator.np_array(x)

-如果是numpy向量，则创建一个生成x个元素的读取器。或者，如果它是一个numpy矩阵，创建一个生成x行元素的读取器。或由最高维度索引的任何子超平面。 
+如果是numpy向量，则创建一个生成x个元素的读取器。或者，如果它是一个numpy矩阵，创建一个生成x行元素的读取器。或由最高维度索引的任何子超平面。

 参数：
    - **x** – 用于创建reader的numpy数组。
@@ -359,7 +359,7 @@ Creator包包含一些简单的reader creator，可以在用户Program中使用

 .. py:function::  paddle.reader.creator.recordio(paths, buf_size=100)

-从给定的recordio文件路径创建数据reader，用“，”分隔“，支持全局模式。 
+从给定的recordio文件路径创建数据reader，用“，”分隔“，支持全局模式。

 路径：recordio文件的路径，可以是字符串或字符串列表。


--- a/doc/fluid/api_cn/fluid_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn.rst
--- a/doc/fluid/api_cn/index_cn.rst
+++ b/doc/fluid/api_cn/index_cn.rst
@@ -5,12 +5,13 @@ API
 ..  toctree::
    :maxdepth: 1

-    ../api_guides/index.rst
+    ../api_guides/index_cn.rst
    fluid_cn.rst
    average_cn.rst
    backward_cn.rst
    clip_cn.rst
    data_feeder_cn.rst
+    dataset_cn.rst
    executor_cn.rst
    initializer_cn.rst
    io_cn.rst
@@ -21,6 +22,5 @@ API
    profiler_cn.rst
    regularizer_cn.rst
    transpiler_cn.rst
-    dataset_cn.rst
    data/dataset_cn.rst
    data/data_reader_cn.rst
--- a/doc/fluid/api_cn/io_cn.rst
+++ b/doc/fluid/api_cn/io_cn.rst
@@ -289,8 +289,6 @@ PyReader
  - **reader** (generator)  – 返回LoDTensor类型的批处理数据的Python生成器
  - **places** (None|list(CUDAPlace)|list(CPUPlace)) –  位置列表。当PyReader可迭代时必须被提供

-
-
 .. _cn_api_fluid_io_save_inference_model:

 save_inference_model
@@ -313,7 +311,9 @@ save_inference_model
  - **params_filename** (str|None) – 保存所有相关参数的文件名称。如果设置为None，则参数将保存在单独的文件中。
  - **export_for_deployment** (bool) – 如果为真，Program将被修改为只支持直接预测部署的Program。否则，将存储更多的信息，方便优化和再训练。目前只支持True。

-返回: None
+返回: 获取的变量名列表
+
+返回类型：target_var_name_list(list)

 抛出异常：
 - ``ValueError`` – 如果 ``feed_var_names`` 不是字符串列表
@@ -406,8 +406,9 @@ save_persistables
    exe = fluid.Executor(fluid.CPUPlace())
    param_path = "./my_paddle_model"
    prog = fluid.default_main_program()
+    # `prog` 可以是由用户自定义的program
    fluid.io.save_persistables(executor=exe, dirname=param_path,
-                               main_program=None)
+                               main_program=prog)
    
    


--- a/doc/fluid/api_cn/layers_cn.rst
+++ b/doc/fluid/api_cn/layers_cn.rst
--- a/doc/fluid/api_cn/profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn.rst
@@ -12,7 +12,7 @@ cuda_profiler
 .. py:function:: paddle.fluid.profiler.cuda_profiler(output_file, output_mode=None, config=None)


-CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行性能分析。分析结果将以键-值对格式或逗号分隔的格式写入output_file。用户可以通过output_mode参数设置输出模式，并通过配置参数设置计数器/选项。默认配置是[' gpustarttimestamp '， ' gpustarttimestamp '， ' gridsize3d '， ' threadblocksize '， ' streamid '， ' enableonstart 0 '， ' conckerneltrace ']。然后，用户可使用 `NVIDIA Visual Profiler <https://developer.nvidia.com/nvidia-visualprofiler>`_ 工具来加载这个输出文件以可视化结果。
+CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行性能分析。分析结果将以键-值对格式或逗号分隔的格式写入output_file。用户可以通过output_mode参数设置输出模式，并通过配置参数设置计数器/选项。默认配置是[' gpustarttimestamp '， ' gpustarttimestamp '， ' gridsize3d '， ' threadblocksize '， ' streamid '， ' enableonstart 0 '， ' conckerneltrace ']。然后，用户可使用 `NVIDIA Visual Profiler <https://developer.nvidia.com/nvidia-visual-profiler>`_ 工具来加载这个输出文件以可视化结果。


 参数:
@@ -28,7 +28,7 @@ CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行


 ..  code-block:: python
-  
+
    import paddle.fluid as fluid
    import paddle.fluid.profiler as profiler

@@ -46,7 +46,7 @@ CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行
        for i in range(epoc):
            input = np.random.random(dshape).astype('float32')
            exe.run(fluid.default_main_program(), feed={'data': input})
-            
+
    # 之后可以使用 NVIDIA Visual Profile 可视化结果


@@ -67,20 +67,20 @@ profiler
 profile interface 。与cuda_profiler不同，此profiler可用于分析CPU和GPU程序。默认情况下，它记录CPU和GPU kernel，如果想分析其他程序，可以参考教程来在c++代码中添加更多代码。


-如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md <https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md>`_ 
+如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `这里 <../advanced_usage/development/profiling/timeline_cn.html>`_

 参数:
  - **state** (string) –  profiling state, 取值为 'CPU' 或 'GPU',  profiler 使用 CPU timer 或GPU timer 进行 profiling. 虽然用户可能在开始时指定了执行位置(CPUPlace/CUDAPlace)，但是为了灵活性，profiler不会使用这个位置。
  - **sorted_key** (string) – 如果为None，prfile的结果将按照事件的第一次结束时间顺序打印。否则，结果将按标志排序。标志取值为"call"、"total"、"max"、"min" "ave"之一，根据调用着的数量进行排序。total表示按总执行时间排序，max 表示按最大执行时间排序。min 表示按最小执行时间排序。ave表示按平均执行时间排序。
  - **profile_path** (string) –  如果 state == 'All', 结果将写入文件 profile proto.
-  
+
 抛出异常：
  - ``ValueError`` – 如果state 取值不在 ['CPU', 'GPU', 'All']中. 如果 sorted_key 取值不在 ['calls', 'total', 'max', 'min', 'ave']
-  
+
 **代码示例**

 ..  code-block:: python
-    
+
    import paddle.fluid.profiler as profiler

    with profiler.profiler('All', 'total', '/tmp/profile') as prof:
@@ -110,7 +110,7 @@ reset_profiler
 **代码示例**

 ..  code-block:: python
-  
+
    import paddle.fluid.profiler as profiler
    with profiler.profiler(state, 'total', '/tmp/profile'):
    for iter in range(10):
@@ -133,10 +133,10 @@ start_profiler
 .. py:function:: paddle.fluid.profiler.start_profiler(state)

 激活使用 profiler， 用户可以使用 ``fluid.profiler.start_profiler`` 和 ``fluid.profiler.stop_profiler`` 插入代码
-不能使用 ``fluid.profiler.profiler`` 
+不能使用 ``fluid.profiler.profiler``


-如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md <https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md>`_ 
+如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `这里 <../advanced_usage/development/profiling/timeline_cn.html>`_

 参数:
  - **state** (string) – profiling state, 取值为 'CPU' 或 'GPU' 或 'All', 'CPU' 代表只分析 cpu. 'GPU' 代表只分析 GPU . 'All' 会产生 timeline.
@@ -147,7 +147,7 @@ start_profiler
 **代码示例**

 ..  code-block:: python
-    
+
    import paddle.fluid.profiler as profiler

    profiler.start_profiler('GPU')
@@ -174,12 +174,12 @@ stop_profiler
 .. py:function:: paddle.fluid.profiler.stop_profiler(sorted_key=None, profile_path='/tmp/profile')

 停止 profiler， 用户可以使用 ``fluid.profiler.start_profiler`` 和 ``fluid.profiler.stop_profiler`` 插入代码
-不能使用 ``fluid.profiler.profiler`` 
+不能使用 ``fluid.profiler.profiler``

 参数:
  - **sorted_key** (string) – 如果为None，prfile的结果将按照事件的第一次结束时间顺序打印。否则，结果将按标志排序。标志取值为"call"、"total"、"max"、"min" "ave"之一，根据调用着的数量进行排序。total表示按总执行时间排序，max 表示按最大执行时间排序。min 表示按最小执行时间排序。ave表示按平均执行时间排序。
  - **profile_path** (string) - 如果 state == 'All', 结果将写入文件 profile proto.
-  
+

 抛出异常:
  - ``ValueError`` – 如果state 取值不在 ['CPU', 'GPU', 'All']中
@@ -187,7 +187,7 @@ stop_profiler
 **代码示例**

 ..  code-block:: python
-    
+
    import paddle.fluid.profiler as profiler

    profiler.start_profiler('GPU')

--- a/doc/fluid/api_guides/X2Paddle/Caffe-Fluid.rst
+++ b/doc/fluid/api_guides/X2Paddle/Caffe-Fluid.rst
+.. _Caffe-Fluid:
+
+########################
+Caffe-Fluid常用层对应表
+########################
+
+本文档梳理了Caffe常用Layer与PaddlePaddle API对应关系和差异分析。根据文档对应关系，有Caffe使用经验的用户，可根据对应关系，快速熟悉PaddlePaddle的接口使用。  
+
+
+..  csv-table:: 
+    :header: "序号", "Caffe Layer", "Fluid接口", "备注"
+    :widths: 1, 8, 8, 3
+
+    "1",  "`AbsVal <http://caffe.berkeleyvision.org/tutorial/layers/absval.html>`_", ":ref:`cn_api_fluid_layers_abs`",  "功能一致"
+    "2",  "`Accuracy <http://caffe.berkeleyvision.org/tutorial/layers/accuracy.html>`_", ":ref:`cn_api_fluid_layers_accuracy`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Accuracy.md>`_"
+    "3",  "`ArgMax <http://caffe.berkeleyvision.org/tutorial/layers/argmax.html>`_", ":ref:`cn_api_fluid_layers_argmax`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/ArgMax.md>`_"
+    "4",  "`BatchNorm <http://caffe.berkeleyvision.org/tutorial/layers/batchnorm.html>`_", ":ref:`cn_api_fluid_layers_batch_norm`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/BatchNorm.md>`_"
+    "5",  "`BNLL <http://caffe.berkeleyvision.org/tutorial/layers/bnll.html>`_", ":ref:`cn_api_fluid_layers_softplus`",  "功能一致"
+    "6",  "`Concat <http://caffe.berkeleyvision.org/tutorial/layers/concat.html>`_", ":ref:`cn_api_fluid_layers_concat`",  "功能一致"
+    "7",  "`Convolution <http://caffe.berkeleyvision.org/tutorial/layers/convolution.html>`_", ":ref:`cn_api_fluid_layers_conv2d`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Convolution.md>`_"
+    "8",  "`Crop <http://caffe.berkeleyvision.org/tutorial/layers/crop.html>`_", ":ref:`cn_api_fluid_layers_crop`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Crop.md>`_"
+    "9",  "`Deconvolution <http://caffe.berkeleyvision.org/tutorial/layers/deconvolution.html>`_", ":ref:`cn_api_fluid_layers_conv2d_transpose`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Deconvolution.md>`_"
+    "10",  "`Dropout <http://caffe.berkeleyvision.org/tutorial/layers/dropout.html>`_", ":ref:`cn_api_fluid_layers_dropout`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Dropout.md>`_"
+    "11",  "`Eltwise <http://caffe.berkeleyvision.org/tutorial/layers/eltwise.html>`_",  "无相应接口",  "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Eltwise.md>`_"
+    "12",  "`ELU <http://caffe.berkeleyvision.org/tutorial/layers/elu.html>`_", ":ref:`cn_api_fluid_layers_elu`",  "功能一致"
+    "13",  "`EuclideanLoss <http://caffe.berkeleyvision.org/tutorial/layers/euclideanloss.html>`_", ":ref:`cn_api_fluid_layers_square_error_cost`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/EuclideanLoss.md>`_"
+    "14",  "`Exp <http://caffe.berkeleyvision.org/tutorial/layers/exp.html>`_", ":ref:`cn_api_fluid_layers_exp`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Exp.md>`_"
+    "15",  "`Flatten <http://caffe.berkeleyvision.org/tutorial/layers/flatten.html>`_", ":ref:`cn_api_fluid_layers_reshape`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Flatten.md>`_"
+    "16",  "`InnerProduct <http://caffe.berkeleyvision.org/tutorial/layers/innerproduct.html>`_", ":ref:`cn_api_fluid_layers_fc`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/InnerProduct.md>`_"
+    "17",  "`Input <http://caffe.berkeleyvision.org/tutorial/layers/input.html>`_", ":ref:`cn_api_fluid_layers_data`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Input.md>`_"
+    "18",  "`Log <http://caffe.berkeleyvision.org/tutorial/layers/log.html>`_", ":ref:`cn_api_fluid_layers_log`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Log.md>`_"
+    "19",  "`LRN <http://caffe.berkeleyvision.org/tutorial/layers/lrn.html>`_", ":ref:`cn_api_fluid_layers_lrn`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/LRN.md>`_"
+    "20",  "`Pooling <http://caffe.berkeleyvision.org/tutorial/layers/pooling.html>`_", ":ref:`cn_api_fluid_layers_pool2d`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Pooling.md>`_"
+    "21",  "`Power <http://caffe.berkeleyvision.org/tutorial/layers/power.html>`_", ":ref:`cn_api_fluid_layers_pow`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Power.md>`_"
+    "22",  "`PReLU <http://caffe.berkeleyvision.org/tutorial/layers/prelu.html>`_", ":ref:`cn_api_fluid_layers_prelu`",  "功能一致"
+    "23",  "`Reduction <http://caffe.berkeleyvision.org/tutorial/layers/reduction.html>`_",  "无相应接口",  "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Reduction.md>`_"
+    "24",  "`ReLU <http://caffe.berkeleyvision.org/tutorial/layers/relu.html>`_", ":ref:`cn_api_fluid_layers_leaky_relu`",  "功能一致"
+    "25",  "`Reshape <http://caffe.berkeleyvision.org/tutorial/layers/reshape.html>`_", ":ref:`cn_api_fluid_layers_reshape`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Reshape.md>`_"
+    "26",  "`SigmoidCrossEntropyLoss <http://caffe.berkeleyvision.org/tutorial/layers/sigmoidcrossentropyloss.html>`_", ":ref:`cn_api_fluid_layers_sigmoid_cross_entropy_with_logits`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/SigmoidCrossEntropyLoss.md>`_"
+    "27",  "`Sigmoid <http://caffe.berkeleyvision.org/tutorial/layers/sigmoid.html>`_", ":ref:`cn_api_fluid_layers_sigmoid`",  "功能一致"
+    "28",  "`Slice <http://caffe.berkeleyvision.org/tutorial/layers/slice.html>`_", ":ref:`cn_api_fluid_layers_slice`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Slice.md>`_"
+    "29",  "`SoftmaxWithLoss <http://caffe.berkeleyvision.org/tutorial/layers/softmaxwithloss.html>`_", ":ref:`cn_api_fluid_layers_softmax_with_cross_entropy`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/SofmaxWithLoss.md>`_"
+    "30",  "`Softmax <http://caffe.berkeleyvision.org/tutorial/layers/softmax.html>`_", ":ref:`cn_api_fluid_layers_softmax`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Sofmax.md>`_"
+    "31",  "`TanH <http://caffe.berkeleyvision.org/tutorial/layers/tanh.html>`_", ":ref:`cn_api_fluid_layers_tanh`",  "功能一致"
+    "32",  "`Tile <http://caffe.berkeleyvision.org/tutorial/layers/tile.html>`_", ":ref:`cn_api_fluid_layers_expand`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Tile.md>`_"
--- a/doc/fluid/api_guides/X2Paddle/TensorFlow-Fluid.rst
+++ b/doc/fluid/api_guides/X2Paddle/TensorFlow-Fluid.rst
--- a/doc/fluid/api_guides/high_low_level_api.md
+++ b/doc/fluid/api_guides/high_low_level_api.md
-## High/Low-level API简介
-
-PaddlePaddle Fluid目前有2套API接口：
-
- Low-level（底层） API：
-	
-	- 灵活性强并且已经相对成熟，使用它训练的模型，能直接支持C++预测上线。
-	- 提供了大量的模型作为使用示例，包括[Book](https://github.com/PaddlePaddle/book)中的全部章节，以及[models](https://github.com/PaddlePaddle/models)中的所有章节。
-	- 适用人群：对深度学习有一定了解，需要自定义网络进行训练/预测/上线部署的用户。
-
- High-level（高层）API：
-	
-	- 使用简单
-	- 尚未成熟，接口暂时在[paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib)下面。
--- a/doc/fluid/api_guides/high_low_level_api_en.md
+++ b/doc/fluid/api_guides/high_low_level_api_en.md
-## Introduction to High/Low-level API
-
-Currently PaddlePaddle Fluid has 2 branches of API interfaces:
-
- Low-level API:
-
-	- It is highly flexible and relatively mature. The model trained by it can directly support C++ inference deployment and release.
-	- There are a large number of models as examples, including all chapters in [book](https://github.com/PaddlePaddle/book), and [models](https://github.com/PaddlePaddle/models).
-	- Recommended for users who have a certain understanding of deep learning and need to customize a network for training/inference/online deployment.
-
- High-level API:
-
-	- Simple to use
-    - Still under development. the interface is temporarily in [paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib).
\ No newline at end of file
--- a/doc/fluid/api_guides/index.rst
+++ b/doc/fluid/api_guides/index.rst
 ===========
-API使用指南
+API分类检索
 ===========

-API使用指南分功能向您介绍PaddlePaddle Fluid的API体系和用法，帮助您快速了解PaddlePaddle Fluid API的全貌，包括以下几个模块：
+本模块分功能向您介绍PaddlePaddle Fluid的API体系和用法，提高您的查找效率，帮助您快速了解PaddlePaddle Fluid API的全貌，包括以下几个模块：

 ..  toctree::
    :maxdepth: 1

-    high_low_level_api.md
+    low_level/program.rst
    low_level/layers/index.rst
-    low_level/executor.rst
+    low_level/nets.rst
    low_level/optimizer.rst
+    low_level/backward.rst
    low_level/metrics.rst
    low_level/model_save_reader.rst
    low_level/inference.rst
-    low_level/distributed/index.rst
    low_level/memory_optimize.rst
-    low_level/nets.rst
+    low_level/executor.rst
    low_level/parallel_executor.rst
-    low_level/backward.rst
+    low_level/compiled_program.rst
    low_level/parameter.rst
-    low_level/program.rst
+    low_level/distributed/index.rst
+    X2Paddle/TensorFlow-Fluid.rst
+    X2Paddle/Caffe-Fluid.rst
--- a/doc/fluid/api_guides/index_en.rst
+++ b/doc/fluid/api_guides/index_en.rst
-===========
-API Guides
-===========
+=================
+API Quick Search
+=================

 This section introduces the Fluid API structure and usage, to help you quickly get the full picture of the PaddlePaddle Fluid API. This section is divided into the following modules:

 ..  toctree::
    :maxdepth: 1

-    high_low_level_api_en.md
+    low_level/program_en.rst
    low_level/layers/index_en.rst
-    low_level/executor_en.rst
+    low_level/nets_en.rst
    low_level/optimizer_en.rst
+    low_level/backward_en.rst
    low_level/metrics_en.rst
    low_level/model_save_reader_en.rst
    low_level/inference_en.rst
-    low_level/distributed/index_en.rst
    low_level/memory_optimize_en.rst
-    low_level/nets_en.rst
+    low_level/executor_en.rst
    low_level/parallel_executor_en.rst
    low_level/compiled_program_en.rst
-    low_level/backward_en.rst
    low_level/parameter_en.rst
-    low_level/program_en.rst
+    low_level/distributed/index_en.rst
--- a/doc/fluid/api_guides/low_level/compiled_program_cn.rst
+++ b/doc/fluid/api_guides/low_level/compiled_program_cn.rst
--- a/doc/fluid/api_guides/low_level/distributed/async_training.rst
+++ b/doc/fluid/api_guides/low_level/distributed/async_training.rst
@@ -4,7 +4,7 @@
 分布式异步训练
 ############

-Fluid支持数据并行的分布式异步训练，API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的
+Fluid支持数据并行的分布式异步训练，API使用 :code:`DistributeTranspiler` 将单机网络配置转换成可以多机执行的
 :code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码，根据环境变量或启动参数，
 可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid异步训练只支持pserver模式，异步训练和 `同步训练 <../distributed/sync_training.html>`_ 的主要差异在于：异步训练每个trainer的梯度是单独更新到参数上的，
 而同步训练是所有trainer的梯度合并之后统一更新到参数上，因此，同步训练和异步训练的超参数需要分别调节。
@@ -16,17 +16,17 @@ API详细使用方法参考 :ref:`cn_api_fluid_DistributeTranspiler` ，简单

 .. code-block:: python

-    config = fluid.DistributedTranspilerConfig()
+    config = fluid.DistributeTranspilerConfig()
    # 配置策略config
    config.slice_var_up = False
-    t = fluid.DistributedTranspiler(config=config)
-    t.transpile(trainer_id, 
+    t = fluid.DistributeTranspiler(config=config)
+    t.transpile(trainer_id,
                program=main_program,
                pservers="192.168.0.1:6174,192.168.0.2:6174",
                trainers=1,
                sync_mode=False)

-以上参数说明请参考 `同步训练 <../distributed/sync_training.html>`_ 
+以上参数说明请参考 `同步训练 <../distributed/sync_training.html>`_

 需要注意的是：进行异步训练时，请修改 :code:`sync_mode` 的值


--- a/doc/fluid/api_guides/low_level/distributed/async_training_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/async_training_en.rst
@@ -4,21 +4,21 @@
 Asynchronous Distributed Training
 ####################################

-Fluid supports parallelism asynchronous distributed training. :code:`DistributedTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed. 
+Fluid supports parallelism asynchronous distributed training. :code:`DistributeTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed.

 **Asynchronous distributed training in Fluid only supports the pserver mode** . The main difference between asynchronous training and `synchronous training <../distributed/sync_training_en.html>`_ is that the gradients of each trainer are asynchronously applied on the parameters, but in synchronous training, the gradients of all trainers must be combined first and then they are used to update the parameters. Therefore, the hyperparameters of synchronous training and asynchronous training need to be adjusted separately.

-Asynchronous distributed training in Pserver mode 
+Asynchronous distributed training in Pserver mode
 ==================================================

 For detailed API, please refer to :ref:`api_fluid_transpiler_DistributeTranspiler` . A simple example:

 .. code-block:: python

-	config = fluid.DistributedTranspilerConfig()
-	#Configuring config policy 
+	config = fluid.DistributeTranspilerConfig()
+	#Configuring config policy
 	config.slice_var_up = False
-	t = fluid.DistributedTranspiler(config=config)
+	t = fluid.DistributeTranspiler(config=config)
 	t.transpile(trainer_id,
 				program=main_program,
 				pservers="192.168.0.1:6174,192.168.0.2:6174",

--- a/doc/fluid/api_guides/low_level/distributed/index.rst
+++ b/doc/fluid/api_guides/low_level/distributed/index.rst
@@ -7,8 +7,5 @@

    sync_training.rst
    async_training.rst
-    cpu_train_best_practice.rst
    large_scale_sparse_feature_training.rst
    cluster_train_data_cn.rst
-
-
--- a/doc/fluid/api_guides/low_level/distributed/index_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/index_en.rst
@@ -7,7 +7,6 @@ Distributed Training

    sync_training_en.rst
    async_training_en.rst
-    cpu_train_best_practice_en.rst
    large_scale_sparse_feature_training_en.rst
    cluster_train_data_en.rst


--- a/doc/fluid/api_guides/low_level/distributed/sync_training.rst
+++ b/doc/fluid/api_guides/low_level/distributed/sync_training.rst
@@ -4,7 +4,7 @@
 分布式同步训练
 ############

-Fluid支持数据并行的分布式同步训练，API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的
+Fluid支持数据并行的分布式同步训练，API使用 :code:`DistributeTranspiler` 将单机网络配置转换成可以多机执行的
 :code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码，根据环境变量或启动参数，
 可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid分布式同步训练同时支持pserver模式和NCCL2模式，
 在API使用上有差别，需要注意。
@@ -16,11 +16,11 @@ API详细使用方法参考 :ref:`DistributeTranspiler` ，简单实例用法：

 .. code-block:: python

-    config = fluid.DistributedTranspilerConfig()
+    config = fluid.DistributeTranspilerConfig()
    # 配置策略config
    config.slice_var_up = False
-    t = fluid.DistributedTranspiler(config=config)
-    t.transpile(trainer_id, 
+    t = fluid.DistributeTranspiler(config=config)
+    t.transpile(trainer_id,
                program=main_program,
                pservers="192.168.0.1:6174,192.168.0.2:6174",
                trainers=1,
@@ -68,8 +68,8 @@ NCCL2模式分布式训练

    config = fluid.DistributeTranspilerConfig()
    config.mode = "nccl2"
-    t = fluid.DistributedTranspiler(config=config)
-    t.transpile(trainer_id, 
+    t = fluid.DistributeTranspiler(config=config)
+    t.transpile(trainer_id,
                program=main_program,
                startup_program=startup_program,
                trainers="192.168.0.1:6174,192.168.0.2:6174",

--- a/doc/fluid/api_guides/low_level/distributed/sync_training_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/sync_training_en.rst
@@ -4,19 +4,19 @@
 Synchronous Distributed Training
 ####################################

-Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributedTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
+Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributeTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.

-Distributed training in pserver mode 
+Distributed training in pserver mode
 ======================================

 For API Reference, please refer to :ref:`DistributeTranspiler`. A simple example :

 .. code-block:: python

-	config = fluid.DistributedTranspilerConfig()
+	config = fluid.DistributeTranspilerConfig()
 	#Configuring policy config
 	config.slice_var_up = False
-	t = fluid.DistributedTranspiler(config=config)
+	t = fluid.DistributeTranspiler(config=config)
 	t.transpile(trainer_id,
 				program=main_program,
 				pservers="192.168.0.1:6174,192.168.0.2:6174",
@@ -51,7 +51,7 @@ Configuration for general environment variables:
 - :code:`FLAGS_rpc_deadline` : int, the longest waiting time for RPC communication, in milliseconds, default 180000


-Distributed training in NCCL2 mode 
+Distributed training in NCCL2 mode
 ====================================

 The multi-node synchronous training mode based on NCCL2 (Collective Communication) is only supported in the GPU cluster.
@@ -65,7 +65,7 @@ Use the following code to convert the current :code:`Program` to a Fluid :code:`

 	Config = fluid.DistributeTranspilerConfig()
 	Config.mode = "nccl2"
-	t = fluid.DistributedTranspiler(config=config)
+	t = fluid.DistributeTranspiler(config=config)
 	t.transpile(trainer_id,
 				program=main_program,
 				startup_program=startup_program,

--- a/doc/fluid/api_guides/low_level/executor.rst
+++ b/doc/fluid/api_guides/low_level/executor.rst
@@ -4,11 +4,11 @@
 执行引擎
 ##########

-:code:`Executor` 实现了一个简易的执行器，所有的操作在其中顺序执行。你可以在Python脚本中运行:code:`Executor`。PaddlePaddle Fluid中有两种执行器。一种是:code:`Executor` 默认的单线程执行器，另一种是并行计算执行器，在:ref:`api_guide_parallel_executor`中进行了解释。`Executor`和:ref:`api_guide_parallel_executor`的配置不同，这可能会给部分用户带来困惑。为使执行器更加灵活，我们引入了:ref:`api_guide_compiled_program`，:ref:`api_guide_compiled_program`用于把一个程序转换为不同的优化组合，可以通过:code:`Executor`运行。
+:code:`Executor` 实现了一个简易的执行器，所有的操作在其中顺序执行。你可以在Python脚本中运行 :code:`Executor` 。PaddlePaddle Fluid中有两种执行器。一种是 :code:`Executor` 默认的单线程执行器，另一种是并行计算执行器，在 :ref:`api_guide_parallel_executor` 中进行了解释。``Executor`` 和 :ref:`api_guide_parallel_executor` 的配置不同，这可能会给部分用户带来困惑。为使执行器更加灵活，我们引入了 :ref:`api_guide_compiled_program` ， :ref:`api_guide_compiled_program` 用于把一个程序转换为不同的优化组合，可以通过 :code:`Executor` 运行。

-:code:`Executor`的逻辑非常简单。建议在调试阶段用:code:`Executor`在一台计算机上完整地运行模型，然后转向多设备或多台计算机计算。
+ :code:`Executor` 的逻辑非常简单。建议在调试阶段用 :code:`Executor` 在一台计算机上完整地运行模型，然后转向多设备或多台计算机计算。

-:code:`Executor`在构造时接受一个:code:`Place`，它既可能是:ref:`api_fluid_CPUPlace`也可能是:ref:`api_fluid_CUDAPlace`。
+ :code:`Executor` 在构造时接受一个 :code:`Place` ，它既可能是 :ref:`api_fluid_CPUPlace` 也可能是 :ref:`api_fluid_CUDAPlace` 。

 .. code-block:: python
    # 首先创建Executor。
@@ -16,12 +16,12 @@
    exe = fluid.Executor(place)
    # 运行启动程序仅一次。
    exe.run(fluid.default_startup_program())
-    
+
    # 直接运行主程序。
    loss, = exe.run(fluid.default_main_program(),
                    feed=feed_dict,
                    fetch_list=[loss.name])
-简单样例请参照 `quick_start_fit_a_line <http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/quick_start/fit_a_line/README.html>`_ 
+简单样例请参照 `basics_fit_a_line <../../beginners_guide/basics/fit_a_line/README.cn.html>`_

 - 相关API :
- - :ref:`cn_api_fluid_Executor` 
+ - :ref:`cn_api_fluid_Executor`
--- a/doc/fluid/api_guides/low_level/executor_en.rst
+++ b/doc/fluid/api_guides/low_level/executor_en.rst
@@ -8,7 +8,7 @@ Executor

 The logic of :code:`Executor` is very simple. It is suggested to thoroughly run the model with :code:`Executor` in debugging phase on one computer and then switch to mode of multiple devices or multiple computers to compute.

-:code:`Executor` receives a :code:`Place` at construction, which can either be :ref:`api_fluid_CPUPlace` or :ref:`api_fluid_CUDAPlace`. 
+:code:`Executor` receives a :code:`Place` at construction, which can either be :ref:`api_fluid_CPUPlace` or :ref:`api_fluid_CUDAPlace`.

 .. code-block:: python

@@ -18,14 +18,14 @@ The logic of :code:`Executor` is very simple. It is suggested to thoroughly run

    # Run the startup program once and only once.
    exe.run(fluid.default_startup_program())
-    
+
    # Run the main program directly.
    loss, = exe.run(fluid.default_main_program(),
                    feed=feed_dict,
                    fetch_list=[loss.name])


-For simple example please refer to `quick_start_fit_a_line <http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/quick_start/fit_a_line/README.html>`_ 
+For simple example please refer to `basics_fit_a_line <../../beginners_guide/basics/fit_a_line/README.html>`_

 - Related API :
 - :ref:`api_fluid_Executor`

--- a/doc/fluid/api_guides/low_level/layers/conv.rst
+++ b/doc/fluid/api_guides/low_level/layers/conv.rst
@@ -14,30 +14,31 @@
 ---------------------

 卷积需要依据滑动步长(stride)、填充长度(padding)、卷积核窗口大小(filter size)、分组数(groups)、扩张系数(dilation rate)来决定如何计算。groups最早在 `AlexNet <https://www.nvidia.cn/content/tesla/pdf/machine-learning/imagenet-classification-with-deep-convolutional-nn.pdf>`_ 中引入, 可以理解为将原始的卷积分为独立若干组卷积计算。
-  
+
  **注意**: 同cuDNN的方式，Fluid目前只支持在特征图上下填充相同的长度，左右也是。

- 输入输出Layout: 
+- 输入输出Layout:

  2D卷积输入特征的Layout为[N, C, H, W]或[N, H, W, C], N即batch size，C是通道数，H、W是特征的高度和宽度，输出特征和输入特征的Layout一致。(相应的3D卷积输入特征的Layout为[N, C, D, H, W]或[N, D, H, W, C]，但**注意**，Fluid的卷积当前只支持[N, C, H, W]，[N, C, D, H, W]。)
-   
- 卷积核的Layout: 
-  
+
+- 卷积核的Layout:
+
  Fluid中2D卷积的卷积核(也称权重)的Layout为[C_o, C_in / groups, f_h, f_w]，C_o、C_in表示输出、输入通道数，f_h、f_w表示卷积核窗口的高度和宽度，按行序存储。(相应的2D卷积的卷积核Layout为[C_o, C_in / groups, f_d, f_h, d_w]，同样按行序存储。)
-  
- 深度可分离卷积(depthwise separable convolution): 
-   
+
+- 深度可分离卷积(depthwise separable convolution):
+
  在深度可分离卷积中包括depthwise convolution和pointwise convolution两组，这两个卷积的接口和上述普通卷积接口相同。前者可以通过给普通卷积设置groups来做，后者通过设置卷积核filters的大小为1x1，深度可分离卷积减少参数的同时减少了计算量。
-  
+
  对于depthwise convolution，可以设置groups等于输入通道数，此时，2D卷积的卷积核形状为[C_o, 1, f_h, f_w]。
  对于pointwise convolution，卷积核的形状为[C_o, C_in, 1, 1]。
-  
-  **注意**：Fluid针对depthwise convolution的GPU计算做了高度优化，您可以通过在 :code:`fluid.layers.conv2d`接口设置 :code:`use_cudnn=False`来使用Fluid自身优化的CUDA程序。
-   
+
+  **注意**：Fluid针对depthwise convolution的GPU计算做了高度优化，您可以通过在
+  :code:`fluid.layers.conv2d` 接口设置 :code:`use_cudnn=False` 来使用Fluid自身优化的CUDA程序。
+
 - 空洞卷积(dilated convolution):
-  
+
  空洞卷积相比普通卷积而言，卷积核在特征图上取值时不在连续，而是间隔的，这个间隔数称作dilation，等于1时，即为普通卷积，空洞卷积相比普通卷积的感受野更大。
-  
+
 - API汇总:
 - :ref:`cn_api_fluid_layers_conv2d`
 - :ref:`cn_api_fluid_layers_conv3d`
@@ -50,14 +51,14 @@

 Fluid可以表示变长的序列结构，这里的变长是指不同样本的时间步(step)数不一样，通常是一个2D的Tensor和一个能够区分的样本长度的辅助结构来表示。假定，2D的Tensor的形状是shape，shape[0]是所有样本的总时间步数，shape[1]是序列特征的大小。

-基于此数据结构的卷积在Fluid里称作序列卷积，也表示一维卷积。同图像卷积，序列卷积的输入参数有卷积核大小、填充大小、滑动步长，但与2D卷积不同的是，这些参数个数都为1。**注意**，目前仅支持stride为1的情况，输出序列的时间步数和输入序列相同。 
+基于此数据结构的卷积在Fluid里称作序列卷积，也表示一维卷积。同图像卷积，序列卷积的输入参数有卷积核大小、填充大小、滑动步长，但与2D卷积不同的是，这些参数个数都为1。**注意**，目前仅支持stride为1的情况，输出序列的时间步数和输入序列相同。

 假如：输入序列形状为(T, N)， T即该序列的时间步数，N是序列特征大小；卷积核的上下文步长为K，输出序列长度为M，则卷积核权重形状为(K * N, M），输出序列形状为(T, M)。
-  
+
 另外，参考DeepSpeech，Fluid实现了行卷积row convolution, 或称
 `look ahead convolution <http://www.cs.cmu.edu/~dyogatam/papers/wang+etal.iclrworkshop2016.pdf>`_ ，
 该卷积相比上述普通序列卷积可以减少参数。
- 
+

 - API汇总:
 - :ref:`cn_api_fluid_layers_sequence_conv`

--- a/doc/fluid/api_guides/low_level/layers/learning_rate_scheduler.rst
+++ b/doc/fluid/api_guides/low_level/layers/learning_rate_scheduler.rst
@@ -38,3 +38,9 @@
 * :code:`append_LARS`: 通过Layer-wise Adaptive Rate Scaling算法获得学习率，相关算法请参考 `《Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation》 <https://arxiv.org/abs/1802.09750>`_ 。
  相关API Reference请参考 :ref:`cn_api_fluid_layers_append_LARS`

+* :code:`cosine_decay`: 余弦衰减，即学习率随step数变化呈余弦函数。
+  相关API Reference请参考 :ref:`cn_api_fluid_layers_cosine_decay`
+
+* :code:`linear_lr_warmup`: 学习率随step数线性增加到指定学习率。
+  相关API Reference请参考 :ref:`cn_api_fluid_layers_linear_lr_warmup`
+
--- a/doc/fluid/api_guides/low_level/layers/sparse_update.rst
+++ b/doc/fluid/api_guides/low_level/layers/sparse_update.rst
@@ -37,9 +37,9 @@ API详细使用方法参考 :ref:`cn_api_fluid_layers_embedding` ，以下是一

 以上参数中：

- :code:`is_sparse` ： 反向计算的时候梯度是否为sparse tensor。如果不设置，梯度是一个 `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/prepare_data/lod_tensor.md>`_  。默认为False。
+- :code:`is_sparse` ： 反向计算的时候梯度是否为sparse tensor。如果不设置，梯度是一个 `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/basic_concept/lod_tensor.html>`_  。默认为False。

- :code:`is_distributed` ： 标志是否是用在分布式的场景下。一般大规模稀疏更新（embedding的第0维维度很大，比如几百万以上）才需要设置。具体可以参考大规模稀疏的API guide  :ref:`api_guide_async_training`  。默认为False。
+- :code:`is_distributed` ： 标志是否是用在分布式的场景下。一般大规模稀疏更新（embedding的第0维维度很大，比如几百万以上）才需要设置。具体可以参考大规模稀疏的API guide  :ref:`cn_api_guide_async_training`  。默认为False。

 - API汇总:
 - :ref:`cn_api_fluid_layers_embedding`
--- a/doc/fluid/api_guides/low_level/layers/sparse_update_en.rst
+++ b/doc/fluid/api_guides/low_level/layers/sparse_update_en.rst
@@ -37,9 +37,9 @@ API reference :ref:`api_fluid_layers_embedding` . Here is a simple example:

 The parameters:

- :code:`is_sparse` : Whether the gradient is a sparse tensor in the backward calculation. If not set, the gradient is a `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/prepare_data/lod_tensor.md>`_ . The default is False.
+- :code:`is_sparse` : Whether the gradient is a sparse tensor in the backward calculation. If not set, the gradient is a `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/basic_concept/lod_tensor_en.html>`_ . The default is False.

 - :code:`is_distributed` : Whether the current training is in a distributed scenario. Generally, this parameter can only be set in large-scale sparse updates (the 0th dimension of embedding is very large, such as several million or more). For details, please refer to the large-scale sparse API guide :ref:`api_guide_async_training`. The default is False.

 - API :
-   - :ref:`api_fluid_layers_embedding`
\ No newline at end of file
+   - :ref:`api_fluid_layers_embedding`
--- a/doc/fluid/api_guides/low_level/nets.rst
+++ b/doc/fluid/api_guides/low_level/nets.rst
@@ -33,8 +33,9 @@ API Reference 请参考 :ref:`cn_api_fluid_nets_img_conv_group`

 :code:`sequence_conv_pool` 是由 :ref:`cn_api_fluid_layers_sequence_conv` 与 :ref:`cn_api_fluid_layers_sequence_pool` 串联而成。
 该模块在 `自然语言处理 <https://zh.wikipedia.org/wiki/自然语言处理>`_ 以及 `语音识别 <https://zh.wikipedia.org/wiki/语音识别>`_ 等领域均有广泛应用，
-比如 `文本分类模型 <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/text_classification/nets.py>`_ , 
-`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/tagspace/train.py>`_  以及 `Multi-view Simnet <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/multiview_simnet/nets.py>`_ 等模型。
+比如 `文本分类模型 <https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/text_classification/nets.py>`_ ,
+`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/tagspace/train.py>`_  以及
+`Multi-view Simnet <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/multiview_simnet/nets.py>`_ 等模型。

 API Reference 请参考 :ref:`cn_api_fluid_nets_sequence_conv_pool`

@@ -55,7 +56,7 @@ API Reference 请参考 :ref:`cn_api_fluid_nets_glu`
 .. math::
 Attention(Q, K, V)= softmax(QK^\mathrm{T})V

-该模块广泛使用在 `机器翻译 <https://zh.wikipedia.org/zh/机器翻译>`_ 的模型中，比如 `Transformer <https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/neural_machine_translation/transformer>`_ 。
+该模块广泛使用在 `机器翻译 <https://zh.wikipedia.org/zh/机器翻译>`_ 的模型中，比如 `Transformer <https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/neural_machine_translation/transformer>`_ 。

 API Reference 请参考 :ref:`cn_api_fluid_nets_scaled_dot_product_attention`

--- a/doc/fluid/api_guides/low_level/nets_en.rst
+++ b/doc/fluid/api_guides/low_level/nets_en.rst
@@ -32,8 +32,8 @@ For API Reference, please refer to :ref:`api_fluid_nets_img_conv_group`
 --------------------

 :code:`sequence_conv_pool` is got by concatenating :ref:`api_fluid_layers_sequence_conv` with :ref:`api_fluid_layers_sequence_pool`.
-The module is widely used in the field of `natural language processing <https://en.wikipedia.org/wiki/Natural_language_processing>`_ and `speech recognition <https://en.wikipedia.org/wiki/Speech_recognition>`_ .  Models such as the `text classification model <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/text_classification/nets.py>`_ ,
-`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/tagspace/train.py>`_ and `Multi view Simnet <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/multiview_simnet/nets.py>`_.
+The module is widely used in the field of `natural language processing <https://en.wikipedia.org/wiki/Natural_language_processing>`_ and `speech recognition <https://en.wikipedia.org/wiki/Speech_recognition>`_ .  Models such as the `text classification model <https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/text_classification/nets.py>`_ ,
+`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/tagspace/train.py>`_ and `Multi view Simnet <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/multiview_simnet/nets.py>`_.

 For API Reference, please refer to :ref:`api_fluid_nets_sequence_conv_pool`

@@ -54,6 +54,6 @@ For the input data :code:`Queries` , :code:`Key` and :code:`Values`, calculate t
 .. math::
 Attention(Q, K, V)= softmax(QK^\mathrm{T})V

-This module is widely used in the model of `machine translation <https://en.wikipedia.org/wiki/Machine_translation>`_, such as `Transformer <https://github.com/PaddlePaddle/models/tree/develop/Fluid/PaddleNLP/neural_machine_translation/transformer>`_ .
+This module is widely used in the model of `machine translation <https://en.wikipedia.org/wiki/Machine_translation>`_, such as `Transformer <https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/neural_machine_translation/transformer>`_ .

 For API Reference, please refer to :ref:`api_fluid_nets_scaled_dot_product_attention`
--- a/doc/fluid/api_guides/low_level/parallel_executor.rst
+++ b/doc/fluid/api_guides/low_level/parallel_executor.rst
@@ -29,7 +29,7 @@

 **注意** ：如果在Reduce模式下使用 :code:`CPU` 多线程执行 :code:`Program` ， :code:`Program` 的参数在多个线程间是共享的，在某些模型上，Reduce模式可以大幅节省内存。

-鉴于模型的执行速率和模型结构及执行器的执行策略有关，:code:`ParallelExecutor` 允许你修改执行器的相关参数，例如线程池的规模( :code:`num_threads` )、为清除临时变量:code:`num_iteration_per_drop_scope`需要进行的循环次数。更多信息请参照:ref:`cn_api_fluid_ExecutionStrategy`。
+鉴于模型的执行速率和模型结构及执行器的执行策略有关，:code:`ParallelExecutor` 允许你修改执行器的相关参数，例如线程池的规模( :code:`num_threads` )、为清除临时变量 :code:`num_iteration_per_drop_scope` 需要进行的循环次数。更多信息请参照 :ref:`cn_api_fluid_ExecutionStrategy` 。


 .. code-block:: python
@@ -49,8 +49,8 @@
    exec_strategy.num_threads = dev_count * 4 # the size of thread pool.
    build_strategy = fluid.BuildStrategy()
    build_strategy.memory_optimize = True if memory_opt else False
-    train_exe = fluid.ParallelExecutor(use_cuda=use_cuda, 
-                                       main_program=train_program, 
+    train_exe = fluid.ParallelExecutor(use_cuda=use_cuda,
+                                       main_program=train_program,
                                       build_strategy=build_strategy,
                                       exec_strategy=exec_strategy,
                                       loss_name=loss.name)

--- a/doc/fluid/api_guides/low_level/program.rst
+++ b/doc/fluid/api_guides/low_level/program.rst
 .. _api_guide_Program:

-###############################
-Program/Block/Operator/Variable
-###############################
+#########
+基础概念
+#########

 ==================
 Program
@@ -13,13 +13,13 @@ Program

 总得来说：

-* 一个模型是一个 Fluid :code:`Program` ,一个模型可以含有多于一个 :code:`Program` ； 
+* 一个模型是一个 Fluid :code:`Program` ,一个模型可以含有多于一个 :code:`Program` ；

 * :code:`Program` 由嵌套的 :code:`Block` 构成，:code:`Block` 的概念可以类比到 C++ 或是 Java 中的一对大括号，或是 Python 语言中的一个缩进块；

 * :code:`Block` 中的计算由顺序执行、条件选择或者循环执行三种方式组合，构成复杂的计算逻辑；

-* :code:`Block` 中包含对计算和计算对象的描述。计算的描述称之为 Operator；计算作用的对象（或者说 Operator 的输入和输出）被统一为 Tensor，在Fluid中，Tensor 用层级为0的 `LoD-Tensor <http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/prepare_data/lod_tensor.html#permalink-4-lod-tensor>`_ 表示。 
+* :code:`Block` 中包含对计算和计算对象的描述。计算的描述称之为 Operator；计算作用的对象（或者说 Operator 的输入和输出）被统一为 Tensor，在Fluid中，Tensor 用层级为0的 `LoD-Tensor <http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/prepare_data/lod_tensor.html#permalink-4-lod-tensor>`_ 表示。



@@ -37,7 +37,7 @@ Block
 +----------------------+-------------------------+
 | if-else, switch      | IfElseOp, SwitchOp      |
 +----------------------+-------------------------+
-| 顺序执行              | 一系列 layers            | 
+| 顺序执行              | 一系列 layers            |
 +----------------------+-------------------------+

 如上文所说，Fluid 中的 :code:`Block` 描述了一组以顺序、选择或是循环执行的 Operator 以及 Operator 操作的对象：Tensor。
@@ -54,7 +54,7 @@ Operator
 这是因为一些常见的对 Tensor 的操作可能是由更多基础操作构成，为了提高使用的便利性，框架内部对基础 Operator 进行了一些封装，包括创建 Operator 依赖可学习参数，可学习参数的初始化细节等，减少用户重复开发的成本。


-更多内容可参考阅读 `Fluid设计思想 <../../advanced_usage/design_idea/fluid_design_idea.html>`_ 
+更多内容可参考阅读 `Fluid设计思想 <../../advanced_usage/design_idea/fluid_design_idea.html>`_


 =========
@@ -78,4 +78,4 @@ Fluid 中的 :code:`Variable` 可以包含任何类型的值———在大多

 * 用户还可以使用 :ref:`cn_api_fluid_program_guard` 配合 :code:`with` 语句，修改配置好的 :ref:`cn_api_fluid_default_startup_program` 和 :ref:`cn_api_fluid_default_main_program` 。

-* 在Fluid中，Block内部执行顺序由控制流决定，如 :ref:`cn_api_fluid_layers_IfElse` , :ref:`cn_api_fluid_layers_While`, :ref:`cn_api_fluid_layers_Switch` 等，更多内容可参考： :ref:`api_guide_control_flow` 
+* 在Fluid中，Block内部执行顺序由控制流决定，如 :ref:`cn_api_fluid_layers_IfElse` , :ref:`cn_api_fluid_layers_While`, :ref:`cn_api_fluid_layers_Switch` 等，更多内容可参考： :ref:`api_guide_control_flow`
--- a/doc/fluid/api_guides/low_level/program_en.rst
+++ b/doc/fluid/api_guides/low_level/program_en.rst
 .. _api_guide_Program_en:

-###############################
-Program/Block/Operator/Variable
-###############################
+###############
+Basic Concept
+###############

 ==================
 Program
@@ -36,7 +36,7 @@ Block
 +----------------------+-------------------------+
 | if-else, switch      | IfElseOp, SwitchOp      |
 +----------------------+-------------------------+
-| execute sequentially | a series of layers      | 
+| execute sequentially | a series of layers      |
 +----------------------+-------------------------+

 As mentioned above,  :code:`Block` in Fluid describes a set of Operators that include sequential execution, conditional selection or loop execution, and the operating object of Operator: Tensor.
@@ -53,7 +53,7 @@ This is because some common operations on Tensor may consist of more basic opera



-More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_ 
+More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_


 =========
@@ -75,4 +75,4 @@ Related API
 * Users can also use :ref:`api_fluid_program_guard` with :code:`with` to modify the configured :ref:`api_fluid_default_startup_program` and :ref:`api_fluid_default_main_program` .


-* In Fluid，the execution order in a Block is determined by control flow，such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to： :ref:`api_guide_control_flow_en` 
+* In Fluid，the execution order in a Block is determined by control flow，such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to： :ref:`api_guide_control_flow_en`
--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/README.cn.md
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/README.cn.md
--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/README.md
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/README.md
--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/image
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/image
--- a/doc/fluid/beginners_guide/basics/index.rst
+++ b/doc/fluid/beginners_guide/basics/index.rst
 ################
-深度学习基础
+深度学习基础教程
 ################


-本章由7篇文档组成，它们按照简单到难的顺序排列，将指导您如何使用PaddlePaddle完成基础的深度学习任务
+本章由9篇文档组成，它们按照简单到难的顺序排列，将指导您如何使用PaddlePaddle完成基础的深度学习任务

 本章文档涉及大量了深度学习基础知识，也介绍了如何使用PaddlePaddle实现这些内容，请参阅以下说明了解如何使用：

@@ -15,6 +15,8 @@
 ..  toctree::
    :titlesonly:

+    fit_a_line/README.cn.md
+    recognize_digits/README.cn.md
    image_classification/index.md
    word2vec/index.md
    recommender_system/index.md

--- a/doc/fluid/beginners_guide/basics/index_en.rst
+++ b/doc/fluid/beginners_guide/basics/index_en.rst
-##########################
+############################
 Basic Deep Learning Models
-##########################
+############################

-This section collects six documents arranging from the simplest to the most challenging, which will guide you through the basic deep learning tasks in PaddlePaddle.
+This section collects 8 documents arranging from the simplest to the most challenging, which will guide you through the basic deep learning tasks in PaddlePaddle.

 The documentation in this chapter covers a lot of deep learning basics and how to implement them with PaddlePaddle. See the instructions below for how to use:

@@ -15,6 +15,8 @@ The book you are reading is an "interactive" e-book - each chapter can be run in
 ..  toctree::
    :titlesonly:

+    fit_a_line/README.md
+    recognize_digits/README.md
    image_classification/index_en.md
    word2vec/index_en.md
    recommender_system/index_en.md
@@ -45,7 +47,7 @@ Just run these in shell:

 	docker run -d -p 8888:8888 paddlepaddle/book

-It downloads the Docker image for running books from DockerHub.com. 
+It downloads the Docker image for running books from DockerHub.com.
 To read and edit this book on-line, please visit http://localhost:8888 in your browser.

 If the Internet connection to DockerHub.com is compromised, try our spare docker image named docker.paddlepaddlehub.com:

--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/README.cn.md
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/README.cn.md
--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/README.md
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/README.md
--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/image
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/image
--- a/doc/fluid/beginners_guide/index.rst
+++ b/doc/fluid/beginners_guide/index.rst
@@ -6,23 +6,24 @@ PaddlePaddle (PArallel Distributed Deep LEarning)是一个易用、高效、灵

 您可参考PaddlePaddle的 `Github <https://github.com/PaddlePaddle/Paddle>`_ 了解详情，也可阅读 `版本说明 <../release_note.html>`_ 了解新版本的特性

-当您第一次来到PaddlePaddle，请您首先阅读以下文档，了解安装方法：
+让我们从这里开始：

-    - `安装说明 <../beginners_guide/install/index_cn.html>`_：我们支持在Ubuntu/CentOS/Windows/MacOS环境上的安装
+    - `快速开始 <../beginners_guide/quick_start.html>`_

-如果您已经具备一定的深度学习基础，第一次使用PaddlePaddle时，可以跟随下列简单的模型案例供您快速上手：
+当您第一次来到PaddlePaddle，请您首先阅读以下文档，了解安装方法：

-    - `Fluid编程指南 <../beginners_guide/programming_guide/programming_guide.html>`_：介绍 Fluid 的基本概念和使用方法
+    - `安装说明 <../beginners_guide/install/index_cn.html>`_：我们支持在Ubuntu/CentOS/Windows/MacOS环境上的安装

-    - `快速入门 <../beginners_guide/quick_start/index.html>`_：提供线性回归和识别数字两个入门级模型，帮助您快速上手训练网络
+这里为您提供了更多学习资料:

    - `深度学习基础 <../beginners_guide/basics/index.html>`_：覆盖图像分类、个性化推荐、机器翻译等多个深度领域的基础知识，提供 Fluid 实现案例

+    - `Fluid编程指南 <../beginners_guide/programming_guide/programming_guide.html>`_：介绍 Fluid 的基本概念和使用方法

 ..  toctree::
    :hidden:

+    quick_start_cn.rst
    install/index_cn.rst
-    quick_start/index.rst
-    basics/index.rst
+    basics/index_cn.rst
    programming_guide/programming_guide.md
--- a/doc/fluid/beginners_guide/index_en.rst
+++ b/doc/fluid/beginners_guide/index_en.rst
@@ -15,8 +15,6 @@ If you have been armed with certain level of deep learning knowledge, and it hap

    - `Programming with Fluid <../beginners_guide/programming_guide/programming_guide_en.html>`_ ： Core concepts and basic usage of Fluid

-    - `Quick Start <../beginners_guide/quick_start/index_en.html>`_： Two easy-to-go models, linear regression model and digit recognition model, are in place to speed up your study of training neural networks
-
    - `Deep Learning  Basics <../beginners_guide/basics/index_en.html>`_： This section encompasses various fields of fundamental deep learning knowledge, such as image classification, customized recommendation, machine translation, and examples implemented by Fluid are provided.


@@ -24,6 +22,5 @@ If you have been armed with certain level of deep learning knowledge, and it hap
    :hidden:

    install/index_en.rst
-    quick_start/index_en.rst
    basics/index_en.rst
    programming_guide/programming_guide_en.md
--- a/doc/fluid/beginners_guide/install/Tables.md
+++ b/doc/fluid/beginners_guide/install/Tables.md
--- a/doc/fluid/beginners_guide/install/compile/compile_MacOS.md
+++ b/doc/fluid/beginners_guide/install/compile/compile_MacOS.md
@@ -186,6 +186,7 @@
 			For Python2: cmake .. -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF  -DCMAKE_BUILD_TYPE=Release
 			For Python3: cmake .. -DPY_VERSION=3.5 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
 			 -DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF  -DCMAKE_BUILD_TYPE=Release
+
 	>`-DPY_VERSION=3.5`请修改为安装环境的Python版本

 10. 使用以下命令来编译：

--- a/doc/fluid/beginners_guide/install/compile/compile_MacOS_en.md
+++ b/doc/fluid/beginners_guide/install/compile/compile_MacOS_en.md
--- a/doc/fluid/beginners_guide/install/install_Ubuntu_en.md
+++ b/doc/fluid/beginners_guide/install/install_Ubuntu_en.md
--- a/doc/fluid/beginners_guide/programming_guide/programming_guide.md
+++ b/doc/fluid/beginners_guide/programming_guide/programming_guide.md
--- a/doc/fluid/beginners_guide/programming_guide/programming_guide_en.md
+++ b/doc/fluid/beginners_guide/programming_guide/programming_guide_en.md
@@ -414,7 +414,7 @@ Firstly, define input data format, model structure,loss function and optimized a
    ```
    Now we discover that predicted value is nearly close to real value and the loss value descends from original value 9.05 to 0.01 after iteration for 100 times.

-    Congratulations! You have succeed to create a simple network. If you want to try advanced linear regression —— predict model of housing price, please read [linear regression](../../beginners_guide/quick_start/fit_a_line/README.en.html). More examples of model can be found in [models](../../user_guides/models/index_en.html).
+    Congratulations! You have succeed to create a simple network. If you want to try advanced linear regression —— predict model of housing price, please read [linear regression](../../beginners_guide/basics/fit_a_line/README.en.html). More examples of model can be found in [models](../../user_guides/models/index_en.html).

 <a name="what_next"></a>
 ## What's next
@@ -427,4 +427,4 @@ After the construction of network, you can start training your network in single

 In addition, there are three learning levels in documentation according to developer's background and experience: [Beginner's Guide](../../beginners_guide/index_en.html) , [User Guides](../../user_guides/index_en.html) and [Advanced User Guides](../../advanced_usage/index_en.html).

-If you want to read examples in more application scenarios, you can go to [quick start](../../beginners_guide/quick_start/index_en.html) and [basic knowledge of deep learning](../../beginners_guide/basics/index_en.html) .If you have learned basic knowledge of deep learning, you can read from [user guide](../../user_guides/index_en.html).
+If you want to read examples in more application scenarios, you can go to [basic knowledge of deep learning](../../beginners_guide/basics/index_en.html) .If you have learned basic knowledge of deep learning, you can read from [user guide](../../user_guides/index_en.html).
--- a/doc/fluid/beginners_guide/quick_start/index.rst
+++ b/doc/fluid/beginners_guide/quick_start/index.rst
--- a/doc/fluid/beginners_guide/quick_start/index_en.rst
+++ b/doc/fluid/beginners_guide/quick_start/index_en.rst
--- a/doc/fluid/beginners_guide/quick_start.rst
+++ b/doc/fluid/beginners_guide/quick_start.rst
--- a/doc/fluid/index_cn.rst
+++ b/doc/fluid/index_cn.rst
--- a/doc/fluid/release_note.rst
+++ b/doc/fluid/release_note.rst
--- a/doc/fluid/user_guides/howto/configure_simple_model/index.rst
+++ b/doc/fluid/user_guides/howto/configure_simple_model/index.rst
--- a/doc/fluid/user_guides/howto/evaluation_and_debugging/index.rst
+++ b/doc/fluid/user_guides/howto/evaluation_and_debugging/index.rst
--- a/doc/fluid/user_guides/howto/prepare_data/feeding_data.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/feeding_data.rst
--- a/doc/fluid/user_guides/howto/prepare_data/feeding_data_en.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/feeding_data_en.rst
--- a/doc/fluid/user_guides/howto/prepare_data/index.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/index.rst
--- a/doc/fluid/user_guides/howto/training/deploy_ctr_on_baidu_cloud_cn.rst
+++ b/doc/fluid/user_guides/howto/training/deploy_ctr_on_baidu_cloud_cn.rst
--- a/doc/fluid/user_guides/howto/training/index.rst
+++ b/doc/fluid/user_guides/howto/training/index.rst
--- a/doc/fluid/user_guides/howto/training/multi_node.rst
+++ b/doc/fluid/user_guides/howto/training/multi_node.rst
--- a/doc/fluid/user_guides/howto/training/save_load_variables.rst
+++ b/doc/fluid/user_guides/howto/training/save_load_variables.rst
--- a/doc/fluid/user_guides/howto/training/single_node.rst
+++ b/doc/fluid/user_guides/howto/training/single_node.rst
--- a/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst
+++ b/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst
--- a/doc/fluid/user_guides/index.rst
+++ b/doc/fluid/user_guides/index.rst
--- a/doc/fluid/user_guides/models/index_en.rst
+++ b/doc/fluid/user_guides/models/index_en.rst