merge conflict

84c01a53 · JiabinYang · 7dba9ffb · 8e019748 · 84c01a53 · 84c01a53
79 changed file
--- a/.gitignore
+++ b/.gitignore
 .vscode/
+/doc/fluid/menu.zh.json
+/doc/fluid/menu.en.json
--- a/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice.rst
+++ b/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice.rst
 .. _api_guide_cpu_training_best_practice:
-##################
+####################
 分布式CPU训练最佳实践
-##################
+####################
 提高CPU分布式训练的训练速度，主要要从两个方面来考虑：
 1）提高训练速度，主要是提高CPU的使用率；2）提高通信速度，主要是减少通信传输的数据量。
@@ -46,7 +46,7 @@ API详细使用方法参考 :ref:`cn_api_fluid_ParallelExecutor` ，简单实例
 提高通信速度
 ==========
-要减少通信数据量，提高通信速度，主要是使用稀疏更新 ，目前支持 `稀疏更新 <../layers/sparse_update.html>`_  的主要是  :ref:`cn_api_fluid_layers_embedding` 。
+要减少通信数据量，提高通信速度，主要是使用稀疏更新 ，目前支持  :ref:`api_guide_sparse_update` 的主要是  :ref:`cn_api_fluid_layers_embedding` 。
 .. code-block:: python

--- a/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice_en.rst
--- a/doc/fluid/advanced_usage/best_practice/dist_training_gpu.rst
+++ b/doc/fluid/advanced_usage/best_practice/dist_training_gpu.rst
 .. _best_practice_dist_training_gpu:
-性能优化最佳实践之：GPU分布式训练
+#####################
-============================
+分布式GPU训练最佳实践
+#####################
 开始优化您的GPU分布式训练任务
 -------------------------

--- a/doc/fluid/advanced_usage/best_practice/index_cn.rst
+++ b/doc/fluid/advanced_usage/best_practice/index_cn.rst
+#########
+最佳实践
+#########
+..  toctree::
+    :maxdepth: 1
+    cpu_train_best_practice.rst
+    dist_training_gpu.rst
--- a/doc/fluid/advanced_usage/best_practice/index_en.rst
+++ b/doc/fluid/advanced_usage/best_practice/index_en.rst
+###############
+Best Practice
+###############
+..  toctree::
+    :hidden:
+    cpu_train_best_practice_en.rst
--- a/doc/fluid/advanced_usage/deploy/inference/paddle_gpu_benchmark.md
+++ b/doc/fluid/advanced_usage/deploy/inference/paddle_gpu_benchmark.md
@@ -8,7 +8,7 @@
 ## 测试对象
 **PaddlePaddle, Pytorch, Tensorflow**
- 在测试中，PaddlePaddle使用子图优化的方式集成了TensorRT, 模型[地址](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)。
+- 在测试中，PaddlePaddle使用子图优化的方式集成了TensorRT, 模型[地址](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models)。
 - Pytorch使用了原生的实现, 模型[地址1](https://github.com/pytorch/vision/tree/master/torchvision/models)、[地址2](https://github.com/marvis/pytorch-mobilenet)。
 - 对TensorFlow测试包括了对TF的原生的测试，和对TF—TRT的测试，**对TF—TRT的测试并没有达到预期的效果，后期会对其进行补充**， 模型[地址](https://github.com/tensorflow/models)。

--- a/doc/fluid/advanced_usage/design_idea/fluid_design_idea.md
+++ b/doc/fluid/advanced_usage/design_idea/fluid_design_idea.md
@@ -95,7 +95,7 @@ prob = ie()
 ```
 ### BlockDesc and ProgramDesc
-用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存，所有的`protobub`信息被定义在`framework.proto`中，在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)。
+用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存，所有的`protobuf`信息被定义在`framework.proto`中，在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)。
 `BlockDesc`中包含本地变量的定义`vars`，和一系列的operator`ops`：
@@ -359,5 +359,5 @@ Fluid使用Executor.run来运行一段Program。
       [6.099215 ]], dtype=float32), array([1.6935859], dtype=float32)]
 ```
-至此您已经了解了Fluid 内部的执行流程的核心概念，更多框架使用细节请参考[使用指南](../../user_guides/index.html)相关内容，[模型库](../../user_guides/models/index_cn.html
+至此您已经了解了Fluid 内部的执行流程的核心概念，更多框架使用细节请参考[使用指南](../../user_guides/index_cn.html)相关内容，[模型库](../../user_guides/models/index_cn.html
 )中也为您提供了丰富的模型示例以供参考。
--- a/doc/fluid/advanced_usage/development/contribute_to_paddle/submit_pr_guide.md
+++ b/doc/fluid/advanced_usage/development/contribute_to_paddle/submit_pr_guide.md
-# Github提交PR指南
+# 提交PR注意事项
 ## 建立 Issue 并完成 Pull Request

--- a/doc/fluid/advanced_usage/development/new_op/index_cn.rst
+++ b/doc/fluid/advanced_usage/development/new_op/index_cn.rst
 #############
-新增Operator
+新增OP
 #############
 本部分将指导您如何新增Operator，也包括一些必要的注意事项

--- a/doc/fluid/advanced_usage/development/new_op/new_op.md
+++ b/doc/fluid/advanced_usage/development/new_op/new_op.md
-# 如何写新的op
+# 如何写新的OP
 ## 概念简介

--- a/doc/fluid/advanced_usage/development/new_op/op_notes.md
+++ b/doc/fluid/advanced_usage/development/new_op/op_notes.md
-# op相关注意事项
+# OP相关注意事项
 ## Fluid中Op的构建逻辑
 ### 1.Fluid中Op的构建逻辑

--- a/doc/fluid/advanced_usage/development/new_op/op_notes_en.md
+++ b/doc/fluid/advanced_usage/development/new_op/op_notes_en.md
@@ -11,7 +11,7 @@ The Fluid framework is designed to run on a variety of devices and third-party l
 Operator inheritance diagram:
 ![op_inheritance_relation_diagram](../../pics/op_inheritance_relation_diagram.png)
-For further information, please refer to: [multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices) , [scope](https://github.com/PaddlePaddle/FluidDoc/Blob/develop/doc/fluid/design/concepts/scope.md) , [Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/release/1.2/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md)
+For further information, please refer to: [multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices) , [scope](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/scope.md) , [Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/release/1.2/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md)
 ### 2.Op's registration logic
 The registration entries for each Operator include:

--- a/doc/fluid/advanced_usage/index.rst
+++ b/doc/fluid/advanced_usage/index.rst
@@ -2,18 +2,20 @@
 进阶使用
 ########
-..  todo::
 如果您非常熟悉 Fluid，期望获得更高效的模型或者定义自己的Operator，请阅读：
    - `Fluid 设计思想 <../advanced_usage/design_idea/fluid_design_idea.html>`_：介绍 Fluid 底层的设计思想，帮助您更好的理解框架运作过程
    - `预测部署 <../advanced_usage/deploy/index_cn.html>`_ ：介绍如何应用训练好的模型进行预测
-	- `新增operator <../advanced_usage/development/new_op/index_cn.html>`_ ：介绍新增operator的方法及注意事项
+    - `新增OP <../advanced_usage/development/new_op/index_cn.html>`_ ：介绍新增operator的方法及注意事项
    - `性能调优 <../advanced_usage/development/profiling/index_cn.html>`_ ：介绍 Fluid 使用过程中的调优方法
+    - `最佳实践 <../advanced_usage/best_practice/index_cn.html>`_
+    - `模型压缩工具库 <../advanced_usage/paddle_slim/paddle_slim.html>`_
 非常欢迎您为我们的开源社区做出贡献，关于如何贡献您的代码或文档，请阅读：
 	- `如何贡献代码 <../advanced_usage/development/contribute_to_paddle/index_cn.html>`_：介绍如何向 PaddlePaddle 开源社区贡献代码
@@ -27,7 +29,7 @@
    deploy/index_cn.rst
    development/new_op/index_cn.rst
    development/profiling/index_cn.rst
+    best_practice/index_cn.rst
+    paddle_slim/paddle_slim.md
    development/contribute_to_paddle/index_cn.rst
    development/write_docs_cn.md
-    best_practice/dist_training_gpu.rst
-    paddle_slim/paddle_slim.md 
--- a/doc/fluid/advanced_usage/index_en.rst
+++ b/doc/fluid/advanced_usage/index_en.rst
@@ -29,3 +29,4 @@ We gladly encourage your contributions of codes and documentation to our communi
    development/profiling/index_en.rst
    development/contribute_to_paddle/index_en.rst
    development/write_docs_en.md
+    best_practice/index_en.rst
--- a/doc/fluid/api_cn/data/data_reader_cn.rst
+++ b/doc/fluid/api_cn/data/data_reader_cn.rst
@@ -133,7 +133,7 @@ Data Reader Interface
 	iterable = data_reader()
-从iterable生成的元素应该是单个数据条目，而不是mini batch。数据输入可以是单个项目，也可以是项目的元组，但应为 `支持的类型 <http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types>`_ （如, numpy 1d array of float32, int, list of int）
+从iterable生成的元素应该是单个数据条目，而不是mini batch。数据输入可以是单个项目，也可以是项目的元组，但应为 `支持的类型 <../../user_guides/howto/prepare_data/feeding_data.html#fluid>`_ （如, numpy 1d array of float32, int, list of int）
 单项目数据读取器创建者的示例实现：

--- a/doc/fluid/api_cn/fluid_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn.rst
@@ -91,7 +91,7 @@ download_data是用于分布式训练的默认下载方法，用户可不使用
  - **local_path** （str） - 下载数据路径
  - **fs_default_name** （str） - 文件系统服务器地址
  - **ugi** （str） -  hadoop ugi
-  - **file_cn** （int） - 用户可以指定用于调试的文件号
+  - **file_cnt** （int） - 用户可以指定用于调试的文件号
  - **hadoop_home** （str） -  hadoop home path
  - **process_num** （int） - 下载进程号
@@ -188,8 +188,15 @@ str类型。在 ``ParallelExecutor`` 中，存在两种减少策略（reduce str
 .. py:attribute:: remove_unnecessary_lock
-BOOL类型。如果设置为True, GPU操作中的一些锁将被释放，ParallelExecutor将运行得更快，默认为 False。
+BOOL类型。如果设置为True, GPU操作中的一些锁将被释放，ParallelExecutor将运行得更快，默认为 True。
+.. py:attribute:: sync_batch_norm
+类型为bool，sync_batch_norm表示是否使用同步的批正则化，即在训练阶段通过多个设备同步均值和方差。
+当前的实现不支持FP16培训和CPU。仅在一台机器上进行同步式批正则，不适用于多台机器。
+默认为 False。
 .. _cn_api_fluid_CompiledProgram:
@@ -197,9 +204,9 @@ BOOL类型。如果设置为True, GPU操作中的一些锁将被释放，Paralle
 CompiledProgram
 -------------------------------
-.. py:class:: paddle.fluid.CompiledProgram(program)
+.. py:class:: paddle.fluid.CompiledProgram(program_or_graph)
-编译一个接着用来执行的Program。
+编译成一个用来执行的Graph。
 1. 首先使用layers(网络层)创建程序。
 2. （可选）可使用CompiledProgram来在运行之前优化程序。
@@ -226,9 +233,9 @@ CompiledProgram用于转换程序以进行各种优化。例如，
                                     fetch_list=[loss.name])
 参数：
-  - **program** : 一个Program对象，承载着用户定义的模型计算逻辑
+  - **program_or_graph** (Graph|Program): 如果它是Program，那么它将首先被降成一个graph，以便进一步优化。如果它是一个graph（以前可能优化过），它将直接用于进一步的优化。注意：只有使用 with_data_parallel 选项编译时才支持graph。
-.. py:method:: with_data_parallel(loss_name=None, build_strategy=None, exec_strategy=None, share_vars_from=None)
+.. py:method:: with_data_parallel(loss_name=None, build_strategy=None, exec_strategy=None, share_vars_from=None, places=None)
 配置Program使其以数据并行方式运行。
@@ -237,6 +244,7 @@ CompiledProgram用于转换程序以进行各种优化。例如，
  - **build_strategy** （BuildStrategy） -  build_strategy用于构建图，因此它可以在具有优化拓扑的多个设备/核上运行。 有关更多信息，请参阅  ``fluid.BuildStrategy`` 。 默认None。
  - **exec_strategy** （ExecutionStrategy） -  exec_strategy用于选择执行图的方式，例如使用多少线程，每次清理临时变量之前进行的迭代次数。 有关更多信息，请参阅 ``fluid.ExecutionStrategy`` 。 默认None。
  - **share_vars_from** （CompiledProgram） - 如果有，此CompiledProgram将共享来自share_vars_from的变量。 share_vars_from指定的Program必须由此CompiledProgram之前的Executor运行，以便vars准备就绪。
+  - **places** （list(CUDAPlace)|list(CPUPlace)|None） - 如果提供，则仅在给定位置编译程序。否则，编译时使用的位置由Executor确定，使用的位置由环境变量控制：如果使用GPU，则标记FLAGS_selected_gpus或CUDA_VISIBLE_DEVICES设备；如果使用CPU，则标记CPU_NUM。例如，如果要在GPU 0和GPU 1上运行，请设置places=[fluid.CUDAPlace(0), fluid.CUDAPlace(1)]。如果要在2个CPU核心上运行，请设置places=[fluid.CPUPlace()]*2。
 返回: self
@@ -394,6 +402,7 @@ cuda_places
 创建 ``fluid.CUDAPlace`` 对象列表。
 如果 ``device_ids`` 为None，则首先检查 ``FLAGS_selected_gpus`` 的环境变量。如果 ``FLAGS_selected_gpus=0,1,2`` ，则返回的列表将为[fluid.CUDAPlace(0), fluid.CUDAPlace(1), fluid.CUDAPlace(2)]。如果未设置标志 ``FLAGS_selected_gpus`` ，则将返回所有可见的GPU places。
@@ -437,6 +446,7 @@ CUDAPlace是一个设备描述符，它代表一个GPU，并且每个CUDAPlace
 .. _cn_api_fluid_DataFeedDesc:
 DataFeedDesc
@@ -669,7 +679,7 @@ reader通常返回一个minibatch条目列表。在列表中每一条目都是
 参数：
        - **reader** (fun) – 该参数是一个可以生成数据的函数
        - **multi_devices** (bool) – bool型，指明是否使用多个设备
-        - **num_places** (int) – 如果 ``multi_devices`` 为 ``True`` , 可以使用此参数来设置GPU数目。如果 ``num_places`` 为 ``None`` ，该函数默认使用当前训练机所有GPU设备。默认为None。
+        - **num_places** (int) – 如果 ``multi_devices`` 为 ``True`` , 可以使用此参数来设置GPU数目。如果 ``multi_devices`` 为 ``None`` ，该函数默认使用当前训练机所有GPU设备。默认为None。
        - **drop_last** (bool) – 如果最后一个batch的大小比 ``batch_size`` 要小，则可使用该参数来指明是否选择丢弃最后一个batch数据。 默认为 ``True`` 
 返回：转换结果
@@ -979,7 +989,7 @@ Executor
-执行引擎（Executor）使用python脚本驱动，仅支持在单GPU环境下运行。多卡环境下请参考 ``ParallelExecutor`` 。
+执行引擎（Executor）使用python脚本驱动，支持在单/多GPU、单/多CPU环境下运行。
 Python Executor可以接收传入的program,并根据feed map(输入映射表)和fetch_list(结果获取表)
 向program中添加feed operators(数据输入算子)和fetch operators（结果获取算子)。
 feed map为该program提供输入数据。fetch_list提供program训练结束后用户预期的变量（或识别类场景中的命名）。
@@ -990,7 +1000,6 @@ Executor将全局变量存储到全局作用域中，并为临时变量创建局
 当每一mini-batch上的前向/反向运算完成后，局部作用域的内容将被废弃，
 但全局作用域中的变量将在Executor的不同执行过程中一直存在。
-program中所有的算子会按顺序执行。
 **示例代码**
@@ -1023,13 +1032,12 @@ program中所有的算子会按顺序执行。
-提示：你可以用 ``Executor`` 来调试基于并行GPU实现的复杂网络，他们有完全一样的参数也会产生相同的结果。
 .. py:method:: close()
-关闭这个执行器(Executor)。调用这个方法后不可以再使用这个执行器。 对于分布式训练, 该函数会释放在PServers上涉及到目前训练器的资源。
+关闭这个执行器(Executor)。
+调用这个方法后不可以再使用这个执行器。 对于分布式训练, 该函数会释放在PServers上和目前Trainer有关联的资源。
 **示例代码**
@@ -1054,12 +1062,12 @@ feed map为该program提供输入数据。fetch_list提供program训练结束后
 参数：  
 	- **program** (Program|CompiledProgram) – 需要执行的program,如果没有给定那么默认使用default_main_program (未编译的)
 	- **feed** (dict) – 前向输入的变量，数据,词典dict类型, 例如 {“image”: ImageData, “label”: LabelData}
-	- **fetch_list** (list) – 用户想得到的变量或者命名的列表, run会根据这个列表给与结果
+	- **fetch_list** (list) – 用户想得到的变量或者命名的列表, 该方法会根据这个列表给出结果
 	- **feed_var_name** (str) – 前向算子(feed operator)变量的名称
 	- **fetch_var_name** (str) – 结果获取算子(fetch operator)的输出变量名称
 	- **scope** (Scope) – 执行这个program的域，用户可以指定不同的域。缺省为全局域
 	- **return_numpy** (bool) – 如果为True,则将结果张量（fetched tensor）转化为numpy
-	- **use_program_cache** (bool) – 当program较上次比没有改动则将其置为True
+	- **use_program_cache** (bool) – 是否跨批使用缓存程序设置。设置为True时，只有当（1）程序没有用数据并行编译，并且（2）program、 feed变量名和fetch_list变量名与上一步相比没有更改时，运行速度才会更快。
 返回:	根据fetch_list来获取结果

--- a/doc/fluid/api_cn/index_cn.rst
+++ b/doc/fluid/api_cn/index_cn.rst
@@ -5,12 +5,13 @@ API
 ..  toctree::
    :maxdepth: 1
-    ../api_guides/index.rst
+    ../api_guides/index_cn.rst
    fluid_cn.rst
    average_cn.rst
    backward_cn.rst
    clip_cn.rst
    data_feeder_cn.rst
+    dataset_cn.rst
    executor_cn.rst
    initializer_cn.rst
    io_cn.rst
@@ -21,6 +22,5 @@ API
    profiler_cn.rst
    regularizer_cn.rst
    transpiler_cn.rst
-    dataset_cn.rst
    data/dataset_cn.rst
    data/data_reader_cn.rst
--- a/doc/fluid/api_cn/io_cn.rst
+++ b/doc/fluid/api_cn/io_cn.rst
@@ -289,8 +289,6 @@ PyReader
  - **reader** (generator)  – 返回LoDTensor类型的批处理数据的Python生成器
  - **places** (None|list(CUDAPlace)|list(CPUPlace)) –  位置列表。当PyReader可迭代时必须被提供
 .. _cn_api_fluid_io_save_inference_model:
 save_inference_model
@@ -313,7 +311,9 @@ save_inference_model
  - **params_filename** (str|None) – 保存所有相关参数的文件名称。如果设置为None，则参数将保存在单独的文件中。
  - **export_for_deployment** (bool) – 如果为真，Program将被修改为只支持直接预测部署的Program。否则，将存储更多的信息，方便优化和再训练。目前只支持True。
-返回: None
+返回: 获取的变量名列表
+返回类型：target_var_name_list(list)
 抛出异常：
 - ``ValueError`` – 如果 ``feed_var_names`` 不是字符串列表
@@ -406,8 +406,9 @@ save_persistables
    exe = fluid.Executor(fluid.CPUPlace())
    param_path = "./my_paddle_model"
    prog = fluid.default_main_program()
+    # `prog` 可以是由用户自定义的program
    fluid.io.save_persistables(executor=exe, dirname=param_path,
-                               main_program=None)
+                               main_program=prog)

--- a/doc/fluid/api_cn/layers_cn.rst
+++ b/doc/fluid/api_cn/layers_cn.rst
@@ -205,13 +205,13 @@ memory用于缓存分段数据。memory的初始值可以是零，也可以是
 .. note::
    目前不支持在DynamicRNN中任何层上配置 is_sparse = True
-.. py:method:: step_input(x)
+.. py:method:: step_input(x, level=0)
    将序列标记为动态RNN输入。
 参数:
    	- **x** (Variable) - 输入序列
+      - **level** (int) - 用于拆分步骤的LOD层级，默认值0
 返回:当前的输入序列中的timestep。
@@ -1087,8 +1087,7 @@ py_reader
 	    except fluid.core.EOFException:
 		reader.reset()
+    fluid.io.save_inference_model(dirname='./model', feeded_var_names=[img, label],target_vars=[loss], executor=fluid.Executor(fluid.CUDAPlace(0)))
 2.训练和测试应使用不同的名称创建两个不同的py_reader，例如：
@@ -1483,7 +1482,7 @@ add_position_encoding
 affine_channel
 -------------------------------
-.. py:function:: paddle.fluid.layers.affine_channel(x, scale=None, bias=None, data_layout='NCHW', name=None)
+.. py:function:: paddle.fluid.layers.affine_channel(x, scale=None, bias=None, data_layout='NCHW', name=None,act=None)
 对输入的每个 channel 应用单独的仿射变换。用于将空间批处理范数替换为其等价的固定变换。
@@ -1495,6 +1494,7 @@ affine_channel
 	- **bias** (Variable):形状为(C)的一维输入，第C个元素是输入的第C个通道的仿射变换的偏置。
 	- **data_layout** (string, default NCHW): NCHW 或 NHWC，如果输入是一个2D张量，可以忽略该参数
 	- **name** (str, default None): 此层的名称
+        - **act** (str, default None): 应用于该层输出的激活函数
 返回： out (Variable): 与x具有相同形状和数据布局的张量。
@@ -1680,11 +1680,11 @@ batch_norm
 参数：
-    - **input** (Variable) - 输入变量，为LoDTensor
+    - **input** (Variable) - 输入变量的排序，可以为 2, 3, 4, 5
    - **act** （string，默认None）- 激活函数类型，linear|relu|prelu|...
-    - **is_test** （bool,默认False） - 标志位，是否用于测试或训练
+    - **is_test** （bool,默认False） - 指示它是否在测试阶段。
-    - **momentum** （float，默认0.9）- （暂无说明，待更新）
+    - **momentum** （float，默认0.9）- 此值用于计算 moving_mean and moving_var. 更新公式为:  :math:`\(moving\_mean = moving\_mean * momentum + new\_mean * (1. - momentum)\)` :math:`\(moving\_var = moving\_var * momentum + new\_var * (1. - momentum)\)` ， 默认值0.9.
-    - **epsilon** （float，默认1e-05）- （暂无说明，待更新）
+    - **epsilon** （float，默认1e-05）- 加在分母上为了数值稳定的值。默认值为1e-5。
    - **param_attr** （ParamAttr|None） - batch_norm参数范围的属性，如果设为None或者是ParamAttr的一个属性，batch_norm创建ParamAttr为param_attr。如果没有设置param_attr的初始化函数，参数初始化为Xavier。默认：None
    - **bias_attr** （ParamAttr|None） - batch_norm bias参数的属性，如果设为None或者是ParamAttr的一个属性，batch_norm创建ParamAttr为bias_attr。如果没有设置bias_attr的初始化函数，参数初始化为0。默认：None
    - **data_layout** （string,默认NCHW) - NCHW|NHWC
@@ -1741,7 +1741,7 @@ beam_search
 参数:
  - **pre_ids** （Variable） -  LodTensor变量，它是上一步 ``beam_search`` 的输出。在第一步中。它应该是LodTensor，shape为 :math:`(batch\_size，1)` ， :math:`lod [[0,1，...，batch\_size]，[0,1，...，batch\_size]]`
  - **pre_scores** （Variable） -  LodTensor变量，它是上一步中beam_search的输出
-  - **ids** （Variable） - 包含候选ID的LodTensor变量。shpae为 :math:`（batch\_size×beam\_ize，K）` ，其中 ``K`` 应该是 ``beam_size``
+  - **ids** （Variable） - 包含候选ID的LodTensor变量。shape为 :math:`（batch\_size×beam\_ize，K）` ，其中 ``K`` 应该是 ``beam_size``
  - **scores** （Variable） - 与 ``ids`` 及其shape对应的累积分数的LodTensor变量, 与 ``ids`` 的shape相同。
  - **beam_size** （int） - 束搜索中的束宽度。
  - **end_id** （int） - 结束标记的id。
@@ -2763,7 +2763,7 @@ ctc_greedy_decoder
 data_norm
 -------------------------------
-.. py:function:: paddle.fluid.layers.data_norm(input, act=None, epsilon=1e-05, param_attr=None, data_layout='NCHW', in_place=False, use_mkldnn=False, name=None, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=False)
+.. py:function:: paddle.fluid.layers.data_norm(input, act=None, epsilon=1e-05, param_attr=None, data_layout='NCHW', in_place=False, name=None, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=False)
 **数据正则化层**
@@ -2790,7 +2790,6 @@ data_norm
  - **param_attr** （ParamAttr） - 参数比例的参数属性。
  - **data_layout** （string，默认NCHW） -  NCHW | NHWC
  - **in_place** （bool，默认值False） - 使data_norm的输入和输出复用同一块内存。
-  - **use_mkldnn** （bool，默认为false） -  是否使用mkldnn
  - **name** （string，默认None） - 此层的名称（可选）。 如果设置为None，则将自动命名该层。
  - **moving_mean_name** （string，Default None） - 存储全局Mean的moving_mean的名称。
  - **moving_variance_name** （string，默认None） - 存储全局Variance的moving_variance的名称。
@@ -2884,7 +2883,7 @@ dropout op可以从Program中删除，提高执行效率。
         - train: out = input * mask
-         - inference: out = input * dropout_prob 
+         - inference: out = input * (1.0 - dropout_prob) 
         (mask是一个张量，维度和输入维度相同，值为0或1，值为0的比例即为 ``dropout_prob`` )
@@ -3111,7 +3110,7 @@ W 代表了权重矩阵(weight matrix)，例如 :math:`W_{xi}` 是从输入门
 dynamic_lstmp
 -------------------------------
-.. py:function:: paddle.fluid.layers.dynamic_lstmp(input, size, proj_size, param_attr=None, bias_attr=None, use_peepholes=True, is_reverse=False, gate_activation='sigmoid', cell_activation='tanh', candidate_activation='tanh', proj_activation='tanh', dtype='float32', name=None)
+.. py:function:: paddle.fluid.layers.dynamic_lstmp(input, size, proj_size, param_attr=None, bias_attr=None, use_peepholes=True, is_reverse=False, gate_activation='sigmoid', cell_activation='tanh', candidate_activation='tanh', proj_activation='tanh', dtype='float32', name=None, h_0=None, c_0=None, cell_clip=None, proj_clip=None)
 动态LSTMP层(Dynamic LSTMP Layer)
@@ -3180,6 +3179,10 @@ LSTMP层(具有循环映射的LSTM)在LSTM层后有一个分离的映射层，
    - **proj_activation** (str) - 投影输出的激活函数。Choices = [“sigmoid”，“tanh”，“relu”，“identity”]，默认“tanh”。
    - **dtype** (str) - 数据类型。Choices = [“float32”，“float64”]，默认“float32”。
    - **name** (str|None) - 该层名称（可选）。若设为None，则自动为该层命名。
+    - **h_0** (Variable) - 初始隐藏状态是可选输入，默认为0。这是一个具有形状的张量(N x D)，其中N是批大小，D是投影大小。 
+    - **c_0** (Variable) - 初始cell状态是可选输入，默认为0。这是一个具有形状(N x D)的张量，其中N是批大小。h_0和c_0可以为空，但只能同时为空。
+    - **cell_clip** (float) - 如果提供该参数，则在单元输出激活之前，单元状态将被此值剪裁。 
+    - **proj_clip** (float) - 如果 num_proj > 0 并且 proj_clip 被提供,那么将投影值沿元素方向剪切到[-proj_clip，proj_clip]内
 返回：含有两个输出变量的元组，隐藏状态（hidden state）的投影和LSTMP的cell状态。投影的shape为（T*P），cell state的shape为（T*D），两者的LoD和输入相同。
@@ -3793,9 +3796,17 @@ fc
 **全连接层**
-该函数在神经网络中建立一个全连接层。 它可以同时将多个tensor（ ``input`` 可使用多个tensor组成的一个list，详见参数说明）作为自己的输入，并为每个输入的tensor创立一个变量，称为“权”（weights），等价于一个从每个输入单元到每个输出单元的全连接权矩阵。FC层用每个tensor和它对应的权相乘得到输出tensor。如果有多个输入tensor，那么多个乘法运算将会加在一起得出最终结果。如果 ``bias_attr`` 非空，则会新创建一个偏向变量（bias variable），并把它加入到输出结果的运算中。最后，如果 ``act`` 非空，它也会加入最终输出的计算中。
+该函数在神经网络中建立一个全连接层。 它可以将一个或多个tensor（ ``input`` 可以是一个list或者Variable，详见参数说明）作为自己的输入，并为每个输入的tensor创立一个变量，称为“权”（weights），等价于一个从每个输入单元到每个输出单元的全连接权矩阵。FC层用每个tensor和它对应的权相乘得到形状为[M, size]输出tensor，M是批大小。如果有多个输入tensor，那么形状为[M, size]的多个输出张量的结果将会被加起来。如果 ``bias_attr`` 非空，则会新创建一个偏向变量（bias variable），并把它加入到输出结果的运算中。最后，如果 ``act`` 非空，它也会加入最终输出的计算中。
+当输入为单个张量：
+.. math::
+        \\Out = Act({XW + b})\\
-这个过程可以通过如下公式表现：
+当输入为多个张量：
 .. math::
@@ -3803,13 +3814,29 @@ fc
 上述等式中：
-  - :math:`N` ：输入tensor的数目
+  - :math:`N` ：输入的数目,如果输入是变量列表，N等于len（input）
-  - :math:`X_i` : 输入的tensor
+  - :math:`X_i` : 第i个输入的tensor
-  - :math:`W` ：该层创立的权
+  - :math:`W_i` ：对应第i个输入张量的第i个权重矩阵
  - :math:`b` ：该层创立的bias参数
  - :math:`Act` : activation function(激励函数)
  - :math:`Out` : 输出tensor
+::
+            Given:
+                data_1.data = [[[0.1, 0.2],
+                               [0.3, 0.4]]]
+                data_1.shape = (1, 2, 2) # 1 is batch_size
+                data_2 = [[[0.1, 0.2, 0.3]]]
+                data_2.shape = (1, 1, 3)
+                out = fluid.layers.fc(input=[data_1, data_2], size=2)
+            Then:
+                out.data = [[0.18669507, 0.1893476]]
+                out.shape = (1, 2)
 参数:
  - **input** (Variable|list of Variable) – 该层的输入tensor(s)（张量），其维度至少是2
@@ -3832,9 +3859,15 @@ fc
 ..  code-block:: python
+         # 当输入为单个张量时
        data = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
        fc = fluid.layers.fc(input=data, size=1000, act="tanh")
+        # 当输入为多个张量时
+        data_1 = fluid.layers.data(name="data_1", shape=[32, 32], dtype="float32")
+        data_2 = fluid.layers.data(name="data_2", shape=[24, 36], dtype="float32")
+        fc = fluid.layers.fc(input=[data_1, data_2], size=1000, act="tanh")
@@ -6273,7 +6306,6 @@ pad_constant_like
              [[41, 42, 43]]]]
        Y.shape = (1, 3, 1, 3)
 参数：
          - **x** （Variable）- 输入Tensor变量。
          - **y** （Variable）- 输出Tensor变量。
@@ -7222,7 +7254,7 @@ align_corners和align_mode是可选参数，插值的计算方法可以由它们
 参数:
-    - **input** (Variable) - 双线性插值的输入张量，是一个shpae为(N x C x h x w)的4d张量。
+    - **input** (Variable) - 双线性插值的输入张量，是一个shape为(N x C x h x w)的4d张量。
    - **out_shape** (Variable) - 一维张量，包含两个数。第一个数是高度，第二个数是宽度。
    - **scale** (float|None) - 用于输入高度或宽度的乘数因子。out_shape和scale至少要设置一个。out_shape的优先级高于scale。默认值:None。
    - **name** (str|None) - 输出变量名。
@@ -8566,7 +8598,6 @@ shape层。
 返回类型：    Variable
 **代码示例：**
 .. code-block:: python
@@ -9026,6 +9057,7 @@ softmax_with_cross_entropy
 参数:
  - **logits** (Variable) - 未标准化(unscaled)的log概率,一个形为 N X K 的二维张量。 N是batch大小，K是类别总数。
  - **label** (Variable) - 2-D 张量，代表了正确标注（ground truth）, 如果 ``soft_label`` 为  False，则该参数是一个形为 N X 1 的Tensor<int64> 。如果 ``soft_label`` 为 True，它是 Tensor<float/double> ，形为 N X K 。
  - **soft_label** (bool) - 是否将输入标签当作软标签。默认为False。
@@ -9356,6 +9388,7 @@ stack
        Out.dims = [1, 3, 2]
 参数:	
  - **x** (Variable|list(Variable)|tuple(Variable)) – 输入变量
  - **axis** (int|None) – 对输入进行stack运算所在的轴
@@ -9875,6 +9908,7 @@ abs
    out = |x|
 参数:
    - **x** - abs算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -9958,6 +9992,7 @@ ceil
 参数:
    - **x** - Ceil算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -9988,6 +10023,7 @@ Cosine余弦激活函数。
 参数:
    - **x** - cos算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -10048,6 +10084,7 @@ Exp激活函数(Exp指以自然常数e为底的指数运算)。
    out = e^x
 参数:
    - **x** - Exp算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -10078,6 +10115,7 @@ floor
 参数:
    - **x** - Floor算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -10150,6 +10188,7 @@ Logsigmoid激活函数。
 参数:
    - **x** - LogSigmoid算子的输入
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn 
 返回：        LogSigmoid算子的输出
@@ -10177,6 +10216,7 @@ Reciprocal（取倒数）激活函数
    out = \frac{1}{x}
 参数:
    - **x** - reciprocal算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn 
@@ -10191,9 +10231,6 @@ Reciprocal（取倒数）激活函数
 .. _cn_api_fluid_layers_round:
 round
@@ -10209,6 +10246,7 @@ Round取整激活函数。
 参数:
    - **x** - round算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn 
@@ -10237,6 +10275,7 @@ sigmoid激活函数
 参数:
    - **x** - Sigmoid算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -10265,6 +10304,7 @@ sin
 参数:
    - **x** - sin算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -10382,6 +10422,7 @@ sqrt
    out = \sqrt{x}
 参数:
    - **x** - Sqrt算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -10440,6 +10481,7 @@ tanh 激活函数。
 参数:
    - **x** - Tanh算子的输入  
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -10468,6 +10510,7 @@ tanh_shrink激活函数。
    out = x - \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}
 参数:
    - **x** - TanhShrink算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
@@ -12385,7 +12428,7 @@ multi_box_head
 ..  code-block:: python
        mbox_locs, mbox_confs, box, var = fluid.layers.multi_box_head(
-          inputs=[conv1, conv2, conv3, conv4, conv5, conv5],
+          inputs=[conv1, conv2, conv3, conv4, conv5, conv6],
          image=images,
          num_classes=21,
          min_ratio=20,
@@ -12474,7 +12517,7 @@ PolygonBoxTransform 算子。
 参数：
    - **input** （Variable） - shape 为[batch_size，geometry_channels，height，width]的张量
-返回：与输入 shpae 相同
+返回：与输入 shape 相同
 返回类型：output（Variable）
@@ -12871,7 +12914,7 @@ yolo_box
 yolov3_loss
 -------------------------------
-.. py:function:: paddle.fluid.layers.yolov3_loss(x, gtbox, gtlabel, anchors, anchor_mask, class_num, ignore_thresh, downsample_ratio, gtscore=None, use_label_smooth=True, name=None)
+.. py:function:: paddle.fluid.layers.yolov3_loss(x, gt_box, gt_label, anchors, anchor_mask, class_num, ignore_thresh, downsample_ratio, gt_score=None, use_label_smooth=True, name=None)
 该运算通过给定的预测结果和真实框生成yolov3损失。
@@ -12922,15 +12965,15 @@ yolov3_loss
 参数：
    - **x**  (Variable) – YOLOv3损失运算的输入张量，这是一个形状为[N，C，H，W]的四维张量。H和W应该相同，第二维（C）存储框的位置信息，以及每个anchor box的置信度得分和one-hot分类
-    - **gtbox**  (Variable) – 真实框，应该是[N，B，4]的形状。第三维用来承载x、y、w、h，x、y、w、h应该是输入图像相对值。 N是batch size，B是图像中所含有的的最多的box数目
+    - **gt_box**  (Variable) – 真实框，应该是[N，B，4]的形状。第三维用来承载x、y、w、h，x、y、w、h应该是输入图像相对值。 N是batch size，B是图像中所含有的的最多的box数目
-    - **gtlabel**  (Variable) – 真实框的类id，应该形为[N，B]。
+    - **gt_label**  (Variable) – 真实框的类id，应该形为[N，B]。
    - **anchors**  (list|tuple) – 指定anchor框的宽度和高度，它们将逐对进行解析
    - **anchor_mask**  (list|tuple) – 当前YOLOv3损失计算中使用的anchor的mask索引
    - **class_num**  (int) – 要预测的类数
    - **ignore_thresh**  (float) – 一定条件下忽略某框置信度损失的忽略阈值
    - **downsample_ratio**  (int) – 从网络输入到YOLOv3 loss输入的下采样率，因此应为第一，第二和第三个YOLOv3损失运算设置32,16,8
    - **name** (string) – yolov3损失层的命名
-    - **gtscore** （Variable） - 真实框的混合得分，形为[N，B]。 默认None。
+    - **gt_score** （Variable） - 真实框的混合得分，形为[N，B]。 默认None。
    - **use_label_smooth** (bool） - 是否使用平滑标签。 默认为True
@@ -12953,13 +12996,13 @@ yolov3_loss
 .. code-block:: python
    x = fluid.layers.data(name='x', shape=[255, 13, 13], dtype='float32')
-    gtbox = fluid.layers.data(name='gtbox', shape=[6, 4], dtype='float32')
+    gt_box = fluid.layers.data(name='gtbox', shape=[6, 4], dtype='float32')
-    gtlabel = fluid.layers.data(name='gtlabel', shape=[6], dtype='int32')
+    gt_label = fluid.layers.data(name='gtlabel', shape=[6], dtype='int32')
-    gtscore = fluid.layers.data(name='gtscore', shape=[6], dtype='float32')
+    gt_score = fluid.layers.data(name='gtscore', shape=[6], dtype='float32')
    anchors = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326]
    anchor_mask = [0, 1, 2]
-    loss = fluid.layers.yolov3_loss(x=x, gtbox=gtbox, gtlabel=gtlabel,
+    loss = fluid.layers.yolov3_loss(x=x, gt_box=gt_box, gt_label=gt_label,
-                                    gtscore=gtscore, anchors=anchors,
+                                    gt_score=gt_score, anchors=anchors,
                                    anchor_mask=anchor_mask, class_num=80,
                                    ignore_thresh=0.7, downsample_ratio=32)

--- a/doc/fluid/api_cn/profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn.rst
@@ -12,7 +12,7 @@ cuda_profiler
 .. py:function:: paddle.fluid.profiler.cuda_profiler(output_file, output_mode=None, config=None)
-CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行性能分析。分析结果将以键-值对格式或逗号分隔的格式写入output_file。用户可以通过output_mode参数设置输出模式，并通过配置参数设置计数器/选项。默认配置是[' gpustarttimestamp '， ' gpustarttimestamp '， ' gridsize3d '， ' threadblocksize '， ' streamid '， ' enableonstart 0 '， ' conckerneltrace ']。然后，用户可使用 `NVIDIA Visual Profiler <https://developer.nvidia.com/nvidia-visualprofiler>`_ 工具来加载这个输出文件以可视化结果。
+CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行性能分析。分析结果将以键-值对格式或逗号分隔的格式写入output_file。用户可以通过output_mode参数设置输出模式，并通过配置参数设置计数器/选项。默认配置是[' gpustarttimestamp '， ' gpustarttimestamp '， ' gridsize3d '， ' threadblocksize '， ' streamid '， ' enableonstart 0 '， ' conckerneltrace ']。然后，用户可使用 `NVIDIA Visual Profiler <https://developer.nvidia.com/nvidia-visual-profiler>`_ 工具来加载这个输出文件以可视化结果。
 参数:
@@ -67,7 +67,7 @@ profiler
 profile interface 。与cuda_profiler不同，此profiler可用于分析CPU和GPU程序。默认情况下，它记录CPU和GPU kernel，如果想分析其他程序，可以参考教程来在c++代码中添加更多代码。
-如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md <https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md>`_ 
+如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `这里 <../advanced_usage/development/profiling/timeline_cn.html>`_
 参数:
  - **state** (string) –  profiling state, 取值为 'CPU' 或 'GPU',  profiler 使用 CPU timer 或GPU timer 进行 profiling. 虽然用户可能在开始时指定了执行位置(CPUPlace/CUDAPlace)，但是为了灵活性，profiler不会使用这个位置。
@@ -136,7 +136,7 @@ start_profiler
 不能使用 ``fluid.profiler.profiler``
-如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md <https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md>`_ 
+如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `这里 <../advanced_usage/development/profiling/timeline_cn.html>`_
 参数:
  - **state** (string) – profiling state, 取值为 'CPU' 或 'GPU' 或 'All', 'CPU' 代表只分析 cpu. 'GPU' 代表只分析 GPU . 'All' 会产生 timeline.

--- a/doc/fluid/api_guides/X2Paddle/Caffe-Fluid.rst
+++ b/doc/fluid/api_guides/X2Paddle/Caffe-Fluid.rst
+.. _Caffe-Fluid:
+########################
+Caffe-Fluid常用层对应表
+########################
+本文档梳理了Caffe常用Layer与PaddlePaddle API对应关系和差异分析。根据文档对应关系，有Caffe使用经验的用户，可根据对应关系，快速熟悉PaddlePaddle的接口使用。  
+..  csv-table:: 
+    :header: "序号", "Caffe Layer", "Fluid接口", "备注"
+    :widths: 1, 8, 8, 3
+    "1",  "`AbsVal <http://caffe.berkeleyvision.org/tutorial/layers/absval.html>`_", ":ref:`cn_api_fluid_layers_abs`",  "功能一致"
+    "2",  "`Accuracy <http://caffe.berkeleyvision.org/tutorial/layers/accuracy.html>`_", ":ref:`cn_api_fluid_layers_accuracy`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Accuracy.md>`_"
+    "3",  "`ArgMax <http://caffe.berkeleyvision.org/tutorial/layers/argmax.html>`_", ":ref:`cn_api_fluid_layers_argmax`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/ArgMax.md>`_"
+    "4",  "`BatchNorm <http://caffe.berkeleyvision.org/tutorial/layers/batchnorm.html>`_", ":ref:`cn_api_fluid_layers_batch_norm`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/BatchNorm.md>`_"
+    "5",  "`BNLL <http://caffe.berkeleyvision.org/tutorial/layers/bnll.html>`_", ":ref:`cn_api_fluid_layers_softplus`",  "功能一致"
+    "6",  "`Concat <http://caffe.berkeleyvision.org/tutorial/layers/concat.html>`_", ":ref:`cn_api_fluid_layers_concat`",  "功能一致"
+    "7",  "`Convolution <http://caffe.berkeleyvision.org/tutorial/layers/convolution.html>`_", ":ref:`cn_api_fluid_layers_conv2d`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Convolution.md>`_"
+    "8",  "`Crop <http://caffe.berkeleyvision.org/tutorial/layers/crop.html>`_", ":ref:`cn_api_fluid_layers_crop`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Crop.md>`_"
+    "9",  "`Deconvolution <http://caffe.berkeleyvision.org/tutorial/layers/deconvolution.html>`_", ":ref:`cn_api_fluid_layers_conv2d_transpose`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Deconvolution.md>`_"
+    "10",  "`Dropout <http://caffe.berkeleyvision.org/tutorial/layers/dropout.html>`_", ":ref:`cn_api_fluid_layers_dropout`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Dropout.md>`_"
+    "11",  "`Eltwise <http://caffe.berkeleyvision.org/tutorial/layers/eltwise.html>`_",  "无相应接口",  "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Eltwise.md>`_"
+    "12",  "`ELU <http://caffe.berkeleyvision.org/tutorial/layers/elu.html>`_", ":ref:`cn_api_fluid_layers_elu`",  "功能一致"
+    "13",  "`EuclideanLoss <http://caffe.berkeleyvision.org/tutorial/layers/euclideanloss.html>`_", ":ref:`cn_api_fluid_layers_square_error_cost`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/EuclideanLoss.md>`_"
+    "14",  "`Exp <http://caffe.berkeleyvision.org/tutorial/layers/exp.html>`_", ":ref:`cn_api_fluid_layers_exp`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Exp.md>`_"
+    "15",  "`Flatten <http://caffe.berkeleyvision.org/tutorial/layers/flatten.html>`_", ":ref:`cn_api_fluid_layers_reshape`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Flatten.md>`_"
+    "16",  "`InnerProduct <http://caffe.berkeleyvision.org/tutorial/layers/innerproduct.html>`_", ":ref:`cn_api_fluid_layers_fc`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/InnerProduct.md>`_"
+    "17",  "`Input <http://caffe.berkeleyvision.org/tutorial/layers/input.html>`_", ":ref:`cn_api_fluid_layers_data`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Input.md>`_"
+    "18",  "`Log <http://caffe.berkeleyvision.org/tutorial/layers/log.html>`_", ":ref:`cn_api_fluid_layers_log`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Log.md>`_"
+    "19",  "`LRN <http://caffe.berkeleyvision.org/tutorial/layers/lrn.html>`_", ":ref:`cn_api_fluid_layers_lrn`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/LRN.md>`_"
+    "20",  "`Pooling <http://caffe.berkeleyvision.org/tutorial/layers/pooling.html>`_", ":ref:`cn_api_fluid_layers_pool2d`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Pooling.md>`_"
+    "21",  "`Power <http://caffe.berkeleyvision.org/tutorial/layers/power.html>`_", ":ref:`cn_api_fluid_layers_pow`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Power.md>`_"
+    "22",  "`PReLU <http://caffe.berkeleyvision.org/tutorial/layers/prelu.html>`_", ":ref:`cn_api_fluid_layers_prelu`",  "功能一致"
+    "23",  "`Reduction <http://caffe.berkeleyvision.org/tutorial/layers/reduction.html>`_",  "无相应接口",  "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Reduction.md>`_"
+    "24",  "`ReLU <http://caffe.berkeleyvision.org/tutorial/layers/relu.html>`_", ":ref:`cn_api_fluid_layers_leaky_relu`",  "功能一致"
+    "25",  "`Reshape <http://caffe.berkeleyvision.org/tutorial/layers/reshape.html>`_", ":ref:`cn_api_fluid_layers_reshape`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Reshape.md>`_"
+    "26",  "`SigmoidCrossEntropyLoss <http://caffe.berkeleyvision.org/tutorial/layers/sigmoidcrossentropyloss.html>`_", ":ref:`cn_api_fluid_layers_sigmoid_cross_entropy_with_logits`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/SigmoidCrossEntropyLoss.md>`_"
+    "27",  "`Sigmoid <http://caffe.berkeleyvision.org/tutorial/layers/sigmoid.html>`_", ":ref:`cn_api_fluid_layers_sigmoid`",  "功能一致"
+    "28",  "`Slice <http://caffe.berkeleyvision.org/tutorial/layers/slice.html>`_", ":ref:`cn_api_fluid_layers_slice`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Slice.md>`_"
+    "29",  "`SoftmaxWithLoss <http://caffe.berkeleyvision.org/tutorial/layers/softmaxwithloss.html>`_", ":ref:`cn_api_fluid_layers_softmax_with_cross_entropy`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/SofmaxWithLoss.md>`_"
+    "30",  "`Softmax <http://caffe.berkeleyvision.org/tutorial/layers/softmax.html>`_", ":ref:`cn_api_fluid_layers_softmax`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Sofmax.md>`_"
+    "31",  "`TanH <http://caffe.berkeleyvision.org/tutorial/layers/tanh.html>`_", ":ref:`cn_api_fluid_layers_tanh`",  "功能一致"
+    "32",  "`Tile <http://caffe.berkeleyvision.org/tutorial/layers/tile.html>`_", ":ref:`cn_api_fluid_layers_expand`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Tile.md>`_"
--- a/doc/fluid/api_guides/X2Paddle/TensorFlow-Fluid.rst
+++ b/doc/fluid/api_guides/X2Paddle/TensorFlow-Fluid.rst
--- a/doc/fluid/api_guides/high_low_level_api.md
+++ b/doc/fluid/api_guides/high_low_level_api.md
-## High/Low-level API简介
-PaddlePaddle Fluid目前有2套API接口：
- Low-level（底层） API：
-	- 灵活性强并且已经相对成熟，使用它训练的模型，能直接支持C++预测上线。
-	- 提供了大量的模型作为使用示例，包括[Book](https://github.com/PaddlePaddle/book)中的全部章节，以及[models](https://github.com/PaddlePaddle/models)中的所有章节。
-	- 适用人群：对深度学习有一定了解，需要自定义网络进行训练/预测/上线部署的用户。
- High-level（高层）API：
-	- 使用简单
-	- 尚未成熟，接口暂时在[paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib)下面。
--- a/doc/fluid/api_guides/high_low_level_api_en.md
+++ b/doc/fluid/api_guides/high_low_level_api_en.md
-## Introduction to High/Low-level API
-Currently PaddlePaddle Fluid has 2 branches of API interfaces:
- Low-level API:
-	- It is highly flexible and relatively mature. The model trained by it can directly support C++ inference deployment and release.
-	- There are a large number of models as examples, including all chapters in [book](https://github.com/PaddlePaddle/book), and [models](https://github.com/PaddlePaddle/models).
-	- Recommended for users who have a certain understanding of deep learning and need to customize a network for training/inference/online deployment.
- High-level API:
-	- Simple to use
-    - Still under development. the interface is temporarily in [paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib).
\ No newline at end of file
--- a/doc/fluid/api_guides/index.rst
+++ b/doc/fluid/api_guides/index.rst
 ===========
-API使用指南
+API分类检索
 ===========
-API使用指南分功能向您介绍PaddlePaddle Fluid的API体系和用法，帮助您快速了解PaddlePaddle Fluid API的全貌，包括以下几个模块：
+本模块分功能向您介绍PaddlePaddle Fluid的API体系和用法，提高您的查找效率，帮助您快速了解PaddlePaddle Fluid API的全貌，包括以下几个模块：
 ..  toctree::
    :maxdepth: 1
-    high_low_level_api.md
+    low_level/program.rst
    low_level/layers/index.rst
-    low_level/executor.rst
+    low_level/nets.rst
    low_level/optimizer.rst
+    low_level/backward.rst
    low_level/metrics.rst
    low_level/model_save_reader.rst
    low_level/inference.rst
-    low_level/distributed/index.rst
    low_level/memory_optimize.rst
-    low_level/nets.rst
+    low_level/executor.rst
    low_level/parallel_executor.rst
-    low_level/backward.rst
+    low_level/compiled_program.rst
    low_level/parameter.rst
-    low_level/program.rst
+    low_level/distributed/index.rst
+    X2Paddle/TensorFlow-Fluid.rst
+    X2Paddle/Caffe-Fluid.rst
--- a/doc/fluid/api_guides/index_en.rst
+++ b/doc/fluid/api_guides/index_en.rst
-===========
+=================
-API Guides
+API Quick Search
-===========
+=================
 This section introduces the Fluid API structure and usage, to help you quickly get the full picture of the PaddlePaddle Fluid API. This section is divided into the following modules:
 ..  toctree::
    :maxdepth: 1
-    high_low_level_api_en.md
+    low_level/program_en.rst
    low_level/layers/index_en.rst
-    low_level/executor_en.rst
+    low_level/nets_en.rst
    low_level/optimizer_en.rst
+    low_level/backward_en.rst
    low_level/metrics_en.rst
    low_level/model_save_reader_en.rst
    low_level/inference_en.rst
-    low_level/distributed/index_en.rst
    low_level/memory_optimize_en.rst
-    low_level/nets_en.rst
+    low_level/executor_en.rst
    low_level/parallel_executor_en.rst
    low_level/compiled_program_en.rst
-    low_level/backward_en.rst
    low_level/parameter_en.rst
-    low_level/program_en.rst
+    low_level/distributed/index_en.rst
--- a/doc/fluid/api_guides/low_level/compiled_program_cn.rst
+++ b/doc/fluid/api_guides/low_level/compiled_program_cn.rst
--- a/doc/fluid/api_guides/low_level/distributed/async_training.rst
+++ b/doc/fluid/api_guides/low_level/distributed/async_training.rst
@@ -4,7 +4,7 @@
 分布式异步训练
 ############
-Fluid支持数据并行的分布式异步训练，API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的
+Fluid支持数据并行的分布式异步训练，API使用 :code:`DistributeTranspiler` 将单机网络配置转换成可以多机执行的
 :code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码，根据环境变量或启动参数，
 可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid异步训练只支持pserver模式，异步训练和 `同步训练 <../distributed/sync_training.html>`_ 的主要差异在于：异步训练每个trainer的梯度是单独更新到参数上的，
 而同步训练是所有trainer的梯度合并之后统一更新到参数上，因此，同步训练和异步训练的超参数需要分别调节。
@@ -16,10 +16,10 @@ API详细使用方法参考 :ref:`cn_api_fluid_DistributeTranspiler` ，简单
 .. code-block:: python
-    config = fluid.DistributedTranspilerConfig()
+    config = fluid.DistributeTranspilerConfig()
    # 配置策略config
    config.slice_var_up = False
-    t = fluid.DistributedTranspiler(config=config)
+    t = fluid.DistributeTranspiler(config=config)
    t.transpile(trainer_id,
                program=main_program,
                pservers="192.168.0.1:6174,192.168.0.2:6174",

--- a/doc/fluid/api_guides/low_level/distributed/async_training_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/async_training_en.rst
@@ -4,7 +4,7 @@
 Asynchronous Distributed Training
 ####################################
-Fluid supports parallelism asynchronous distributed training. :code:`DistributedTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed. 
+Fluid supports parallelism asynchronous distributed training. :code:`DistributeTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed.
 **Asynchronous distributed training in Fluid only supports the pserver mode** . The main difference between asynchronous training and `synchronous training <../distributed/sync_training_en.html>`_ is that the gradients of each trainer are asynchronously applied on the parameters, but in synchronous training, the gradients of all trainers must be combined first and then they are used to update the parameters. Therefore, the hyperparameters of synchronous training and asynchronous training need to be adjusted separately.
@@ -15,10 +15,10 @@ For detailed API, please refer to :ref:`api_fluid_transpiler_DistributeTranspile
 .. code-block:: python
-	config = fluid.DistributedTranspilerConfig()
+	config = fluid.DistributeTranspilerConfig()
 	#Configuring config policy
 	config.slice_var_up = False
-	t = fluid.DistributedTranspiler(config=config)
+	t = fluid.DistributeTranspiler(config=config)
 	t.transpile(trainer_id,
 				program=main_program,
 				pservers="192.168.0.1:6174,192.168.0.2:6174",

--- a/doc/fluid/api_guides/low_level/distributed/index.rst
+++ b/doc/fluid/api_guides/low_level/distributed/index.rst
@@ -7,8 +7,5 @@
    sync_training.rst
    async_training.rst
-    cpu_train_best_practice.rst
    large_scale_sparse_feature_training.rst
    cluster_train_data_cn.rst
--- a/doc/fluid/api_guides/low_level/distributed/index_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/index_en.rst
@@ -7,7 +7,6 @@ Distributed Training
    sync_training_en.rst
    async_training_en.rst
-    cpu_train_best_practice_en.rst
    large_scale_sparse_feature_training_en.rst
    cluster_train_data_en.rst

--- a/doc/fluid/api_guides/low_level/distributed/sync_training.rst
+++ b/doc/fluid/api_guides/low_level/distributed/sync_training.rst
@@ -4,7 +4,7 @@
 分布式同步训练
 ############
-Fluid支持数据并行的分布式同步训练，API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的
+Fluid支持数据并行的分布式同步训练，API使用 :code:`DistributeTranspiler` 将单机网络配置转换成可以多机执行的
 :code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码，根据环境变量或启动参数，
 可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid分布式同步训练同时支持pserver模式和NCCL2模式，
 在API使用上有差别，需要注意。
@@ -16,10 +16,10 @@ API详细使用方法参考 :ref:`DistributeTranspiler` ，简单实例用法：
 .. code-block:: python
-    config = fluid.DistributedTranspilerConfig()
+    config = fluid.DistributeTranspilerConfig()
    # 配置策略config
    config.slice_var_up = False
-    t = fluid.DistributedTranspiler(config=config)
+    t = fluid.DistributeTranspiler(config=config)
    t.transpile(trainer_id,
                program=main_program,
                pservers="192.168.0.1:6174,192.168.0.2:6174",
@@ -68,7 +68,7 @@ NCCL2模式分布式训练
    config = fluid.DistributeTranspilerConfig()
    config.mode = "nccl2"
-    t = fluid.DistributedTranspiler(config=config)
+    t = fluid.DistributeTranspiler(config=config)
    t.transpile(trainer_id,
                program=main_program,
                startup_program=startup_program,

--- a/doc/fluid/api_guides/low_level/distributed/sync_training_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/sync_training_en.rst
@@ -4,7 +4,7 @@
 Synchronous Distributed Training
 ####################################
-Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributedTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
+Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributeTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
 Distributed training in pserver mode
 ======================================
@@ -13,10 +13,10 @@ For API Reference, please refer to :ref:`DistributeTranspiler`. A simple example
 .. code-block:: python
-	config = fluid.DistributedTranspilerConfig()
+	config = fluid.DistributeTranspilerConfig()
 	#Configuring policy config
 	config.slice_var_up = False
-	t = fluid.DistributedTranspiler(config=config)
+	t = fluid.DistributeTranspiler(config=config)
 	t.transpile(trainer_id,
 				program=main_program,
 				pservers="192.168.0.1:6174,192.168.0.2:6174",
@@ -65,7 +65,7 @@ Use the following code to convert the current :code:`Program` to a Fluid :code:`
 	Config = fluid.DistributeTranspilerConfig()
 	Config.mode = "nccl2"
-	t = fluid.DistributedTranspiler(config=config)
+	t = fluid.DistributeTranspiler(config=config)
 	t.transpile(trainer_id,
 				program=main_program,
 				startup_program=startup_program,

--- a/doc/fluid/api_guides/low_level/executor.rst
+++ b/doc/fluid/api_guides/low_level/executor.rst
@@ -4,11 +4,11 @@
 执行引擎
 ##########
-:code:`Executor` 实现了一个简易的执行器，所有的操作在其中顺序执行。你可以在Python脚本中运行:code:`Executor`。PaddlePaddle Fluid中有两种执行器。一种是:code:`Executor` 默认的单线程执行器，另一种是并行计算执行器，在:ref:`api_guide_parallel_executor`中进行了解释。`Executor`和:ref:`api_guide_parallel_executor`的配置不同，这可能会给部分用户带来困惑。为使执行器更加灵活，我们引入了:ref:`api_guide_compiled_program`，:ref:`api_guide_compiled_program`用于把一个程序转换为不同的优化组合，可以通过:code:`Executor`运行。
+:code:`Executor` 实现了一个简易的执行器，所有的操作在其中顺序执行。你可以在Python脚本中运行 :code:`Executor` 。PaddlePaddle Fluid中有两种执行器。一种是 :code:`Executor` 默认的单线程执行器，另一种是并行计算执行器，在 :ref:`api_guide_parallel_executor` 中进行了解释。``Executor`` 和 :ref:`api_guide_parallel_executor` 的配置不同，这可能会给部分用户带来困惑。为使执行器更加灵活，我们引入了 :ref:`api_guide_compiled_program` ， :ref:`api_guide_compiled_program` 用于把一个程序转换为不同的优化组合，可以通过 :code:`Executor` 运行。
-:code:`Executor`的逻辑非常简单。建议在调试阶段用:code:`Executor`在一台计算机上完整地运行模型，然后转向多设备或多台计算机计算。
+ :code:`Executor` 的逻辑非常简单。建议在调试阶段用 :code:`Executor` 在一台计算机上完整地运行模型，然后转向多设备或多台计算机计算。
-:code:`Executor`在构造时接受一个:code:`Place`，它既可能是:ref:`api_fluid_CPUPlace`也可能是:ref:`api_fluid_CUDAPlace`。
+ :code:`Executor` 在构造时接受一个 :code:`Place` ，它既可能是 :ref:`api_fluid_CPUPlace` 也可能是 :ref:`api_fluid_CUDAPlace` 。
 .. code-block:: python
    # 首先创建Executor。
@@ -21,7 +21,7 @@
    loss, = exe.run(fluid.default_main_program(),
                    feed=feed_dict,
                    fetch_list=[loss.name])
-简单样例请参照 `quick_start_fit_a_line <http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/quick_start/fit_a_line/README.html>`_ 
+简单样例请参照 `basics_fit_a_line <../../beginners_guide/basics/fit_a_line/README.cn.html>`_
 - 相关API :
 - :ref:`cn_api_fluid_Executor`
--- a/doc/fluid/api_guides/low_level/executor_en.rst
+++ b/doc/fluid/api_guides/low_level/executor_en.rst
@@ -25,7 +25,7 @@ The logic of :code:`Executor` is very simple. It is suggested to thoroughly run
                    fetch_list=[loss.name])
-For simple example please refer to `quick_start_fit_a_line <http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/quick_start/fit_a_line/README.html>`_ 
+For simple example please refer to `basics_fit_a_line <../../beginners_guide/basics/fit_a_line/README.html>`_
 - Related API :
 - :ref:`api_fluid_Executor`

--- a/doc/fluid/api_guides/low_level/layers/conv.rst
+++ b/doc/fluid/api_guides/low_level/layers/conv.rst
@@ -32,7 +32,8 @@
  对于depthwise convolution，可以设置groups等于输入通道数，此时，2D卷积的卷积核形状为[C_o, 1, f_h, f_w]。
  对于pointwise convolution，卷积核的形状为[C_o, C_in, 1, 1]。
-  **注意**：Fluid针对depthwise convolution的GPU计算做了高度优化，您可以通过在 :code:`fluid.layers.conv2d`接口设置 :code:`use_cudnn=False`来使用Fluid自身优化的CUDA程序。
+  **注意**：Fluid针对depthwise convolution的GPU计算做了高度优化，您可以通过在
+  :code:`fluid.layers.conv2d` 接口设置 :code:`use_cudnn=False` 来使用Fluid自身优化的CUDA程序。
 - 空洞卷积(dilated convolution):

--- a/doc/fluid/api_guides/low_level/layers/learning_rate_scheduler.rst
+++ b/doc/fluid/api_guides/low_level/layers/learning_rate_scheduler.rst
@@ -38,3 +38,9 @@
 * :code:`append_LARS`: 通过Layer-wise Adaptive Rate Scaling算法获得学习率，相关算法请参考 `《Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation》 <https://arxiv.org/abs/1802.09750>`_ 。
  相关API Reference请参考 :ref:`cn_api_fluid_layers_append_LARS`
+* :code:`cosine_decay`: 余弦衰减，即学习率随step数变化呈余弦函数。
+  相关API Reference请参考 :ref:`cn_api_fluid_layers_cosine_decay`
+* :code:`linear_lr_warmup`: 学习率随step数线性增加到指定学习率。
+  相关API Reference请参考 :ref:`cn_api_fluid_layers_linear_lr_warmup`
--- a/doc/fluid/api_guides/low_level/layers/sparse_update.rst
+++ b/doc/fluid/api_guides/low_level/layers/sparse_update.rst
@@ -37,9 +37,9 @@ API详细使用方法参考 :ref:`cn_api_fluid_layers_embedding` ，以下是一
 以上参数中：
- :code:`is_sparse` ： 反向计算的时候梯度是否为sparse tensor。如果不设置，梯度是一个 `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/prepare_data/lod_tensor.md>`_  。默认为False。
+- :code:`is_sparse` ： 反向计算的时候梯度是否为sparse tensor。如果不设置，梯度是一个 `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/basic_concept/lod_tensor.html>`_  。默认为False。
- :code:`is_distributed` ： 标志是否是用在分布式的场景下。一般大规模稀疏更新（embedding的第0维维度很大，比如几百万以上）才需要设置。具体可以参考大规模稀疏的API guide  :ref:`api_guide_async_training`  。默认为False。
+- :code:`is_distributed` ： 标志是否是用在分布式的场景下。一般大规模稀疏更新（embedding的第0维维度很大，比如几百万以上）才需要设置。具体可以参考大规模稀疏的API guide  :ref:`cn_api_guide_async_training`  。默认为False。
 - API汇总:
 - :ref:`cn_api_fluid_layers_embedding`
--- a/doc/fluid/api_guides/low_level/layers/sparse_update_en.rst
+++ b/doc/fluid/api_guides/low_level/layers/sparse_update_en.rst
@@ -37,7 +37,7 @@ API reference :ref:`api_fluid_layers_embedding` . Here is a simple example:
 The parameters:
- :code:`is_sparse` : Whether the gradient is a sparse tensor in the backward calculation. If not set, the gradient is a `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/prepare_data/lod_tensor.md>`_ . The default is False.
+- :code:`is_sparse` : Whether the gradient is a sparse tensor in the backward calculation. If not set, the gradient is a `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/basic_concept/lod_tensor_en.html>`_ . The default is False.
 - :code:`is_distributed` : Whether the current training is in a distributed scenario. Generally, this parameter can only be set in large-scale sparse updates (the 0th dimension of embedding is very large, such as several million or more). For details, please refer to the large-scale sparse API guide :ref:`api_guide_async_training`. The default is False.

--- a/doc/fluid/api_guides/low_level/nets.rst
+++ b/doc/fluid/api_guides/low_level/nets.rst
@@ -33,8 +33,9 @@ API Reference 请参考 :ref:`cn_api_fluid_nets_img_conv_group`
 :code:`sequence_conv_pool` 是由 :ref:`cn_api_fluid_layers_sequence_conv` 与 :ref:`cn_api_fluid_layers_sequence_pool` 串联而成。
 该模块在 `自然语言处理 <https://zh.wikipedia.org/wiki/自然语言处理>`_ 以及 `语音识别 <https://zh.wikipedia.org/wiki/语音识别>`_ 等领域均有广泛应用，
-比如 `文本分类模型 <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/text_classification/nets.py>`_ , 
+比如 `文本分类模型 <https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/text_classification/nets.py>`_ ,
-`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/tagspace/train.py>`_  以及 `Multi-view Simnet <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/multiview_simnet/nets.py>`_ 等模型。
+`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/tagspace/train.py>`_  以及
+`Multi-view Simnet <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/multiview_simnet/nets.py>`_ 等模型。
 API Reference 请参考 :ref:`cn_api_fluid_nets_sequence_conv_pool`
@@ -55,7 +56,7 @@ API Reference 请参考 :ref:`cn_api_fluid_nets_glu`
 .. math::
 Attention(Q, K, V)= softmax(QK^\mathrm{T})V
-该模块广泛使用在 `机器翻译 <https://zh.wikipedia.org/zh/机器翻译>`_ 的模型中，比如 `Transformer <https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/neural_machine_translation/transformer>`_ 。
+该模块广泛使用在 `机器翻译 <https://zh.wikipedia.org/zh/机器翻译>`_ 的模型中，比如 `Transformer <https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/neural_machine_translation/transformer>`_ 。
 API Reference 请参考 :ref:`cn_api_fluid_nets_scaled_dot_product_attention`
--- a/doc/fluid/api_guides/low_level/nets_en.rst
+++ b/doc/fluid/api_guides/low_level/nets_en.rst
@@ -32,8 +32,8 @@ For API Reference, please refer to :ref:`api_fluid_nets_img_conv_group`
 --------------------
 :code:`sequence_conv_pool` is got by concatenating :ref:`api_fluid_layers_sequence_conv` with :ref:`api_fluid_layers_sequence_pool`.
-The module is widely used in the field of `natural language processing <https://en.wikipedia.org/wiki/Natural_language_processing>`_ and `speech recognition <https://en.wikipedia.org/wiki/Speech_recognition>`_ .  Models such as the `text classification model <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/text_classification/nets.py>`_ ,
+The module is widely used in the field of `natural language processing <https://en.wikipedia.org/wiki/Natural_language_processing>`_ and `speech recognition <https://en.wikipedia.org/wiki/Speech_recognition>`_ .  Models such as the `text classification model <https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/text_classification/nets.py>`_ ,
-`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/tagspace/train.py>`_ and `Multi view Simnet <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/multiview_simnet/nets.py>`_.
+`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/tagspace/train.py>`_ and `Multi view Simnet <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/multiview_simnet/nets.py>`_.
 For API Reference, please refer to :ref:`api_fluid_nets_sequence_conv_pool`
@@ -54,6 +54,6 @@ For the input data :code:`Queries` , :code:`Key` and :code:`Values`, calculate t
 .. math::
 Attention(Q, K, V)= softmax(QK^\mathrm{T})V
-This module is widely used in the model of `machine translation <https://en.wikipedia.org/wiki/Machine_translation>`_, such as `Transformer <https://github.com/PaddlePaddle/models/tree/develop/Fluid/PaddleNLP/neural_machine_translation/transformer>`_ .
+This module is widely used in the model of `machine translation <https://en.wikipedia.org/wiki/Machine_translation>`_, such as `Transformer <https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/neural_machine_translation/transformer>`_ .
 For API Reference, please refer to :ref:`api_fluid_nets_scaled_dot_product_attention`
--- a/doc/fluid/api_guides/low_level/parallel_executor.rst
+++ b/doc/fluid/api_guides/low_level/parallel_executor.rst
@@ -29,7 +29,7 @@
 **注意** ：如果在Reduce模式下使用 :code:`CPU` 多线程执行 :code:`Program` ， :code:`Program` 的参数在多个线程间是共享的，在某些模型上，Reduce模式可以大幅节省内存。
-鉴于模型的执行速率和模型结构及执行器的执行策略有关，:code:`ParallelExecutor` 允许你修改执行器的相关参数，例如线程池的规模( :code:`num_threads` )、为清除临时变量:code:`num_iteration_per_drop_scope`需要进行的循环次数。更多信息请参照:ref:`cn_api_fluid_ExecutionStrategy`。
+鉴于模型的执行速率和模型结构及执行器的执行策略有关，:code:`ParallelExecutor` 允许你修改执行器的相关参数，例如线程池的规模( :code:`num_threads` )、为清除临时变量 :code:`num_iteration_per_drop_scope` 需要进行的循环次数。更多信息请参照 :ref:`cn_api_fluid_ExecutionStrategy` 。
 .. code-block:: python

--- a/doc/fluid/api_guides/low_level/program.rst
+++ b/doc/fluid/api_guides/low_level/program.rst
 .. _api_guide_Program:
-###############################
+#########
-Program/Block/Operator/Variable
+基础概念
-###############################
+#########
 ==================
 Program

--- a/doc/fluid/api_guides/low_level/program_en.rst
+++ b/doc/fluid/api_guides/low_level/program_en.rst
 .. _api_guide_Program_en:
-###############################
+###############
-Program/Block/Operator/Variable
+Basic Concept
-###############################
+###############
 ==================
 Program

--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/README.cn.md
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/README.cn.md
--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/README.md
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/README.md
--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/image
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/image
--- a/doc/fluid/beginners_guide/basics/index.rst
+++ b/doc/fluid/beginners_guide/basics/index.rst
 ################
-深度学习基础
+深度学习基础教程
 ################
-本章由7篇文档组成，它们按照简单到难的顺序排列，将指导您如何使用PaddlePaddle完成基础的深度学习任务
+本章由9篇文档组成，它们按照简单到难的顺序排列，将指导您如何使用PaddlePaddle完成基础的深度学习任务
 本章文档涉及大量了深度学习基础知识，也介绍了如何使用PaddlePaddle实现这些内容，请参阅以下说明了解如何使用：
@@ -15,6 +15,8 @@
 ..  toctree::
    :titlesonly:
+    fit_a_line/README.cn.md
+    recognize_digits/README.cn.md
    image_classification/index.md
    word2vec/index.md
    recommender_system/index.md

--- a/doc/fluid/beginners_guide/basics/index_en.rst
+++ b/doc/fluid/beginners_guide/basics/index_en.rst
-##########################
+############################
 Basic Deep Learning Models
-##########################
+############################
-This section collects six documents arranging from the simplest to the most challenging, which will guide you through the basic deep learning tasks in PaddlePaddle.
+This section collects 8 documents arranging from the simplest to the most challenging, which will guide you through the basic deep learning tasks in PaddlePaddle.
 The documentation in this chapter covers a lot of deep learning basics and how to implement them with PaddlePaddle. See the instructions below for how to use:
@@ -15,6 +15,8 @@ The book you are reading is an "interactive" e-book - each chapter can be run in
 ..  toctree::
    :titlesonly:
+    fit_a_line/README.md
+    recognize_digits/README.md
    image_classification/index_en.md
    word2vec/index_en.md
    recommender_system/index_en.md

--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/README.cn.md
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/README.cn.md
--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/README.md
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/README.md
--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/image
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/image
--- a/doc/fluid/beginners_guide/index.rst
+++ b/doc/fluid/beginners_guide/index.rst
@@ -6,23 +6,24 @@ PaddlePaddle (PArallel Distributed Deep LEarning)是一个易用、高效、灵
 您可参考PaddlePaddle的 `Github <https://github.com/PaddlePaddle/Paddle>`_ 了解详情，也可阅读 `版本说明 <../release_note.html>`_ 了解新版本的特性
-当您第一次来到PaddlePaddle，请您首先阅读以下文档，了解安装方法：
+让我们从这里开始：
-    - `安装说明 <../beginners_guide/install/index_cn.html>`_：我们支持在Ubuntu/CentOS/Windows/MacOS环境上的安装
+    - `快速开始 <../beginners_guide/quick_start.html>`_
-如果您已经具备一定的深度学习基础，第一次使用PaddlePaddle时，可以跟随下列简单的模型案例供您快速上手：
+当您第一次来到PaddlePaddle，请您首先阅读以下文档，了解安装方法：
-    - `Fluid编程指南 <../beginners_guide/programming_guide/programming_guide.html>`_：介绍 Fluid 的基本概念和使用方法
+    - `安装说明 <../beginners_guide/install/index_cn.html>`_：我们支持在Ubuntu/CentOS/Windows/MacOS环境上的安装
-    - `快速入门 <../beginners_guide/quick_start/index.html>`_：提供线性回归和识别数字两个入门级模型，帮助您快速上手训练网络
+这里为您提供了更多学习资料:
    - `深度学习基础 <../beginners_guide/basics/index.html>`_：覆盖图像分类、个性化推荐、机器翻译等多个深度领域的基础知识，提供 Fluid 实现案例
+    - `Fluid编程指南 <../beginners_guide/programming_guide/programming_guide.html>`_：介绍 Fluid 的基本概念和使用方法
 ..  toctree::
    :hidden:
+    quick_start_cn.rst
    install/index_cn.rst
-    quick_start/index.rst
+    basics/index_cn.rst
-    basics/index.rst
    programming_guide/programming_guide.md
--- a/doc/fluid/beginners_guide/index_en.rst
+++ b/doc/fluid/beginners_guide/index_en.rst
@@ -15,8 +15,6 @@ If you have been armed with certain level of deep learning knowledge, and it hap
    - `Programming with Fluid <../beginners_guide/programming_guide/programming_guide_en.html>`_ ： Core concepts and basic usage of Fluid
-    - `Quick Start <../beginners_guide/quick_start/index_en.html>`_： Two easy-to-go models, linear regression model and digit recognition model, are in place to speed up your study of training neural networks
    - `Deep Learning  Basics <../beginners_guide/basics/index_en.html>`_： This section encompasses various fields of fundamental deep learning knowledge, such as image classification, customized recommendation, machine translation, and examples implemented by Fluid are provided.
@@ -24,6 +22,5 @@ If you have been armed with certain level of deep learning knowledge, and it hap
    :hidden:
    install/index_en.rst
-    quick_start/index_en.rst
    basics/index_en.rst
    programming_guide/programming_guide_en.md
--- a/doc/fluid/beginners_guide/install/Tables.md
+++ b/doc/fluid/beginners_guide/install/Tables.md
@@ -111,16 +111,6 @@
 		<td> 是否支持GPU </td>
 		<td> ON </td>
 	</tr>
-	<tr>
-		<td> WITH_C_API </td>
-		<td> 是否仅编译CAPI </td>
-		<td>  OFF </td>
-	</tr>
-		<tr>
-		<td> WITH_DOUBLE </td>
-		<td> 是否使用双精度浮点数 </td>
-		<td> OFF </td>
-	</tr>
 	<tr>
 		<td> WITH_DSO </td>
 		<td> 是否运行时动态加载CUDA动态库，而非静态加载CUDA动态库 </td>
@@ -136,30 +126,11 @@
 		<td> 是否内嵌PYTHON解释器 </td>
 		<td> ON </td>
 	</tr>
-	<tr>
-		<td> WITH_STYLE_CHECK </td>
-		<td> 是否编译时进行代码风格检查 </td>
-		<td> ON </td>
-	</tr>
 	<tr>
 		<td> WITH_TESTING </td>
 		<td> 是否开启单元测试 </td>
 		<td> OFF </td>
 	</tr>
-	<tr>
-		<td> WITH_DOC </td>
-		<td> 是否编译中英文文档 </td>
-		<td> OFF </td>
-	</tr>
-	<tr>
-		<td> WITH_SWIG_PY </td>
-		<td> 是否编译PYTHON的SWIG接口，该接口可用于预测和定制化训练 </td>
-		<td> Auto </td>
-	<tr>
-		<td> WITH_GOLANG </td>
-		<td> 是否编译go语言的可容错parameter server </td>
-		<td> OFF </td>
-	</tr>
 	<tr>
 		<td> WITH_MKL </td>
 		<td> 是否使用MKL数学库，如果为否则是用OpenBLAS </td>
@@ -175,11 +146,6 @@
 		<td> 是否编译带有分布式的版本 </td>
 		<td> OFF </td>
 	</tr>
-	<tr>
-		<td> WITH_RDMA </td>
-		<td> 是否编译支持RDMA的相关部分 </td>
-		<td> OFF </td>
-	</tr>
 	<tr>
 		<td> WITH_BRPC_RDMA </td>
 		<td> 是否使用BRPC RDMA作为RPC协议 </td>
@@ -190,11 +156,6 @@
 		<td> 是否打开预测优化 </td>
 		<td> OFF </td>
 	</tr>
-	<tr>
-		<td> DWITH_ANAKIN </td>
-		<td> 是否编译ANAKIN </td>
-		<td> OFF </td>
-	</tr>
   </tbody>
 </table>
 </p>

--- a/doc/fluid/beginners_guide/install/compile/compile_MacOS.md
+++ b/doc/fluid/beginners_guide/install/compile/compile_MacOS.md
@@ -186,6 +186,7 @@
 			For Python2: cmake .. -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF  -DCMAKE_BUILD_TYPE=Release
 			For Python3: cmake .. -DPY_VERSION=3.5 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
 			 -DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF  -DCMAKE_BUILD_TYPE=Release
 	>`-DPY_VERSION=3.5`请修改为安装环境的Python版本
 10. 使用以下命令来编译：

--- a/doc/fluid/beginners_guide/install/compile/compile_MacOS_en.md
+++ b/doc/fluid/beginners_guide/install/compile/compile_MacOS_en.md
@@ -68,7 +68,7 @@ Once you have **properly installed Docker**, you can start **compiling PaddlePad
 9. Execute cmake:
-	> For details on the compilation options, see the [compilation options table](../Tables.html/#Compile).
+	> For details on the compilation options, see the [compilation options table](../Tables_en.html/#Compile).
 	* For users who need to compile the **CPU version PaddlePaddle**:

--- a/doc/fluid/beginners_guide/install/install_Ubuntu_en.md
+++ b/doc/fluid/beginners_guide/install/install_Ubuntu_en.md
@@ -4,7 +4,7 @@
 This instruction describes how to install PaddlePaddle on a *64-bit desktop or laptop* and Ubuntu system. The Ubuntu systems we support must meet the following requirements:
-Please note: Attempts on other systems may cause the installation to fail. Please ensure that your environment meets the conditions. The installation we provide by default requires your computer processor to support the AVX instruction set. Otherwise, please select the version of `no_avx` in the [latest Release installation package list](./Tables.html/#ciwhls-release).
+Please note: Attempts on other systems may cause the installation to fail. Please ensure that your environment meets the conditions. The installation we provide by default requires your computer processor to support the AVX instruction set. Otherwise, please select the version of `no_avx` in the [latest Release installation package list](./Tables_en.html/#ciwhls-release).
 Under Ubuntu, you can use `cat /proc/cpuinfo | grep avx` to check if your processor supports the AVX instruction set.

--- a/doc/fluid/beginners_guide/programming_guide/programming_guide.md
+++ b/doc/fluid/beginners_guide/programming_guide/programming_guide.md
@@ -236,7 +236,7 @@ Fluid的设计思想类似于高级编程语言C++和JAVA等。程序的执行
 #定义Exector
 cpu = fluid.core.CPUPlace() #定义运算场所，这里选择在CPU下训练
 exe = fluid.Executor(cpu) #创建执行器
-exe.run(fluid.default_startup_program()) #初始化Program
+exe.run(fluid.default_startup_program()) #用来进行初始化的program
 #训练Program，开始计算
 #feed以字典的形式定义了数据传入网络的顺序
@@ -407,17 +407,17 @@ outs = exe.run(
    ```
    可以看到100次迭代后，预测值已经非常接近真实值了，损失值也从初始值9.05下降到了0.01。
-    恭喜您！已经成功完成了第一个简单网络的搭建，想尝试线性回归的进阶版——房价预测模型，请阅读：[线性回归](../../beginners_guide/quick_start/fit_a_line/README.cn.html)。更多丰富的模型实例可以在[模型库](../../user_guides/models/index_cn.html)中找到。
+    恭喜您！已经成功完成了第一个简单网络的搭建，想尝试线性回归的进阶版——房价预测模型，请阅读：[线性回归](../../beginners_guide/basics/fit_a_line/README.cn.html)。更多丰富的模型实例可以在[模型库](../../user_guides/models/index_cn.html)中找到。
 <a name="what_next"></a>
 ## What's next
 如果您已经掌握了基本操作，可以进行下一阶段的学习了：
-跟随这一教程将学习到如何对实际问题建模并使用fluid构建模型：[配置简单的网络](../../user_guides/howto/configure_simple_model/index.html)。
+跟随这一教程将学习到如何对实际问题建模并使用fluid构建模型：[配置简单的网络](../../user_guides/howto/configure_simple_model/index_cn.html)。
-完成网络搭建后，可以开始在单机或多机上训练您的网络了，详细步骤请参考[训练神经网络](../../user_guides/howto/training/index.html)。
+完成网络搭建后，可以开始在单机或多机上训练您的网络了，详细步骤请参考[训练神经网络](../../user_guides/howto/training/index_cn.html)。
-除此之外，使用文档模块根据开发者的不同背景划分了三个学习阶段：[新手入门](../../beginners_guide/index.html)、[使用指南](../../user_guides/index.html)和[进阶使用](../../advanced_usage/index.html)。
+除此之外，使用文档模块根据开发者的不同背景划分了三个学习阶段：[新手入门](../../beginners_guide/index_cn.html)、[使用指南](../../user_guides/index_cn.html)和[进阶使用](../../advanced_usage/index_cn.html)。
-如果您希望阅读更多场景下的应用案例，可以跟随导航栏进入[快速入门](../../beginners_guide/quick_start/index.html)和[深度学习基础知识](../../beginners_guide/basics/index.html)。已经具备深度学习基础知识的用户，可以从[使用指南](../../user_guides/index.html)开始阅读。
+如果您希望阅读更多场景下的应用案例，可以参考[深度学习基础教程](../../beginners_guide/basics/index_cn.html)。已经具备深度学习基础知识的用户，可以从[使用指南](../../user_guides/index_cn.html)开始阅读。
--- a/doc/fluid/beginners_guide/programming_guide/programming_guide_en.md
+++ b/doc/fluid/beginners_guide/programming_guide/programming_guide_en.md
@@ -414,7 +414,7 @@ Firstly, define input data format, model structure,loss function and optimized a
    ```
    Now we discover that predicted value is nearly close to real value and the loss value descends from original value 9.05 to 0.01 after iteration for 100 times.
-    Congratulations! You have succeed to create a simple network. If you want to try advanced linear regression —— predict model of housing price, please read [linear regression](../../beginners_guide/quick_start/fit_a_line/README.en.html). More examples of model can be found in [models](../../user_guides/models/index_en.html).
+    Congratulations! You have succeed to create a simple network. If you want to try advanced linear regression —— predict model of housing price, please read [linear regression](../../beginners_guide/basics/fit_a_line/README.en.html). More examples of model can be found in [models](../../user_guides/models/index_en.html).
 <a name="what_next"></a>
 ## What's next
@@ -427,4 +427,4 @@ After the construction of network, you can start training your network in single
 In addition, there are three learning levels in documentation according to developer's background and experience: [Beginner's Guide](../../beginners_guide/index_en.html) , [User Guides](../../user_guides/index_en.html) and [Advanced User Guides](../../advanced_usage/index_en.html).
-If you want to read examples in more application scenarios, you can go to [quick start](../../beginners_guide/quick_start/index_en.html) and [basic knowledge of deep learning](../../beginners_guide/basics/index_en.html) .If you have learned basic knowledge of deep learning, you can read from [user guide](../../user_guides/index_en.html).
+If you want to read examples in more application scenarios, you can go to [basic knowledge of deep learning](../../beginners_guide/basics/index_en.html) .If you have learned basic knowledge of deep learning, you can read from [user guide](../../user_guides/index_en.html).
--- a/doc/fluid/beginners_guide/quick_start/index.rst
+++ b/doc/fluid/beginners_guide/quick_start/index.rst
-########
-快速入门
-########
-欢迎来到快速入门部分，在这里，我们将向您介绍如何通过PaddlePaddle Fluid实现经典的线性回归和手写识别的模型，以下两篇文档将指导您使用真实数据集搭建起模型、进行训练和预测：
-..  toctree::
-    :titlesonly:
-    fit_a_line/README.cn.md
-    recognize_digits/README.cn.md
--- a/doc/fluid/beginners_guide/quick_start/index_en.rst
+++ b/doc/fluid/beginners_guide/quick_start/index_en.rst
-##############
-Quick Start
-##############
-Welcome to Quick Start! 
-This section will tutor you to invent your won models of classical *linear Regression* and *Handwritten Digits Recognition* tasks in PaddlePaddle Fluid. The following tutorials provide details on model definition, training, and inference in a friendly manner based on real-life datasets:
-..  toctree::
-    :titlesonly:
-    fit_a_line/README.md
-    recognize_digits/README.md
--- a/doc/fluid/beginners_guide/quick_start.rst
+++ b/doc/fluid/beginners_guide/quick_start.rst
--- a/doc/fluid/index_cn.rst
+++ b/doc/fluid/index_cn.rst
@@ -11,8 +11,8 @@
    :maxdepth: 1
-    beginners_guide/index.rst
+    beginners_guide/index_cn.rst
-    user_guides/index.rst
+    user_guides/index_cn.rst
-    advanced_usage/index.rst
+    advanced_usage/index_cn.rst
    api_cn/index_cn.rst
-    release_note.rst
+    release_note_cn.rst
--- a/doc/fluid/release_note.rst
+++ b/doc/fluid/release_note.rst
--- a/doc/fluid/user_guides/howto/configure_simple_model/index.rst
+++ b/doc/fluid/user_guides/howto/configure_simple_model/index.rst
--- a/doc/fluid/user_guides/howto/evaluation_and_debugging/index.rst
+++ b/doc/fluid/user_guides/howto/evaluation_and_debugging/index.rst
--- a/doc/fluid/user_guides/howto/prepare_data/feeding_data.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/feeding_data.rst
@@ -53,6 +53,8 @@ PaddlePaddle Fluid支持使用 :code:`fluid.layers.data()` 配置数据层；
 .. code-block:: python
   exe = fluid.Executor(fluid.CPUPlace())
+   # init Program
+   exe.run(fluid.default_startup_program())
   exe.run(feed={
      "image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
      "label": numpy.random.random(size=(32, 1)).astype('int64')
@@ -84,7 +86,7 @@ PaddlePaddle Fluid支持使用 :code:`fluid.layers.data()` 配置数据层；
   exe.run(feed={
     "sentence": create_lod_tensor(
       data=numpy.array([1, 3, 4, 5, 3, 6, 8], dtype='int64').reshape(-1, 1),
-       lod=[[4, 1, 2]],
+       recursive_seq_lens=[[4, 1, 2]],
       place=fluid.CPUPlace()
     )
   })

--- a/doc/fluid/user_guides/howto/prepare_data/feeding_data_en.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/feeding_data_en.rst
@@ -45,6 +45,8 @@ For example:
 .. code-block:: python
   exe = fluid.Executor(fluid.CPUPlace())
+   # init Program
+   exe.run(fluid.default_startup_program())
   exe.run(feed={
      "image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
      "label": numpy.random.random(size=(32, 1)).astype('int64')
@@ -81,7 +83,7 @@ For example:
   exe.run(feed={
     "sentence": create_lod_tensor(
       data=numpy.array([1, 3, 4, 5, 3, 6, 8], dtype='int64').reshape(-1, 1),
-       lod=[4, 1, 2],
+       recursive_seq_lens=[[4, 1, 2]],
       place=fluid.CPUPlace()
     )
   })

--- a/doc/fluid/user_guides/howto/prepare_data/index.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/index.rst
--- a/doc/fluid/user_guides/howto/training/deploy_ctr_on_baidu_cloud_cn.rst
+++ b/doc/fluid/user_guides/howto/training/deploy_ctr_on_baidu_cloud_cn.rst
 ..  _deploy_ctr_on_baidu_cloud_cn:
-百度云分布式训练CTR
+在百度云分布式训练CTR
 =========================
 Fluid支持数据并行的分布式训练，也支持基于Kubernetes的分布式部署。本文以百度云为例，说明如何通过在云服务器上分布式训练Click-Through-Rate（以下简称ctr）任务。

--- a/doc/fluid/user_guides/howto/training/index.rst
+++ b/doc/fluid/user_guides/howto/training/index.rst
@@ -7,6 +7,6 @@ PaddlePaddle Fluid支持单机训练和多节点训练。每种训练模式下
 .. toctree::
   :maxdepth: 1
-   single_node
+   single_node.rst
-   multi_node
+   multi_node.rst
-   save_load_variables
+   save_load_variables.rst
--- a/doc/fluid/user_guides/howto/training/multi_node.rst
+++ b/doc/fluid/user_guides/howto/training/multi_node.rst
@@ -8,3 +8,4 @@
    cluster_quick_start.rst
    cluster_howto.rst
    train_on_baidu_cloud_cn.rst
+    deploy_ctr_on_baidu_cloud_cn.rst
--- a/doc/fluid/user_guides/howto/training/save_load_variables.rst
+++ b/doc/fluid/user_guides/howto/training/save_load_variables.rst
 .. _user_guide_save_load_vars:
-##################
+#############################
-模型/变量的保存、载入与增量训练
+模型/变量的保存/载入与增量训练
-##################
+#############################
 模型变量分类
 ############

--- a/doc/fluid/user_guides/howto/training/single_node.rst
+++ b/doc/fluid/user_guides/howto/training/single_node.rst
@@ -77,7 +77,7 @@
 多卡训练
 #######################
-在多卡训练中，你可以使用:code:`fluid.compiler.CompiledProgram`来编译:code:`fluid.Program`，然后调用:code:`with_data_parallel`。例如：
+在多卡训练中，你可以使用 :code:`fluid.compiler.CompiledProgram` 来编译 :code:`fluid.Program` ，然后调用 :code:`with_data_parallel` 。例如：
 .. code-block:: python
@@ -93,9 +93,9 @@
 注释：
-1. :ref:`cn_api_fluid_CompiledProgram`的构造函数需要经过:code:`fluid.Program`设置后运行，这在运行时内无法被修改。
+1. :ref:`cn_api_fluid_CompiledProgram` 的构造函数需要经过 :code:`fluid.Program` 设置后运行，这在运行时内无法被修改。
-2. 如果:code:`exe`是用CUDAPlace来初始化的，模型会在GPU中运行。在显卡训练模式中，所有的显卡都将被占用。用户可以配置 `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ 以更改被占用的显卡。
+2. 如果 :code:`exe` 是用CUDAPlace来初始化的，模型会在GPU中运行。在显卡训练模式中，所有的显卡都将被占用。用户可以配置 `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ 以更改被占用的显卡。
-3. 如果:code:`exe`是用CPUPlace来初始化的，模型会在CPU中运行。在这种情况下，多线程用于运行模型，同时线程的数目和逻辑核的数目相等。用户可以配置`CPU_NUM`以更改使用中的线程数目。
+3. 如果 :code:`exe` 是用CPUPlace来初始化的，模型会在CPU中运行。在这种情况下，多线程用于运行模型，同时线程的数目和逻辑核的数目相等。用户可以配置 ``CPU_NUM`` 以更改使用中的线程数目。
 进阶使用
 ###############

--- a/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst
+++ b/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst
 .. _train_on_baidu_cloud_cn:
-在百度云启动Fluid分布式训练
+在百度云启动分布式训练
 =========================
 PaddlePaddle Fluid分布式训练，可以不依赖集群系统（比如MPI，Kubernetes）启动分布式训练。

--- a/doc/fluid/user_guides/index.rst
+++ b/doc/fluid/user_guides/index.rst
@@ -8,29 +8,28 @@
    - `基本概念 <../user_guides/howto/basic_concept/index_cn.html>`_ ：介绍了Fluid的基本使用概念
-    - `准备数据 <../user_guides/howto/prepare_data/index.html>`_ ：介绍使用 Fluid 训练网络时，数据的支持类型及传输方法
+    - `准备数据 <../user_guides/howto/prepare_data/index_cn.html>`_ ：介绍使用 Fluid 训练网络时，数据的支持类型及传输方法
-    - `配置简单的网络 <../user_guides/howto/configure_simple_model/index.html>`_： 介绍如何针对问题建模，并利用 Fluid 中相关算子搭建网络
+    - `配置简单的网络 <../user_guides/howto/configure_simple_model/index_cn.html>`_： 介绍如何针对问题建模，并利用 Fluid 中相关算子搭建网络
-    - `训练神经网络 <../user_guides/howto/training/index.html>`_：介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量
+    - `训练神经网络 <../user_guides/howto/training/index_cn.html>`_：介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量
-    - `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index.html>`_：介绍在 Fluid 下进行模型评估和调试的方法，包括：
 	- `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.md>`_：介绍在 Fluid 下使用DyGraph       
+    - `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index_cn.html>`_：介绍在 Fluid 下进行模型评估和调试的方法，包括：
 基于 Fluid 复现的多领域经典模型：
    - `Fluid 模型库 <../user_guides/models/index_cn.html>`_
 ..  toctree::
    :hidden:
    howto/basic_concept/index_cn.rst
-    howto/prepare_data/index
+    howto/prepare_data/index_cn.rst
-    howto/configure_simple_model/index
+    howto/configure_simple_model/index_cn.rst
-    howto/training/index
+    howto/training/index_cn.rst
-    howto/evaluation_and_debugging/index
+    howto/evaluation_and_debugging/index_cn.rst
    howto/dygraph/DyGraph.md
    models/index_cn.rst
--- a/doc/fluid/user_guides/models/index_en.rst
+++ b/doc/fluid/user_guides/models/index_en.rst
@@ -87,7 +87,7 @@ Automatic Speech Recognition (ASR) is a technique for transcribing vocabulary co
 Different from the end-to-end direct prediction for word distribution of the deep learning model  `DeepSpeech <https://github.com/PaddlePaddle/DeepSpeech>`__ , this example is closer to the traditional language recognition process. With phoneme as the modeling unit, it focuses on the training of acoustic models in speech recognition, use `kaldi <http://www.kaldi-asr.org>`__ for feature extraction and label alignment of audio data, and integrate kaldi's decoder to complete decoding.
- `DeepASR <https://github.com/PaddlePaddle/models/blob/develop/DeepASR/README_cn.md>`__
+- `DeepASR <https://github.com/PaddlePaddle/models/blob/develop/PaddleSpeech/DeepASR/README.md>`__
 Machine Translation
 ---------------------