merge conflict

84c01a53 · JiabinYang · 7dba9ffb · 8e019748 · 84c01a53 · 84c01a53
79 changed file
--- a/.gitignore
+++ b/.gitignore
 .vscode/
+/doc/fluid/menu.zh.json
+/doc/fluid/menu.en.json
--- a/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice.rst
+++ b/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice.rst
 .. _api_guide_cpu_training_best_practice:

-##################
+####################
 分布式CPU训练最佳实践
-##################
+####################

 提高CPU分布式训练的训练速度，主要要从两个方面来考虑：
 1）提高训练速度，主要是提高CPU的使用率；2）提高通信速度，主要是减少通信传输的数据量。
@@ -46,7 +46,7 @@ API详细使用方法参考 :ref:`cn_api_fluid_ParallelExecutor` ，简单实例
 提高通信速度
 ==========

-要减少通信数据量，提高通信速度，主要是使用稀疏更新 ，目前支持 `稀疏更新 <../layers/sparse_update.html>`_  的主要是  :ref:`cn_api_fluid_layers_embedding` 。
+要减少通信数据量，提高通信速度，主要是使用稀疏更新 ，目前支持  :ref:`api_guide_sparse_update` 的主要是  :ref:`cn_api_fluid_layers_embedding` 。

 .. code-block:: python


--- a/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/cpu_train_best_practice_en.rst
--- a/doc/fluid/advanced_usage/best_practice/dist_training_gpu.rst
+++ b/doc/fluid/advanced_usage/best_practice/dist_training_gpu.rst
 .. _best_practice_dist_training_gpu:

-性能优化最佳实践之：GPU分布式训练
-============================
+#####################
+分布式GPU训练最佳实践
+#####################

 开始优化您的GPU分布式训练任务
 -------------------------
@@ -170,7 +171,7 @@ PaddlePaddle Fluid使用“线程池” [#]_ 模型调度并执行Op，Op在启
 数据读取的优化在GPU训练中至关重要，尤其在不断增加batch_size提升吞吐时，计算对reader性能会有更高对要求，
 优化reader性能需要考虑的点包括：

-1. 使用 :code:`pyreader` 
+1. 使用 :code:`pyreader`
   参考 `这里 <../../user_guides/howto/prepare_data/use_py_reader.html>`_
   使用pyreader，并开启 :code:`use_double_buffer`
 2. reader返回uint8类型数据
@@ -229,7 +230,7 @@ PaddlePaddle Fluid使用“线程池” [#]_ 模型调度并执行Op，Op在启
              for batch_id in (iters_per_pass):
                  exe.run()
          pyreader.reset()
-   
+

 使用混合精度训练
 ++++++++++++++

--- a/doc/fluid/advanced_usage/best_practice/index_cn.rst
+++ b/doc/fluid/advanced_usage/best_practice/index_cn.rst
+#########
+最佳实践
+#########
+
+..  toctree::
+    :maxdepth: 1
+
+    cpu_train_best_practice.rst
+    dist_training_gpu.rst
--- a/doc/fluid/advanced_usage/best_practice/index_en.rst
+++ b/doc/fluid/advanced_usage/best_practice/index_en.rst
+###############
+Best Practice
+###############
+
+..  toctree::
+    :hidden:
+
+    cpu_train_best_practice_en.rst
--- a/doc/fluid/advanced_usage/deploy/inference/paddle_gpu_benchmark.md
+++ b/doc/fluid/advanced_usage/deploy/inference/paddle_gpu_benchmark.md
@@ -6,15 +6,15 @@
 - 测试模型 ResNet50，MobileNet，ResNet101, Inception V3.

 ## 测试对象
-**PaddlePaddle, Pytorch, Tensorflow**   
+**PaddlePaddle, Pytorch, Tensorflow**

- 在测试中，PaddlePaddle使用子图优化的方式集成了TensorRT, 模型[地址](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)。
+- 在测试中，PaddlePaddle使用子图优化的方式集成了TensorRT, 模型[地址](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models)。
 - Pytorch使用了原生的实现, 模型[地址1](https://github.com/pytorch/vision/tree/master/torchvision/models)、[地址2](https://github.com/marvis/pytorch-mobilenet)。
 - 对TensorFlow测试包括了对TF的原生的测试，和对TF—TRT的测试，**对TF—TRT的测试并没有达到预期的效果，后期会对其进行补充**， 模型[地址](https://github.com/tensorflow/models)。


-### ResNet50 
- 
+### ResNet50
+
 |batch_size|PaddlePaddle(ms)|Pytorch(ms)|TensorFlow(ms)|
 |---|---|---|---|
 |1|4.64117 |16.3|10.878|

--- a/doc/fluid/advanced_usage/design_idea/fluid_design_idea.md
+++ b/doc/fluid/advanced_usage/design_idea/fluid_design_idea.md
@@ -21,28 +21,28 @@ Fluid使用一种编译器式的执行流程，分为编译时和运行时两个
 </p>

 1. 编译时，用户编写一段python程序，通过调用 Fluid 提供的算子，向一段 Program 中添加变量（Tensor）以及对变量的操作（Operators 或者 Layers）。用户只需要描述核心的前向计算，不需要关心反向计算、分布式下以及异构设备下如何计算。
- 
+
 2. 原始的 Program 在平台内部转换为中间描述语言： `ProgramDesc`。
- 
+
 3. 编译期最重要的一个功能模块是 `Transpiler`。`Transpiler` 接受一段 `ProgramDesc` ，输出一段变化后的 `ProgramDesc` ，作为后端 `Executor` 最终需要执行的 Fluid Program

 4. 后端 Executor 接受 Transpiler 输出的这段 Program ，依次执行其中的 Operator（可以类比为程序语言中的指令），在执行过程中会为 Operator 创建所需的输入输出并进行管理。
-	


- 
-## 2. Program设计思想 
+
+
+## 2. Program设计思想

 用户完成网络定义后，一段 Fluid 程序中通常存在 2 段 Program：

  1. fluid.default_startup_program：定义了创建模型参数，输入输出，以及模型中可学习参数的初始化等各种操作
-    
+
    default_startup_program 可以由框架自动生成，使用时无需显示地创建
-    
+
    如果调用修改了参数的默认初始化方式，框架会自动的将相关的修改加入default_startup_program
-  
+
  2. fluid.default_main_program ：定义了神经网络模型，前向反向计算，以及优化算法对网络中可学习参数的更新
-    
+
    使用Fluid的核心就是构建起 default_main_program


@@ -53,7 +53,7 @@ Fluid 的 Program 的基本结构是一些嵌套 blocks，形式上类似一段
 blocks中包含：

 -  本地变量的定义
-  一系列的operator 
+-  一系列的operator

 block的概念与通用程序一致，例如在下列这段C++代码中包含三个block：

@@ -95,7 +95,7 @@ prob = ie()
 ```
 ### BlockDesc and ProgramDesc

-用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存，所有的`protobub`信息被定义在`framework.proto`中，在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)。
+用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存，所有的`protobuf`信息被定义在`framework.proto`中，在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)。

 `BlockDesc`中包含本地变量的定义`vars`，和一系列的operator`ops`：

@@ -172,12 +172,12 @@ class Executor{
 				Scope* scope,
 				int block_id) {
 			auto& block = pdesc.Block(block_id);
-			
+
 			//创建所有变量
 			for (auto& var : block.AllVars())
 				scope->Var(Var->Name());
 			}
-			
+
 			//创建OP并按顺序执行
 			for (auto& op_desc : block.AllOps()){
 				auto op = CreateOp(*op_desc);
@@ -300,7 +300,7 @@ BlockDesc中包含定义的 vars 和一系列的 ops，以输入x为例，python
 x = fluid.layers.data(name="x",shape=[1],dtype='float32')
 ```
 在BlockDesc中，变量x被描述为：
-``` 
+```
 vars {
    name: "x"
    type {
@@ -359,5 +359,5 @@ Fluid使用Executor.run来运行一段Program。
       [6.099215 ]], dtype=float32), array([1.6935859], dtype=float32)]
 ```

-至此您已经了解了Fluid 内部的执行流程的核心概念，更多框架使用细节请参考[使用指南](../../user_guides/index.html)相关内容，[模型库](../../user_guides/models/index_cn.html
+至此您已经了解了Fluid 内部的执行流程的核心概念，更多框架使用细节请参考[使用指南](../../user_guides/index_cn.html)相关内容，[模型库](../../user_guides/models/index_cn.html
 )中也为您提供了丰富的模型示例以供参考。
--- a/doc/fluid/advanced_usage/development/contribute_to_paddle/submit_pr_guide.md
+++ b/doc/fluid/advanced_usage/development/contribute_to_paddle/submit_pr_guide.md
-# Github提交PR指南
+# 提交PR注意事项

 ## 建立 Issue 并完成 Pull Request


--- a/doc/fluid/advanced_usage/development/new_op/index_cn.rst
+++ b/doc/fluid/advanced_usage/development/new_op/index_cn.rst
 #############
-新增Operator
+新增OP
 #############

 本部分将指导您如何新增Operator，也包括一些必要的注意事项

--- a/doc/fluid/advanced_usage/development/new_op/new_op.md
+++ b/doc/fluid/advanced_usage/development/new_op/new_op.md
-# 如何写新的op
+# 如何写新的OP

 ## 概念简介


--- a/doc/fluid/advanced_usage/development/new_op/op_notes.md
+++ b/doc/fluid/advanced_usage/development/new_op/op_notes.md
-# op相关注意事项
+# OP相关注意事项

 ## Fluid中Op的构建逻辑
 ### 1.Fluid中Op的构建逻辑

--- a/doc/fluid/advanced_usage/development/new_op/op_notes_en.md
+++ b/doc/fluid/advanced_usage/development/new_op/op_notes_en.md
@@ -11,7 +11,7 @@ The Fluid framework is designed to run on a variety of devices and third-party l
 Operator inheritance diagram:
 ![op_inheritance_relation_diagram](../../pics/op_inheritance_relation_diagram.png)

-For further information, please refer to: [multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices) , [scope](https://github.com/PaddlePaddle/FluidDoc/Blob/develop/doc/fluid/design/concepts/scope.md) , [Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/release/1.2/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md)
+For further information, please refer to: [multi_devices](https://github.com/PaddlePaddle/FluidDoc/tree/develop/doc/fluid/design/multi_devices) , [scope](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/concepts/scope.md) , [Developer's_Guide_to_Paddle_Fluid](https://github.com/PaddlePaddle/FluidDoc/blob/release/1.2/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md)

 ### 2.Op's registration logic
 The registration entries for each Operator include:

--- a/doc/fluid/advanced_usage/index.rst
+++ b/doc/fluid/advanced_usage/index.rst
@@ -2,17 +2,19 @@
 进阶使用
 ########

-..  todo::
-
 如果您非常熟悉 Fluid，期望获得更高效的模型或者定义自己的Operator，请阅读：

    - `Fluid 设计思想 <../advanced_usage/design_idea/fluid_design_idea.html>`_：介绍 Fluid 底层的设计思想，帮助您更好的理解框架运作过程

-	- `预测部署 <../advanced_usage/deploy/index_cn.html>`_ ：介绍如何应用训练好的模型进行预测
+    - `预测部署 <../advanced_usage/deploy/index_cn.html>`_ ：介绍如何应用训练好的模型进行预测
+
+    - `新增OP <../advanced_usage/development/new_op/index_cn.html>`_ ：介绍新增operator的方法及注意事项
+
+    - `性能调优 <../advanced_usage/development/profiling/index_cn.html>`_ ：介绍 Fluid 使用过程中的调优方法

-	- `新增operator <../advanced_usage/development/new_op/index_cn.html>`_ ：介绍新增operator的方法及注意事项
+    - `最佳实践 <../advanced_usage/best_practice/index_cn.html>`_

-	- `性能调优 <../advanced_usage/development/profiling/index_cn.html>`_ ：介绍 Fluid 使用过程中的调优方法
+    - `模型压缩工具库 <../advanced_usage/paddle_slim/paddle_slim.html>`_

 非常欢迎您为我们的开源社区做出贡献，关于如何贡献您的代码或文档，请阅读：

@@ -27,7 +29,7 @@
    deploy/index_cn.rst
    development/new_op/index_cn.rst
    development/profiling/index_cn.rst
+    best_practice/index_cn.rst
+    paddle_slim/paddle_slim.md
    development/contribute_to_paddle/index_cn.rst
    development/write_docs_cn.md
-    best_practice/dist_training_gpu.rst
-    paddle_slim/paddle_slim.md 
--- a/doc/fluid/advanced_usage/index_en.rst
+++ b/doc/fluid/advanced_usage/index_en.rst
@@ -29,3 +29,4 @@ We gladly encourage your contributions of codes and documentation to our communi
    development/profiling/index_en.rst
    development/contribute_to_paddle/index_en.rst
    development/write_docs_en.md
+    best_practice/index_en.rst
--- a/doc/fluid/api_cn/data/data_reader_cn.rst
+++ b/doc/fluid/api_cn/data/data_reader_cn.rst
@@ -26,7 +26,7 @@ DataFeeder将reader返回的数据转换为可以输入Executor和ParallelExecut
 	result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])


-如果您想在使用多个GPU训练模型时预先将数据单独输入GPU端，可以使用decorate_reader函数。 
+如果您想在使用多个GPU训练模型时预先将数据单独输入GPU端，可以使用decorate_reader函数。


 **代码示例**
@@ -90,7 +90,7 @@ DataFeeder将reader返回的数据转换为可以输入Executor和ParallelExecut



-.. note:: 
+.. note::

 	设备数量和mini-batches数量必须一致。

@@ -121,7 +121,7 @@ Reader
 	- reader是一个读取数据（从文件、网络、随机数生成器等）并生成数据项的函数。
 	- reader creator是返回reader函数的函数。
 	- reader decorator是一个函数，它接受一个或多个reader，并返回一个reader。
-	- batch reader是一个函数，它读取数据（从reader、文件、网络、随机数生成器等）并生成一批数据项。 
+	- batch reader是一个函数，它读取数据（从reader、文件、网络、随机数生成器等）并生成一批数据项。


 Data Reader Interface
@@ -133,10 +133,10 @@ Data Reader Interface

 	iterable = data_reader()

-从iterable生成的元素应该是单个数据条目，而不是mini batch。数据输入可以是单个项目，也可以是项目的元组，但应为 `支持的类型 <http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types>`_ （如, numpy 1d array of float32, int, list of int）
+从iterable生成的元素应该是单个数据条目，而不是mini batch。数据输入可以是单个项目，也可以是项目的元组，但应为 `支持的类型 <../../user_guides/howto/prepare_data/feeding_data.html#fluid>`_ （如, numpy 1d array of float32, int, list of int）


-单项目数据读取器创建者的示例实现： 
+单项目数据读取器创建者的示例实现：

 ..  code-block:: python

@@ -147,7 +147,7 @@ Data Reader Interface
 	return reader


-多项目数据读取器创建者的示例实现： 
+多项目数据读取器创建者的示例实现：

 ..  code-block:: python

@@ -194,11 +194,11 @@ Data Reader Interface

 参数：
    - **readers** - 将被组合的多个读取器。
-    - **check_alignment** (bool) - 如果为True，将检查输入reader是否正确对齐。如果为False，将不检查对齐，将丢弃跟踪输出。默认值True。 
+    - **check_alignment** (bool) - 如果为True，将检查输入reader是否正确对齐。如果为False，将不检查对齐，将丢弃跟踪输出。默认值True。

 返回：新的数据读取器

-抛出异常： 	``ComposeNotAligned`` – reader的输出不一致。 当check_alignment设置为False，不会升高。 
+抛出异常： 	``ComposeNotAligned`` – reader的输出不一致。 当check_alignment设置为False，不会升高。



@@ -220,7 +220,7 @@ Data Reader Interface

 创建数据读取器，该reader的数据输出将被无序排列。

-由原始reader创建的迭代器的输出将被缓冲到shuffle缓冲区，然后进行打乱。打乱缓冲区的大小由参数buf_size决定。 
+由原始reader创建的迭代器的输出将被缓冲到shuffle缓冲区，然后进行打乱。打乱缓冲区的大小由参数buf_size决定。

 参数：
    - **reader** (callable)  – 输出会被打乱的原始reader
@@ -257,7 +257,7 @@ Data Reader Interface
 PipeReader通过流从一个命令中读取数据，将它的stdout放到管道缓冲区中，并将其重定向到解析器进行解析，然后根据需要的格式生成数据。


-您可以使用标准Linux命令或调用其他Program来读取数据，例如通过HDFS、CEPH、URL、AWS S3中读取： 
+您可以使用标准Linux命令或调用其他Program来读取数据，例如通过HDFS、CEPH、URL、AWS S3中读取：

 **代码示例**

@@ -340,7 +340,7 @@ Creator包包含一些简单的reader creator，可以在用户Program中使用

 .. py:function:: paddle.reader.creator.np_array(x)

-如果是numpy向量，则创建一个生成x个元素的读取器。或者，如果它是一个numpy矩阵，创建一个生成x行元素的读取器。或由最高维度索引的任何子超平面。 
+如果是numpy向量，则创建一个生成x个元素的读取器。或者，如果它是一个numpy矩阵，创建一个生成x行元素的读取器。或由最高维度索引的任何子超平面。

 参数：
    - **x** – 用于创建reader的numpy数组。
@@ -359,7 +359,7 @@ Creator包包含一些简单的reader creator，可以在用户Program中使用

 .. py:function::  paddle.reader.creator.recordio(paths, buf_size=100)

-从给定的recordio文件路径创建数据reader，用“，”分隔“，支持全局模式。 
+从给定的recordio文件路径创建数据reader，用“，”分隔“，支持全局模式。

 路径：recordio文件的路径，可以是字符串或字符串列表。


--- a/doc/fluid/api_cn/fluid_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn.rst
-#################
- fluid
-#################
-
-
-
-.. _cn_api_fluid_AsyncExecutor:
-
-AsyncExecutor
-------------------------------
-
-.. py:class:: paddle.fluid.AsyncExecutor(place=None, run_mode='')
-
-**AsyncExecutor正在积极开发，API可能在短期内进行调整。**
-
-Python中的异步执行器。AsyncExecutor利用多核处理器和数据排队的强大功能，使数据读取和融合解耦，每个线程并行运行。
-
-AsyncExecutor不是在python端读取数据，而是接受一个训练文件列表，该列表将在c++中检索，然后训练输入将被读取、解析并在c++代码中提供给训练网络。
-
-
-参数：
-	- **place** (fluid.CPUPlace|None) - 指示 executor 将在哪个设备上运行。目前仅支持CPU
-
-**代码示例：**
-
-.. code-block:: python
-
-    data_feed = fluid.DataFeedDesc('data.proto')
-    startup_program = fluid.default_startup_program()
-    main_program = fluid.default_main_program()
-    filelist = ["train_data/part-%d" % i for i in range(100)]
-    thread_num = len(filelist) / 4
-    place = fluid.CPUPlace()
-    async_executor = fluid.AsyncExecutor(place)
-    async_executor.run_startup_program(startup_program)
-    epoch = 10
-    for i in range(epoch):
-        async_executor.run(main_program,
-                           data_feed,
-                           filelist,
-                           thread_num,
-                           [acc],
-                           debug=False)
-
-.. note::
-
-	对于并行gpu调试复杂网络，您可以在executor上测试。他们有完全相同的参数，并可以得到相同的结果。
-
-	目前仅支持CPU
-
-.. py:method:: run(program, data_feed, filelist, thread_num, fetch, mode='', debug=False)
-
-使用此 ``AsyncExecutor`` 来运行 ``program`` 。
-
-``filelist`` 中包含训练数据集。用户也可以通过在参数 ``fetch`` 中提出变量来检查特定的变量， 正如 ``fluid.Executor`` 。
-
-但不像 ``fluid.Executor`` ， ``AsyncExecutor`` 不返回获取到的变量，而是将每个获取到的变量作为标准输出展示给用户。
-
-数据集上的运算在多个线程上执行，每个线程中都会独立出一个线程本地作用域，并在此域中建立运算。
-所有运算同时更新参数值。
-
-参数:	
-  - **program**  (Program) – 需要执行的program。如果没有提供该参数，默认使用 ``default_main_program`` 
-  - **data_feed**  (DataFeedDesc) –  ``DataFeedDesc`` 对象
-  - **filelist**  (str) – 一个包含训练数据集文件的文件列表
-  - **thread_num**  (int) – 并发训练线程数。参照 *注解* 部分获取合适的设置方法
-  - **fetch**  (str|list) – 变量名，或者变量名列表。指明最后要进行观察的变量命名
-  - **mode**  (str) – 该接口的运行模式
-  - **debug**  (bool) – 如果为True, 在每一个minibatch处理后，fetch 中指明的变量将会通过标准输出打印出来
-
-.. note::
-    1.该执行器会运行program中的所有运算，不只是那些依赖于fetchlist的运算
-
-    2.该类执行器在多线程上运行，每个线程占用一个CPU核。为了实现效率最大化，建议将 ``thread_num`` 等于或稍微小于CPU核心数
-
-.. py:method:: download_data(afs_path, local_path, fs_default_name, ugi, file_cnt, hadoop_home='$HADOOP_HOME', process_num=12)
-
-download_data是用于分布式训练的默认下载方法，用户可不使用该方法下载数据。
-
-**示例**
-
-..  code-block:: python
-
-    exe = fluid.AsyncExecutor()
-    exe.download_data("/xxx/xxx/xx/",
-                      "./data", "afs://
-     xxx.xxx.xxx.xxx:9901", "xxx,yyy")
-
-参数: 
-  - **afs_path** （str） - 用户定义的afs_path
-  - **local_path** （str） - 下载数据路径
-  - **fs_default_name** （str） - 文件系统服务器地址
-  - **ugi** （str） -  hadoop ugi
-  - **file_cn** （int） - 用户可以指定用于调试的文件号
-  - **hadoop_home** （str） -  hadoop home path
-  - **process_num** （int） - 下载进程号
-
-.. py:method:: get_instance()
-
-获取当前节点的实例，以便用户可以在分布式背景下中执行操作。
-
-.. py:method:: config_distributed_nodes()
-
-如果用户需要运行分布式AsyncExecutor，则需要进行全局配置，以便获取当前进程的信息。
-
-.. py:method:: stop()
-
-在流程结束时，用户应该停止服务器并阻止所有workers。
-
-.. py:method:: init_server(dist_desc)
-
-如果当前进程是server，则初始化当前节点的服务器。
-
-参数: 
-  - **dist_desc** （str）- 描述如何初始化worker和server的protobuf字符串
-
-.. py:method:: init_worker(dist_desc, startup_program)
-
-如果当前进程是worker，则初始化当前节点的worker 
-
-参数: 
-  - **dist_desc** （str）- 描述如何初始化worker和server的protobuf字符串
-  - **startup_program** （fluid.Program）- 当前进程的startup program
-
-.. py:method:: init_model()
-
-可以从其中一个worker中调用的init_model命令。随之，在server中初始化模型参数。
-
-.. py:method:: save_model(save_path)
-
-可以从其中一个worker调用的save_model命令。随之，模型参数会保存在server中并上传到文件系统的save_path指定的位置。
-
-参数: 
-  - **save_path** （str）- 文件系统的保存路径
-
-
-.. _cn_api_fluid_BuildStrategy:
-
-BuildStrategy
-------------------------------
-
-.. py:class::  paddle.fluid.BuildStrategy
-
-``BuildStrategy`` 使用户更精准地控制 ``ParallelExecutor`` 中SSA图的建造方法。可通过设置 ``ParallelExecutor`` 中的 ``BuildStrategy`` 成员来实现此功能。
-
-**代码示例**
-
-..  code-block:: python
-
-    build_strategy = fluid.BuildStrategy()
-    build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce
-
-    train_exe = fluid.ParallelExecutor(use_cuda=True,
-                                       loss_name=loss.name,
-                                       build_strategy=build_strategy)
-
-    train_loss, = train_exe.run([loss.name], feed=feed_dict)
-
-
-
-.. py:attribute:: debug_graphviz_path
-
-str类型。它表明了以graphviz格式向文件中写入SSA图的路径，有利于调试。 默认值为""。
-
-.. py:attribute:: enable_sequential_execution
-
-类型是BOOL。 如果设置为True，则ops的执行顺序将与program中的执行顺序相同。 默认为False。
-
-
-.. py:attribute:: fuse_elewise_add_act_ops
-
-bool类型。它表明了是否融合（fuse）elementwise_add_op和activation_op。这会使整体执行过程更快一些。默认为False。
-
-.. py:attribute:: fuse_relu_depthwise_conv
-
-BOOL类型，fuse_relu_depthwise_conv指示是否融合relu和depthwise_conv2d，它会节省GPU内存并可能加速执行过程。 此选项仅适用于GPU设备。 默认为False。
-
-
-.. py:attribute:: gradient_scale_strategy
-
-str类型。在 ``ParallelExecutor`` 中，存在三种定义 *loss@grad* 的方式，分别为 ``CoeffNumDevice``, ``One`` 与 ``Customized``。默认情况下， ``ParallelExecutor`` 根据设备数目来设置 *loss@grad* 。如果你想自定义 *loss@grad* ，你可以选择 ``Customized`` 方法。默认为 ``CoeffNumDevice`` 。
-
-
-
-.. py:attribute:: reduce_strategy
-
-str类型。在 ``ParallelExecutor`` 中，存在两种减少策略（reduce strategy），即 ``AllReduce`` 和 ``Reduce`` 。如果你需要在所有执行场所上独立地进行参数优化，可以使用 ``AllReduce`` 。反之，如果使用 ``Reduce`` 策略，所有参数的优化将均匀地分配给不同的执行场所，随之将优化后的参数广播给其他执行场所。在一些模型中， ``Reduce`` 策略执行速度更快一些。默认值为 ``AllReduce`` 。
-
-.. py:attribute:: remove_unnecessary_lock
-
-BOOL类型。如果设置为True, GPU操作中的一些锁将被释放，ParallelExecutor将运行得更快，默认为 False。
-
-
-
-.. _cn_api_fluid_CompiledProgram:
-
-CompiledProgram
-------------------------------
-
-.. py:class:: paddle.fluid.CompiledProgram(program)
-
-编译一个接着用来执行的Program。
-
-1. 首先使用layers(网络层)创建程序。
-2. （可选）可使用CompiledProgram来在运行之前优化程序。
-3. 定义的程序或CompiledProgram由Executor运行。
-
-CompiledProgram用于转换程序以进行各种优化。例如，
-
- 预先计算一些逻辑，以便每次运行更快。
- 转换Program，使其可以在多个设备中运行。
- 转换Program以进行优化预测或分布式训练。
-
-**代码示例**
-
-..  code-block:: python
-
-    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-            exe = fluid.Executor(place)
-            exe.run(startup)
-            compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
-                loss_name=loss.name)
-            for i in range(5):
-                test_loss, = exe.run(compiled_prog,
-                                     feed=feed_dict,
-                                     fetch_list=[loss.name])
-
-参数：
-  - **program** : 一个Program对象，承载着用户定义的模型计算逻辑
-
-.. py:method:: with_data_parallel(loss_name=None, build_strategy=None, exec_strategy=None, share_vars_from=None)
-
-配置Program使其以数据并行方式运行。
-
-参数：
-  - **loss_name** （str） - 损失函数名称必须在训练过程中设置。 默认None。
-  - **build_strategy** （BuildStrategy） -  build_strategy用于构建图，因此它可以在具有优化拓扑的多个设备/核上运行。 有关更多信息，请参阅  ``fluid.BuildStrategy`` 。 默认None。
-  - **exec_strategy** （ExecutionStrategy） -  exec_strategy用于选择执行图的方式，例如使用多少线程，每次清理临时变量之前进行的迭代次数。 有关更多信息，请参阅 ``fluid.ExecutionStrategy`` 。 默认None。
-  - **share_vars_from** （CompiledProgram） - 如果有，此CompiledProgram将共享来自share_vars_from的变量。 share_vars_from指定的Program必须由此CompiledProgram之前的Executor运行，以便vars准备就绪。
-
-返回: self
-
-.. py:method:: with_inference_optimize(config)
-
-添加预测优化。
-
-参数：
-  - **config** - 用于创建预测器的NativeConfig或AnalysisConfig的实例
-
-返回: self
-
-
-.. _cn_api_fluid_cpu_places:
-
-cpu_places
-------------------------------
-
-.. py:function:: paddle.fluid.cpu_places(device_count=None) 
-
-创建 ``fluid.CPUPlace`` 对象列表。
-
-如果 ``device_count`` 为None，则设备数目将由环境变量 ``CPU_NUM`` 确定。如果未设置 ``CPU_NUM`` ，则设备数目将由 ``multiprocessing.cpu_count()`` 确定。
-
-参数：
-  - **device_count** (None|int) - 设备数目
-
-返回: CPUPlace列表
-
-返回类型：out (list(fluid.CPUPlace))
-
-
-
-.. _cn_api_fluid_CPUPlace:
-
-CPUPlace
-------------------------------
-
-.. py:class:: paddle.fluid.CPUPlace
-
-
-CPUPlace是设备的描述符。它代表一个CPU，可以访问CPUPlace对应的内存。
-
-
-
-
-
-
-.. _cn_api_fluid_create_lod_tensor:
-
-
-create_lod_tensor
-------------------------------
-
-.. py:function:: paddle.fluid.create_lod_tensor(data, recursive_seq_lens, place) 
-
-
-该函数从一个numpy数组，列表或者已经存在的lod tensor中创建一个lod tensor。
-
-通过一下几步实现:
-
-1. 检查length-based level of detail (LoD,长度为基准的细节层次)，或称recursive_sequence_lengths(递归序列长度)的正确性
-
-2. 将recursive_sequence_lengths转化为offset-based LoD(偏移量为基准的LoD)
-
-3. 把提供的numpy数组，列表或者已经存在的lod tensor复制到CPU或GPU中(依据执行场所确定)
-
-4. 利用offset-based LoD来设置LoD
-
-例如：
-         假如我们想用LoD Tensor来承载一词序列的数据，其中每个词由一个整数来表示。现在，我们意图创建一个LoD Tensor来代表两个句子，其中一个句子有两个词，另外一个句子有三个。
-     	 那么数 ``data`` 可以是一个numpy数组，形状为（5,1）。同时， ``recursive_seq_lens`` 为 [[2, 3]]，表明各个句子的长度。这个长度为基准的 ``recursive_seq_lens`` 将在函数中会被转化为以偏移量为基准的 LoD [[0, 2, 5]]。
-
-参数:
-	- **data** (numpy.ndarray|list|LoDTensor) – 容纳着待复制数据的一个numpy数组、列表或LoD Tensor
-	- **recursive_seq_lens** (list) – 一组列表的列表， 表明了由用户指明的length-based level of detail信息
-	- **place** (Place) – CPU或GPU。 指明返回的新LoD Tensor存储地点
-
-返回: 一个fluid LoDTensor对象，包含数据和 ``recursive_seq_lens`` 信息
-
-
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_create_random_int_lodtensor:
-
-
-create_random_int_lodtensor
-------------------------------
-
-.. py:function:: paddle.fluid.create_random_int_lodtensor(recursive_seq_lens, base_shape, place, low, high)
-
-
-
-该函数创建一个存储多个随机整数的LoD Tensor。
-
-该函数是经常在书中出现的案例，所以我们根据新的API： ``create_lod_tensor`` 更改它然后放在LoD Tensor板块里来简化代码。
-
-该函数实现以下功能：
-
-1. 根据用户输入的length-based ``recursive_seq_lens`` （基于长度的递归序列长）和在 ``basic_shape`` 中的基本元素形状计算LoDTensor的整体形状
-2. 由此形状，建立numpy数组
-3. 使用API： ``create_lod_tensor`` 建立LoDTensor
-
-
-假如我们想用LoD Tensor来承载一词序列，其中每个词由一个整数来表示。现在，我们意图创建一个LoD Tensor来代表两个句子，其中一个句子有两个词，另外一个句子有三个。那么 ``base_shape`` 为[1], 输入的length-based ``recursive_seq_lens`` 是 [[2, 3]]。那么LoDTensor的整体形状应为[5, 1]，并且为两个句子存储5个词。
-
-参数:	
-    - **recursive_seq_lens** (list) – 一组列表的列表， 表明了由用户指明的length-based level of detail信息
-    - **base_shape** (list) – LoDTensor所容纳的基本元素的形状
-    - **place** (Place) –  CPU或GPU。 指明返回的新LoD Tensor存储地点
-    - **low** (int) – 随机数下限
-    - **high** (int) – 随机数上限
-
-返回:	一个fluid LoDTensor对象，包含数据和 ``recursive_seq_lens`` 信息
-
-
-
-.. _cn_api_fluid_cuda_pinned_places:
-
-cuda_pinned_places
-------------------------------
-
-
-.. py:function:: paddle.fluid.cuda_pinned_places(device_count=None)
-
-
-
-创建 ``fluid.CUDAPinnedPlace`` 对象列表。
-
-如果 ``device_count`` 为None，则设备数目将由环境变量 ``CPU_NUM`` 确定。如果未设置 ``CPU_NUM`` ，则设备数目将由 ``multiprocessing.cpu_count()`` 确定。
-
-参数：
-  - **device_count** (None|int) - 设备数目
-
-返回: CUDAPinnedPlace对象列表
-
-返回类型：out(list(fluid.CUDAPinnedPlace))
-
-
-
-.. _cn_api_fluid_cuda_places:
-
-cuda_places
-------------------------------
-
-.. py:function:: paddle.fluid.cuda_places(device_ids=None)
-
-创建 ``fluid.CUDAPlace`` 对象列表。
-
-
-如果 ``device_ids`` 为None，则首先检查 ``FLAGS_selected_gpus`` 的环境变量。如果 ``FLAGS_selected_gpus=0,1,2`` ，则返回的列表将为[fluid.CUDAPlace(0), fluid.CUDAPlace(1), fluid.CUDAPlace(2)]。如果未设置标志 ``FLAGS_selected_gpus`` ，则将返回所有可见的GPU places。
-
-
-如果 ``device_ids`` 不是None，它应该是GPU的设备ID。例如，如果 ``device_id=[0,1,2]`` ，返回的列表将是[fluid.CUDAPlace(0), fluid.CUDAPlace(1), fluid.CUDAPlace(2)]。
-
-参数：
-  - **device_ids** (None|list(int)|tuple(int)) - GPU的设备ID列表
-
-返回: CUDAPlace列表
-
-返回类型：out (list(fluid.CUDAPlace))
-
-
-
-
-
-.. _cn_api_fluid_CUDAPinnedPlace:
-
-CUDAPinnedPlace
-------------------------------
-
-.. py:class:: paddle.fluid.CUDAPinnedPlace
-
-CUDAPinnedPlace是一个设备描述符，它所指代的存储空间可以被GPU和CPU访问。
-
-
-
-
-.. _cn_api_fluid_CUDAPlace:
-
-CUDAPlace
-------------------------------
-
-.. py:class:: paddle.fluid.CUDAPlace
-
-CUDAPlace是一个设备描述符，它代表一个GPU，并且每个CUDAPlace有一个dev_id（设备id）来表明当前CUDAPlace代表的卡数。dev_id不同的CUDAPlace所对应的内存不可相互访问。
-
-
-
-
-
-
-
-.. _cn_api_fluid_DataFeedDesc:
-
-DataFeedDesc
-------------------------------
-
-.. py:class:: paddle.fluid.DataFeedDesc(proto_file)
-
-数据描述符，描述输入训练数据格式。
-
-这个类目前只用于AsyncExecutor(有关类AsyncExecutor的简要介绍，请参阅注释)
-
-DataFeedDesc应由来自磁盘的有效protobuf消息初始化:
-
-.. code-block:: python
-
-	data_feed = fluid.DataFeedDesc('data.proto')
-
-可以参考 :code:`paddle/fluid/framework/data_feed.proto` 查看我们如何定义message
-
-一段典型的message可能是这样的：
-
-.. code-block:: text
-
-    name: "MultiSlotDataFeed"
-    batch_size: 2
-    multi_slot_desc {
-        slots {
-            name: "words"
-            type: "uint64"
-            is_dense: false
-            is_used: true
-        }
-        slots {
-            name: "label"
-            type: "uint64"
-            is_dense: false
-            is_used: true
-        }
-    }
-
-但是，用户通常不应该关心消息格式;相反，我们鼓励他们在将原始日志文件转换为AsyncExecutor可以接受的训练文件的过程中，使用 :code:`Data Generator` 生成有效数据描述。
-
-DataFeedDesc也可以在运行时更改。一旦你熟悉了每个字段的含义，您可以修改它以更好地满足您的需要。例如:
-
-.. code-block:: python
-
-    data_feed.set_batch_size(128)
-    data_feed.set_dense_slots('wd')  # The slot named 'wd' will be dense
-    data_feed.set_use_slots('wd')    # The slot named 'wd' will be used
-    
-    #Finally, the content can be dumped out for debugging purpose:
-    
-    print(data_feed.desc())
-
-
-参数：
-	- **proto_file** (string) - 包含数据feed中描述的磁盘文件
-
-
-.. py:method:: set_batch_size(batch_size)
-
-设置batch size，训练期间有效
-
-
-参数：
-	- batch_size：batch size
-
-**代码示例：**
-
-.. code-block:: python
-	
-	data_feed = fluid.DataFeedDesc('data.proto')
-	data_feed.set_batch_size(128)
-
-.. py:method:: set_dense_slots(dense_slots_name)
-
-指定slot经过设置后将变成密集的slot，仅在训练期间有效。
-
-密集slot的特征将被输入一个Tensor，而稀疏slot的特征将被输入一个lodTensor
-
-
-参数：
-	- **dense_slots_name** : slot名称的列表，这些slot将被设置为密集的
-
-**代码示例：**
-
-.. code-block:: python
-	
-	data_feed = fluid.DataFeedDesc('data.proto')
-	data_feed.set_dense_slots(['words'])
-
-.. note:: 
-
-	默认情况下，所有slot都是稀疏的
-
-.. py:method:: set_use_slots(use_slots_name)
-
-
-设置一个特定的slot是否用于训练。一个数据集包含了很多特征，通过这个函数可以选择哪些特征将用于指定的模型。
-
-参数：
-	- **use_slots_name** :将在训练中使用的slot名列表
-
-**代码示例：**
-
-.. code-block:: python
-
-	data_feed = fluid.DataFeedDesc('data.proto')
-	data_feed.set_use_slots(['words'])
-
-.. note::
-	
-	默认值不用于所有slot
-
-
-.. py:method:: desc()
-
-返回此DataFeedDesc的protobuf信息
-
-返回：一个message字符串
-
-**代码示例：**
-
-.. code-block:: python
-
-	data_feed = fluid.DataFeedDesc('data.proto')
-	print(data_feed.desc())
-
-
-
-
-
-
-.. _cn_api_fluid_DataFeeder:
-
-DataFeeder
-------------------------------
-
-.. py:class:: paddle.fluid.DataFeeder(feed_list, place, program=None)
-
-
-
-``DataFeeder`` 负责将reader(读取器)返回的数据转成一种特殊的数据结构，使它们可以输入到 ``Executor`` 和 ``ParallelExecutor`` 中。
-reader通常返回一个minibatch条目列表。在列表中每一条目都是一个样本（sample）,它是由具有一至多个特征的列表或元组组成的。
-
-
-以下是简单用法：
-
-..  code-block:: python
-	
-	place = fluid.CPUPlace()
-	img = fluid.layers.data(name='image', shape=[1, 28, 28])
-	label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-	feeder = fluid.DataFeeder([img, label], fluid.CPUPlace())
-	result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])
-	
-在多GPU模型训练时，如果需要提前分别向各GPU输入数据，可以使用 ``decorate_reader`` 函数。
-
-..  code-block:: python
-
-	place=fluid.CUDAPlace(0)
-	feeder = fluid.DataFeeder(place=place, feed_list=[data, label])
-	reader = feeder.decorate_reader(
-    		paddle.batch(flowers.train(), batch_size=16))
-
-
-
-参数：
-    - **feed_list** (list) – 向模型输入的变量表或者变量表名
-    - **place** (Place) – place表明是向GPU还是CPU中输入数据。如果想向GPU中输入数据, 请使用 ``fluid.CUDAPlace(i)`` (i 代表 the GPU id)；如果向CPU中输入数据, 请使用  ``fluid.CPUPlace()``
-    - **program** (Program) – 需要向其中输入数据的Program。如果为None, 会默认使用 ``default_main_program()``。 缺省值为None
-
-
-抛出异常:
-  - ``ValueError``  – 如果一些变量不在此 Program 中
-
-
-**代码示例**
-
-..  code-block:: python
-
-	# ...
-	place = fluid.CPUPlace()
-	feed_list = [
-    		main_program.global_block().var(var_name) for var_name in feed_vars_name
-	] # feed_vars_name 是一个由变量名组成的列表
-	feeder = fluid.DataFeeder(feed_list, place)
-	for data in reader():
-    		outs = exe.run(program=main_program,
-               		       feed=feeder.feed(data))
-			       
-			       
-.. py:method:: feed(iterable)
-
-
-根据feed_list（数据输入表）和iterable（可遍历的数据）提供的信息，将输入数据转成一种特殊的数据结构，使它们可以输入到 ``Executor`` 和 ``ParallelExecutor`` 中。
-
-参数:	
-	- **iterable** (list|tuple) – 要输入的数据
-
-返回：  转换结果
-
-返回类型:	dict
-
-
-.. py:method:: feed_parallel(iterable, num_places=None)
-
-
-该方法获取的多个minibatch，并把每个minibatch提前输入进各个设备中。
-
-参数:	
-    - **iterable** (list|tuple) – 要输入的数据
-    - **num_places** (int) – 设备数目。默认为None。
-
-返回: 转换结果
-
-返回类型: dict
-
-.. note::
-     设备（CPU或GPU）的数目必须等于minibatch的数目
-
-
-
-.. py:method::  decorate_reader(reader, multi_devices, num_places=None, drop_last=True)
-
-
-  
-将reader返回的输入数据batch转换为多个mini-batch，之后每个mini-batch都会被输入进各个设备（CPU或GPU）中。
-    
-参数：
-        - **reader** (fun) – 该参数是一个可以生成数据的函数
-        - **multi_devices** (bool) – bool型，指明是否使用多个设备
-        - **num_places** (int) – 如果 ``multi_devices`` 为 ``True`` , 可以使用此参数来设置GPU数目。如果 ``num_places`` 为 ``None`` ，该函数默认使用当前训练机所有GPU设备。默认为None。
-        - **drop_last** (bool) – 如果最后一个batch的大小比 ``batch_size`` 要小，则可使用该参数来指明是否选择丢弃最后一个batch数据。 默认为 ``True`` 
-
-返回：转换结果
-
-返回类型: dict
-    
-抛出异常： ``ValueError`` – 如果 ``drop_last`` 值为False并且data batch与设备不匹配时，产生此异常
-
-
-        
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_default_main_program:
-
-default_main_program
-------------------------------
-
-.. py:function:: paddle.fluid.default_main_program()
-
-
-
-
-
-此函数用于获取默认或全局main program(主程序)。该主程序用于训练和测试模型。
-
-``fluid.layers`` 中的所有layer函数可以向 ``default_main_program`` 中添加operators（算子）和variables（变量）。
-
-``default_main_program`` 是fluid的许多编程接口（API）的Program参数的缺省值。例如,当用户program没有传入的时候，
-``Executor.run()`` 会默认执行 ``default_main_program`` 。
-
-
-返回：	main program
-
-返回类型:	Program
-
-
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_default_startup_program:
-
-
-
-
-default_startup_program
-------------------------------
-
-.. py:function:: paddle.fluid.default_startup_program()
-
-
-
-该函数可以获取默认/全局 startup program (启动程序)。
-
-``fluid.layers`` 中的layer函数会新建参数、readers(读取器)、NCCL句柄作为全局变量。 
-
-startup_program会使用内在的operators（算子）去初始化他们，并由layer函数将这些operators追加到startup program中。
-
-该函数将返回默认的或当前的startup_program。用户可以使用 ``fluid.program_guard`` 去切换program。
-
-返回:	startup program
-
-返回类型:	Program
-
-
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_DistributeTranspiler:
-
-DistributeTranspiler
-------------------------------
-
-.. py:class:: paddle.fluid.DistributeTranspiler (config=None)
-
-
-该类可以把fluid program转变为分布式数据并行计算程序（distributed data-parallelism programs）,可以有Pserver和NCCL2两种模式。
-当program在Pserver（全称：parameter server）模式下， ``main_program`` (主程序)转为使用一架远程parameter server(即pserver,参数服务器)来进行参数优化，并且优化图会被输入到一个pserver program中。
-在NCCL2模式下，transpiler会在 ``startup_program`` 中附加一个 ``NCCL_ID`` 广播算子（broadcasting operators）来实现在该集群中所有工作结点共享 ``NCCL_ID`` 。
-调用 ``transpile_nccl2`` 后， 你 **必须** 将 ``trainer_id`` , ``num_trainers`` 参数提供给 ``ParallelExecutor`` 来启动NCCL2分布式模式。 
-
-
-
-
-**代码示例**
-
-..  code-block:: python
-
-	#pserver模式下
-	pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
-	trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
-	current_endpoint = "192.168.0.1:6174"
-	trainer_id = 0
-	trainers = 4
-	role = os.getenv("PADDLE_TRAINING_ROLE")
-
-	t = fluid.DistributeTranspiler()
-	t.transpile(
-     	     trainer_id, pservers=pserver_endpoints, trainers=trainers)
-	if role == "PSERVER":
-     	     pserver_program = t.get_pserver_program(current_endpoint)
-             pserver_startup_program = t.get_startup_program(current_endpoint,
-                                                     pserver_program)
-	elif role == "TRAINER":
-             trainer_program = t.get_trainer_program()
-
-	# nccl2模式下
-	config = fluid.DistributeTranspilerConfig()
-	config.mode = "nccl2"
-	t = fluid.DistributeTranspiler(config=config)
-	t.transpile(trainer_id, workers=workers, current_endpoint=curr_ep)
-	exe = fluid.ParallelExecutor(
-    	    use_cuda,
-            loss_name=loss_var.name,
-            num_trainers=len(trainers.split(",)),
-            trainer_id=trainer_id
-	)
-
-
-
-.. py:method:: transpile(trainer_id, program=None, pservers='127.0.0.1:6174', trainers=1, sync_mode=True, startup_program=None, current_endpoint='127.0.0.1:6174')
-
-该方法可以运行该transpiler（转译器）。
-
-参数:	
-	- **trainer_id** (int) – 当前Trainer worker的id, 如果有n个Trainer worker, id 取值范围为0 ~ n-1
-	- **program** (Program|None) – 待transpile（转译）的program, 缺省为 ``fluid.default_main_program()`` 
-	- **startup_program** (Program|None) - 要转译的 ``startup_program`` ,默认为 ``fluid.default_startup_program()``
-	- **pservers** (str) – 内容为Pserver列表的字符串，格式为：按逗号区分不同的Pserver，每个Pserver的格式为 *ip地址:端口号* 
-	- **trainers** (int|str) – 在Pserver模式下，该参数指Trainer机的个数；在nccl2模式下，它是一个内容为Trainer终端列表的字符串
-	- **sync_mode** (bool) – 是否做同步训练(synchronous training), 默认为True
- 	- **startup_program** (Program|None) – 待transpile（转译）的startup_program，默认为 ``fluid.default_main_program()``
-	- **current_endpoint** (str) – 当需要把program转译（transpile）至NCCL2模式下时，需要将当前endpoint（终端）传入该参数。Pserver模式不使用该参数
-
-.. py:method:: get_trainer_program(wait_port=True)
-
-
-该方法可以得到Trainer侧的program。
-
-返回:	Trainer侧的program
-
-返回类型:	Program
-
-
-
-.. py:method:: get_pserver_program(endpoint)
-
-
-该方法可以得到Pserver（参数服务器）侧的程序
- 
-参数:	
-	- **endpoint** (str) – 当前Pserver终端
- 
-返回:	当前Pserver需要执行的program
-
-返回类型:	Program
-
-
-.. py:method:: get_pserver_programs(endpoint)
-
-
-该方法可以得到Pserver侧用于分布式训练的 ``main_program`` 和 ``startup_program`` 。
-
-参数:	
-	- **endpoint** (str) – 当前Pserver终端
-
-返回:	(main_program, startup_program), “Program”类型的元组
-
-返回类型:	tuple 
- 
- 
-.. py:method:: get_startup_program(endpoint, pserver_program=None, startup_program=None)
-
-
-**该函数已停止使用**
-获取当前Pserver的startup_program，如果有多个被分散到不同blocks的变量，则修改operator的输入变量。
-
-参数:	
-	- **endpoint** (str) – 当前Pserver终端
-	- **pserver_program** (Program) – 已停止使用。 先调用get_pserver_program
- 	- **startup_program** (Program) – 已停止使用。应在初始化时传入startup_program
-
-返回:	Pserver侧的startup_program
-
-返回类型:	Program
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_DistributeTranspilerConfig:
-
-DistributeTranspilerConfig
-------------------------------
-
-.. py:class:: paddle.fluid.DistributeTranspilerConfig
-
-
-.. py:attribute:: slice_var_up (bool)
-
-为多个Pserver（parameter server）将tensor切片, 默认为True。
-
-.. py:attribute:: split_method (PSDispatcher)
-
-可使用 RoundRobin 或者 HashName。
-
-注意: 尝试选择最佳方法来达到Pserver间负载均衡。
-
-.. py:attribute:: min_block_size (int)
-
-block中分割(split)出的元素个数的最小值。
-
-注意: 根据：`issuecomment-369912156 <https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-369912156>`_ , 当数据块大小超过2MB时，我们可以有效地使用带宽。如果你想更改它，请详细查看 ``slice_variable`` 函数。
-
-
-
-
-
-
-
-.. _cn_api_fluid_ExecutionStrategy:
-
-ExecutionStrategy
-------------------------------
-
-.. py:class:: paddle.fluid.ExecutionStrategy
-
-``ExecutionStrategy`` 允许用户更加精准地控制program在 ``ParallelExecutor`` 中的运行方式。可以通过在 ``ParallelExecutor`` 中设置本成员来实现。
-
-**代码示例**
-
-..  code-block:: python
-
-  exec_strategy = fluid.ExecutionStrategy()
-  exec_strategy.num_threads = 4
-
-  train_exe = fluid.ParallelExecutor(use_cuda=True,
-                                     loss_name=loss.name,
-                                     exec_strategy=exec_strategy)
-
-  train_loss, = train_exe.run([loss.name], feed=feed_dict)
-
-
-
-.. py:attribute:: allow_op_delay
-   
-这是一个bool类型成员，表示是否推迟communication operators(交流运算)的执行，这样做会使整体执行过程更快一些。但是在一些模型中，allow_op_delay会导致程序中断。默认为False。
-  
-
-
-.. py:attribute:: num_iteration_per_drop_scope
-  
-int型成员。它表明了清空执行时产生的临时变量需要的程序执行重复次数。因为临时变量的形状可能在两次重复过程中保持一致，所以它会使整体执行过程更快。默认值为100。
-
-.. note::
-  1. 如果在调用 ``run`` 方法时获取结果数据，``ParallelExecutor`` 会在当前程序重复执行尾部清空临时变量
-  
-  2. 在一些NLP模型里，该成员会致使GPU内存不足。此时，你应减少 ``num_iteration_per_drop_scope`` 的值
-
-
-
-.. py:attribute:: num_threads
-
-int型成员。它代表了线程池(thread pool)的大小。这些线程会被用来执行当前 ``ParallelExecutor`` 的program中的operator（算子，运算）。如果 :math:`num\_threads=1` ，则所有的operator将一个接一个地执行，但在不同的程序重复周期(iterations)中执行顺序可能不同。如果该成员没有被设置，则在 ``ParallelExecutor`` 中，它会依据设备类型(device type)、设备数目(device count)而设置为相应值。对GPU，:math:`num\_threads=device\_count∗4` ；对CPU， :math:`num\_threads=CPU\_NUM∗4` 。在 ``ParallelExecutor`` 中有关于 :math:`CPU\_NUM` 的详细解释。如果没有设置 :math:`CPU\_NUM` ， ``ParallelExecutor`` 可以通过调用 ``multiprocessing.cpu_count()`` 获取CPU数目(cpu count)。默认值为0。
-
-
-
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_executor:
-
-Executor
-------------------------------
-
-
-.. py:class:: paddle.fluid.Executor (place)
-
-
-
-
-执行引擎（Executor）使用python脚本驱动，仅支持在单GPU环境下运行。多卡环境下请参考 ``ParallelExecutor`` 。
-Python Executor可以接收传入的program,并根据feed map(输入映射表)和fetch_list(结果获取表)
-向program中添加feed operators(数据输入算子)和fetch operators（结果获取算子)。
-feed map为该program提供输入数据。fetch_list提供program训练结束后用户预期的变量（或识别类场景中的命名）。
-
-应注意，执行器会执行program中的所有算子而不仅仅是依赖于fetch_list的那部分。
-
-Executor将全局变量存储到全局作用域中，并为临时变量创建局部作用域。
-当每一mini-batch上的前向/反向运算完成后，局部作用域的内容将被废弃，
-但全局作用域中的变量将在Executor的不同执行过程中一直存在。
-
-program中所有的算子会按顺序执行。
-
-**示例代码**
-
-.. code-block:: python
-
-    # 新建一个执行引擎Executor名为exe。 
-    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-    exe = fluid.Executor(place)
-
-    # 仅运行一次startup program.
-    # 不需要优化/编译这个startup program. 
-    exe.run(fluid.default_startup_program())
-
-    # 无需编译，直接运行main program
-    loss, = exe.run(fluid.default_main_program(),
-                        feed=feed_dict,
-                        fetch_list=[loss.name])
-
-    # 另一种方法是，编译这个main program然后运行. 参考CompiledProgram 
-    compiled_prog = compiler.CompiledProgram(
-            fluid.default_main_program()).with_data_parallel(
-            loss_name=loss.name)
-    loss, = exe.run(compiled_prog,
-                        feed=feed_dict,
-                        fetch_list=[loss.name])
-
-
-参数:	
-    - **place** (core.CPUPlace|core.CUDAPlace(n)) – 指明了 ``Executor`` 的执行场所
-
-
-
-提示：你可以用 ``Executor`` 来调试基于并行GPU实现的复杂网络，他们有完全一样的参数也会产生相同的结果。
-
-
-.. py:method:: close()
-
-
-关闭这个执行器(Executor)。调用这个方法后不可以再使用这个执行器。 对于分布式训练, 该函数会释放在PServers上涉及到目前训练器的资源。
-   
-**示例代码**
-
-..  code-block:: python
-    
-    cpu = core.CPUPlace()
-    exe = Executor(cpu)
-    ...
-    exe.close()
-
-
-.. py:method:: run(program=None, feed=None, fetch_list=None, feed_var_name='feed', fetch_var_name='fetch', scope=None, return_numpy=True,use_program_cache=False)
-
-
-调用该执行器对象的此方法可以执行program。通过feed map提供待学习数据，以及借助fetch_list得到相应的结果。
-Python执行器(Executor)可以接收传入的program,并根据输入映射表(feed map)和结果获取表(fetch_list)
-向program中添加数据输入算子(feed operators)和结果获取算子（fetch operators)。
-feed map为该program提供输入数据。fetch_list提供program训练结束后用户预期的变量（或识别类场景中的命名）。
-
-应注意，执行器会执行program中的所有算子而不仅仅是依赖于fetch_list的那部分。
-
-参数：  
-	- **program** (Program|CompiledProgram) – 需要执行的program,如果没有给定那么默认使用default_main_program (未编译的)
-	- **feed** (dict) – 前向输入的变量，数据,词典dict类型, 例如 {“image”: ImageData, “label”: LabelData}
-	- **fetch_list** (list) – 用户想得到的变量或者命名的列表, run会根据这个列表给与结果
-	- **feed_var_name** (str) – 前向算子(feed operator)变量的名称
-	- **fetch_var_name** (str) – 结果获取算子(fetch operator)的输出变量名称
-	- **scope** (Scope) – 执行这个program的域，用户可以指定不同的域。缺省为全局域
-	- **return_numpy** (bool) – 如果为True,则将结果张量（fetched tensor）转化为numpy
-	- **use_program_cache** (bool) – 当program较上次比没有改动则将其置为True
-	
-返回:	根据fetch_list来获取结果
-
-返回类型:	list(numpy.array)
-
-
-**示例代码**
-
-..  code-block:: python
-
-
-	data = fluid.layers.data(name='X', shape=[1], dtype='float32')
-	hidden = fluid.layers.fc(input=data, size=10)
-	layers.assign(hidden, out)
-	loss = fluid.layers.mean(out)
-	adam = fluid.optimizer.Adam()
-	adam.minimize(loss)
-
-
-..  code-block:: python
-	
-	
-	cpu = core.CPUPlace()
-	exe = Executor(cpu)
-	exe.run(default_startup_program())
-	
-..  code-block:: python
-	
-	x = numpy.random.random(size=(10, 1)).astype('float32')
-	outs = exe.run(
-		feed={'X': x},
-		fetch_list=[loss.name])
-	
-
-
-
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_global_scope:
-
-global_scope
-------------------------------
-
-.. py:function:: paddle.fluid.global_scope()
-
-
-获取全局/默认作用域实例。很多api使用默认 ``global_scope`` ，例如 ``Executor.run`` 。
-
-返回：全局/默认作用域实例
-
-返回类型：Scope
-
-
-
-
-
-.. _cn_api_fluid_in_dygraph_mode:
-
-in_dygraph_mode
-------------------------------
-
-.. py:function:: paddle.fluid.in_dygraph_mode()
-
-返回：bool，如果Program是在动态图模式下运行的则为True。
-
-
-
-
-
-
-.. _cn_api_fluid_LoDTensor:
-
-LoDTensor
-------------------------------
-
-.. py:class:: paddle.fluid.LoDTensor
-
-
-LoDTensor是一个具有LoD信息的张量(Tensor)
-
-``np.array(lod_tensor)`` 可以将LoDTensor转换为numpy array。 
-
-``lod_tensor.lod()`` 可以获得LoD信息。
-
-LoD是多层序列（Level of Details）的缩写，通常用于不同长度的序列。如果您不需要了解LoD信息，可以跳过下面的注解。
-
-举例:
-
-X 为 LoDTensor，它包含两个序列。第一个长度是2，第二个长度是3。
-
-从Lod中可以计算出X的第一维度为5， 因为5=2+3， 说明X中有5个序列。在X中的每个序列中的每个元素有2列，因此X的shape为[5,2]。
-
-::
-
-	x.lod  =  [[2, 3]] 
-	x.data = [[1, 2], [3, 4], // seq 1
-
-		  [5, 6], [7, 8], [9, 10]] // seq 2
-
-	x.shape = [5, 2]
-
-
-LoD可以有多个level(例如，一个段落可以有多个句子，一个句子可以有多个单词)。下面的例子中，Y为LoDTensor ，lod_level为2。表示有2个序列，第一个序列的长度是2(有2个子序列)，第二个序列的长度是1。第一序列的两个子序列长度分别为2和2。第二个序列的子序列的长度是3。
-
-
-::
-
-	y.lod = [[2 1], [2 2 3]] y.shape = [2+2+3, ...]
-
-
-.. note::
-
-	在上面的描述中，LoD是基于长度的。在paddle内部实现中，lod是基于偏移的。因此,在内部,y.lod表示为[[0,2,3]，[0,2,4,7]](基于长度的Lod表示为为[[2-0,3-2]，[2-0,4-2,7-4]])。
-
-	可以将LoD理解为recursive_sequence_length（递归序列长度）。此时，LoD必须是基于长度的。由于历史原因。当LoD在API中被称为lod时，它可能是基于偏移的。用户应该注意。
-
-
-
-
-.. py:method::	has_valid_recursive_sequence_lengths(self: paddle.fluid.core.LoDTensor) → bool
-
-检查LoDTensor的lod值的正确性。
-
-返回:    是否带有正确的lod值
-
-返回类型:    out (bool)
-
-.. py:method::	lod(self: paddle.fluid.core.LoDTensor) → List[List[int]]
-
-得到LoD Tensor的LoD。 
-
-返回：LoD Tensor的LoD。 
-
-返回类型：out（List [List [int]]）
-
-
-.. py:method::	recursive_sequence_lengths(self: paddle.fluid.core.LoDTensor) → List[List[int]]
-
-得到与LoD对应的LoDTensor的序列长度。
-
-返回：LoD对应的一至多个序列长度。
-
-返回类型：out（List [List [int]）
-
-
-
-.. py:method::	set_lod(self: paddle.fluid.core.LoDTensor, lod: List[List[int]]) → None
-
-设置LoDTensor的LoD。
-
-参数：
- **lod** （List [List [int]]） - 要设置的lod。
-
-.. py:method::	set_recursive_sequence_lengths(self: paddle.fluid.core.LoDTensor, recursive_sequence_lengths: List[List[int]]) → None
-
-根据递归序列长度recursive_sequence_lengths设置LoDTensor的LoD。
-
-::
-
-   例如，如果recursive_sequence_lengths = [[2,3]]，
-   意味着有两个长度分别为2和3的序列，相应的lod将是[[0,2,2 + 3]]，即[[0， 2,5]]。
-
-参数：
- **recursive_sequence_lengths** （List [List [int]]） - 序列长度。
-
-
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_LoDTensorArray:
-
-LoDTensorArray
-------------------------------
-
-.. py:class:: paddle.fluid.LoDTensorArray
-
-.. py:method:: append(self: paddle.fluid.core.LoDTensorArray, tensor: paddle.fluid.core.LoDTensor) → None
-
-将LoDTensor追加到LoDTensorArray后。
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_memory_optimize:
-
-memory_optimize
-------------------------------
-
-.. py:function:: paddle.fluid.memory_optimize(input_program, skip_opt_set=None, print_log=False, level=0, skip_grads=False)
-
-
-通过重用var内存来优化内存。
-
-.. note::
-    它不支持block中嵌套子block。
-
-参数:
-	- **input_program** (str) – 输入Program。
-	- **skip_opt_set** (set) – set中的vars将不被内存优化。
-	- **print_log** (bool) – 是否打印debug日志。
-	- **level** (int)  如果 level=0 并且shape是完全相等，则重用。
-	
-返回: None
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_name_scope:
-
-name_scope
-------------------------------
-
-.. py:function:: paddle.fluid.name_scope(prefix=None)
-
-
-为operators生成层次名称前缀
-
-注意： 这个函数只能用于调试和可视化。不要将其用于分析，比如graph/program转换。
-
-参数： 
-	- **prefix** (str) - 前缀
-
-**示例代码**
-
-.. code-block:: python
-          
-    with name_scope("encoder"):
-        ...
-    with name_scope("decoder"):
-        ...
-    with name_scope("attention"):
-        ...
-
-
-
-
-
-
-
-.. _cn_api_fluid_ParallelExecutor:
-
-ParallelExecutor
-------------------------------
-
-.. py:class:: paddle.fluid.ParallelExecutor(use_cuda, loss_name=None, main_program=None, share_vars_from=None, exec_strategy=None, build_strategy=None, num_trainers=1, trainer_id=0, scope=None)
-
-
-
-
-``ParallelExecutor`` 专门设计用来实现数据并行计算，着力于向不同结点(node)分配数据，并行地在不同结点中对数据进行操作。如果在GPU上使用该类运行程序，node则用来指代GPU， ``ParallelExecutor`` 也将自动获取在当前机器上可用的GPU资源。如果在CPU上进行操作，node则指代CPU，同时你也可以通过添加环境变量 ``CPU_NUM`` 来设置CPU设备的个数。例如，``CPU_NUM=4``。但是如果没有设置该环境变量，该类会调用 ``multiprocessing.cpu_count`` 来获取当前系统中CPU的个数。
-
-
-
-
-参数: 
-    - **use_cuda** (bool) – 是否使用CUDA
-    - **loss_name** (str) – 在训练阶段，必须提供loss function名称。默认为None
-    - **main_program** (Program) – 需要执行的program。如果未提供， 那么将使用 ``default_main_program``。 默认为None
-    - **share_vars_from** (ParallelExecutor) – 如果提供了该参数， 则该 ``ParallelExecutor`` 与指定的 ``ParallelExecutor`` 共享变量。默          认为空
-    - **exec_strategy** (ExecutionStrategy) – ``exec_strategy`` 用于调控program在 ``ParallelExecutor`` 中的执行方式，例如，执行该program需要的线程数, 释放在执行过程中产生的临时变量需要的重复(iterations)次数。 请参考 ``fluid.ExecutionStrategy`` 获取详细介绍。该参数默认为 None
-    - **build_strategy** (BuildStrategy) – 设置成员 ``build_strategy`` 可以控制在 ``ParallelExecutor`` 中搭建SSA Graph的方式，例如， ``reduce_strategy`` ， ``gradient_scale_strategy`` 。 请参考 ``fluid.BuildStrategy`` 获取详细介绍。 该参数默认为None
-    - **num_trainers** (int) – 如果该值大于1， NCCL将会通过多层级node的方式来初始化。每个node应有相同的GPU数目。 随之会启用分布式训练。该参数默认为1
-    - **trainer_id** (int) – 必须与 ``num_trainers`` 参数同时使用。``trainer_id`` 是当前所在node的 “rank”（层级），从0开始计数。该参数默认为0
-    - **scope** (Scope) – 指定执行program所在的作用域， 默认使用 ``fluid.global_scope()``
-
-返回：初始化后的 ``ParallelExecutor`` 对象
-
-返回类型:	ParallelExecutor
-
-抛出异常：``TypeError`` - 如果提供的参数 ``share_vars_from`` 不是 ``ParallelExecutor`` 类型的，将会弹出此异常
-
-**代码示例**
-
-..  code-block:: python
-
-  train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=loss.name)
-  test_exe = fluid.ParallelExecutor(use_cuda=True,
-                                    main_program=test_program,
-                                    share_vars_from=train_exe)
-
-  train_loss, = train_exe.run([loss.name], feed=feed_dict)
-  test_loss, = test_exe.run([loss.name], feed=feed_dict)
-
-
-
-.. py:method::  run(fetch_list, feed=None, feed_dict=None, return_numpy=True)
-
-使用 ``fetch_list`` 执行一个 ``ParallelExecutor`` 对象。
-
-参数 ``feed`` 可以是 ``dict`` 或者 ``list`` 类型变量。如果该参数是 ``dict`` 类型，feed中的数据将会被分割(split)并分送给多个设备（CPU/GPU）。
-反之，如果它是 ``list`` ，则列表中的各个元素都会直接分别被拷贝到各设备中。
-
-例如，如果 ``feed`` 是个 ``dict`` 类型变量，则有
-
-..  code-block:: python
-    
-    exe = ParallelExecutor()
-    # 图像会被split到设备中。假设有两个设备，那么每个设备将会处理形为 (24, 1, 28, 28)的图像
-    exe.run(feed={'image': numpy.random.random(size=(48, 1, 28, 28))})
-  
-如果 ``feed`` 是个 ``list`` 类型变量，则有
-
-..  code-block:: python
-
-    exe = ParallelExecutor()
-    # 各设备挨个处理列表中的每个元素
-    # 第一个设备处理形为 (48, 1, 28, 28) 的图像
-    # 第二个设备处理形为 (32, 1, 28, 28) 的图像
-    #
-    # 使用 exe.device_count 得到设备数目
-    exe.run(feed=[{"image": numpy.random.random(size=(48, 1, 28, 28))},
-                  {"image": numpy.random.random(size=(32, 1, 28, 28))},
-                  ])
-
-参数： 
-    - **fetch_list** (list) – 获取的变量名列表
-    - **feed** (list|dict|None) – feed变量。 如果该参数是 ``dict`` 类型，feed中的数据将会被分割(split)并分送给多个设备（CPU/GPU）。反之，如果它是 ``list`` ，则列表中的各个元素都直接分别被拷贝到各设备中。默认为None
-    - **feed_dict** – 该参数已经停止使用。feed参数的别名, 为向后兼容而立。默认为None
-    - **return_numpy** (bool) – 是否将fetched tensor转换为numpy。默认为True
-
-返回： 获取的结果列表
-
-返回类型：List
-
-抛出异常: 
-     - ``ValueError`` - 如果feed参数是list类型，但是它的长度不等于可用设备（执行场所）的数目，再或者给定的feed不是dict类型，抛出此异常
-     - ``TypeError`` - 如果feed参数是list类型，但是它里面的元素不是dict类型时，弹出此异常
-
-.. note::
-     1. 如果feed参数为dict类型，那么传入 ``ParallelExecutor`` 的数据量 *必须* 大于可用的CPU核数或GPU卡数。否则，C++端将会抛出异常。应额外注意核对数据集的最后一个batch是否比可用的CPU核数或GPU卡数大。
-     2. 如果可用的CPU核数或GPU卡数大于一个，则为每个变量最后获取的结果都是list类型，且这个list中的每个元素都是各CPU核或GPU卡上的变量
-
-**代码示例**
-
-..  code-block:: python
-
-        pe = fluid.ParallelExecutor(use_cuda=use_cuda,
-                                    loss_name=avg_cost.name,
-                                    main_program=fluid.default_main_program())
-        loss = pe.run(feed=feeder.feed(cur_batch),
-                      fetch_list=[avg_cost.name]))
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_ParamAttr:
-
- 
-ParamAttr
-------------------------------
-
-
-.. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)
-
-该类代表了参数的各种属性。 为了使神经网络训练过程更加流畅，用户可以根据需要调整参数属性。比如learning rate（学习率）, regularization（正则化）, trainable（可训练性）, do_model_average(平均化模型)和参数初始化方法.
-
-参数:	
-    - **name** (str) – 参数名。默认为None。
-    - **initializer** (Initializer) – 初始化该参数的方法。 默认为None
-    - **learning_rate** (float) – 参数的学习率。计算方法为 :math:`global\_lr*parameter\_lr∗scheduler\_factor` 。 默认为1.0
-    - **regularizer** (WeightDecayRegularizer) – 正则因子. 默认为None
-    - **trainable** (bool) – 该参数是否可训练。默认为True
-    - **gradient_clip** (BaseGradientClipAttr) – 减少参数梯度的方法。默认为None
-    - **do_model_average** (bool) – 该参数是否服从模型平均值。默认为False
-    
-**代码示例**
-
-..  code-block:: python
-
-   w_param_attrs = fluid.ParamAttr(name="fc_weight",
-                                   learning_rate=0.5,
-                                   regularizer=fluid.L2Decay(1.0),
-                                   trainable=True)
-   y_predict = fluid.layers.fc(input=x, size=10, param_attr=w_param_attrs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_Program:
-
-Program
-------------------------------
-
-.. py:class::  paddle.fluid.Program
-
-
-创建python program， 在paddleFluid内部会被转换为ProgramDesc描述语言，用来创建一段 c++ 程序。Program像容器一样，是一种自包含的程序语言。Program中包括至少一个块（Block），当 block 中存在条件选择的控制流op（例如 while_op）时，该Program将会含有嵌套块（nested block）。详情请参阅framework.proto。
-
-注意：默认情况下，paddleFluid内部默认含有 ``default_startup_program`` 和 ``default_main_program`` ，它们将共享参数。 ``default_startup_program`` 只运行一次来初始化参数， ``default_main_program`` 在每个mini batch中运行并调整权重。
-
-返回： empty program
-
-**代码示例**
-
-..  code-block:: python
-
-  main_program = fluid.Program()
-  startup_program = fluid.Program()
-  with fluid.program_guard(main_program=main_program, startup_program=startup_program):
-        fluid.layers.data(name="x", shape=[-1, 784], dtype='float32')
-        fluid.layers.data(name="y", shape=[-1, 1], dtype='int32')
-        fluid.layers.fc(name="fc", shape=[10], dtype='float32', act="relu")
-
-
-
-.. py:attribute:: op_role
-
-operator的角色，值只能是枚举变量{Forward, Backward, Optimize}。
-
-注意：这是一个底层API。它仅用于 ``ParallelExecutor`` 复制或调度operator到设备。
-
-例如，Forward operator应该在每个设备上执行。Backward operator在每个设备上执行，并将后向传播的参数梯度(使用 ``op_role_var`` 获得该变量)合并到一个设备上。Optimize operator只在一个设备上执行，并向其他设备广播新的参数，
-
-
-
-.. py:attribute:: set_op_role
-
-operator的角色，值只能是枚举变量{Forward, Backward, Optimize}。
-
-注意：这是一个底层API。它仅用于 ``ParallelExecutor`` 复制或调度operator到设备上执行。
-
-例如，Forward operator应该在每个设备上执行。Backward operato应该在每个设备上执行，并将后向传播的参数梯度(使用op_role_var获得该变量)合并到一个设备上。Optimize operator只在一个设备上执行，并向其他设备广播新的参数
-
-
-
-.. py:attribute:: op_role_var
-
-``op_role`` 的辅助变量。
-
-参考: ``Program.op_role`` 文档。
-
-注意:这是一个底层API，用户不应该直接使用它。
-
-
-
-.. py:attribute:: set_op_role_var
-
-``op_role`` 的辅助变量。
-
-参考: ``Program.op_role`` 文档。
-
-注意:这是一个底层API。用户不应该直接使用它。
-
-
-
-.. py:method:: to_string(throw_on_error, with_details=False)
-
-用于debug
-
-参数：  
-	- **throw_on_error** (bool): 没有设置任何必需的字段时，抛出值错误。
-	- **with_details** (bool): 值为true时，打印更多关于变量和参数的信息，如trainable, optimize_attr等
-
-返回：(str): debug 字符串
-
-返回类型： str
-
-抛出异常： 
- - ``ValueError`` - 当 ``throw_on_error == true`` ，但没有设置任何必需的字段时，抛出 ``ValueError`` 。
-
-
-
-.. py:method:: clone(for_test=False)
-
-创建一个新的、相同的Program。
-
-有些operator，在训练和测试之间的行为是不同的，比如batch_norm。它们有一个属性is_test来控制行为。当for_test=True时，此方法将把它们的is_test属性更改为True。
-
- 克隆Program，该Program用于训练时，将 ``for_test`` 设置为False。
- 克隆Program，该Program用于测试时，将 ``for_test`` 设置为True。
-
-注意:此API不会删除任何操作符。请在backward和optimization之前使用clone(for_test=True)。
-
-**代码示例**
-
-..  code-block:: python
-
-  test_program = fluid.default_main_program().clone(for_test=True)
-  optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
-  optimizer.minimize()
-
-参数：
-	- **for_test** (bool) – 取值为True时，clone方法内部会把operator的属性 ``is_test`` 设置为 True
-
-返回：一个新的、相同的Program
-
-返回类型:Program
-
-**代码示例**
-
-1.克隆一个Program，示例代码如下：
-
-..  code-block:: python
-
-  train_program = fluid.Program()
-  startup_program = fluid.Program()
-  with fluid.program_guard(train_program, startup_program):
-        img = fluid.layers.data(name='image', shape=[784])
-        hidden = fluid.layers.fc(input=img, size=200, act='relu')
-        hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
-        loss = fluid.layers.cross_entropy(
-                     input=fluid.layers.fc(hidden, size=10, act='softmax'),
-                     label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
-  test_program = train_program.clone(for_test=True)
-  sgd = fluid.optimizer.SGD(learning_rate=1e-3)
-  with fluid.program_guard(train_program, startup_program):
-        sgd.minimize(loss)    
-	
-2.如果分别运行 train Program 和 test Program，则可以不使用clone。
-
-..  code-block:: python
-
-	import paddle.fluid as fluid
-
- 	def network(is_test):
-	     img = fluid.layers.data(name='image', shape=[784])
-	     hidden = fluid.layers.fc(input=img, size=200, act='relu')
-	     hidden = fluid.layers.dropout(hidden, dropout_prob=0.5, is_test=is_test)
-	     loss = fluid.layers.cross_entropy(
-			 input=fluid.layers.fc(hidden, size=10, act='softmax'),
-			 label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
-	     return loss
-
-	 train_program = fluid.Program()
-	 startup_program = fluid.Program()
-	 test_program = fluid.Program()
-
-	 with fluid.program_guard(train_program, startup_program):
-	     with fluid.unique_name.guard():
-		 loss = network(is_test=False)
-		 sgd = fluid.optimizer.SGD(learning_rate=1e-3)
-		 sgd.minimize(loss)
-
-	 # 不使用测试阶段的startup program
-	 with fluid.program_guard(test_program, fluid.Program()):
-	     with fluid.unique_name.guard():
-		 loss = network(is_test=True)
-
-上边两个代码片段生成的Program是一样的。
-
-.. py:staticmethod:: parse_from_string(binary_str)
-
-反序列化protobuf，转换成program
-
-注意:在序列化和反序列化之后，所有关于参数的信息都会丢失。
-
-参数:	
-    - **binary_str_type** (str) – prootbuf二进制字符串
-
-返回:	反序列化后的ProgramDesc
-
-返回类型：Program
-
-.. py:attribute:: num_blocks
-
-该program中的block的个数
-
-.. py:attribute:: random_seed
-
-
-程序中随机运算符的默认随机种子。0意味着从随机设备中获取随机种子。
-
-注意：必须在operator被添加之前设置。
-
-.. py:method:: global_block()
-
-获取该program的第一个block。
-
-.. py:method:: block(index)
-
-返回该program中 ， ``index`` 指定的block。 ``index`` 类型为int
-
-返回：index对应的block
-
-返回类型：Block
-
-.. py:method:: current_block()
-
-获取当前block。当前block是用来添加operators。
-
-.. py:method:: list_vars()
-
-获取当前program中所有变量。返回值是一个可迭代对象（iterable object)。
-
-返回：generator 会yield每个Program中的变量
-
-返回类型：iterable
-	
-
-
-
-
-
-
-
-.. _cn_api_fluid_program_guard:
-
-program_guard
-------------------------------
-
-.. py:function::    paddle.fluid.program_guard(main_program, startup_program=None)
-
-
-
-该函数应配合使用python的“with”语句来改变全局主程序(main program)和启动程序(startup program)。
-
-“with”语句块中的layer函数将在新的main program（主程序）中添加operators（算子）和variables（变量）。
-
-**代码示例**
-
-..  code-block:: python
-
-	import paddle.fluid as fluid
-	main_program = fluid.Program()
-	startup_program = fluid.Program()
-	with fluid.program_guard(main_program, startup_program):
-		data = fluid.layers.data(...)
- 		hidden = fluid.layers.fc(...)
-
-需要注意的是，如果用户不需要构建自己的启动程序或者主程序，一个临时的program将会发挥作用。
-
-**代码示例**
-
-..  code-block:: python
-
-	import paddle.fluid as fluid
-	main_program = fluid.Program()
-	# 如果您不需要关心startup program,传入一个临时值即可
-	with fluid.program_guard(main_program, fluid.Program()):
-		data = ...
-
-
-参数：  
-		- **main_program** (Program) – “with”语句中将使用的新的main program。
-		- **startup_program** (Program) – “with”语句中将使用的新的startup program。若传入 ``None`` 则不改变当前的启动程序。
-
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_release_memory:
-
-release_memory
-------------------------------
-
-.. py:function:: paddle.fluid.release_memory(input_program, skip_opt_set=None) 
-
-
-该函数可以调整输入program，插入 ``delete_op`` 删除算子，提前删除不需要的变量。
-改动是在变量本身上进行的。
-
-**提醒**: 该API还在试验阶段，会在后期版本中删除。不建议用户使用。
-
-参数:	
-    - **input_program** (Program) – 在此program中插入 ``delete_op`` 
-    - **skip_opt_set** (set) – 在内存优化时跳过的变量的集合
-
-返回: None
-
-
-
-.. _cn_api_fluid_scope_guard:
-
-scope_guard
-------------------------------
-
-.. py:function:: paddle.fluid.scope_guard(scope)
-
-
-修改全局/默认作用域（scope）,  运行时中的所有变量都将分配给新的scope。
-
-参数：
-	- **scope** - 新的全局/默认 scope。
-
-**代码示例**
-
-..  code-block:: python
-
-	import paddle.fluid as fluid
-	
-	new_scope = fluid.Scope()
-	with fluid.scope_guard(new_scope):
-		...
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_Tensor:
-
-Tensor
-------------------------------
-
-.. py:function:: paddle.fluid.Tensor
-
-    ``LoDTensor`` 的别名
-
-
-
-
-
-
-
-
-
-.. _cn_api_fluid_WeightNormParamAttr:
-
-WeightNormParamAttr
-------------------------------
-
-.. py:class:: paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)
-
-
-权重归一化。权重归一化是将权重向量的长度与其方向解耦。`Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks <https://arxiv.org/pdf/1602.07868.pdf>`_ 这篇paper中讨论了权重归一化的实现
-
-参数:
-	- **dim** (list) - 参数的名称。默认None。
-	- **name** (str) - 参数的名称。默认None。
-	- **initializer** （initializer) - 初始化参数的方法。默认None。
-	- **learning_rate** (float) - 学习率。优化时学习速率 :math:`global\_lr∗parameter\_lr∗scheduler\_factor` 。默认1.0。
-	- **regularizer** (WeightDecayRegularizer) - 正则化因子。默认None。
-	- **trainable** (bool) - 参数是否可训练。默认True。
-	- **gradient_clip** (BaseGradientClipAttr) - 梯度下降裁剪（Gradient Clipping）的方法。默认None。
-	- **do_model_average** (bool) - 参数是否应该model average。默认False。
-
-返回： empty program
-
-**代码示例**
-
-..  code-block:: python
-
-	data = fluid.layers.data(name="data", shape=[3, 32, 32], dtype="float32")
-	fc = fluid.layers.fc(input=data,
-			     size=1000,
-			     param_attr=WeightNormParamAttr(
-				  dim=None,
-				  name='weight_norm_param'))
-
-
-
-
-
-
-
-
+#################
+ fluid
+#################
+
+
+
+.. _cn_api_fluid_AsyncExecutor:
+
+AsyncExecutor
+-------------------------------
+
+.. py:class:: paddle.fluid.AsyncExecutor(place=None, run_mode='')
+
+**AsyncExecutor正在积极开发，API可能在短期内进行调整。**
+
+Python中的异步执行器。AsyncExecutor利用多核处理器和数据排队的强大功能，使数据读取和融合解耦，每个线程并行运行。
+
+AsyncExecutor不是在python端读取数据，而是接受一个训练文件列表，该列表将在c++中检索，然后训练输入将被读取、解析并在c++代码中提供给训练网络。
+
+
+参数：
+	- **place** (fluid.CPUPlace|None) - 指示 executor 将在哪个设备上运行。目前仅支持CPU
+
+**代码示例：**
+
+.. code-block:: python
+
+    data_feed = fluid.DataFeedDesc('data.proto')
+    startup_program = fluid.default_startup_program()
+    main_program = fluid.default_main_program()
+    filelist = ["train_data/part-%d" % i for i in range(100)]
+    thread_num = len(filelist) / 4
+    place = fluid.CPUPlace()
+    async_executor = fluid.AsyncExecutor(place)
+    async_executor.run_startup_program(startup_program)
+    epoch = 10
+    for i in range(epoch):
+        async_executor.run(main_program,
+                           data_feed,
+                           filelist,
+                           thread_num,
+                           [acc],
+                           debug=False)
+
+.. note::
+
+	对于并行gpu调试复杂网络，您可以在executor上测试。他们有完全相同的参数，并可以得到相同的结果。
+
+	目前仅支持CPU
+
+.. py:method:: run(program, data_feed, filelist, thread_num, fetch, mode='', debug=False)
+
+使用此 ``AsyncExecutor`` 来运行 ``program`` 。
+
+``filelist`` 中包含训练数据集。用户也可以通过在参数 ``fetch`` 中提出变量来检查特定的变量， 正如 ``fluid.Executor`` 。
+
+但不像 ``fluid.Executor`` ， ``AsyncExecutor`` 不返回获取到的变量，而是将每个获取到的变量作为标准输出展示给用户。
+
+数据集上的运算在多个线程上执行，每个线程中都会独立出一个线程本地作用域，并在此域中建立运算。
+所有运算同时更新参数值。
+
+参数:	
+  - **program**  (Program) – 需要执行的program。如果没有提供该参数，默认使用 ``default_main_program`` 
+  - **data_feed**  (DataFeedDesc) –  ``DataFeedDesc`` 对象
+  - **filelist**  (str) – 一个包含训练数据集文件的文件列表
+  - **thread_num**  (int) – 并发训练线程数。参照 *注解* 部分获取合适的设置方法
+  - **fetch**  (str|list) – 变量名，或者变量名列表。指明最后要进行观察的变量命名
+  - **mode**  (str) – 该接口的运行模式
+  - **debug**  (bool) – 如果为True, 在每一个minibatch处理后，fetch 中指明的变量将会通过标准输出打印出来
+
+.. note::
+    1.该执行器会运行program中的所有运算，不只是那些依赖于fetchlist的运算
+
+    2.该类执行器在多线程上运行，每个线程占用一个CPU核。为了实现效率最大化，建议将 ``thread_num`` 等于或稍微小于CPU核心数
+
+.. py:method:: download_data(afs_path, local_path, fs_default_name, ugi, file_cnt, hadoop_home='$HADOOP_HOME', process_num=12)
+
+download_data是用于分布式训练的默认下载方法，用户可不使用该方法下载数据。
+
+**示例**
+
+..  code-block:: python
+
+    exe = fluid.AsyncExecutor()
+    exe.download_data("/xxx/xxx/xx/",
+                      "./data", "afs://
+     xxx.xxx.xxx.xxx:9901", "xxx,yyy")
+
+参数: 
+  - **afs_path** （str） - 用户定义的afs_path
+  - **local_path** （str） - 下载数据路径
+  - **fs_default_name** （str） - 文件系统服务器地址
+  - **ugi** （str） -  hadoop ugi
+  - **file_cnt** （int） - 用户可以指定用于调试的文件号
+  - **hadoop_home** （str） -  hadoop home path
+  - **process_num** （int） - 下载进程号
+
+.. py:method:: get_instance()
+
+获取当前节点的实例，以便用户可以在分布式背景下中执行操作。
+
+.. py:method:: config_distributed_nodes()
+
+如果用户需要运行分布式AsyncExecutor，则需要进行全局配置，以便获取当前进程的信息。
+
+.. py:method:: stop()
+
+在流程结束时，用户应该停止服务器并阻止所有workers。
+
+.. py:method:: init_server(dist_desc)
+
+如果当前进程是server，则初始化当前节点的服务器。
+
+参数: 
+  - **dist_desc** （str）- 描述如何初始化worker和server的protobuf字符串
+
+.. py:method:: init_worker(dist_desc, startup_program)
+
+如果当前进程是worker，则初始化当前节点的worker 
+
+参数: 
+  - **dist_desc** （str）- 描述如何初始化worker和server的protobuf字符串
+  - **startup_program** （fluid.Program）- 当前进程的startup program
+
+.. py:method:: init_model()
+
+可以从其中一个worker中调用的init_model命令。随之，在server中初始化模型参数。
+
+.. py:method:: save_model(save_path)
+
+可以从其中一个worker调用的save_model命令。随之，模型参数会保存在server中并上传到文件系统的save_path指定的位置。
+
+参数: 
+  - **save_path** （str）- 文件系统的保存路径
+
+
+.. _cn_api_fluid_BuildStrategy:
+
+BuildStrategy
+-------------------------------
+
+.. py:class::  paddle.fluid.BuildStrategy
+
+``BuildStrategy`` 使用户更精准地控制 ``ParallelExecutor`` 中SSA图的建造方法。可通过设置 ``ParallelExecutor`` 中的 ``BuildStrategy`` 成员来实现此功能。
+
+**代码示例**
+
+..  code-block:: python
+
+    build_strategy = fluid.BuildStrategy()
+    build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce
+
+    train_exe = fluid.ParallelExecutor(use_cuda=True,
+                                       loss_name=loss.name,
+                                       build_strategy=build_strategy)
+
+    train_loss, = train_exe.run([loss.name], feed=feed_dict)
+
+
+
+.. py:attribute:: debug_graphviz_path
+
+str类型。它表明了以graphviz格式向文件中写入SSA图的路径，有利于调试。 默认值为""。
+
+.. py:attribute:: enable_sequential_execution
+
+类型是BOOL。 如果设置为True，则ops的执行顺序将与program中的执行顺序相同。 默认为False。
+
+
+.. py:attribute:: fuse_elewise_add_act_ops
+
+bool类型。它表明了是否融合（fuse）elementwise_add_op和activation_op。这会使整体执行过程更快一些。默认为False。
+
+.. py:attribute:: fuse_relu_depthwise_conv
+
+BOOL类型，fuse_relu_depthwise_conv指示是否融合relu和depthwise_conv2d，它会节省GPU内存并可能加速执行过程。 此选项仅适用于GPU设备。 默认为False。
+
+
+.. py:attribute:: gradient_scale_strategy
+
+str类型。在 ``ParallelExecutor`` 中，存在三种定义 *loss@grad* 的方式，分别为 ``CoeffNumDevice``, ``One`` 与 ``Customized``。默认情况下， ``ParallelExecutor`` 根据设备数目来设置 *loss@grad* 。如果你想自定义 *loss@grad* ，你可以选择 ``Customized`` 方法。默认为 ``CoeffNumDevice`` 。
+
+
+
+.. py:attribute:: reduce_strategy
+
+str类型。在 ``ParallelExecutor`` 中，存在两种减少策略（reduce strategy），即 ``AllReduce`` 和 ``Reduce`` 。如果你需要在所有执行场所上独立地进行参数优化，可以使用 ``AllReduce`` 。反之，如果使用 ``Reduce`` 策略，所有参数的优化将均匀地分配给不同的执行场所，随之将优化后的参数广播给其他执行场所。在一些模型中， ``Reduce`` 策略执行速度更快一些。默认值为 ``AllReduce`` 。
+
+.. py:attribute:: remove_unnecessary_lock
+
+BOOL类型。如果设置为True, GPU操作中的一些锁将被释放，ParallelExecutor将运行得更快，默认为 True。
+
+.. py:attribute:: sync_batch_norm
+
+类型为bool，sync_batch_norm表示是否使用同步的批正则化，即在训练阶段通过多个设备同步均值和方差。
+
+当前的实现不支持FP16培训和CPU。仅在一台机器上进行同步式批正则，不适用于多台机器。
+
+默认为 False。
+
+
+.. _cn_api_fluid_CompiledProgram:
+
+CompiledProgram
+-------------------------------
+
+.. py:class:: paddle.fluid.CompiledProgram(program_or_graph)
+
+编译成一个用来执行的Graph。
+
+1. 首先使用layers(网络层)创建程序。
+2. （可选）可使用CompiledProgram来在运行之前优化程序。
+3. 定义的程序或CompiledProgram由Executor运行。
+
+CompiledProgram用于转换程序以进行各种优化。例如，
+
+- 预先计算一些逻辑，以便每次运行更快。
+- 转换Program，使其可以在多个设备中运行。
+- 转换Program以进行优化预测或分布式训练。
+
+**代码示例**
+
+..  code-block:: python
+
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+            exe = fluid.Executor(place)
+            exe.run(startup)
+            compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
+                loss_name=loss.name)
+            for i in range(5):
+                test_loss, = exe.run(compiled_prog,
+                                     feed=feed_dict,
+                                     fetch_list=[loss.name])
+
+参数：
+  - **program_or_graph** (Graph|Program): 如果它是Program，那么它将首先被降成一个graph，以便进一步优化。如果它是一个graph（以前可能优化过），它将直接用于进一步的优化。注意：只有使用 with_data_parallel 选项编译时才支持graph。
+
+.. py:method:: with_data_parallel(loss_name=None, build_strategy=None, exec_strategy=None, share_vars_from=None, places=None)
+
+配置Program使其以数据并行方式运行。
+
+参数：
+  - **loss_name** （str） - 损失函数名称必须在训练过程中设置。 默认None。
+  - **build_strategy** （BuildStrategy） -  build_strategy用于构建图，因此它可以在具有优化拓扑的多个设备/核上运行。 有关更多信息，请参阅  ``fluid.BuildStrategy`` 。 默认None。
+  - **exec_strategy** （ExecutionStrategy） -  exec_strategy用于选择执行图的方式，例如使用多少线程，每次清理临时变量之前进行的迭代次数。 有关更多信息，请参阅 ``fluid.ExecutionStrategy`` 。 默认None。
+  - **share_vars_from** （CompiledProgram） - 如果有，此CompiledProgram将共享来自share_vars_from的变量。 share_vars_from指定的Program必须由此CompiledProgram之前的Executor运行，以便vars准备就绪。
+  - **places** （list(CUDAPlace)|list(CPUPlace)|None） - 如果提供，则仅在给定位置编译程序。否则，编译时使用的位置由Executor确定，使用的位置由环境变量控制：如果使用GPU，则标记FLAGS_selected_gpus或CUDA_VISIBLE_DEVICES设备；如果使用CPU，则标记CPU_NUM。例如，如果要在GPU 0和GPU 1上运行，请设置places=[fluid.CUDAPlace(0), fluid.CUDAPlace(1)]。如果要在2个CPU核心上运行，请设置places=[fluid.CPUPlace()]*2。
+
+返回: self
+
+.. py:method:: with_inference_optimize(config)
+
+添加预测优化。
+
+参数：
+  - **config** - 用于创建预测器的NativeConfig或AnalysisConfig的实例
+
+返回: self
+
+
+.. _cn_api_fluid_cpu_places:
+
+cpu_places
+-------------------------------
+
+.. py:function:: paddle.fluid.cpu_places(device_count=None) 
+
+创建 ``fluid.CPUPlace`` 对象列表。
+
+如果 ``device_count`` 为None，则设备数目将由环境变量 ``CPU_NUM`` 确定。如果未设置 ``CPU_NUM`` ，则设备数目将由 ``multiprocessing.cpu_count()`` 确定。
+
+参数：
+  - **device_count** (None|int) - 设备数目
+
+返回: CPUPlace列表
+
+返回类型：out (list(fluid.CPUPlace))
+
+
+
+.. _cn_api_fluid_CPUPlace:
+
+CPUPlace
+-------------------------------
+
+.. py:class:: paddle.fluid.CPUPlace
+
+
+CPUPlace是设备的描述符。它代表一个CPU，可以访问CPUPlace对应的内存。
+
+
+
+
+
+
+.. _cn_api_fluid_create_lod_tensor:
+
+
+create_lod_tensor
+-------------------------------
+
+.. py:function:: paddle.fluid.create_lod_tensor(data, recursive_seq_lens, place) 
+
+
+该函数从一个numpy数组，列表或者已经存在的lod tensor中创建一个lod tensor。
+
+通过一下几步实现:
+
+1. 检查length-based level of detail (LoD,长度为基准的细节层次)，或称recursive_sequence_lengths(递归序列长度)的正确性
+
+2. 将recursive_sequence_lengths转化为offset-based LoD(偏移量为基准的LoD)
+
+3. 把提供的numpy数组，列表或者已经存在的lod tensor复制到CPU或GPU中(依据执行场所确定)
+
+4. 利用offset-based LoD来设置LoD
+
+例如：
+         假如我们想用LoD Tensor来承载一词序列的数据，其中每个词由一个整数来表示。现在，我们意图创建一个LoD Tensor来代表两个句子，其中一个句子有两个词，另外一个句子有三个。
+     	 那么数 ``data`` 可以是一个numpy数组，形状为（5,1）。同时， ``recursive_seq_lens`` 为 [[2, 3]]，表明各个句子的长度。这个长度为基准的 ``recursive_seq_lens`` 将在函数中会被转化为以偏移量为基准的 LoD [[0, 2, 5]]。
+
+参数:
+	- **data** (numpy.ndarray|list|LoDTensor) – 容纳着待复制数据的一个numpy数组、列表或LoD Tensor
+	- **recursive_seq_lens** (list) – 一组列表的列表， 表明了由用户指明的length-based level of detail信息
+	- **place** (Place) – CPU或GPU。 指明返回的新LoD Tensor存储地点
+
+返回: 一个fluid LoDTensor对象，包含数据和 ``recursive_seq_lens`` 信息
+
+
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_create_random_int_lodtensor:
+
+
+create_random_int_lodtensor
+-------------------------------
+
+.. py:function:: paddle.fluid.create_random_int_lodtensor(recursive_seq_lens, base_shape, place, low, high)
+
+
+
+该函数创建一个存储多个随机整数的LoD Tensor。
+
+该函数是经常在书中出现的案例，所以我们根据新的API： ``create_lod_tensor`` 更改它然后放在LoD Tensor板块里来简化代码。
+
+该函数实现以下功能：
+
+1. 根据用户输入的length-based ``recursive_seq_lens`` （基于长度的递归序列长）和在 ``basic_shape`` 中的基本元素形状计算LoDTensor的整体形状
+2. 由此形状，建立numpy数组
+3. 使用API： ``create_lod_tensor`` 建立LoDTensor
+
+
+假如我们想用LoD Tensor来承载一词序列，其中每个词由一个整数来表示。现在，我们意图创建一个LoD Tensor来代表两个句子，其中一个句子有两个词，另外一个句子有三个。那么 ``base_shape`` 为[1], 输入的length-based ``recursive_seq_lens`` 是 [[2, 3]]。那么LoDTensor的整体形状应为[5, 1]，并且为两个句子存储5个词。
+
+参数:	
+    - **recursive_seq_lens** (list) – 一组列表的列表， 表明了由用户指明的length-based level of detail信息
+    - **base_shape** (list) – LoDTensor所容纳的基本元素的形状
+    - **place** (Place) –  CPU或GPU。 指明返回的新LoD Tensor存储地点
+    - **low** (int) – 随机数下限
+    - **high** (int) – 随机数上限
+
+返回:	一个fluid LoDTensor对象，包含数据和 ``recursive_seq_lens`` 信息
+
+
+
+.. _cn_api_fluid_cuda_pinned_places:
+
+cuda_pinned_places
+-------------------------------
+
+
+.. py:function:: paddle.fluid.cuda_pinned_places(device_count=None)
+
+
+
+创建 ``fluid.CUDAPinnedPlace`` 对象列表。
+
+如果 ``device_count`` 为None，则设备数目将由环境变量 ``CPU_NUM`` 确定。如果未设置 ``CPU_NUM`` ，则设备数目将由 ``multiprocessing.cpu_count()`` 确定。
+
+参数：
+  - **device_count** (None|int) - 设备数目
+
+返回: CUDAPinnedPlace对象列表
+
+返回类型：out(list(fluid.CUDAPinnedPlace))
+
+
+
+.. _cn_api_fluid_cuda_places:
+
+cuda_places
+-------------------------------
+
+.. py:function:: paddle.fluid.cuda_places(device_ids=None)
+
+创建 ``fluid.CUDAPlace`` 对象列表。
+
+
+
+如果 ``device_ids`` 为None，则首先检查 ``FLAGS_selected_gpus`` 的环境变量。如果 ``FLAGS_selected_gpus=0,1,2`` ，则返回的列表将为[fluid.CUDAPlace(0), fluid.CUDAPlace(1), fluid.CUDAPlace(2)]。如果未设置标志 ``FLAGS_selected_gpus`` ，则将返回所有可见的GPU places。
+
+
+如果 ``device_ids`` 不是None，它应该是GPU的设备ID。例如，如果 ``device_id=[0,1,2]`` ，返回的列表将是[fluid.CUDAPlace(0), fluid.CUDAPlace(1), fluid.CUDAPlace(2)]。
+
+参数：
+  - **device_ids** (None|list(int)|tuple(int)) - GPU的设备ID列表
+
+返回: CUDAPlace列表
+
+返回类型：out (list(fluid.CUDAPlace))
+
+
+
+
+
+.. _cn_api_fluid_CUDAPinnedPlace:
+
+CUDAPinnedPlace
+-------------------------------
+
+.. py:class:: paddle.fluid.CUDAPinnedPlace
+
+CUDAPinnedPlace是一个设备描述符，它所指代的存储空间可以被GPU和CPU访问。
+
+
+
+
+.. _cn_api_fluid_CUDAPlace:
+
+CUDAPlace
+-------------------------------
+
+.. py:class:: paddle.fluid.CUDAPlace
+
+CUDAPlace是一个设备描述符，它代表一个GPU，并且每个CUDAPlace有一个dev_id（设备id）来表明当前CUDAPlace代表的卡数。dev_id不同的CUDAPlace所对应的内存不可相互访问。
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_DataFeedDesc:
+
+DataFeedDesc
+-------------------------------
+
+.. py:class:: paddle.fluid.DataFeedDesc(proto_file)
+
+数据描述符，描述输入训练数据格式。
+
+这个类目前只用于AsyncExecutor(有关类AsyncExecutor的简要介绍，请参阅注释)
+
+DataFeedDesc应由来自磁盘的有效protobuf消息初始化:
+
+.. code-block:: python
+
+	data_feed = fluid.DataFeedDesc('data.proto')
+
+可以参考 :code:`paddle/fluid/framework/data_feed.proto` 查看我们如何定义message
+
+一段典型的message可能是这样的：
+
+.. code-block:: text
+
+    name: "MultiSlotDataFeed"
+    batch_size: 2
+    multi_slot_desc {
+        slots {
+            name: "words"
+            type: "uint64"
+            is_dense: false
+            is_used: true
+        }
+        slots {
+            name: "label"
+            type: "uint64"
+            is_dense: false
+            is_used: true
+        }
+    }
+
+但是，用户通常不应该关心消息格式;相反，我们鼓励他们在将原始日志文件转换为AsyncExecutor可以接受的训练文件的过程中，使用 :code:`Data Generator` 生成有效数据描述。
+
+DataFeedDesc也可以在运行时更改。一旦你熟悉了每个字段的含义，您可以修改它以更好地满足您的需要。例如:
+
+.. code-block:: python
+
+    data_feed.set_batch_size(128)
+    data_feed.set_dense_slots('wd')  # The slot named 'wd' will be dense
+    data_feed.set_use_slots('wd')    # The slot named 'wd' will be used
+    
+    #Finally, the content can be dumped out for debugging purpose:
+    
+    print(data_feed.desc())
+
+
+参数：
+	- **proto_file** (string) - 包含数据feed中描述的磁盘文件
+
+
+.. py:method:: set_batch_size(batch_size)
+
+设置batch size，训练期间有效
+
+
+参数：
+	- batch_size：batch size
+
+**代码示例：**
+
+.. code-block:: python
+	
+	data_feed = fluid.DataFeedDesc('data.proto')
+	data_feed.set_batch_size(128)
+
+.. py:method:: set_dense_slots(dense_slots_name)
+
+指定slot经过设置后将变成密集的slot，仅在训练期间有效。
+
+密集slot的特征将被输入一个Tensor，而稀疏slot的特征将被输入一个lodTensor
+
+
+参数：
+	- **dense_slots_name** : slot名称的列表，这些slot将被设置为密集的
+
+**代码示例：**
+
+.. code-block:: python
+	
+	data_feed = fluid.DataFeedDesc('data.proto')
+	data_feed.set_dense_slots(['words'])
+
+.. note:: 
+
+	默认情况下，所有slot都是稀疏的
+
+.. py:method:: set_use_slots(use_slots_name)
+
+
+设置一个特定的slot是否用于训练。一个数据集包含了很多特征，通过这个函数可以选择哪些特征将用于指定的模型。
+
+参数：
+	- **use_slots_name** :将在训练中使用的slot名列表
+
+**代码示例：**
+
+.. code-block:: python
+
+	data_feed = fluid.DataFeedDesc('data.proto')
+	data_feed.set_use_slots(['words'])
+
+.. note::
+	
+	默认值不用于所有slot
+
+
+.. py:method:: desc()
+
+返回此DataFeedDesc的protobuf信息
+
+返回：一个message字符串
+
+**代码示例：**
+
+.. code-block:: python
+
+	data_feed = fluid.DataFeedDesc('data.proto')
+	print(data_feed.desc())
+
+
+
+
+
+
+.. _cn_api_fluid_DataFeeder:
+
+DataFeeder
+-------------------------------
+
+.. py:class:: paddle.fluid.DataFeeder(feed_list, place, program=None)
+
+
+
+``DataFeeder`` 负责将reader(读取器)返回的数据转成一种特殊的数据结构，使它们可以输入到 ``Executor`` 和 ``ParallelExecutor`` 中。
+reader通常返回一个minibatch条目列表。在列表中每一条目都是一个样本（sample）,它是由具有一至多个特征的列表或元组组成的。
+
+
+以下是简单用法：
+
+..  code-block:: python
+	
+	place = fluid.CPUPlace()
+	img = fluid.layers.data(name='image', shape=[1, 28, 28])
+	label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+	feeder = fluid.DataFeeder([img, label], fluid.CPUPlace())
+	result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])
+	
+在多GPU模型训练时，如果需要提前分别向各GPU输入数据，可以使用 ``decorate_reader`` 函数。
+
+..  code-block:: python
+
+	place=fluid.CUDAPlace(0)
+	feeder = fluid.DataFeeder(place=place, feed_list=[data, label])
+	reader = feeder.decorate_reader(
+    		paddle.batch(flowers.train(), batch_size=16))
+
+
+
+参数：
+    - **feed_list** (list) – 向模型输入的变量表或者变量表名
+    - **place** (Place) – place表明是向GPU还是CPU中输入数据。如果想向GPU中输入数据, 请使用 ``fluid.CUDAPlace(i)`` (i 代表 the GPU id)；如果向CPU中输入数据, 请使用  ``fluid.CPUPlace()``
+    - **program** (Program) – 需要向其中输入数据的Program。如果为None, 会默认使用 ``default_main_program()``。 缺省值为None
+
+
+抛出异常:
+  - ``ValueError``  – 如果一些变量不在此 Program 中
+
+
+**代码示例**
+
+..  code-block:: python
+
+	# ...
+	place = fluid.CPUPlace()
+	feed_list = [
+    		main_program.global_block().var(var_name) for var_name in feed_vars_name
+	] # feed_vars_name 是一个由变量名组成的列表
+	feeder = fluid.DataFeeder(feed_list, place)
+	for data in reader():
+    		outs = exe.run(program=main_program,
+               		       feed=feeder.feed(data))
+			       
+			       
+.. py:method:: feed(iterable)
+
+
+根据feed_list（数据输入表）和iterable（可遍历的数据）提供的信息，将输入数据转成一种特殊的数据结构，使它们可以输入到 ``Executor`` 和 ``ParallelExecutor`` 中。
+
+参数:	
+	- **iterable** (list|tuple) – 要输入的数据
+
+返回：  转换结果
+
+返回类型:	dict
+
+
+.. py:method:: feed_parallel(iterable, num_places=None)
+
+
+该方法获取的多个minibatch，并把每个minibatch提前输入进各个设备中。
+
+参数:	
+    - **iterable** (list|tuple) – 要输入的数据
+    - **num_places** (int) – 设备数目。默认为None。
+
+返回: 转换结果
+
+返回类型: dict
+
+.. note::
+     设备（CPU或GPU）的数目必须等于minibatch的数目
+
+
+
+.. py:method::  decorate_reader(reader, multi_devices, num_places=None, drop_last=True)
+
+
+  
+将reader返回的输入数据batch转换为多个mini-batch，之后每个mini-batch都会被输入进各个设备（CPU或GPU）中。
+    
+参数：
+        - **reader** (fun) – 该参数是一个可以生成数据的函数
+        - **multi_devices** (bool) – bool型，指明是否使用多个设备
+        - **num_places** (int) – 如果 ``multi_devices`` 为 ``True`` , 可以使用此参数来设置GPU数目。如果 ``multi_devices`` 为 ``None`` ，该函数默认使用当前训练机所有GPU设备。默认为None。
+        - **drop_last** (bool) – 如果最后一个batch的大小比 ``batch_size`` 要小，则可使用该参数来指明是否选择丢弃最后一个batch数据。 默认为 ``True`` 
+
+返回：转换结果
+
+返回类型: dict
+    
+抛出异常： ``ValueError`` – 如果 ``drop_last`` 值为False并且data batch与设备不匹配时，产生此异常
+
+
+        
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_default_main_program:
+
+default_main_program
+-------------------------------
+
+.. py:function:: paddle.fluid.default_main_program()
+
+
+
+
+
+此函数用于获取默认或全局main program(主程序)。该主程序用于训练和测试模型。
+
+``fluid.layers`` 中的所有layer函数可以向 ``default_main_program`` 中添加operators（算子）和variables（变量）。
+
+``default_main_program`` 是fluid的许多编程接口（API）的Program参数的缺省值。例如,当用户program没有传入的时候，
+``Executor.run()`` 会默认执行 ``default_main_program`` 。
+
+
+返回：	main program
+
+返回类型:	Program
+
+
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_default_startup_program:
+
+
+
+
+default_startup_program
+-------------------------------
+
+.. py:function:: paddle.fluid.default_startup_program()
+
+
+
+该函数可以获取默认/全局 startup program (启动程序)。
+
+``fluid.layers`` 中的layer函数会新建参数、readers(读取器)、NCCL句柄作为全局变量。 
+
+startup_program会使用内在的operators（算子）去初始化他们，并由layer函数将这些operators追加到startup program中。
+
+该函数将返回默认的或当前的startup_program。用户可以使用 ``fluid.program_guard`` 去切换program。
+
+返回:	startup program
+
+返回类型:	Program
+
+
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_DistributeTranspiler:
+
+DistributeTranspiler
+-------------------------------
+
+.. py:class:: paddle.fluid.DistributeTranspiler (config=None)
+
+
+该类可以把fluid program转变为分布式数据并行计算程序（distributed data-parallelism programs）,可以有Pserver和NCCL2两种模式。
+当program在Pserver（全称：parameter server）模式下， ``main_program`` (主程序)转为使用一架远程parameter server(即pserver,参数服务器)来进行参数优化，并且优化图会被输入到一个pserver program中。
+在NCCL2模式下，transpiler会在 ``startup_program`` 中附加一个 ``NCCL_ID`` 广播算子（broadcasting operators）来实现在该集群中所有工作结点共享 ``NCCL_ID`` 。
+调用 ``transpile_nccl2`` 后， 你 **必须** 将 ``trainer_id`` , ``num_trainers`` 参数提供给 ``ParallelExecutor`` 来启动NCCL2分布式模式。 
+
+
+
+
+**代码示例**
+
+..  code-block:: python
+
+	#pserver模式下
+	pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
+	trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
+	current_endpoint = "192.168.0.1:6174"
+	trainer_id = 0
+	trainers = 4
+	role = os.getenv("PADDLE_TRAINING_ROLE")
+
+	t = fluid.DistributeTranspiler()
+	t.transpile(
+     	     trainer_id, pservers=pserver_endpoints, trainers=trainers)
+	if role == "PSERVER":
+     	     pserver_program = t.get_pserver_program(current_endpoint)
+             pserver_startup_program = t.get_startup_program(current_endpoint,
+                                                     pserver_program)
+	elif role == "TRAINER":
+             trainer_program = t.get_trainer_program()
+
+	# nccl2模式下
+	config = fluid.DistributeTranspilerConfig()
+	config.mode = "nccl2"
+	t = fluid.DistributeTranspiler(config=config)
+	t.transpile(trainer_id, workers=workers, current_endpoint=curr_ep)
+	exe = fluid.ParallelExecutor(
+    	    use_cuda,
+            loss_name=loss_var.name,
+            num_trainers=len(trainers.split(",)),
+            trainer_id=trainer_id
+	)
+
+
+
+.. py:method:: transpile(trainer_id, program=None, pservers='127.0.0.1:6174', trainers=1, sync_mode=True, startup_program=None, current_endpoint='127.0.0.1:6174')
+
+该方法可以运行该transpiler（转译器）。
+
+参数:	
+	- **trainer_id** (int) – 当前Trainer worker的id, 如果有n个Trainer worker, id 取值范围为0 ~ n-1
+	- **program** (Program|None) – 待transpile（转译）的program, 缺省为 ``fluid.default_main_program()`` 
+	- **startup_program** (Program|None) - 要转译的 ``startup_program`` ,默认为 ``fluid.default_startup_program()``
+	- **pservers** (str) – 内容为Pserver列表的字符串，格式为：按逗号区分不同的Pserver，每个Pserver的格式为 *ip地址:端口号* 
+	- **trainers** (int|str) – 在Pserver模式下，该参数指Trainer机的个数；在nccl2模式下，它是一个内容为Trainer终端列表的字符串
+	- **sync_mode** (bool) – 是否做同步训练(synchronous training), 默认为True
+ 	- **startup_program** (Program|None) – 待transpile（转译）的startup_program，默认为 ``fluid.default_main_program()``
+	- **current_endpoint** (str) – 当需要把program转译（transpile）至NCCL2模式下时，需要将当前endpoint（终端）传入该参数。Pserver模式不使用该参数
+
+.. py:method:: get_trainer_program(wait_port=True)
+
+
+该方法可以得到Trainer侧的program。
+
+返回:	Trainer侧的program
+
+返回类型:	Program
+
+
+
+.. py:method:: get_pserver_program(endpoint)
+
+
+该方法可以得到Pserver（参数服务器）侧的程序
+ 
+参数:	
+	- **endpoint** (str) – 当前Pserver终端
+ 
+返回:	当前Pserver需要执行的program
+
+返回类型:	Program
+
+
+.. py:method:: get_pserver_programs(endpoint)
+
+
+该方法可以得到Pserver侧用于分布式训练的 ``main_program`` 和 ``startup_program`` 。
+
+参数:	
+	- **endpoint** (str) – 当前Pserver终端
+
+返回:	(main_program, startup_program), “Program”类型的元组
+
+返回类型:	tuple 
+ 
+ 
+.. py:method:: get_startup_program(endpoint, pserver_program=None, startup_program=None)
+
+
+**该函数已停止使用**
+获取当前Pserver的startup_program，如果有多个被分散到不同blocks的变量，则修改operator的输入变量。
+
+参数:	
+	- **endpoint** (str) – 当前Pserver终端
+	- **pserver_program** (Program) – 已停止使用。 先调用get_pserver_program
+ 	- **startup_program** (Program) – 已停止使用。应在初始化时传入startup_program
+
+返回:	Pserver侧的startup_program
+
+返回类型:	Program
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_DistributeTranspilerConfig:
+
+DistributeTranspilerConfig
+-------------------------------
+
+.. py:class:: paddle.fluid.DistributeTranspilerConfig
+
+
+.. py:attribute:: slice_var_up (bool)
+
+为多个Pserver（parameter server）将tensor切片, 默认为True。
+
+.. py:attribute:: split_method (PSDispatcher)
+
+可使用 RoundRobin 或者 HashName。
+
+注意: 尝试选择最佳方法来达到Pserver间负载均衡。
+
+.. py:attribute:: min_block_size (int)
+
+block中分割(split)出的元素个数的最小值。
+
+注意: 根据：`issuecomment-369912156 <https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-369912156>`_ , 当数据块大小超过2MB时，我们可以有效地使用带宽。如果你想更改它，请详细查看 ``slice_variable`` 函数。
+
+
+
+
+
+
+
+.. _cn_api_fluid_ExecutionStrategy:
+
+ExecutionStrategy
+-------------------------------
+
+.. py:class:: paddle.fluid.ExecutionStrategy
+
+``ExecutionStrategy`` 允许用户更加精准地控制program在 ``ParallelExecutor`` 中的运行方式。可以通过在 ``ParallelExecutor`` 中设置本成员来实现。
+
+**代码示例**
+
+..  code-block:: python
+
+  exec_strategy = fluid.ExecutionStrategy()
+  exec_strategy.num_threads = 4
+
+  train_exe = fluid.ParallelExecutor(use_cuda=True,
+                                     loss_name=loss.name,
+                                     exec_strategy=exec_strategy)
+
+  train_loss, = train_exe.run([loss.name], feed=feed_dict)
+
+
+
+.. py:attribute:: allow_op_delay
+   
+这是一个bool类型成员，表示是否推迟communication operators(交流运算)的执行，这样做会使整体执行过程更快一些。但是在一些模型中，allow_op_delay会导致程序中断。默认为False。
+  
+
+
+.. py:attribute:: num_iteration_per_drop_scope
+  
+int型成员。它表明了清空执行时产生的临时变量需要的程序执行重复次数。因为临时变量的形状可能在两次重复过程中保持一致，所以它会使整体执行过程更快。默认值为100。
+
+.. note::
+  1. 如果在调用 ``run`` 方法时获取结果数据，``ParallelExecutor`` 会在当前程序重复执行尾部清空临时变量
+  
+  2. 在一些NLP模型里，该成员会致使GPU内存不足。此时，你应减少 ``num_iteration_per_drop_scope`` 的值
+
+
+
+.. py:attribute:: num_threads
+
+int型成员。它代表了线程池(thread pool)的大小。这些线程会被用来执行当前 ``ParallelExecutor`` 的program中的operator（算子，运算）。如果 :math:`num\_threads=1` ，则所有的operator将一个接一个地执行，但在不同的程序重复周期(iterations)中执行顺序可能不同。如果该成员没有被设置，则在 ``ParallelExecutor`` 中，它会依据设备类型(device type)、设备数目(device count)而设置为相应值。对GPU，:math:`num\_threads=device\_count∗4` ；对CPU， :math:`num\_threads=CPU\_NUM∗4` 。在 ``ParallelExecutor`` 中有关于 :math:`CPU\_NUM` 的详细解释。如果没有设置 :math:`CPU\_NUM` ， ``ParallelExecutor`` 可以通过调用 ``multiprocessing.cpu_count()`` 获取CPU数目(cpu count)。默认值为0。
+
+
+
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_executor:
+
+Executor
+-------------------------------
+
+
+.. py:class:: paddle.fluid.Executor (place)
+
+
+
+
+执行引擎（Executor）使用python脚本驱动，支持在单/多GPU、单/多CPU环境下运行。
+Python Executor可以接收传入的program,并根据feed map(输入映射表)和fetch_list(结果获取表)
+向program中添加feed operators(数据输入算子)和fetch operators（结果获取算子)。
+feed map为该program提供输入数据。fetch_list提供program训练结束后用户预期的变量（或识别类场景中的命名）。
+
+应注意，执行器会执行program中的所有算子而不仅仅是依赖于fetch_list的那部分。
+
+Executor将全局变量存储到全局作用域中，并为临时变量创建局部作用域。
+当每一mini-batch上的前向/反向运算完成后，局部作用域的内容将被废弃，
+但全局作用域中的变量将在Executor的不同执行过程中一直存在。
+
+
+**示例代码**
+
+.. code-block:: python
+
+    # 新建一个执行引擎Executor名为exe。 
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    # 仅运行一次startup program.
+    # 不需要优化/编译这个startup program. 
+    exe.run(fluid.default_startup_program())
+
+    # 无需编译，直接运行main program
+    loss, = exe.run(fluid.default_main_program(),
+                        feed=feed_dict,
+                        fetch_list=[loss.name])
+
+    # 另一种方法是，编译这个main program然后运行. 参考CompiledProgram 
+    compiled_prog = compiler.CompiledProgram(
+            fluid.default_main_program()).with_data_parallel(
+            loss_name=loss.name)
+    loss, = exe.run(compiled_prog,
+                        feed=feed_dict,
+                        fetch_list=[loss.name])
+
+
+参数:	
+    - **place** (core.CPUPlace|core.CUDAPlace(n)) – 指明了 ``Executor`` 的执行场所
+
+
+
+.. py:method:: close()
+
+
+关闭这个执行器(Executor)。
+
+调用这个方法后不可以再使用这个执行器。 对于分布式训练, 该函数会释放在PServers上和目前Trainer有关联的资源。
+   
+**示例代码**
+
+..  code-block:: python
+    
+    cpu = core.CPUPlace()
+    exe = Executor(cpu)
+    ...
+    exe.close()
+
+
+.. py:method:: run(program=None, feed=None, fetch_list=None, feed_var_name='feed', fetch_var_name='fetch', scope=None, return_numpy=True,use_program_cache=False)
+
+
+调用该执行器对象的此方法可以执行program。通过feed map提供待学习数据，以及借助fetch_list得到相应的结果。
+Python执行器(Executor)可以接收传入的program,并根据输入映射表(feed map)和结果获取表(fetch_list)
+向program中添加数据输入算子(feed operators)和结果获取算子（fetch operators)。
+feed map为该program提供输入数据。fetch_list提供program训练结束后用户预期的变量（或识别类场景中的命名）。
+
+应注意，执行器会执行program中的所有算子而不仅仅是依赖于fetch_list的那部分。
+
+参数：  
+	- **program** (Program|CompiledProgram) – 需要执行的program,如果没有给定那么默认使用default_main_program (未编译的)
+	- **feed** (dict) – 前向输入的变量，数据,词典dict类型, 例如 {“image”: ImageData, “label”: LabelData}
+	- **fetch_list** (list) – 用户想得到的变量或者命名的列表, 该方法会根据这个列表给出结果
+	- **feed_var_name** (str) – 前向算子(feed operator)变量的名称
+	- **fetch_var_name** (str) – 结果获取算子(fetch operator)的输出变量名称
+	- **scope** (Scope) – 执行这个program的域，用户可以指定不同的域。缺省为全局域
+	- **return_numpy** (bool) – 如果为True,则将结果张量（fetched tensor）转化为numpy
+	- **use_program_cache** (bool) – 是否跨批使用缓存程序设置。设置为True时，只有当（1）程序没有用数据并行编译，并且（2）program、 feed变量名和fetch_list变量名与上一步相比没有更改时，运行速度才会更快。
+	
+返回:	根据fetch_list来获取结果
+
+返回类型:	list(numpy.array)
+
+
+**示例代码**
+
+..  code-block:: python
+
+
+	data = fluid.layers.data(name='X', shape=[1], dtype='float32')
+	hidden = fluid.layers.fc(input=data, size=10)
+	layers.assign(hidden, out)
+	loss = fluid.layers.mean(out)
+	adam = fluid.optimizer.Adam()
+	adam.minimize(loss)
+
+
+..  code-block:: python
+	
+	
+	cpu = core.CPUPlace()
+	exe = Executor(cpu)
+	exe.run(default_startup_program())
+	
+..  code-block:: python
+	
+	x = numpy.random.random(size=(10, 1)).astype('float32')
+	outs = exe.run(
+		feed={'X': x},
+		fetch_list=[loss.name])
+	
+
+
+
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_global_scope:
+
+global_scope
+-------------------------------
+
+.. py:function:: paddle.fluid.global_scope()
+
+
+获取全局/默认作用域实例。很多api使用默认 ``global_scope`` ，例如 ``Executor.run`` 。
+
+返回：全局/默认作用域实例
+
+返回类型：Scope
+
+
+
+
+
+.. _cn_api_fluid_in_dygraph_mode:
+
+in_dygraph_mode
+-------------------------------
+
+.. py:function:: paddle.fluid.in_dygraph_mode()
+
+返回：bool，如果Program是在动态图模式下运行的则为True。
+
+
+
+
+
+
+.. _cn_api_fluid_LoDTensor:
+
+LoDTensor
+-------------------------------
+
+.. py:class:: paddle.fluid.LoDTensor
+
+
+LoDTensor是一个具有LoD信息的张量(Tensor)
+
+``np.array(lod_tensor)`` 可以将LoDTensor转换为numpy array。 
+
+``lod_tensor.lod()`` 可以获得LoD信息。
+
+LoD是多层序列（Level of Details）的缩写，通常用于不同长度的序列。如果您不需要了解LoD信息，可以跳过下面的注解。
+
+举例:
+
+X 为 LoDTensor，它包含两个序列。第一个长度是2，第二个长度是3。
+
+从Lod中可以计算出X的第一维度为5， 因为5=2+3， 说明X中有5个序列。在X中的每个序列中的每个元素有2列，因此X的shape为[5,2]。
+
+::
+
+	x.lod  =  [[2, 3]] 
+	x.data = [[1, 2], [3, 4], // seq 1
+
+		  [5, 6], [7, 8], [9, 10]] // seq 2
+
+	x.shape = [5, 2]
+
+
+LoD可以有多个level(例如，一个段落可以有多个句子，一个句子可以有多个单词)。下面的例子中，Y为LoDTensor ，lod_level为2。表示有2个序列，第一个序列的长度是2(有2个子序列)，第二个序列的长度是1。第一序列的两个子序列长度分别为2和2。第二个序列的子序列的长度是3。
+
+
+::
+
+	y.lod = [[2 1], [2 2 3]] y.shape = [2+2+3, ...]
+
+
+.. note::
+
+	在上面的描述中，LoD是基于长度的。在paddle内部实现中，lod是基于偏移的。因此,在内部,y.lod表示为[[0,2,3]，[0,2,4,7]](基于长度的Lod表示为为[[2-0,3-2]，[2-0,4-2,7-4]])。
+
+	可以将LoD理解为recursive_sequence_length（递归序列长度）。此时，LoD必须是基于长度的。由于历史原因。当LoD在API中被称为lod时，它可能是基于偏移的。用户应该注意。
+
+
+
+
+.. py:method::	has_valid_recursive_sequence_lengths(self: paddle.fluid.core.LoDTensor) → bool
+
+检查LoDTensor的lod值的正确性。
+
+返回:    是否带有正确的lod值
+
+返回类型:    out (bool)
+
+.. py:method::	lod(self: paddle.fluid.core.LoDTensor) → List[List[int]]
+
+得到LoD Tensor的LoD。 
+
+返回：LoD Tensor的LoD。 
+
+返回类型：out（List [List [int]]）
+
+
+.. py:method::	recursive_sequence_lengths(self: paddle.fluid.core.LoDTensor) → List[List[int]]
+
+得到与LoD对应的LoDTensor的序列长度。
+
+返回：LoD对应的一至多个序列长度。
+
+返回类型：out（List [List [int]）
+
+
+
+.. py:method::	set_lod(self: paddle.fluid.core.LoDTensor, lod: List[List[int]]) → None
+
+设置LoDTensor的LoD。
+
+参数：
+- **lod** （List [List [int]]） - 要设置的lod。
+
+.. py:method::	set_recursive_sequence_lengths(self: paddle.fluid.core.LoDTensor, recursive_sequence_lengths: List[List[int]]) → None
+
+根据递归序列长度recursive_sequence_lengths设置LoDTensor的LoD。
+
+::
+
+   例如，如果recursive_sequence_lengths = [[2,3]]，
+   意味着有两个长度分别为2和3的序列，相应的lod将是[[0,2,2 + 3]]，即[[0， 2,5]]。
+
+参数：
+- **recursive_sequence_lengths** （List [List [int]]） - 序列长度。
+
+
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_LoDTensorArray:
+
+LoDTensorArray
+-------------------------------
+
+.. py:class:: paddle.fluid.LoDTensorArray
+
+.. py:method:: append(self: paddle.fluid.core.LoDTensorArray, tensor: paddle.fluid.core.LoDTensor) → None
+
+将LoDTensor追加到LoDTensorArray后。
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_memory_optimize:
+
+memory_optimize
+-------------------------------
+
+.. py:function:: paddle.fluid.memory_optimize(input_program, skip_opt_set=None, print_log=False, level=0, skip_grads=False)
+
+
+通过重用var内存来优化内存。
+
+.. note::
+    它不支持block中嵌套子block。
+
+参数:
+	- **input_program** (str) – 输入Program。
+	- **skip_opt_set** (set) – set中的vars将不被内存优化。
+	- **print_log** (bool) – 是否打印debug日志。
+	- **level** (int)  如果 level=0 并且shape是完全相等，则重用。
+	
+返回: None
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_name_scope:
+
+name_scope
+-------------------------------
+
+.. py:function:: paddle.fluid.name_scope(prefix=None)
+
+
+为operators生成层次名称前缀
+
+注意： 这个函数只能用于调试和可视化。不要将其用于分析，比如graph/program转换。
+
+参数： 
+	- **prefix** (str) - 前缀
+
+**示例代码**
+
+.. code-block:: python
+          
+    with name_scope("encoder"):
+        ...
+    with name_scope("decoder"):
+        ...
+    with name_scope("attention"):
+        ...
+
+
+
+
+
+
+
+.. _cn_api_fluid_ParallelExecutor:
+
+ParallelExecutor
+-------------------------------
+
+.. py:class:: paddle.fluid.ParallelExecutor(use_cuda, loss_name=None, main_program=None, share_vars_from=None, exec_strategy=None, build_strategy=None, num_trainers=1, trainer_id=0, scope=None)
+
+
+
+
+``ParallelExecutor`` 专门设计用来实现数据并行计算，着力于向不同结点(node)分配数据，并行地在不同结点中对数据进行操作。如果在GPU上使用该类运行程序，node则用来指代GPU， ``ParallelExecutor`` 也将自动获取在当前机器上可用的GPU资源。如果在CPU上进行操作，node则指代CPU，同时你也可以通过添加环境变量 ``CPU_NUM`` 来设置CPU设备的个数。例如，``CPU_NUM=4``。但是如果没有设置该环境变量，该类会调用 ``multiprocessing.cpu_count`` 来获取当前系统中CPU的个数。
+
+
+
+
+参数: 
+    - **use_cuda** (bool) – 是否使用CUDA
+    - **loss_name** (str) – 在训练阶段，必须提供loss function名称。默认为None
+    - **main_program** (Program) – 需要执行的program。如果未提供， 那么将使用 ``default_main_program``。 默认为None
+    - **share_vars_from** (ParallelExecutor) – 如果提供了该参数， 则该 ``ParallelExecutor`` 与指定的 ``ParallelExecutor`` 共享变量。默          认为空
+    - **exec_strategy** (ExecutionStrategy) – ``exec_strategy`` 用于调控program在 ``ParallelExecutor`` 中的执行方式，例如，执行该program需要的线程数, 释放在执行过程中产生的临时变量需要的重复(iterations)次数。 请参考 ``fluid.ExecutionStrategy`` 获取详细介绍。该参数默认为 None
+    - **build_strategy** (BuildStrategy) – 设置成员 ``build_strategy`` 可以控制在 ``ParallelExecutor`` 中搭建SSA Graph的方式，例如， ``reduce_strategy`` ， ``gradient_scale_strategy`` 。 请参考 ``fluid.BuildStrategy`` 获取详细介绍。 该参数默认为None
+    - **num_trainers** (int) – 如果该值大于1， NCCL将会通过多层级node的方式来初始化。每个node应有相同的GPU数目。 随之会启用分布式训练。该参数默认为1
+    - **trainer_id** (int) – 必须与 ``num_trainers`` 参数同时使用。``trainer_id`` 是当前所在node的 “rank”（层级），从0开始计数。该参数默认为0
+    - **scope** (Scope) – 指定执行program所在的作用域， 默认使用 ``fluid.global_scope()``
+
+返回：初始化后的 ``ParallelExecutor`` 对象
+
+返回类型:	ParallelExecutor
+
+抛出异常：``TypeError`` - 如果提供的参数 ``share_vars_from`` 不是 ``ParallelExecutor`` 类型的，将会弹出此异常
+
+**代码示例**
+
+..  code-block:: python
+
+  train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=loss.name)
+  test_exe = fluid.ParallelExecutor(use_cuda=True,
+                                    main_program=test_program,
+                                    share_vars_from=train_exe)
+
+  train_loss, = train_exe.run([loss.name], feed=feed_dict)
+  test_loss, = test_exe.run([loss.name], feed=feed_dict)
+
+
+
+.. py:method::  run(fetch_list, feed=None, feed_dict=None, return_numpy=True)
+
+使用 ``fetch_list`` 执行一个 ``ParallelExecutor`` 对象。
+
+参数 ``feed`` 可以是 ``dict`` 或者 ``list`` 类型变量。如果该参数是 ``dict`` 类型，feed中的数据将会被分割(split)并分送给多个设备（CPU/GPU）。
+反之，如果它是 ``list`` ，则列表中的各个元素都会直接分别被拷贝到各设备中。
+
+例如，如果 ``feed`` 是个 ``dict`` 类型变量，则有
+
+..  code-block:: python
+    
+    exe = ParallelExecutor()
+    # 图像会被split到设备中。假设有两个设备，那么每个设备将会处理形为 (24, 1, 28, 28)的图像
+    exe.run(feed={'image': numpy.random.random(size=(48, 1, 28, 28))})
+  
+如果 ``feed`` 是个 ``list`` 类型变量，则有
+
+..  code-block:: python
+
+    exe = ParallelExecutor()
+    # 各设备挨个处理列表中的每个元素
+    # 第一个设备处理形为 (48, 1, 28, 28) 的图像
+    # 第二个设备处理形为 (32, 1, 28, 28) 的图像
+    #
+    # 使用 exe.device_count 得到设备数目
+    exe.run(feed=[{"image": numpy.random.random(size=(48, 1, 28, 28))},
+                  {"image": numpy.random.random(size=(32, 1, 28, 28))},
+                  ])
+
+参数： 
+    - **fetch_list** (list) – 获取的变量名列表
+    - **feed** (list|dict|None) – feed变量。 如果该参数是 ``dict`` 类型，feed中的数据将会被分割(split)并分送给多个设备（CPU/GPU）。反之，如果它是 ``list`` ，则列表中的各个元素都直接分别被拷贝到各设备中。默认为None
+    - **feed_dict** – 该参数已经停止使用。feed参数的别名, 为向后兼容而立。默认为None
+    - **return_numpy** (bool) – 是否将fetched tensor转换为numpy。默认为True
+
+返回： 获取的结果列表
+
+返回类型：List
+
+抛出异常: 
+     - ``ValueError`` - 如果feed参数是list类型，但是它的长度不等于可用设备（执行场所）的数目，再或者给定的feed不是dict类型，抛出此异常
+     - ``TypeError`` - 如果feed参数是list类型，但是它里面的元素不是dict类型时，弹出此异常
+
+.. note::
+     1. 如果feed参数为dict类型，那么传入 ``ParallelExecutor`` 的数据量 *必须* 大于可用的CPU核数或GPU卡数。否则，C++端将会抛出异常。应额外注意核对数据集的最后一个batch是否比可用的CPU核数或GPU卡数大。
+     2. 如果可用的CPU核数或GPU卡数大于一个，则为每个变量最后获取的结果都是list类型，且这个list中的每个元素都是各CPU核或GPU卡上的变量
+
+**代码示例**
+
+..  code-block:: python
+
+        pe = fluid.ParallelExecutor(use_cuda=use_cuda,
+                                    loss_name=avg_cost.name,
+                                    main_program=fluid.default_main_program())
+        loss = pe.run(feed=feeder.feed(cur_batch),
+                      fetch_list=[avg_cost.name]))
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_ParamAttr:
+
+ 
+ParamAttr
+-------------------------------
+
+
+.. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)
+
+该类代表了参数的各种属性。 为了使神经网络训练过程更加流畅，用户可以根据需要调整参数属性。比如learning rate（学习率）, regularization（正则化）, trainable（可训练性）, do_model_average(平均化模型)和参数初始化方法.
+
+参数:	
+    - **name** (str) – 参数名。默认为None。
+    - **initializer** (Initializer) – 初始化该参数的方法。 默认为None
+    - **learning_rate** (float) – 参数的学习率。计算方法为 :math:`global\_lr*parameter\_lr∗scheduler\_factor` 。 默认为1.0
+    - **regularizer** (WeightDecayRegularizer) – 正则因子. 默认为None
+    - **trainable** (bool) – 该参数是否可训练。默认为True
+    - **gradient_clip** (BaseGradientClipAttr) – 减少参数梯度的方法。默认为None
+    - **do_model_average** (bool) – 该参数是否服从模型平均值。默认为False
+    
+**代码示例**
+
+..  code-block:: python
+
+   w_param_attrs = fluid.ParamAttr(name="fc_weight",
+                                   learning_rate=0.5,
+                                   regularizer=fluid.L2Decay(1.0),
+                                   trainable=True)
+   y_predict = fluid.layers.fc(input=x, size=10, param_attr=w_param_attrs)
+
+
+
+
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_Program:
+
+Program
+-------------------------------
+
+.. py:class::  paddle.fluid.Program
+
+
+创建python program， 在paddleFluid内部会被转换为ProgramDesc描述语言，用来创建一段 c++ 程序。Program像容器一样，是一种自包含的程序语言。Program中包括至少一个块（Block），当 block 中存在条件选择的控制流op（例如 while_op）时，该Program将会含有嵌套块（nested block）。详情请参阅framework.proto。
+
+注意：默认情况下，paddleFluid内部默认含有 ``default_startup_program`` 和 ``default_main_program`` ，它们将共享参数。 ``default_startup_program`` 只运行一次来初始化参数， ``default_main_program`` 在每个mini batch中运行并调整权重。
+
+返回： empty program
+
+**代码示例**
+
+..  code-block:: python
+
+  main_program = fluid.Program()
+  startup_program = fluid.Program()
+  with fluid.program_guard(main_program=main_program, startup_program=startup_program):
+        fluid.layers.data(name="x", shape=[-1, 784], dtype='float32')
+        fluid.layers.data(name="y", shape=[-1, 1], dtype='int32')
+        fluid.layers.fc(name="fc", shape=[10], dtype='float32', act="relu")
+
+
+
+.. py:attribute:: op_role
+
+operator的角色，值只能是枚举变量{Forward, Backward, Optimize}。
+
+注意：这是一个底层API。它仅用于 ``ParallelExecutor`` 复制或调度operator到设备。
+
+例如，Forward operator应该在每个设备上执行。Backward operator在每个设备上执行，并将后向传播的参数梯度(使用 ``op_role_var`` 获得该变量)合并到一个设备上。Optimize operator只在一个设备上执行，并向其他设备广播新的参数，
+
+
+
+.. py:attribute:: set_op_role
+
+operator的角色，值只能是枚举变量{Forward, Backward, Optimize}。
+
+注意：这是一个底层API。它仅用于 ``ParallelExecutor`` 复制或调度operator到设备上执行。
+
+例如，Forward operator应该在每个设备上执行。Backward operato应该在每个设备上执行，并将后向传播的参数梯度(使用op_role_var获得该变量)合并到一个设备上。Optimize operator只在一个设备上执行，并向其他设备广播新的参数
+
+
+
+.. py:attribute:: op_role_var
+
+``op_role`` 的辅助变量。
+
+参考: ``Program.op_role`` 文档。
+
+注意:这是一个底层API，用户不应该直接使用它。
+
+
+
+.. py:attribute:: set_op_role_var
+
+``op_role`` 的辅助变量。
+
+参考: ``Program.op_role`` 文档。
+
+注意:这是一个底层API。用户不应该直接使用它。
+
+
+
+.. py:method:: to_string(throw_on_error, with_details=False)
+
+用于debug
+
+参数：  
+	- **throw_on_error** (bool): 没有设置任何必需的字段时，抛出值错误。
+	- **with_details** (bool): 值为true时，打印更多关于变量和参数的信息，如trainable, optimize_attr等
+
+返回：(str): debug 字符串
+
+返回类型： str
+
+抛出异常： 
+ - ``ValueError`` - 当 ``throw_on_error == true`` ，但没有设置任何必需的字段时，抛出 ``ValueError`` 。
+
+
+
+.. py:method:: clone(for_test=False)
+
+创建一个新的、相同的Program。
+
+有些operator，在训练和测试之间的行为是不同的，比如batch_norm。它们有一个属性is_test来控制行为。当for_test=True时，此方法将把它们的is_test属性更改为True。
+
+- 克隆Program，该Program用于训练时，将 ``for_test`` 设置为False。
+- 克隆Program，该Program用于测试时，将 ``for_test`` 设置为True。
+
+注意:此API不会删除任何操作符。请在backward和optimization之前使用clone(for_test=True)。
+
+**代码示例**
+
+..  code-block:: python
+
+  test_program = fluid.default_main_program().clone(for_test=True)
+  optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
+  optimizer.minimize()
+
+参数：
+	- **for_test** (bool) – 取值为True时，clone方法内部会把operator的属性 ``is_test`` 设置为 True
+
+返回：一个新的、相同的Program
+
+返回类型:Program
+
+**代码示例**
+
+1.克隆一个Program，示例代码如下：
+
+..  code-block:: python
+
+  train_program = fluid.Program()
+  startup_program = fluid.Program()
+  with fluid.program_guard(train_program, startup_program):
+        img = fluid.layers.data(name='image', shape=[784])
+        hidden = fluid.layers.fc(input=img, size=200, act='relu')
+        hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
+        loss = fluid.layers.cross_entropy(
+                     input=fluid.layers.fc(hidden, size=10, act='softmax'),
+                     label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
+  test_program = train_program.clone(for_test=True)
+  sgd = fluid.optimizer.SGD(learning_rate=1e-3)
+  with fluid.program_guard(train_program, startup_program):
+        sgd.minimize(loss)    
+	
+2.如果分别运行 train Program 和 test Program，则可以不使用clone。
+
+..  code-block:: python
+
+	import paddle.fluid as fluid
+
+ 	def network(is_test):
+	     img = fluid.layers.data(name='image', shape=[784])
+	     hidden = fluid.layers.fc(input=img, size=200, act='relu')
+	     hidden = fluid.layers.dropout(hidden, dropout_prob=0.5, is_test=is_test)
+	     loss = fluid.layers.cross_entropy(
+			 input=fluid.layers.fc(hidden, size=10, act='softmax'),
+			 label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
+	     return loss
+
+	 train_program = fluid.Program()
+	 startup_program = fluid.Program()
+	 test_program = fluid.Program()
+
+	 with fluid.program_guard(train_program, startup_program):
+	     with fluid.unique_name.guard():
+		 loss = network(is_test=False)
+		 sgd = fluid.optimizer.SGD(learning_rate=1e-3)
+		 sgd.minimize(loss)
+
+	 # 不使用测试阶段的startup program
+	 with fluid.program_guard(test_program, fluid.Program()):
+	     with fluid.unique_name.guard():
+		 loss = network(is_test=True)
+
+上边两个代码片段生成的Program是一样的。
+
+.. py:staticmethod:: parse_from_string(binary_str)
+
+反序列化protobuf，转换成program
+
+注意:在序列化和反序列化之后，所有关于参数的信息都会丢失。
+
+参数:	
+    - **binary_str_type** (str) – prootbuf二进制字符串
+
+返回:	反序列化后的ProgramDesc
+
+返回类型：Program
+
+.. py:attribute:: num_blocks
+
+该program中的block的个数
+
+.. py:attribute:: random_seed
+
+
+程序中随机运算符的默认随机种子。0意味着从随机设备中获取随机种子。
+
+注意：必须在operator被添加之前设置。
+
+.. py:method:: global_block()
+
+获取该program的第一个block。
+
+.. py:method:: block(index)
+
+返回该program中 ， ``index`` 指定的block。 ``index`` 类型为int
+
+返回：index对应的block
+
+返回类型：Block
+
+.. py:method:: current_block()
+
+获取当前block。当前block是用来添加operators。
+
+.. py:method:: list_vars()
+
+获取当前program中所有变量。返回值是一个可迭代对象（iterable object)。
+
+返回：generator 会yield每个Program中的变量
+
+返回类型：iterable
+	
+
+
+
+
+
+
+
+.. _cn_api_fluid_program_guard:
+
+program_guard
+-------------------------------
+
+.. py:function::    paddle.fluid.program_guard(main_program, startup_program=None)
+
+
+
+该函数应配合使用python的“with”语句来改变全局主程序(main program)和启动程序(startup program)。
+
+“with”语句块中的layer函数将在新的main program（主程序）中添加operators（算子）和variables（变量）。
+
+**代码示例**
+
+..  code-block:: python
+
+	import paddle.fluid as fluid
+	main_program = fluid.Program()
+	startup_program = fluid.Program()
+	with fluid.program_guard(main_program, startup_program):
+		data = fluid.layers.data(...)
+ 		hidden = fluid.layers.fc(...)
+
+需要注意的是，如果用户不需要构建自己的启动程序或者主程序，一个临时的program将会发挥作用。
+
+**代码示例**
+
+..  code-block:: python
+
+	import paddle.fluid as fluid
+	main_program = fluid.Program()
+	# 如果您不需要关心startup program,传入一个临时值即可
+	with fluid.program_guard(main_program, fluid.Program()):
+		data = ...
+
+
+参数：  
+		- **main_program** (Program) – “with”语句中将使用的新的main program。
+		- **startup_program** (Program) – “with”语句中将使用的新的startup program。若传入 ``None`` 则不改变当前的启动程序。
+
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_release_memory:
+
+release_memory
+-------------------------------
+
+.. py:function:: paddle.fluid.release_memory(input_program, skip_opt_set=None) 
+
+
+该函数可以调整输入program，插入 ``delete_op`` 删除算子，提前删除不需要的变量。
+改动是在变量本身上进行的。
+
+**提醒**: 该API还在试验阶段，会在后期版本中删除。不建议用户使用。
+
+参数:	
+    - **input_program** (Program) – 在此program中插入 ``delete_op`` 
+    - **skip_opt_set** (set) – 在内存优化时跳过的变量的集合
+
+返回: None
+
+
+
+.. _cn_api_fluid_scope_guard:
+
+scope_guard
+-------------------------------
+
+.. py:function:: paddle.fluid.scope_guard(scope)
+
+
+修改全局/默认作用域（scope）,  运行时中的所有变量都将分配给新的scope。
+
+参数：
+	- **scope** - 新的全局/默认 scope。
+
+**代码示例**
+
+..  code-block:: python
+
+	import paddle.fluid as fluid
+	
+	new_scope = fluid.Scope()
+	with fluid.scope_guard(new_scope):
+		...
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_Tensor:
+
+Tensor
+-------------------------------
+
+.. py:function:: paddle.fluid.Tensor
+
+    ``LoDTensor`` 的别名
+
+
+
+
+
+
+
+
+
+.. _cn_api_fluid_WeightNormParamAttr:
+
+WeightNormParamAttr
+-------------------------------
+
+.. py:class:: paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)
+
+
+权重归一化。权重归一化是将权重向量的长度与其方向解耦。`Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks <https://arxiv.org/pdf/1602.07868.pdf>`_ 这篇paper中讨论了权重归一化的实现
+
+参数:
+	- **dim** (list) - 参数的名称。默认None。
+	- **name** (str) - 参数的名称。默认None。
+	- **initializer** （initializer) - 初始化参数的方法。默认None。
+	- **learning_rate** (float) - 学习率。优化时学习速率 :math:`global\_lr∗parameter\_lr∗scheduler\_factor` 。默认1.0。
+	- **regularizer** (WeightDecayRegularizer) - 正则化因子。默认None。
+	- **trainable** (bool) - 参数是否可训练。默认True。
+	- **gradient_clip** (BaseGradientClipAttr) - 梯度下降裁剪（Gradient Clipping）的方法。默认None。
+	- **do_model_average** (bool) - 参数是否应该model average。默认False。
+
+返回： empty program
+
+**代码示例**
+
+..  code-block:: python
+
+	data = fluid.layers.data(name="data", shape=[3, 32, 32], dtype="float32")
+	fc = fluid.layers.fc(input=data,
+			     size=1000,
+			     param_attr=WeightNormParamAttr(
+				  dim=None,
+				  name='weight_norm_param'))
+
+
+
+
+
+
+
+
--- a/doc/fluid/api_cn/index_cn.rst
+++ b/doc/fluid/api_cn/index_cn.rst
@@ -5,12 +5,13 @@ API
 ..  toctree::
    :maxdepth: 1

-    ../api_guides/index.rst
+    ../api_guides/index_cn.rst
    fluid_cn.rst
    average_cn.rst
    backward_cn.rst
    clip_cn.rst
    data_feeder_cn.rst
+    dataset_cn.rst
    executor_cn.rst
    initializer_cn.rst
    io_cn.rst
@@ -21,6 +22,5 @@ API
    profiler_cn.rst
    regularizer_cn.rst
    transpiler_cn.rst
-    dataset_cn.rst
    data/dataset_cn.rst
    data/data_reader_cn.rst
--- a/doc/fluid/api_cn/io_cn.rst
+++ b/doc/fluid/api_cn/io_cn.rst
@@ -289,8 +289,6 @@ PyReader
  - **reader** (generator)  – 返回LoDTensor类型的批处理数据的Python生成器
  - **places** (None|list(CUDAPlace)|list(CPUPlace)) –  位置列表。当PyReader可迭代时必须被提供

-
-
 .. _cn_api_fluid_io_save_inference_model:

 save_inference_model
@@ -313,7 +311,9 @@ save_inference_model
  - **params_filename** (str|None) – 保存所有相关参数的文件名称。如果设置为None，则参数将保存在单独的文件中。
  - **export_for_deployment** (bool) – 如果为真，Program将被修改为只支持直接预测部署的Program。否则，将存储更多的信息，方便优化和再训练。目前只支持True。

-返回: None
+返回: 获取的变量名列表
+
+返回类型：target_var_name_list(list)

 抛出异常：
 - ``ValueError`` – 如果 ``feed_var_names`` 不是字符串列表
@@ -406,8 +406,9 @@ save_persistables
    exe = fluid.Executor(fluid.CPUPlace())
    param_path = "./my_paddle_model"
    prog = fluid.default_main_program()
+    # `prog` 可以是由用户自定义的program
    fluid.io.save_persistables(executor=exe, dirname=param_path,
-                               main_program=None)
+                               main_program=prog)
    
    


--- a/doc/fluid/api_cn/layers_cn.rst
+++ b/doc/fluid/api_cn/layers_cn.rst
@@ -4,7 +4,7 @@ fluid.layers


 ============
-control_flow 
+control_flow
 ============


@@ -17,7 +17,7 @@ array_length

 **得到输入LoDTensorArray的长度**

-此功能用于查找输入数组LOD_TENSOR_ARRAY的长度。  
+此功能用于查找输入数组LOD_TENSOR_ARRAY的长度。

 相关API:
    - :ref:`cn_api_fluid_layers_array_read`
@@ -92,7 +92,7 @@ array_read


 .. _cn_api_fluid_layers_array_write:
-    
+
 array_write
 -------------------------------

@@ -106,8 +106,8 @@ array_write
    - **x** (Variable|list) – 待从中读取数据的输入张量(tensor)
    - **i** (Variable|list) – 输出结果 ``LOD_TENSOR_ARRAY`` 的下标, 该下标指向输入张量 ``x`` 写入输出数组的位置
    - **array** (Variable|list) – 会被输入张量 ``x`` 写入的输出结果 ``LOD_TENSOR_ARRAY`` 。如果该项值为None， 一个新的 ``LOD_TENSOR_ARRAY`` 将会被创建并作为结果返回
- 
-返回:	输入张量 ``x`` 所写入的输出结果 ``LOD_TENSOR_ARRAY``  
+
+返回:	输入张量 ``x`` 所写入的输出结果 ``LOD_TENSOR_ARRAY``

 返回类型:	变量（Variable）

@@ -139,7 +139,7 @@ create_array

 创建LoDTensorArray数组。它主要用于实现RNN与array_write, array_read和While。

-参数: 
+参数:
    - **dtype** (int |float) — lod_tensor_array中存储元素的数据类型。

 返回: lod_tensor_array， 元素数据类型为dtype。
@@ -150,10 +150,10 @@ create_array
 **代码示例**

 ..  code-block:: python
-  
+
  data = fluid.layers.create_array(dtype='float32')
-  
-  
+
+



@@ -205,14 +205,14 @@ memory用于缓存分段数据。memory的初始值可以是零，也可以是
 .. note::
    目前不支持在DynamicRNN中任何层上配置 is_sparse = True

-.. py:method:: step_input(x)
+.. py:method:: step_input(x, level=0)

    将序列标记为动态RNN输入。

 参数:
-    	- **x** (Variable) - 输入序列	
-	
-    	
+    	- **x** (Variable) - 输入序列
+      - **level** (int) - 用于拆分步骤的LOD层级，默认值0
+
 返回:当前的输入序列中的timestep。

 .. py:method:: static_input(x)
@@ -231,13 +231,13 @@ memory用于缓存分段数据。memory的初始值可以是零，也可以是
 .. py:method:: memory(init=None, shape=None, value=0.0, need_reorder=False, dtype='float32')

 为动态rnn创建一个memory 变量。
-    
+
 如果 ``init`` 不是None， ``memory`` 将由这个变量初始化。参数 ``need_reorder`` 用于将memory重新排序作为输入变量。当memory初始化依赖于输入样本时，应该将其设置为true。

 **例如**

 ..  code-block:: python
-  
+
  	import paddle.fluid as fluid
  	sentence = fluid.layers.data(
                 name='sentence', dtype='float32', shape=[32])
@@ -252,15 +252,15 @@ memory用于缓存分段数据。memory的初始值可以是零，也可以是
 			 input=[word, memory], size=10, act='tanh')
 	     drnn.update_memory(ex_mem=memory, new_mem=hidden)
 	     drnn.output(hidden)
-	   
+
 	rnn_output = drnn()



 否则，如果已经设置 ``shape`` 、 ``value`` 、 ``dtype`` ，memory将被 ``value`` 初始化
-  
+
 ..  code-block:: python
-  
+
 	import paddle.fluid as fluid

 	sentence = fluid.layers.data(
@@ -292,7 +292,7 @@ memory用于缓存分段数据。memory的初始值可以是零，也可以是
 将内存从 ``ex_mem`` 更新到 ``new_mem`` 。注意， ``ex_mem`` 和 ``new_mem`` 的 ``shape`` 和数据类型必须相同。

 参数：
-	- **ex_mem** （memory Variable）-  memory 变量（Variable） 
+	- **ex_mem** （memory Variable）-  memory 变量（Variable）
 	- **new_mem** （memory Variable）- RNN块中生成的平坦变量（plain  variable）

 返回：None
@@ -306,8 +306,8 @@ memory用于缓存分段数据。memory的初始值可以是零，也可以是
    - **\*outputs** - 输出变量。

 返回:None
- 
- 
+
+



@@ -331,11 +331,11 @@ equal
    - **y** (Variable)-equal的第二个操作数
    - **cond** (Variable|None)-输出变量（可选），用来存储equal的结果

-返回：张量类型的变量，存储equal的输出结果 
+返回：张量类型的变量，存储equal的输出结果

-返回类型：变量（Variable） 
+返回类型：变量（Variable）

-**代码示例**: 
+**代码示例**:

 .. code-block:: python

@@ -356,7 +356,7 @@ IfElse

 .. py:class:: paddle.fluid.layers.IfElse(cond, name=None)

-if-else控制流。  
+if-else控制流。

 参数：
    - **cond** (Variable)-用于比较的条件
@@ -393,13 +393,13 @@ if-else控制流。


 .. _cn_api_fluid_layers_increment:
-  
+
 increment
 -------------------------------
-  
+
 .. py:function:: paddle.fluid.layers.increment(x, value=1.0, in_place=True)

-   
+
 该函数为输入 ``x`` 增加 ``value`` 大小, ``value`` 即函数中待传入的参数。该函数默认直接在原变量 ``x`` 上进行运算。

 .. note::
@@ -417,13 +417,13 @@ increment
 **代码示例**

 ..  code-block:: python
-  
+
    data = fluid.layers.data(name='data', shape=[1], dtype='float32',
                         append_batch_size=False)
    data = fluid.layers.increment(x=data, value=3.0, in_place=True)
- 
- 
- 
+
+
+



@@ -485,7 +485,7 @@ less_than
    import paddle.fluid as fluid
    less = fluid.layers.less_than(x=label, y=limit)

-参数：  
+参数：
    - **x** (Variable) – ``less_than`` 运算的左操作数
    - **y** (Variable) – ``less_than`` 运算的右操作数
    - **force_cpu** (BOOLEAN) – 值True则强制将输出变量写入CPU内存中。否则，将其写入目前所在的运算设备上。默认为True
@@ -563,19 +563,19 @@ reorder_lod_tensor_by_rank


 ::
-	
+
  例如:
- 
+
  假设在 RankTable 中存储的序列索引为 [3,0,2,1]， X 将会被这样被重新排列：
  X 中的第四个序列（即索引为3的序列，后面以此类推）会变成排列后的batch中的第一个，紧接着就是原来batch中的第一个元素，第三个元素，和第二个元素。
-  简言之，若有原batch：X = [Seq0, Seq1, Seq2, Seq3] 且 RankTable 中的索引为 [3,0,2,1]，那么输出即为 Out = [Seq3, Seq0, Seq2, Seq1] ，它携带着新的LoD信息。	
+  简言之，若有原batch：X = [Seq0, Seq1, Seq2, Seq3] 且 RankTable 中的索引为 [3,0,2,1]，那么输出即为 Out = [Seq3, Seq0, Seq2, Seq1] ，它携带着新的LoD信息。
  如果 X 的LoD信息是空的，这表明 X 不是序列型数据。这和由多个定长为1的序列组成的batch是相同的情况。此时，该函数将对 X 中的切片（slice） 在第一轴(axis)上按 rank_table 里的规则加以排列。
  例如，现有 X = [Slice0, Slice1, Slice2, Slice3] ，并且它LoD信息为空，在 RankTable 索引为[3, 0, 2, 1]。则 Out = [Slice3, Slice0, Slice2, Slice1] ，并且不在其中追加LoD信息。

 注意，该operator对 ``X`` 进行的排序所依据的 ``LoDRankTable`` 不一定是在 ``X`` 的基础上得出来的。它可以由
 其他不同的序列batch得出，并由该operator依据这个 ``LoDRankTable`` 来对  ``X`` 排序。

-参数：   
+参数：
    - **x** (LoDTensor)-待根据提供的 ``RankTable`` 进行排序的LoD tensor
    - **rank_table** (LoDRankTable)- ``X`` 重新排序的依据规则表

@@ -619,7 +619,7 @@ StaticRNN



- 
+



@@ -636,7 +636,7 @@ Switch
 .. py:class:: paddle.fluid.layers.Switch (name=None)

 Switch类实现的功能十分类似if-elif-else。它可以在学习率调度器(learning rate scheduler)中调整学习率。
-:: 
+::
  语义上，
      1. switch控制流挨个检查cases
      2. 各个case的条件是一个布尔值(boolean)，它是一个标量(scalar)变量
@@ -646,7 +646,7 @@ Switch类实现的功能十分类似if-elif-else。它可以在学习率调度
 **代码示例**

 ..  code-block:: python
-    
+
    lr = fluid.layers.tensor.create_global_var(
        shape=[1],
        value=0.0,
@@ -663,12 +663,12 @@ Switch类实现的功能十分类似if-elif-else。它可以在学习率调度
            fluid.layers.tensor.assign(input=one_var, output=lr)
        with switch.default():
            fluid.layers.tensor.assign(input=two_var, output=lr)
- 
+
 .. py:method:: case(condition)

 为该condition（情况，条件）建立新的block（块）。
-  
-  
+
+
 .. py:method:: default()

 为该switch建立default case。
@@ -695,11 +695,11 @@ While
 该类用于实现while循环控制功能。


-参数：  
+参数：
 		- **cond** (Variable) – 用于比较的条件
 		- **is_test** (bool) – 用于表明是不是在测试阶段执行
 		- **name** (str) - 该层的命名
- 
+
 **代码示例**

 ..  code-block:: python
@@ -707,7 +707,7 @@ While
  d0 = fluid.layers.data("d0", shape=[10], dtype='float32')
  data_array = fluid.layers.array_write(x=d0, i=i)
  array_len = fluid.layers.fill_constant(shape=[1],dtype='int64', value=3)
-  
+
  cond = fluid.layers.less_than(x=i, y=array_len)
  while_op = fluid.layers.While(cond=cond)
  with while_op.block():
@@ -727,7 +727,7 @@ While


 ============
- io 
+ io
 ============


@@ -799,12 +799,12 @@ create_py_reader_by_data
 **代码示例：**

 :code:`py_reader` 的基本用法如下所示：
-        
+
 .. code-block:: python

    import paddle.fluid as fluid
    import paddle.dataset.mnist as mnist
-    
+
    image = fluid.layers.data(name='image', shape=[3,224,224], dtypes='float32')
    label = fluid.layers.data(name='label', shape=[1], dtypes='int64')
    reader = fluid.layers.create_py_reader_by_data(capacity=64, feed_list=[image, label])
@@ -814,7 +814,7 @@ create_py_reader_by_data
    loss = network(img, label) # some network definition

    fluid.Executor(fluid.CUDAPlace(0)).run(fluid.default_startup_program())
-    
+
    exe = fluid.ParallelExecutor(use_cuda=True, loss_name=loss.name)
    for epoch_id in range(10):
        reader.start()
@@ -1043,7 +1043,7 @@ py_reader

 该layer返回一个Reader Variable。reader提供了 ``decorate_paddle_reader()`` 和 ``decorate_tensor_provider()`` 来设置Python generator，作为Python端的数据源。在c++端调用 ``Executor::Run()`` 时，来自generator的数据将被自动读取。与 ``DataFeeder.feed()`` 不同，数据读取进程和  ``Executor::Run()`` 进程可以使用 ``py_reader`` 并行运行。reader的 ``start()`` 方法应该在每次数据传递开始时调用，在传递结束和抛出  ``fluid.core.EOFException`` 后执行 ``reset()`` 方法。注意， ``Program.clone()`` 方法不能克隆 ``py_reader`` 。

-参数:	
+参数:
  - **capacity** (int) –  ``py_reader`` 维护的缓冲区容量
  - **shapes** (list|tuple) –数据形状的元组或列表.
  - **dtypes** (list|tuple) –  ``shapes`` 对应元素的数据类型
@@ -1054,7 +1054,7 @@ py_reader
 返回:    reader，从reader中可以获取feed的数据

 返回类型:	Variable
-	
+


 **代码示例**
@@ -1087,8 +1087,7 @@ py_reader
 	    except fluid.core.EOFException:
 		reader.reset()

-
-
+    fluid.io.save_inference_model(dirname='./model', feeded_var_names=[img, label],target_vars=[loss], executor=fluid.Executor(fluid.CUDAPlace(0)))


 2.训练和测试应使用不同的名称创建两个不同的py_reader，例如：
@@ -1279,7 +1278,7 @@ shuffle


 ============
- nn 
+ nn
 ============

 .. _cn_api_fluid_layers_adaptive_pool2d:
@@ -1439,14 +1438,14 @@ add_position_encoding

 接受形状为[N×M×P]的三维输入张量，并返回一个形为[N×M×P]的输出张量，且输出张量具有位置编码值。

-可参考论文: `Attention Is All You Need <http://arxiv.org/pdf/1706.03762.pdf>`_ 
+可参考论文: `Attention Is All You Need <http://arxiv.org/pdf/1706.03762.pdf>`_

 .. math::

  PE(pos, 2i) &= \sin{(pos / 10000^{2i / P})}\\
  PE(pos, 2i + 1) &= \cos{(pos / 10000^{2i / P})}\\
  Out(:, pos, i) &= \alpha * input(:, pos, i) + \beta * PE(pos, i)
-	
+
 其中:
    - PE(pos, 2i): 偶数位置上数字的增量
    - PE(pos, 2i + 1): 奇数位置上数字的增量
@@ -1483,7 +1482,7 @@ add_position_encoding
 affine_channel
 -------------------------------

-.. py:function:: paddle.fluid.layers.affine_channel(x, scale=None, bias=None, data_layout='NCHW', name=None)
+.. py:function:: paddle.fluid.layers.affine_channel(x, scale=None, bias=None, data_layout='NCHW', name=None,act=None)

 对输入的每个 channel 应用单独的仿射变换。用于将空间批处理范数替换为其等价的固定变换。

@@ -1495,6 +1494,7 @@ affine_channel
 	- **bias** (Variable):形状为(C)的一维输入，第C个元素是输入的第C个通道的仿射变换的偏置。
 	- **data_layout** (string, default NCHW): NCHW 或 NHWC，如果输入是一个2D张量，可以忽略该参数
 	- **name** (str, default None): 此层的名称
+        - **act** (str, default None): 应用于该层输出的激活函数

 返回： out (Variable): 与x具有相同形状和数据布局的张量。

@@ -1518,7 +1518,7 @@ affine_grid


 .. code-block:: text
-        
+
        * 例 1:
          给定:
              theta = [[[x_11, x_12, x_13]
@@ -1526,15 +1526,15 @@ affine_grid
                       [[x_21, x_22, x_23]
                        [x_24, x_25, x_26]]]
              out_shape = [2, 3, 5, 5]
-          
+
          Step 1:
-              
+
              根据out_shape生成标准化坐标

              归一化坐标的值在-1和1之间
-              
+
              归一化坐标的形状为[2,H, W]，如下所示:
-              
+
              C = [[[-1.  -1.  -1.  -1.  -1. ]
                    [-0.5 -0.5 -0.5 -0.5 -0.5]
                    [ 0.   0.   0.   0.   0. ]
@@ -1545,11 +1545,11 @@ affine_grid
                    [-1.  -0.5  0.   0.5  1. ]
                    [-1.  -0.5  0.   0.5  1. ]
                    [-1.  -0.5  0.   0.5  1. ]]]
-              
+
              C[0]是高轴坐标，C[1]是宽轴坐标。

          Step2:
-              
+
              将C转换并重组成形为[H * W, 2]的张量,并追加到最后一个维度

              我们得到:
@@ -1580,9 +1580,9 @@ affine_grid
                    [ 0.5  1.   1. ]
                    [ 1.   1.   1. ]]
          Step3:
-              按下列公式计算输出 
+              按下列公式计算输出
 .. math::
-  
+
  Output[i] = C\_ * Theta[i]^T

 参数：
@@ -1680,11 +1680,11 @@ batch_norm


 参数：
-    - **input** (Variable) - 输入变量，为LoDTensor
+    - **input** (Variable) - 输入变量的排序，可以为 2, 3, 4, 5
    - **act** （string，默认None）- 激活函数类型，linear|relu|prelu|...
-    - **is_test** （bool,默认False） - 标志位，是否用于测试或训练
-    - **momentum** （float，默认0.9）- （暂无说明，待更新）
-    - **epsilon** （float，默认1e-05）- （暂无说明，待更新）
+    - **is_test** （bool,默认False） - 指示它是否在测试阶段。
+    - **momentum** （float，默认0.9）- 此值用于计算 moving_mean and moving_var. 更新公式为:  :math:`\(moving\_mean = moving\_mean * momentum + new\_mean * (1. - momentum)\)` :math:`\(moving\_var = moving\_var * momentum + new\_var * (1. - momentum)\)` ， 默认值0.9.
+    - **epsilon** （float，默认1e-05）- 加在分母上为了数值稳定的值。默认值为1e-5。
    - **param_attr** （ParamAttr|None） - batch_norm参数范围的属性，如果设为None或者是ParamAttr的一个属性，batch_norm创建ParamAttr为param_attr。如果没有设置param_attr的初始化函数，参数初始化为Xavier。默认：None
    - **bias_attr** （ParamAttr|None） - batch_norm bias参数的属性，如果设为None或者是ParamAttr的一个属性，batch_norm创建ParamAttr为bias_attr。如果没有设置bias_attr的初始化函数，参数初始化为0。默认：None
    - **data_layout** （string,默认NCHW) - NCHW|NHWC
@@ -1694,7 +1694,7 @@ batch_norm
    - **moving_variance_name** （string，默认None）- moving_variance的名称，存储全局变量
    - **do_model_average_for_mean_and_var** （bool，默认False）- 是否为mean和variance做模型均值
    - **fuse_with_relu** （bool）- 如果为True，batch norm后该操作符执行relu
-    - **use_global_stats** （bool, Default False） – 是否使用全局均值和方差。 在预测或测试模式下，将use_global_stats设置为true或将is_test设置为true，并且行为是等效的。 在训练模式中，当设置use_global_stats为True时，在训练期间也使用全局均值和方差。 
+    - **use_global_stats** （bool, Default False） – 是否使用全局均值和方差。 在预测或测试模式下，将use_global_stats设置为true或将is_test设置为true，并且行为是等效的。 在训练模式中，当设置use_global_stats为True时，在训练期间也使用全局均值和方差。

 返回： 张量，在输入中运用批正则后的结果

@@ -1733,15 +1733,15 @@ beam_search
 注意，如果 ``is_accumulated`` 为 True，传入的 ``scores`` 应该是累积分数。反之，``scores`` 会被认为为直接得分(straightforward scores)， 并且会被转化为log值并且在此运算中会被累积到 ``pre_scores`` 中。在计算累积分数之前应该使用额外的 operators 进行长度惩罚。

 有关束搜索用法演示，请参阅以下示例：
-  
+
     fluid/tests/book/test_machine_translation.py
-  
+


 参数:
-  - **pre_ids** （Variable） -  LodTensor变量，它是上一步 ``beam_search`` 的输出。在第一步中。它应该是LodTensor，shape为 :math:`(batch\_size，1)` ， :math:`lod [[0,1，...，batch\_size]，[0,1，...，batch\_size]]` 
+  - **pre_ids** （Variable） -  LodTensor变量，它是上一步 ``beam_search`` 的输出。在第一步中。它应该是LodTensor，shape为 :math:`(batch\_size，1)` ， :math:`lod [[0,1，...，batch\_size]，[0,1，...，batch\_size]]`
  - **pre_scores** （Variable） -  LodTensor变量，它是上一步中beam_search的输出
-  - **ids** （Variable） - 包含候选ID的LodTensor变量。shpae为 :math:`（batch\_size×beam\_ize，K）` ，其中 ``K`` 应该是 ``beam_size``
+  - **ids** （Variable） - 包含候选ID的LodTensor变量。shape为 :math:`（batch\_size×beam\_ize，K）` ，其中 ``K`` 应该是 ``beam_size``
  - **scores** （Variable） - 与 ``ids`` 及其shape对应的累积分数的LodTensor变量, 与 ``ids`` 的shape相同。
  - **beam_size** （int） - 束搜索中的束宽度。
  - **end_id** （int） - 结束标记的id。
@@ -1758,7 +1758,7 @@ beam_search
 **代码示例**

 ..  code-block:: python
-    
+
    # 假设 `probs` 包含计算神经元所得的预测结果
    # `pre_ids` 和 `pre_scores` 为beam_search之前时间步的输出
    topk_scores, topk_indices = fluid.layers.topk(probs, k=beam_size)
@@ -1804,7 +1804,7 @@ beam_search_decode
        - **beam_size** (int) - 束搜索中波束的宽度。
        - **end_id** (int) - 结束token的id。
        - **name** (str|None) - 该层的名称(可选)。如果设置为None，该层将被自动命名。
-    
+
 返回：	LodTensor 对（pair）， 由生成的id序列和相应的score序列组成。两个LodTensor的shape和lod是相同的。lod的level=2，这两个level分别表示每个源句有多少个假设，每个假设有多少个id。

 返回类型:	变量（variable）
@@ -1813,7 +1813,7 @@ beam_search_decode
 **代码示例**

 .. code-block:: python
-            
+
 	    # 假设 `ids` 和 `scores` 为 LodTensorArray变量，它们保留了
            # 选择出的所有时间步的id和score
            finished_ids, finished_scores = fluid.layers.beam_search_decode(
@@ -1881,12 +1881,12 @@ Bayesian Personalized Ranking Loss Operator. (贝叶斯个性化排序损失计
 该算子属于pairwise的排序类型，其标签是期望物品。在某次会话中某一给定点的损失值由下式计算而得:

 .. math::
-  
+
  Y[i] = -\frac{1}{N_{i}-1} * \sum_{0\le j<N_{i},~ j\neq Label[i]}\log(\sigma(X[i, Label[i]]-X[i, j]))

 更多细节请参考 `Session Based Recommendations with Recurrent Neural Networks <https://arxiv.org/abs/1511.06939>`_

-参数: 
+参数:
  - **input** (Variable|list):  一个形为[N x D]的2-D tensor , 其中 N 为批大小batch size ，D 为种类的数量。该输入为logits而非概率。
  - **label** (Variable|list):  2-D tensor<int64> 类型的真实值, 形为[N x 1]
  - **name** (str|None): （可选）该层的命名。 如果为None, 则自动为该层命名。 默认为None.
@@ -1915,7 +1915,7 @@ BRelu 激活函数

 .. math::   out=max(min(x,tmin),tmax)

-参数: 
+参数:
    - **x** (Variable) - BReluoperator的输入
    - **t_min** (FLOAT|0.0) - BRelu的最小值
    - **t_max** (FLOAT|24.0) - BRelu的最大值
@@ -2036,12 +2036,12 @@ clip
 -------------------------------

 .. py:function:: paddle.fluid.layers.clip(x, min, max, name=None)
-        
+
 clip算子

 clip算子限制给定输入的值在一个区间内。间隔使用参数"min"和"max"来指定：公式为

-.. math:: 
+.. math::
        Out=min(max(X,min),max)

 参数：
@@ -2052,7 +2052,7 @@ clip算子限制给定输入的值在一个区间内。间隔使用参数"min"

 返回：        （Tensor）clip操作后的输出和输入（X）具有形状（shape）

-返回类型：        输出（Variable）。        
+返回类型：        输出（Variable）。

 **代码示例：**

@@ -2073,13 +2073,13 @@ clip_by_norm
 -------------------------------

 .. py:function:: paddle.fluid.layers.clip_by_norm(x, max_norm, name=None)
-     
+
 ClipByNorm算子

 此算子将输入 ``X`` 的L2范数限制在 ``max_norm`` 内。如果 ``X`` 的L2范数小于或等于 ``max_norm``  ，则输出（Out）将与 ``X`` 相同。如果X的L2范数大于 ``max_norm`` ，则 ``X`` 将被线性缩放，使得输出（Out）的L2范数等于 ``max_norm`` ，如下面的公式所示：

-.. math:: 
-         Out = \frac{max\_norm * X}{norm(X)} 
+.. math::
+         Out = \frac{max\_norm * X}{norm(X)}

 其中， :math:`norm（X）` 代表 ``x`` 的L2范数。

@@ -2091,7 +2091,7 @@ ClipByNorm算子

 返回：        (Tensor)clip_by_norm操作后的输出和输入(X)具有形状(shape).

-返回类型：       Variable        
+返回类型：       Variable

 **代码示例：**

@@ -2204,7 +2204,7 @@ conv2d_transpose

 输入 :math:`X` 和输出 :math:`Out` 函数关系如下：

-.. math::			   
+.. math::
                        Out=\sigma (W*X+b)\\

 其中：
@@ -2217,34 +2217,34 @@ conv2d_transpose
    -  :math:`b` : 偏置（bias），二维张量，shape为 ``[M,1]``

    -  :math:`σ` : 激活函数
- 
+
    -  :math:`Out` : 输出值，Out和 ``X`` 的 ``shape`` 可能不一样

 **样例**：

 输入：

-.. math:: 
+.. math::

    输入张量的shape :  （N，C_{in}， H_{in}， W_{in})

-    滤波器（filter）shape ： （C_{in}, C_{out}, H_f, W_f)  
+    滤波器（filter）shape ： （C_{in}, C_{out}, H_f, W_f)

 输出：
-        
-.. math:: 
+
+.. math::
    输出张量的 shape ： （N，C_{out}, H_{out}, W_{out})

 其中

-.. math:: 
+.. math::

        & H'_{out} = (H_{in}-1)*strides[0]-2*paddings[0]+dilations[0]*(H_f-1)+1\\
        & W'_{out} = (W_{in}-1)*strides[1]-2*paddings[1]+dilations[1]*(W_f-1)+1 \\
        & H_{out}\in[H'_{out},H'_{out} + strides[0])\\
        & W_{out}\in[W'_{out},W'_{out} + strides[1])\\

-            
+

 参数:
 	- **input** （Variable）- 输入张量，格式为[N, C, H, W]
@@ -2272,11 +2272,11 @@ conv2d_transpose
 **代码示例**

 ..  code-block:: python
-  
+
    data = fluid.layers.data(name='data', shape=[3, 32, 32], dtype='float32')
    conv2d_transpose = fluid.layers.conv2d_transpose(input=data, num_filters=2, filter_size=3)
-    
-  
+
+



@@ -2313,11 +2313,11 @@ conv3d
 **示例**

 - 输入：
-    输入shape： :math:`(N, C_{in}, D_{in}, H_{in}, W_{in})` 
+    输入shape： :math:`(N, C_{in}, D_{in}, H_{in}, W_{in})`

-    滤波器shape： :math:`(C_{out}, C_{in}, D_f, H_f, W_f)` 
+    滤波器shape： :math:`(C_{out}, C_{in}, D_f, H_f, W_f)`
 - 输出：
-    输出shape： :math:`(N, C_{out}, D_{out}, H_{out}, W_{out})` 
+    输出shape： :math:`(N, C_{out}, D_{out}, H_{out}, W_{out})`

 其中

@@ -2392,7 +2392,7 @@ conv3d_transpose
    -  :math:`b` : 偏置（bias），二维张量，shape为 ``[M,1]``

    -  :math:`σ` : 激活函数
- 
+
    -  :math:`Out` : 输出值， ``Out`` 和 ``X`` 的 shape可能不一样


@@ -2400,33 +2400,33 @@ conv3d_transpose

 输入:

-.. math::   
-	
+.. math::
+
 		Input shape: (N,C_{in},D_{in},H_{in},W_{in})

 		Filter shape: (C_{in},C_{out},D_f,H_f,W_f)

-	
+

 输出:

-.. math::   
-	
+.. math::
+
 		Output shape: (N,C_{out},D_{out},H_{out},W_{out})

-	
+
 其中：

-.. math::   
-		
+.. math::
+


 		D_{out}=(D_{in}-1)*strides[0]-2*paddings[0]+dilations[0]*(D_f-1)+1
-	
+
 		H_{out}=(H_{in}-1)*strides[1]-2*paddings[1]+dilations[1]*(H_f-1)+1
-	
+
 		W_{out}=(W_{in}-1)*strides[2]-2*paddings[2]+dilations[2]*(W_f-1)+1
-		
+


 参数:
@@ -2456,7 +2456,7 @@ conv3d_transpose
 **代码示例**

 ..  code-block:: python
-  
+
    data = fluid.layers.data(name='data', shape=[3, 12, 32, 32], dtype='float32')
    conv3d_transpose = fluid.layers.conv3d_transpose(input=data, num_filters=2, filter_size=3)

@@ -2471,7 +2471,7 @@ conv3d_transpose

 .. _cn_api_fluid_layers_cos_sim:

-cos_sim 
+cos_sim
 -------------------------------

 .. py:function:: paddle.fluid.layers.cos_sim(X, Y)
@@ -2513,14 +2513,14 @@ crf_decoding
 本函数实现了Viterbi算法，可以动态地寻找隐藏状态最可能的序列，该序列也被称为Viterbi路径（Viterbi path），从而得出的标注(tags)序列。

 这个运算的结果会随着 ``Label`` 参数的有无而改变：
-      
+
      1. ``Label`` 非None的情况，在实际训练中时常发生。此时本函数会协同 ``chunk_eval`` 工作。本函数会返回一行形为[N X 1]的向量，其中值为0的部分代表该label不适合作为对应结点的标注，值为1的部分则反之。此类型的输出可以直接作为 ``chunk_eval`` 算子的输入
-      
+
      2. 当没有 ``Label`` 时，该函数会执行标准decoding过程

 （没有 ``Label`` 时）该运算返回一个形为 [N X 1]的向量，其中元素取值范围为 0 ~ 最大标注个数-1，分别为预测出的标注（tag）所在的索引。
-	
-参数：	
+
+参数：
    - **input** (Variable)(LoDTensor，默认类型为 LoDTensor<float>) — 一个形为 [N x D] 的LoDTensor，其中 N 是mini-batch的大小，D是标注（tag) 的总数。 该输入是 ``linear_chain_crf`` 的 unscaled emission weight matrix （未标准化的发射权重矩阵）
    - **param_attr** (ParamAttr) — 参与训练的参数的属性
    - **label** (Variable)(LoDTensor，默认类型为 LoDTensor<int64_t>) —  形为[N x 1]的正确标注（ground truth）。 该项可选择传入。 有关该参数的更多信息，请详见上述描述
@@ -2587,7 +2587,7 @@ crop
            Out = [[1, 2, 5],
                   [3, 4, 6]].

- 
+
 参数:
  - **x** (Variable): 输入张量。
  - **shape** (Variable|list/tuple of integer) - 输出张量的形状由参数shape指定，它可以是一个变量/整数的列表/整数元组。如果是张量变量，它的秩必须与x相同。该方式适可用于每次迭代时候需要改变输出形状的情况。如果是整数列表/tupe，则其长度必须与x的秩相同
@@ -2634,28 +2634,28 @@ cross_entropy
 以及soft-label cross-entropy computation（软标签交叉熵损失计算）

  1. One-hot cross-entropy算法
-     
-     soft_label = False, Label[i, 0] 指明样本i的类别所具的索引:        
+
+     soft_label = False, Label[i, 0] 指明样本i的类别所具的索引:
                            .. math::
                                     \\Y[i]=-log(X[i,Label[i]])\\
-  
+
  2. Soft-label cross-entropy算法
-     
-     soft_label = True, Label[i, j] 表明样本i对应类别j的soft label(软标签):        
+
+     soft_label = True, Label[i, j] 表明样本i对应类别j的soft label(软标签):
                            .. math::
                                     \\Y[i]= \sum_{j}-Label[i,j]*log(X[i,j])\\
-                                     
+
     **请确保采用此算法时识别为各软标签的概率总和为1**
-  
+
  3. One-hot cross-entropy with vecterized label（使用向量化标签的One-hot）算法
-        
+
     作为 *2* 的特殊情况，当软类标签内部只有一个非零概率元素，且它的值为1，那么 *2* 算法降级为一种仅有one-hot标签的one-hot交叉熵
-  
-  



-参数：  
+
+
+参数：
    - **input** (Variable|list) – 一个形为[N x D]的二维tensor，其中N是batch大小，D是类别（class）数目。 这是由之前的operator计算出的概率，绝大多数情况下是由softmax operator得出的结果
    - **label** (Variable|list) – 一个二维tensor组成的正确标记的数据集(ground truth)。 当 ``soft_label`` 为False时，label为形为[N x 1]的tensor<int64>。 ``soft_label`` 为True时, label是形为 [N x D]的 tensor<float/double>
    - **soft_label** (bool) – 标志位，指明是否需要把给定的标签列表认定为软标签。默认为False。
@@ -2663,12 +2663,12 @@ cross_entropy

 返回： 一个形为[N x 1]的二维tensor，承载了交叉熵损失

-弹出异常： ``ValueError`` 
+弹出异常： ``ValueError``

                        1. 当 ``input`` 的第一维和 ``label`` 的第一维不相等时，弹出异常
                        2. 当 ``soft_label`` 值为True， 且 ``input`` 的第二维和 ``label`` 的第二维不相等时，弹出异常
                        3. 当 ``soft_label`` 值为False，且 ``label`` 的第二维不是1时，弹出异常
-                        
+


 **代码示例**
@@ -2740,16 +2740,16 @@ ctc_greedy_decoder
        - **input** (Variable) — (LoDTensor<float>)，变长序列的概率，它是一个具有LoD信息的二维张量。它的形状是[Lp, num_classes + 1]，其中Lp是所有输入序列长度的和，num_classes是真正的类别。(不包括空白标签)。
        - **blank** (int) — Connectionist Temporal Classification (CTC) loss空白标签索引,  属于半开区间[0,num_classes + 1）。
        - **name** (str) — 此层的名称。可选。
-   
+
 返回： CTC贪婪解码结果是一个形为(Lp,1)的二维张量，其中Lp是所有输出序列的长度之和。如果结果中的所有序列都为空，则输出LoDTensor 为[-1]，其中LoD[[]] 形为[1,1]。

 返回类型： 变量（Variable）
-    
+

 **代码示例**

 ..  code-block:: python
-        
+
    x = fluid.layers.data(name='x', shape=[8], dtype='float32')

    cost = fluid.layers.ctc_greedy_decoder(input=x, blank=0)
@@ -2763,10 +2763,10 @@ ctc_greedy_decoder
 data_norm
 -------------------------------

-.. py:function:: paddle.fluid.layers.data_norm(input, act=None, epsilon=1e-05, param_attr=None, data_layout='NCHW', in_place=False, use_mkldnn=False, name=None, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=False)
+.. py:function:: paddle.fluid.layers.data_norm(input, act=None, epsilon=1e-05, param_attr=None, data_layout='NCHW', in_place=False, name=None, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=False)

 **数据正则化层**
-    
+
 可用作conv2d和fully_connected操作的正则化函数。 此层所需的数据格式为以下之一：

 1. NHWC [batch, in_height, in_width, in_channels]
@@ -2774,28 +2774,27 @@ data_norm

 :math:`input` 为一个mini-batch上的特征:

-.. math::       
+.. math::
        \mu_{\beta} &\gets \frac{1}{m} \sum_{i=1}^{m} x_i \qquad &//\
        \ mini-batch\ mean \\
        \sigma_{\beta}^{2} &\gets \frac{1}{m} \sum_{i=1}^{m}(x_i - \
        \mu_{\beta})^2 \qquad &//\ mini-batch\ variance \\
        \hat{x_i} &\gets \frac{x_i - \mu_\beta} {\sqrt{\
        \sigma_{\beta}^{2} + \epsilon}} \qquad &//\ normalize \\
-        y_i &\gets \gamma \hat{x_i} + \beta \qquad &//\ scale\ and\ shift  
+        y_i &\gets \gamma \hat{x_i} + \beta \qquad &//\ scale\ and\ shift

 参数:
  - **input** （variable） - 输入变量，它是一个LoDTensor。
  - **act** （string，默认None） - 激活函数类型，线性| relu | prelu | ...
-  - **epsilon** （float，默认1e-05） - 
+  - **epsilon** （float，默认1e-05） -
  - **param_attr** （ParamAttr） - 参数比例的参数属性。
  - **data_layout** （string，默认NCHW） -  NCHW | NHWC
  - **in_place** （bool，默认值False） - 使data_norm的输入和输出复用同一块内存。
-  - **use_mkldnn** （bool，默认为false） -  是否使用mkldnn
  - **name** （string，默认None） - 此层的名称（可选）。 如果设置为None，则将自动命名该层。
  - **moving_mean_name** （string，Default None） - 存储全局Mean的moving_mean的名称。
  - **moving_variance_name** （string，默认None） - 存储全局Variance的moving_variance的名称。
  - **do_model_average_for_mean_and_var** （bool，默认值为false） - 是否为mean和variance进行模型平均。
-    
+
 返回: 张量变量，是对输入数据进行正则化后的结果。

 返回类型: Variable
@@ -2803,7 +2802,7 @@ data_norm
 **代码示例**

 ..  code-block:: python
-        
+
    data = fluid.layers.data(input=x, size=200, param_attr='fc1.w')
    hidden2 = fluid.layers.data_norm(input=hidden1)

@@ -2823,19 +2822,19 @@ dice_loss
 .. py:function:: paddle.fluid.layers.dice_loss(input, label, epsilon=1e-05)

 dice_loss是比较两批数据相似度，通常用于二值图像分割，即标签为二值。
-    
+
 dice_loss定义为:

-.. math::       
+.. math::
        dice\_loss &= 1- \frac{2 * intersection\_area}{total\_rea}\\
                   &= \frac{(total\_area−intersection\_area)−intersection\_area}{total\_area}\\
-                   &= \frac{union\_area−intersection\_area}{total\_area}           
+                   &= \frac{union\_area−intersection\_area}{total\_area}

 参数:
    - **input** (Variable) - rank>=2的预测。第一个维度是batch大小，最后一个维度是类编号。
    - **label** （Variable）- 与输入tensor rank相同的正确的标注数据（groud truth）。第一个维度是batch大小，最后一个维度是1。
    - **epsilon** (float) - 将会加到分子和分母上。如果输入和标签都为空，则确保dice为1。默认值:0.00001
-    
+
 返回: dice_loss shape为[1]。

 返回类型:  dice_loss(Variable)
@@ -2843,7 +2842,7 @@ dice_loss定义为:
 **代码示例**

 ..  code-block:: python
-        
+
 	predictions = fluid.layers.softmax(x)
    	loss = fluid.layers.dice_loss(input=predictions, label=label, 2)

@@ -2876,23 +2875,23 @@ dropout op可以从Program中删除，提高执行效率。
    - **is_test** (bool)-显示是否进行测试用语的标记
    - **seed** (int)-Python整型，用于创建随机种子。如果该参数设为None，则使用随机种子。注：如果给定一个整型种子，始终丢弃相同的输出单元。训练过程中勿用固定不变的种子。
    - **name** (str|None)-该层名称（可选）。如果设置为None,则自动为该层命名
-    - **dropout_implementation** (string) -   
+    - **dropout_implementation** (string) -

      [‘downgrade_in_infer’(default)|’upscale_in_train’] 其中:

-      1. downgrade_in_infer(default), 在预测时减小输出结果 
-
-         - train: out = input * mask 
+      1. downgrade_in_infer(default), 在预测时减小输出结果

-         - inference: out = input * dropout_prob 
+         - train: out = input * mask
+         
+         - inference: out = input * (1.0 - dropout_prob) 

         (mask是一个张量，维度和输入维度相同，值为0或1，值为0的比例即为 ``dropout_prob`` )
-        
+
      2. upscale_in_train, 增加训练时的结果

         - train: out = input * mask / ( 1.0 - dropout_prob )

-         - inference: out = input 
+         - inference: out = input

         (mask是一个张量，维度和输入维度相同，值为0或1，值为0的比例即为 ``dropout_prob`` ）

@@ -2933,7 +2932,7 @@ dynamic_gru

 公式如下：

-.. math:: 
+.. math::
  u_{t}=act_g(W_{ux}x_{t}+W_{uh}h_{t-1}+b_{u})
 .. math::
  r_{t}=act_g(W_{rx}x_{t}+W_{rh}h_{t-1}+b_{r})
@@ -2969,7 +2968,7 @@ dynamic_gru
 参数:
  - **input** (Variable) – dynamic_gru层的输入, 支持variable time length input sequence（可变时长输入序列）。 本变量底层的tensor是一个(T×3D)矩阵， 其中T是该mini-batch中总时间步数， D是隐藏状态的规模（hidden size）。
  - **size** (int) – GRU cell的维度
-  - **param_attr** (ParamAttr|None)  –  可学习的隐藏层权重矩阵的参数属性。 
+  - **param_attr** (ParamAttr|None)  –  可学习的隐藏层权重矩阵的参数属性。
    注意：
                                    - 该矩阵为一个（T X 3D）矩阵。其中D为隐藏状态的规模（hidden size）
                                    - 该矩阵的所有元素由两部分组成。一是update gate和reset gate的权重，形为（D X 2D)，二是候选隐藏状态（candidate hidden state）的权重，形为 (D X D)
@@ -2979,8 +2978,8 @@ dynamic_gru
  - **gate_activation** (str) – update gate 和 reset gate的激励函数（activation）。 可选择[“sigmoid”, “tanh”, “relu”, “identity”]其一, 默认为 “sigmoid”
  - **candidate_activation** (str) – candidate hidden state（候选隐藏状态）计算所需的激励函数（activation）。 可从[“sigmoid”, “tanh”, “relu”, “identity”]中选择, 默认为 “tanh”
  - **h_0** (Variable) – 该函数参数为初始隐藏状态。若未赋值，则默认为0。它是一个 (N x D) tensor, 其中 N 为输入mini-batch的总时间步数， D 为 隐藏状态规模(hidden size)
-  
-  
+
+
 返回：	GRU的隐藏状态(hidden state)。形为（T X D），序列长度和输入相同。

 返回类型:	变量（variable）
@@ -3076,7 +3075,7 @@ W 代表了权重矩阵(weight matrix)，例如 :math:`W_{xi}` 是从输入门
  - **is_reverse** (bool) – （默认: False） 是否计算反LSTM(reversed LSTM)
  - **gate_activation** (str) – （默认: "sigmoid"）应用于input gate（输入门），forget gate（遗忘门）和 output gate（输出门）的激励函数（activation），默认为sigmoid
  - **cell_activation** (str) – （默认: tanh）用于神经元输出的激励函数(activation), 默认为tanh
-  - **candidate_activation** (str) – （默认: tanh）candidate hidden state（候选隐藏状态）的激励函数(activation), 默认为tanh 
+  - **candidate_activation** (str) – （默认: tanh）candidate hidden state（候选隐藏状态）的激励函数(activation), 默认为tanh
  - **dtype** (str) – 即 Data type（数据类型）。 可以选择 [“float32”, “float64”]，默认为“float32”
  - **name** (str|None) – 该层的命名，可选项。如果值为None, 将会自动对该层命名

@@ -3111,7 +3110,7 @@ W 代表了权重矩阵(weight matrix)，例如 :math:`W_{xi}` 是从输入门

 dynamic_lstmp
 -------------------------------
-.. py:function:: paddle.fluid.layers.dynamic_lstmp(input, size, proj_size, param_attr=None, bias_attr=None, use_peepholes=True, is_reverse=False, gate_activation='sigmoid', cell_activation='tanh', candidate_activation='tanh', proj_activation='tanh', dtype='float32', name=None)
+.. py:function:: paddle.fluid.layers.dynamic_lstmp(input, size, proj_size, param_attr=None, bias_attr=None, use_peepholes=True, is_reverse=False, gate_activation='sigmoid', cell_activation='tanh', candidate_activation='tanh', proj_activation='tanh', dtype='float32', name=None, h_0=None, c_0=None, cell_clip=None, proj_clip=None)

 动态LSTMP层(Dynamic LSTMP Layer)

@@ -3141,7 +3140,7 @@ LSTMP层(具有循环映射的LSTM)在LSTM层后有一个分离的映射层，
    - :math:`\tilde{c_t}` : 候选隐藏状态
    - :math:`\odot` : 向量的元素状态生成
    - :math:`act_g` 和 :math:`act_h` : cell输入和cell输出激活函数，通常使用 :math:`tanh`
-    - :math:`\overline{act_h}` : 映射输出的激活函数，通常用 :math:`identity` 或等同的 :math:`act_h` 
+    - :math:`\overline{act_h}` : 映射输出的激活函数，通常用 :math:`identity` 或等同的 :math:`act_h`

 将 ``use_peepholes`` 设置为False，断开窥视孔连接（peephole connection）。在此省略公式，详情请参照论文 `LONG SHORT-TERM MEMORY <http://www.bioinf.jku.at/publications/older/2604.pdf>`_ 。

@@ -3169,9 +3168,9 @@ LSTMP层(具有循环映射的LSTM)在LSTM层后有一个分离的映射层，
        2.use_peepholes = True
            - Biases = { :math:`b_{c},b_{i},b_{f},b_{o},W_{ic},W_{fc},W_{oc}`}
            - 维度为（1*7D）
-        
+
        如果设置为None或者ParamAttr的一个属性，dynamic_lstm将创建ParamAttr为bias_attr。bias_attr的初始函数未设置，bias则初始化为0.默认：None。
-        
+
    - **use_peepholes** (bool) - 是否开启诊断/窥视孔链接，默认为True。
    - **is_reverse** (bool) - 是否计算反向LSTM，默认为False。
    - **gate_activation** (bool) - 输入门（input gate）、遗忘门（forget gate）和输出门（output gate）的激活函数。Choices = [“sigmoid”，“tanh”，“relu”，“identity”]，默认“sigmoid”。
@@ -3179,7 +3178,11 @@ LSTMP层(具有循环映射的LSTM)在LSTM层后有一个分离的映射层，
    - **candidate_activation** (str) - 候选隐藏状态（candidate hidden state）的激活状态。Choices = [“sigmoid”，“tanh”，“relu”，“identity”]，默认“tanh”。
    - **proj_activation** (str) - 投影输出的激活函数。Choices = [“sigmoid”，“tanh”，“relu”，“identity”]，默认“tanh”。
    - **dtype** (str) - 数据类型。Choices = [“float32”，“float64”]，默认“float32”。
-    - **name** (str|None) - 该层名称（可选）。若设为None，则自动为该层命名。 
+    - **name** (str|None) - 该层名称（可选）。若设为None，则自动为该层命名。
+    - **h_0** (Variable) - 初始隐藏状态是可选输入，默认为0。这是一个具有形状的张量(N x D)，其中N是批大小，D是投影大小。 
+    - **c_0** (Variable) - 初始cell状态是可选输入，默认为0。这是一个具有形状(N x D)的张量，其中N是批大小。h_0和c_0可以为空，但只能同时为空。
+    - **cell_clip** (float) - 如果提供该参数，则在单元输出激活之前，单元状态将被此值剪裁。 
+    - **proj_clip** (float) - 如果 num_proj > 0 并且 proj_clip 被提供,那么将投影值沿元素方向剪切到[-proj_clip，proj_clip]内

 返回：含有两个输出变量的元组，隐藏状态（hidden state）的投影和LSTMP的cell状态。投影的shape为（T*P），cell state的shape为（T*D），两者的LoD和输入相同。

@@ -3280,12 +3283,12 @@ elementwise_add
 对于这个运算算子有2种情况：
        1. :math:`Y` 的形状（shape）与 :math:`X` 相同。
        2. :math:`Y` 的形状（shape）是 :math:`X` 的连续子序列。
-        
+
 对于情况2:
        1. 用 :math:`Y` 匹配 :math:`X` 的形状（shape），则 ``axis`` 为 :math:`Y` 传到 :math:`X` 上的起始维度索引。
        2. 如果 ``axis`` 为-1（默认值），则 :math:`axis= rank(X)-rank(Y)` 。
        3. 考虑到子序列， :math:`Y` 的大小为1的尾部尺寸将被忽略，例如shape（Y）=（2,1）=>（2）。
-        
+
 例如：

 ..  code-block:: python
@@ -3352,7 +3355,7 @@ elementwise_div
        shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
        shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
        shape(X) = (2, 3, 4, 5), shape(Y) = (2, 1), with axis=0
-       
+
 输入 :math:`X` 和 :math:`Y` 可以携带不同的LoD信息。但输出仅与输入 :math:`X` 共享LoD信息。

 参数：
@@ -3363,8 +3366,8 @@ elementwise_div
        - **name** （basestring | None）- 输出的名称。

 返回：        元素运算的输出。
-        
-        
+
+



@@ -3385,7 +3388,7 @@ elementwise_max

 .. math::
        Out = max(X, Y)
-        
+
 - :math:`X` ：任何尺寸的张量（Tensor）。
 - :math:`Y` ：尺寸必须小于或等于X尺寸的张量（Tensor）。

@@ -3408,7 +3411,7 @@ elementwise_max
        shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
        shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
        shape(X) = (2, 3, 4, 5), shape(Y) = (2, 1), with axis=0
-        
+
 输入X和Y可以携带不同的LoD信息。但输出仅与输入X共享LoD信息。

 参数：
@@ -3418,8 +3421,8 @@ elementwise_max
        - **act** （basestring | None）- 激活应用于输出。
        - **name** （basestring | None）- 输出的名称。

-返回：        元素运算的输出。        
-        
+返回：        元素运算的输出。
+



@@ -3442,7 +3445,7 @@ elementwise_min

 .. math::
        Out = min(X, Y)
-        
+
 - :math:`X` ：任何维数的张量（Tensor）。
 - :math:`Y` ：维数必须小于或等于X维数的张量（Tensor）。

@@ -3465,7 +3468,7 @@ elementwise_min
        shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
        shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
        shape(X) = (2, 3, 4, 5), shape(Y) = (2, 1), with axis=0
-        
+
 输入X和Y可以携带不同的LoD信息。但输出仅与输入X共享LoD信息。

 参数：
@@ -3475,9 +3478,9 @@ elementwise_min
        - **act** （basestring | None）- 激活应用于输出。
        - **name** （basestring | None）- 输出的名称。

-返回：        元素运算的输出。   
- 
- 
+返回：        元素运算的输出。
+
+



@@ -3499,7 +3502,7 @@ elementwise_mul

 .. math::
        Out = X \odot Y
-        
+
 - **X** ：任何尺寸的张量（Tensor）。
 - **Y** ：尺寸必须小于或等于X尺寸的张量（Tensor）。

@@ -3511,7 +3514,7 @@ elementwise_mul
        1. 用 :math:`Y` 匹配 :math:`X` 的形状（shape），其中 ``axis`` 将是 :math:`Y` 传到 :math:`X` 上的起始维度索引。
        2. 如果 ``axis`` 为-1（默认值），则 :math:`axis = rank（X）-rank（Y）` 。
        3. 考虑到子序列， :math:`Y` 的大小为1的尾随尺寸将被忽略，例如shape（Y）=（2,1）=>（2）。
-        
+
 例如：

 ..  code-block:: python
@@ -3522,7 +3525,7 @@ elementwise_mul
        shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
        shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
        shape(X) = (2, 3, 4, 5), shape(Y) = (2, 1), with axis=0
-        
+
 输入X和Y可以携带不同的LoD信息。但输出仅与输入X共享LoD信息。

 参数：
@@ -3532,8 +3535,8 @@ elementwise_mul
        - **act** （basestring | None）- 激活应用于输出。
        - **name** （basestring | None）- 输出的名称。

-返回：        元素运算的输出。        
-        
+返回：        元素运算的输出。
+



@@ -3555,7 +3558,7 @@ elementwise_pow

 .. math::
        Out = X ^ Y
-       
+
 - :math:`X` ：任何维数的张量（Tensor）。
 - :math:`Y` ：维数必须小于或等于X维数的张量（Tensor）。

@@ -3578,7 +3581,7 @@ elementwise_pow
        shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
        shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
        shape(X) = (2, 3, 4, 5), shape(Y) = (2, 1), with axis=0
-        
+
 输入X和Y可以携带不同的LoD信息。但输出仅与输入X共享LoD信息。

 参数：
@@ -3588,8 +3591,8 @@ elementwise_pow
        - **act** （basestring | None）- 激活应用于输出。
        - **name** （basestring | None）- 输出的名称。

-返回：        元素运算的输出。   
-        
+返回：        元素运算的输出。
+



@@ -3612,7 +3615,7 @@ elementwise_sub

 .. math::
       Out = X - Y
-        
+
 - **X** ：任何尺寸的张量（Tensor）。
 - **Y** ：尺寸必须小于或等于**X**尺寸的张量（Tensor）。

@@ -3624,7 +3627,7 @@ elementwise_sub
        1. 用 :math:`Y` 匹配 :math:`X` 的形状（shape），其中 ``axis`` 将是 :math:`Y` 传到 :math:`X` 上的起始维度索引。
        2. 如果 ``axis`` 为-1（默认值），则 :math:`axis = rank（X）-rank（Y）` 。
        3. 考虑到子序列， :math:`Y` 的大小为1的尾随尺寸将被忽略，例如shape（Y）=（2,1）=>（2）。
-        
+
 例如：

 ..  code-block:: python
@@ -3635,7 +3638,7 @@ elementwise_sub
        shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
        shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
        shape(X) = (2, 3, 4, 5), shape(Y) = (2, 1), with axis=0
-        
+
 输入X和Y可以携带不同的LoD信息。但输出仅与输入X共享LoD信息。

 参数：
@@ -3646,7 +3649,7 @@ elementwise_sub
        - **name** （basestring | None）- 输出的名称。

 返回：        元素运算的输出。
-        
+



@@ -3665,8 +3668,8 @@ elu
 ELU激活层（ELU Activation Operator）

 根据 https://arxiv.org/abs/1511.07289 对输入张量中每个元素应用以下计算。
-    
-.. math::      
+
+.. math::
        \\out=max(0,x)+min(0,α∗(ex−1))\\

 参数:
@@ -3758,11 +3761,11 @@ expand运算会按给定的次数对输入各维度进行复制（tile）运算
                    [[1, 1], [2, 2], [3, 3], [1, 1], [2, 2], [3, 3]],
                    [[4, 4], [5, 5], [6, 6], [4, 4], [5, 5], [6, 6]]
                ]
- 
+
 参数:
        - **x** (Variable)- 一个秩在[1, 6]范围中的张量（Tensor）.
        - **expand_times** (list|tuple) - 每一个维度要扩展的次数.
-        
+
 返回：     expand变量是LoDTensor。expand运算后，输出（Out）的每个维度的大小等于输入（X）的相应维度的大小乘以 ``expand_times`` 给出的相应值。

 返回类型：   变量（Variable）
@@ -3773,8 +3776,8 @@ expand运算会按给定的次数对输入各维度进行复制（tile）运算

        x = fluid.layers.data(name='x', shape=[10], dtype='float32')
        out = fluid.layers.expand(x=x, expand_times=[1, 2, 2])
-               
-               
+
+



@@ -3793,9 +3796,17 @@ fc

 **全连接层**

-该函数在神经网络中建立一个全连接层。 它可以同时将多个tensor（ ``input`` 可使用多个tensor组成的一个list，详见参数说明）作为自己的输入，并为每个输入的tensor创立一个变量，称为“权”（weights），等价于一个从每个输入单元到每个输出单元的全连接权矩阵。FC层用每个tensor和它对应的权相乘得到输出tensor。如果有多个输入tensor，那么多个乘法运算将会加在一起得出最终结果。如果 ``bias_attr`` 非空，则会新创建一个偏向变量（bias variable），并把它加入到输出结果的运算中。最后，如果 ``act`` 非空，它也会加入最终输出的计算中。
+该函数在神经网络中建立一个全连接层。 它可以将一个或多个tensor（ ``input`` 可以是一个list或者Variable，详见参数说明）作为自己的输入，并为每个输入的tensor创立一个变量，称为“权”（weights），等价于一个从每个输入单元到每个输出单元的全连接权矩阵。FC层用每个tensor和它对应的权相乘得到形状为[M, size]输出tensor，M是批大小。如果有多个输入tensor，那么形状为[M, size]的多个输出张量的结果将会被加起来。如果 ``bias_attr`` 非空，则会新创建一个偏向变量（bias variable），并把它加入到输出结果的运算中。最后，如果 ``act`` 非空，它也会加入最终输出的计算中。

-这个过程可以通过如下公式表现：
+当输入为单个张量：
+
+.. math::
+
+        \\Out = Act({XW + b})\\
+
+
+
+当输入为多个张量：

 .. math::

@@ -3803,13 +3814,29 @@ fc


 上述等式中：
-  - :math:`N` ：输入tensor的数目
-  - :math:`X_i` : 输入的tensor
-  - :math:`W` ：该层创立的权
+  - :math:`N` ：输入的数目,如果输入是变量列表，N等于len（input）
+  - :math:`X_i` : 第i个输入的tensor
+  - :math:`W_i` ：对应第i个输入张量的第i个权重矩阵
  - :math:`b` ：该层创立的bias参数
  - :math:`Act` : activation function(激励函数)
  - :math:`Out` : 输出tensor

+::
+
+            Given:
+                data_1.data = [[[0.1, 0.2],
+                               [0.3, 0.4]]]
+                data_1.shape = (1, 2, 2) # 1 is batch_size
+         
+                data_2 = [[[0.1, 0.2, 0.3]]]
+                data_2.shape = (1, 1, 3)
+         
+                out = fluid.layers.fc(input=[data_1, data_2], size=2)
+         
+            Then:
+                out.data = [[0.18669507, 0.1893476]]
+                out.shape = (1, 2)
+

 参数:
  - **input** (Variable|list of Variable) – 该层的输入tensor(s)（张量），其维度至少是2
@@ -3832,9 +3859,15 @@ fc

 ..  code-block:: python

+         # 当输入为单个张量时
+
        data = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
        fc = fluid.layers.fc(input=data, size=1000, act="tanh")

+        # 当输入为多个张量时
+        data_1 = fluid.layers.data(name="data_1", shape=[32, 32], dtype="float32")
+        data_2 = fluid.layers.data(name="data_2", shape=[24, 36], dtype="float32")
+        fc = fluid.layers.fc(input=[data_1, data_2], size=1000, act="tanh")



@@ -3862,16 +3895,16 @@ flatten
 .. code-block:: text

    Case 1:
-      
+
      给定
        X.shape = (3, 100, 100, 4)
      且
        axis = 2
      得到:
        Out.shape = (3 * 100, 4 * 100)
-    
+
    Case 2:
-      
+
      给定
        X.shape = (3, 100, 100, 4)
      且
@@ -3929,7 +3962,7 @@ fsp_matrix
 **代码示例**

 ..  code-block:: python
-        
+
    feature_map_0 = fluid.layers.conv2d(x)
    feature_map_1 = fluid.layers.conv2d(feature_map_0)
    loss = fluid.layers.fsp_matrix(feature_map_0, feature_map_1)
@@ -3970,13 +4003,13 @@ gather
 参数:
        - **input** (Variable) - input 的rank >= 1。
        - **index** (Variable) - index的rank = 1。
-    
+
 返回：	output (Variable)

 **代码示例**

 ..  code-block:: python
-        
+
 	output = fluid.layers.gather(x, index)


@@ -4013,7 +4046,7 @@ gaussian_random算子。

 .. code-block:: python

-    out = fluid.layers.gaussian_random(shape=[20, 30])       
+    out = fluid.layers.gaussian_random(shape=[20, 30])



@@ -4122,7 +4155,7 @@ step 2：
      |          d_s          |
      |           |           |
      ws ------- y_s ------- wn
-    
+
    x_w = floor(x)              // west side x coord
    x_e = x_w + 1               // east side x coord
    y_n = floor(y)              // north side y coord
@@ -4135,7 +4168,7 @@ step 2：
    en = X[:, :, y_n, x_e]      // north-east point value
    ws = X[:, :, y_s, x_w]      // south-east point value
    es = X[:, :, y_s, x_w]      // north-east point value
-    
+

    output = wn * d_e * d_s + en * d_w * d_s
           + ws * d_e * d_n + es * d_w * d_n
@@ -4243,8 +4276,8 @@ GRU单元的输入包括 :math:`z_t` ， :math:`h_{t-1}` 。在上述等式中
 :math:`u_t` 和 :math:`r_t` 分别代表了GRU神经元的update gates（更新门）和reset gates(重置门)。
 和LSTM不同，GRU少了一个门（它没有LSTM的forget gate）。但是它有一个叫做中间候选隐藏状态（intermediate candidate hidden output）的输出，
 记为 :math:`m_t` 。 该层有三个输出： :math:`h_t, dot(r_t,h_{t-1})` 以及 :math:`u_t，r_t，m_t` 的连结(concatenation)。
- 
- 
+
+


 参数:
@@ -4259,7 +4292,7 @@ GRU单元的输入包括 :math:`z_t` ， :math:`h_{t-1}` 。在上述等式中
  - **bias_attr** (ParamAttr|bool|None) - GRU的bias变量的参数属性。形为 :math:`(1x3D)` 的bias连结（concatenate）在update gates（更新门），reset gates(重置门)以及candidate calculations（候选隐藏状态计算）中的bias。如果值为False，那么上述三者将没有bias参与运算。若值为None或者 ``ParamAttr`` 类中的属性之一，gru_unit则会创建一个 ``ParamAttr`` 类的对象作为 bias_attr。如果bias_attr没有被初始化，那它会被默认初始化为0。默认值为None。
  - **activation** (string) –  神经元 “actNode” 的激励函数（activation）类型。默认类型为‘tanh’
  - **gate_activation** (string) – 门 “actGate” 的激励函数（activation）类型。 默认类型为 ‘sigmoid’
-  
+

 返回：	 hidden value（隐藏状态的值），reset-hidden value(重置隐藏状态值)，gate values(门值)

@@ -4299,10 +4332,10 @@ HardSigmoid激活算子。

 sigmoid的分段线性逼近(https://arxiv.org/abs/1603.00391)，比sigmoid快得多。

-.. math::   
+.. math::

      \\out=\max(0,\min(1,slope∗x+shift))\\
- 
+
 斜率是正数。偏移量可正可负的。斜率和位移的默认值是根据上面的参考设置的。建议使用默认值。

 参数：
@@ -4406,7 +4439,7 @@ hsigmoid

 .. py:function:: paddle.fluid.layers.hsigmoid(input, label, num_classes, param_attr=None, bias_attr=None, name=None, path_table=None, path_code=None, is_custom=False, is_sparse=False)

-层次sigmod（ hierarchical sigmoid ）加速语言模型的训练过程。这个operator将类别组织成一个完全二叉树，也可以使用 ``is_custom`` 参数来传入自定义的树结构来实现层次化。 
+层次sigmod（ hierarchical sigmoid ）加速语言模型的训练过程。这个operator将类别组织成一个完全二叉树，也可以使用 ``is_custom`` 参数来传入自定义的树结构来实现层次化。

 树中每个叶节点表示一个类(一个单词)，每个内部节点进行一个二分类。对于每个单词，都有一个从根到它的叶子节点的唯一路径，hsigmoid计算路径上每个内部节点的损失（cost），并将它们相加得到总损失（cost）。

@@ -4426,25 +4459,25 @@ hsigmoid可以把时间复杂度 :math:`O(N)` 优化到 :math:`O(logN)` ,其中

 参数:
    - **input** (Variable) - 输入张量，shape为 ``[N×D]`` ,其中 ``N`` 是minibatch的大小，D是特征大小。
-    - **label** (Variable) - 训练数据的标签。该tensor的shape为 ``[N×1]``   
+    - **label** (Variable) - 训练数据的标签。该tensor的shape为 ``[N×1]``
    - **num_classes** (int) - 类别的数量不能少于2。若使用默认树结构，该参数必须用户设置。当 ``is_custom=False`` 时，该项绝不能为None。反之，如果 ``is_custom=True`` ，它取值应为非叶节点的个数，来指明二分类实用的类别数目。
    - **param_attr** (ParamAttr|None) - 可学习参数/ hsigmoid权重的参数属性。如果将其设置为ParamAttr的一个属性或None，则将ParamAttr设置为param_attr。如果没有设置param_attr的初始化器，那么使用用Xavier初始化。默认值:没None。
    - **bias_attr** (ParamAttr|bool|None) - hsigmoid偏置的参数属性。如果设置为False，则不会向输出添加偏置。如果将其设置ParamAttr的一个属性或None，则将ParamAttr设置为bias_attr。如果没有设置bias_attr的初始化器，偏置将初始化为零。默认值:None。
    - **name** (str|None) - 该layer的名称(可选)。如果设置为None，该层将被自动命名。默认值:None。
    - **path_table** (Variable|None) – 存储每一批样本从词到根节点的路径。路径应为从叶至根方向。 ``path_table`` 和 ``path_code`` 应具有相同的形, 对于每个样本 i ，path_table[i]为一个类似np.array的结构，该数组内的每个元素都是其双亲结点权重矩阵的索引
    - **path_code** (Variable|None) – 存储每批样本的路径编码，仍然是按从叶至根方向。各样本路径编码批都由其各祖先结点的路径编码组成
-    - **is_custom** (bool|False) – 使用用户自定义二叉树取代默认二叉树结构，如果该项为真， 请务必设置 ``path_table`` , ``path_code`` , ``num_classes`` , 否则就需要设置 num_classes 
+    - **is_custom** (bool|False) – 使用用户自定义二叉树取代默认二叉树结构，如果该项为真， 请务必设置 ``path_table`` , ``path_code`` , ``num_classes`` , 否则就需要设置 num_classes
    - **is_sparse** (bool|False) – 使用稀疏更新方式，而非密集更新。如果为真， W的梯度和输入梯度将会变得稀疏

 返回:  (LoDTensor) 层次sigmod（ hierarchical sigmoid） 。shape[N, 1]
-    
+
 返回类型:  Out


 **代码示例**

 ..  code-block:: python
-        
+
 	x = fluid.layers.data(name='x', shape=[2], dtype='float32')
    	y = fluid.layers.data(name='y', shape=[1], dtype='int64')
    	out = fluid.layers.hsigmoid(input=x, label=y, num_classes=6)
@@ -4463,12 +4496,12 @@ Huber损失是更具鲁棒性的损失函数。 huber损失可以评估输入对

 当输入和标签之间的距离大于delta时:

-.. math:: 
+.. math::
        huber\_loss = delta * (label - input) - 0.5 * delta * delta

 当输入和标签之间的距离小于delta时:

-.. math:: 
+.. math::
        huber\_loss = 0.5 * (label - input) * (label - input)


@@ -4504,7 +4537,7 @@ im2sequence
 从输入张量中提取图像张量，与im2col相似，shape={input.batch_size * output_height * output_width, filter_size_H * filter_size_W * input.通道}。这个op使用filter / kernel扫描图像并将这些图像转换成序列。一个图片展开后的timestep的个数为output_height * output_width，其中output_height和output_width由下式计算:


-.. math:: 
+.. math::
                        output\_size=1+\frac{(2∗padding+img\_size−block\_size+stride-1}{stride}

 每个timestep的维度为 :math:`block\_y * block\_x * input.channels` 。
@@ -4569,7 +4602,7 @@ im2sequence
 **代码示例**

 ..  code-block:: python
-  
+
    output = fluid.layers.im2sequence(
    input=layer, stride=[1, 1], filter_size=[2, 2])

@@ -4590,10 +4623,10 @@ image_resize
 .. py:function:: paddle.fluid.layers.image_resize(input, out_shape=None, scale=None, name=None, resample='BILINEAR', actual_shape=None, align_corners=True, align_mode=1)

 调整一个batch中图片的大小。
-    
+
 输入张量的shape为(num_batch, channels, in_h, in_w)，并且调整大小只适用于最后两个维度(高度和宽度)。
-    
-支持重新取样方法: 
+
+支持重新取样方法:

    BILINEAR：双线性插值

@@ -4611,18 +4644,18 @@ Align_corners和align_mode是可选参数，插值的计算方法可以由它们
 ::

      For scale:
-      
+
        if align_corners = True && out_size > 1 :

          scale_factor = (in_size-1.0)/(out_size-1.0)
-        
+
        else:
-          
+
          scale_factor = float(in_size/out_size)
-        
-      
+
+
      Nearest neighbor interpolation:
-      
+
      if:
          align_corners = False

@@ -4645,16 +4678,16 @@ Align_corners和align_mode是可选参数，插值的计算方法可以由它们

      if:
          align_corners = False , align_mode = 0
-          
+
          input : (N,C,H_in,W_in)
          output: (N,C,H_out,W_out) where:
-          
+
          H_out = (H_{in}+0.5) * scale_{factor} - 0.5
          W_out = (W_{in}+0.5) * scale_{factor} - 0.5


      else:
-       
+
          input : (N,C,H_in,W_in)
          output: (N,C,H_out,W_out) where:

@@ -4695,9 +4728,9 @@ https://en.wikipedia.org/wiki/Bilinear_interpolation。
 **代码示例**

 ..  code-block:: python
-        
-	out = fluid.layers.image_resize(input, out_shape=[12, 12], resample="NEAREST") 
-  
+
+	out = fluid.layers.image_resize(input, out_shape=[12, 12], resample="NEAREST")
+



@@ -4722,7 +4755,7 @@ image_resize_short
        - **input** (Variable) -  图像调整图层的输入张量，这是一个4维的形状张量(num_batch, channels, in_h, in_w)。
        - **out_short_len** (int) -  输出图像的短边长度。
        - **resample** (str) - resample方法，默认为双线性插值。
-    
+
 返回：	4维张量，shape为(num_batch, channls, out_h, out_w)

 返回类型:	变量（variable）
@@ -4800,11 +4833,11 @@ L2正则（L2 normalize Layer）
    - **axis** (int)-运用归一化的轴。如果轴小于0，归一化的维是rank(X)+axis。-1是最后维
    - **epsilon** (float)-epsilon用于避免分母为0，默认值为1e-10
    - **name** (str|None)-该层名称（可选）。如果设为空，则自动为该层命名
-    
+
    返回：输出张量，同x的维度一致
-    
+
    返回类型：变量
-    
+
 **代码示例**：

 .. code-block:: python
@@ -4880,8 +4913,8 @@ layer_norm

 假设特征向量存在于维度 ``begin_norm_axis ... rank (input）`` 上，计算大小为 ``H`` 的特征向量a在该维度上的矩统计量，然后使用相应的统计量对每个特征向量进行归一化。 之后，如果设置了 ``scale`` 和 ``shift`` ，则在标准化的张量上应用可学习的增益和偏差以进行缩放和移位。

-请参考 `Layer Normalization <https://arxiv.org/pdf/1607.06450v1.pdf>`_ 
-            
+请参考 `Layer Normalization <https://arxiv.org/pdf/1607.06450v1.pdf>`_
+
 公式如下

 .. math::
@@ -4890,7 +4923,7 @@ layer_norm
            \\\sigma=\sqrt{\frac{1}{H}\sum_i^H{(a_i-\mu)^2}}\\
 .. math::
             \\h=f(\frac{g}{\sigma}(a-\mu) + b)\\
-             
+
 - :math:`\alpha` : 该层神经元输入总和的向量表示
 - :math:`H` : 层中隐藏的神经元个数
 - :math:`g` : 可训练的缩放因子参数
@@ -4908,12 +4941,12 @@ layer_norm
  - **act** （str） - 激活函数。默认 None
  - **name** （str） - 该层的名称， 可选的。默认为None，将自动生成唯一名称。

-返回： 标准化后的结果   
+返回： 标准化后的结果

 **代码示例**

 ..  code-block:: python
-    
+
   data = fluid.layers.data(name='data', shape=[3, 32, 32],
                                           dtype='float32')
   x = fluid.layers.layer_norm(input=data, begin_norm_axis=1)
@@ -4993,7 +5026,7 @@ linear_chain_crf

 	5.Label用 :math:`s` 表示

-	
+


 **注意：**
@@ -5013,7 +5046,7 @@ linear_chain_crf
 返回：
    output(Variable，Tensor，默认float类型Tensor)：shape为[N*D]的二维张量。Emission的指数。这是前向计算中的中间计算结果，在后向计算中还会复用

-    output(Variable，Tensor，默认float类型Tensor)：shape为[(D+2)*D]的二维张量。Transition的指数。这是前向计算中的中间计算结果，在后向计算中还会复用 
+    output(Variable，Tensor，默认float类型Tensor)：shape为[(D+2)*D]的二维张量。Transition的指数。这是前向计算中的中间计算结果，在后向计算中还会复用

    output(Variable,Tensor，默认float类型Tensor)：mini-batch每个训练样本的条件概率的对数。这是一个shape为[S*1]的二维张量，S是mini-batch的序列数。注：S等同于mini-batch的序列数。输出不再是LoDTensor

@@ -5129,7 +5162,7 @@ log

 .. math::
                  \\Out=ln(x)\\
- 
+

 参数:
  - **x** (Variable) – 输入张量
@@ -5220,11 +5253,11 @@ logical_and算子
        - **out** （Tensor）- 输出逻辑运算的张量。
        - **name** （basestring | None）- 输出的名称。

-返回：        (LoDTensor)n-dim bool张量。每个元素的计算公式： :math:`Out = X \&\& Y` 
-        
-返回类型：        输出（Variable）。        
-        
-        
+返回：        (LoDTensor)n-dim bool张量。每个元素的计算公式： :math:`Out = X \&\& Y`
+
+返回类型：        输出（Variable）。
+
+
 **代码示例：**

 .. code-block:: python
@@ -5254,7 +5287,7 @@ logical_not算子

 它在X上以元素方式操作，并返回Out。X和Out是N维布尔张量（Tensor）。Out的每个元素的计算公式为：

-.. math:: 
+.. math::
        Out = !X

 参数：
@@ -5264,13 +5297,13 @@ logical_not算子

 返回：        (LoDTensor)n维布尔张量。

-返回类型：        输出（Variable）。        
+返回类型：        输出（Variable）。


 **代码示例：**

 .. code-block:: python
-    
+
    left = fluid.layers.data(
        name='left', shape=[1], dtype='int32')
    result = fluid.layers.logical_not(x=left)
@@ -5292,7 +5325,7 @@ logical_or算子

 它在X和Y上以元素方式操作，并返回Out。X、Y和Out是N维布尔张量（Tensor）。Out的每个元素的计算公式为：

-.. math:: 
+.. math::
        Out = X || Y

 参数：
@@ -5301,9 +5334,9 @@ logical_or算子
        - **out** （Tensor）- 输出逻辑运算的张量。
        - **name** （basestring | None）- 输出的名称。

-返回：        (LoDTensor)n维布尔张量。每个元素的计算公式： :math:`Out = X || Y` 
-        
-返回类型：        输出（Variable）。        
+返回：        (LoDTensor)n维布尔张量。每个元素的计算公式： :math:`Out = X || Y`
+
+返回类型：        输出（Variable）。



@@ -5334,7 +5367,7 @@ logical_xor算子

 它在X和Y上以元素方式操作，并返回Out。X、Y和Out是N维布尔张量（Tensor）。Out的每个元素的计算公式为：

-.. math:: 
+.. math::
        Out = (X || Y) \&\& !(X \&\& Y)

 参数：
@@ -5344,8 +5377,8 @@ logical_xor算子
        - **name** （basestring | None）- 输出的名称。

 返回：        (LoDTensor)n维布尔张量。
-       
-返回类型：        输出（Variable）。        
+
+返回类型：        输出（Variable）。



@@ -5379,7 +5412,7 @@ lrn

 .. math::

-    Output(i,x,y) = Input(i,x,y)/\left ( k+\alpha \sum_{j=max(0,c-n/2)}^{min(C,c+n/2)}(Input(j,x,y))^2 \right )^\beta 
+    Output(i,x,y) = Input(i,x,y)/\left ( k+\alpha \sum_{j=max(0,c-n/2)}^{min(C,c+n/2)}(Input(j,x,y))^2 \right )^\beta

 在以上公式中：
  - :math:`n` ：累加的通道数
@@ -5597,9 +5630,9 @@ margin rank loss（差距排序损失）层。在排序问题中，它可以比

 返回类型:	变量（Variable）

-抛出异常: 
+抛出异常:
  - ``ValueError`` - ``label`` , ``left`` , ``right`` 有一者不为Variable类型时，抛出此异常
- 
+
 **代码示例**

 ..  code-block:: python
@@ -5697,7 +5730,7 @@ maxout

 假设输入形状为(N, Ci, H, W)，输出形状为(N, Co, H, W)，则 :math:`Co=Ci/groups` 运算公式如下:

-.. math:: 
+.. math::

 	y_{si+j} &= \max_k x_{gsi + sk + j} \\
 	g &= groups \\
@@ -5735,18 +5768,18 @@ mean
 -------------------------------

 .. py:function:: paddle.fluid.layers.mean(x, name=None)
-       
+
 mean算子计算X中所有元素的平均值
-     
+
 参数：
        - **x** (Variable)- (Tensor) 均值运算的输入。
        - **name** (basestring | None)- 输出的名称。

 返回：       均值运算输出张量（Tensor）
-       
+
 返回类型：        Variable
-        
-        
+
+



@@ -5763,18 +5796,18 @@ mean_iou
 .. py:function:: paddle.fluid.layers.mean_iou(input, label, num_classes)

 均值IOU（Mean  Intersection-Over-Union）是语义图像分割中的常用的评价指标之一，它首先计算每个语义类的IOU，然后计算类之间的平均值。定义如下:
-      
-.. math::   
+
+.. math::

    IOU = \frac{true\_positive}{true\_positive+false\_positive+false\_negative}
-          
+
 在一个confusion矩阵中累积得到预测值，然后从中计算均值-IOU。

 参数:
    - **input** (Variable) - 类型为int32或int64的语义标签的预测结果张量。
    - **label** (Variable) - int32或int64类型的真实label张量。它的shape应该与输入相同。
    - **num_classes** (int) - 标签可能的类别数目。
-    
+
 返回: 返回三个变量:

 - mean_iou: 张量，形为[1]， 代表均值IOU。
@@ -5809,18 +5842,18 @@ merge_selected_rows
 该运算用于合并（值相加）输入张量中重复的行。输出行没有重复的行，并且按值从小到大顺序重新对行排序。

 ::
-    
+
    例如：
-          
-          输入: 
+
+          输入:
               X.rows = [0, 5, 5, 4, 19]
               X.height = 20
               X.value = [[1, 1] [2, 2] [3, 3] [4, 4] [6, 6]]
-          
-          
+
+
          输出：
-               Out.row is [0, 4, 5, 19] 
-               Out.height is 20 
+               Out.row is [0, 4, 5, 19]
+               Out.height is 20
               Out.value is: [[1, 1] [4, 4] [5, 5] [6, 6]]


@@ -5829,7 +5862,7 @@ merge_selected_rows
  - x (Variable) – 输入类型为SelectedRows, 选中行有可能重复
  - name (basestring|None) – 输出变量的命名

-返回: 输出类型为SelectedRows，并且选中行不会重复 
+返回: 输出类型为SelectedRows，并且选中行不会重复

 返回类型:	变量（Variable）

@@ -5849,12 +5882,12 @@ mul
 -------------------------------

 .. py:function:: paddle.fluid.layers.mul(x, y, x_num_col_dims=1, y_num_col_dims=1, name=None)
-        
+
 mul算子
 此运算是用于对输入X和Y执行矩阵乘法。
 等式是：

-.. math:: 
+.. math::
        Out = X * Y

 输入X和Y都可以携带LoD（详细程度）信息。但输出仅与输入X共享LoD信息。
@@ -5867,10 +5900,10 @@ mul算子
        - **name** (basestring | None)- 输出的名称。

 返回：       乘法运算输出张量（Tensor）.
-       
-返回类型：    输出(Variable)。       
-        
-        
+
+返回类型：    输出(Variable)。
+
+



@@ -5886,18 +5919,18 @@ multiplex

 .. py:function:: paddle.fluid.layers.multiplex(inputs, index)

-引用给定的索引变量，该层从输入变量中选择行构造Multiplex变量。 
+引用给定的索引变量，该层从输入变量中选择行构造Multiplex变量。

-假设有 :math:`m` 个输入变量，:math:`I_{i}` 代表第i个输入变量，而且 :math:`i` is in :math:`[0,m)` 。 
+假设有 :math:`m` 个输入变量，:math:`I_{i}` 代表第i个输入变量，而且 :math:`i` is in :math:`[0,m)` 。

-所有输入变量都是具有相同形状的张量 :math:`[d_0,d_1, ... ,d_R]` 。 
+所有输入变量都是具有相同形状的张量 :math:`[d_0,d_1, ... ,d_R]` 。

-请注意，输入张量的秩应至少为2。每个输入变量将被视为形状为 :math:`[M，N]` 的二维矩阵，其中 :math:`M` 表示 :math:`d0` ，N表示 :math:`d_1 * d_2 * ... * d_R` 。 
+请注意，输入张量的秩应至少为2。每个输入变量将被视为形状为 :math:`[M，N]` 的二维矩阵，其中 :math:`M` 表示 :math:`d0` ，N表示 :math:`d_1 * d_2 * ... * d_R` 。

-设 :math:`I_{i}[j]` 为第i个输入变量的第j行。 给定的索引变量是具有形状[M，1]的2-D张量。 设 :math:`ID[i]` 为索引变量的第i个索引值。 然后输出变量将是一个形状为 :math:`[d_0,d_1, ... ,d_R]` 的张量。 
+设 :math:`I_{i}[j]` 为第i个输入变量的第j行。 给定的索引变量是具有形状[M，1]的2-D张量。 设 :math:`ID[i]` 为索引变量的第i个索引值。 然后输出变量将是一个形状为 :math:`[d_0,d_1, ... ,d_R]` 的张量。
+
+如果将输出张量视为具有形状[M，N]的2-D矩阵,并且令O[i]为矩阵的第i行，则O[i]等于 :math:`I_{ID}[i][i]`

-如果将输出张量视为具有形状[M，N]的2-D矩阵,并且令O[i]为矩阵的第i行，则O[i]等于 :math:`I_{ID}[i][i]` 
-  
 - Ids: 索引张量
 - X[0 : N - 1]: 输出的候选张量度(N >= 2).
 - 对于从 0 到 batchSize-1 的每个索引i，输出是第（Ids [i]）  张量的第i行
@@ -5958,7 +5991,7 @@ multiplex
 ..  code-block:: python

   import paddle.fluid as fluid
-   
+
   x1 = fluid.layers.data(name='x1', shape=[4], dtype='float32')
   x2 = fluid.layers.data(name='x2', shape=[4], dtype='float32')
   index = fluid.layers.data(name='index', shape=[1], dtype='int32')
@@ -5980,8 +6013,8 @@ nce
 .. py:function:: paddle.fluid.layers.nce(input, label, num_total_classes, sample_weight=None, param_attr=None, bias_attr=None, num_neg_samples=None, name=None, sampler='uniform', custom_dist=None, seed=0, is_sparse=False)

 计算并返回噪音对比估计（ noise-contrastive estimation training loss）。
-`请参考 See Noise-contrastive estimation: A new estimation principle for unnormalized statistical models 
-<http://www.jmlr.org/proceedings/papers/v9/gutmann10a/gutmann10a.pdf>`_ 
+`请参考 See Noise-contrastive estimation: A new estimation principle for unnormalized statistical models
+<http://www.jmlr.org/proceedings/papers/v9/gutmann10a/gutmann10a.pdf>`_
 该operator默认使用均匀分布进行抽样。

 参数:
@@ -6055,8 +6088,8 @@ npair_loss
 NPair损失需要成对的数据。NPair损失分为两部分：第一部分是嵌入向量上的L2正则化器；第二部分是以anchor的相似矩阵和正的相似矩阵为逻辑的交叉熵损失。

 参数:
-    - **anchor** (Variable) -  嵌入锚定图像的向量。尺寸=[batch_size, embedding_dims] 
-    - **positive** (Variable) -  嵌入正图像的向量。尺寸=[batch_size, embedding_dims] 
+    - **anchor** (Variable) -  嵌入锚定图像的向量。尺寸=[batch_size, embedding_dims]
+    - **positive** (Variable) -  嵌入正图像的向量。尺寸=[batch_size, embedding_dims]
    - **labels** (Variable) - 1维张量，尺寸=[batch_size]
    - **l2_reg** (float32) - 嵌入向量的L2正则化项，默认值：0.002

@@ -6066,7 +6099,7 @@ NPair损失需要成对的数据。NPair损失分为两部分：第一部分是

 **代码示例**：

-.. code-block:: python 
+.. code-block:: python

    anchor = fluid.layers.data(
              name = 'anchor', shape = [18, 6], dtype = 'float32', append_batch_size=False)
@@ -6084,7 +6117,7 @@ NPair损失需要成对的数据。NPair损失分为两部分：第一部分是

 .. _cn_api_fluid_layers_one_hot:

-one_hot 
+one_hot
 -------------------------------

 .. py:function:: paddle.fluid.layers.one_hot(input, depth)
@@ -6101,7 +6134,7 @@ one_hot

 **代码示例**：

-.. code-block:: python 
+.. code-block:: python

    label = fluid.layers.data(name="label", shape=[1], dtype="float32")
    one_hot_label = fluid.layers.one_hot(input=label, depth=10)
@@ -6123,7 +6156,7 @@ pad

 在张量上加上一个由 ``pad_value`` 给出的常数值，填充宽度由 ``paddings`` 指定。
 其中，维度 ``i`` 中 ``x`` 内容前填充的值个数用 ``paddings[i]`` 表示，维度 ``i`` 中 ``x`` 内容后填充的值个数用 ``paddings[i+1]`` 表示。
-   
+
 一个例子:

 ::
@@ -6152,12 +6185,12 @@ pad
 返回：	填充后的张量变量

 返回类型： 变量（Variable）
-    
+

 **代码示例**

 ..  code-block:: python
-        
+
    out = fluid.layers.pad(
    x=x, paddings=[0, 1, 1, 2], pad_value=0.)

@@ -6187,7 +6220,7 @@ pad2d

      X = [[1, 2, 3],
           [4, 5, 6]]
-     
+
     Case 0:
        paddings = [0, 1, 2, 3],
        mode = 'constant'
@@ -6195,14 +6228,14 @@ pad2d
        Out = [[0, 0, 1, 2, 3, 0, 0, 0]
               [0, 0, 4, 5, 6, 0, 0, 0]
               [0, 0, 0, 0, 0, 0, 0, 0]]
-     
+
     Case 1:
        paddings = [0, 1, 2, 1],
        mode = 'reflect'
        Out = [[3, 2, 1, 2, 3, 2]
               [6, 5, 4, 5, 6, 5]
               [3, 2, 1, 2, 3, 2]]
-     
+
     Case 2:
        paddings = [0, 1, 2, 1],
        mode = 'edge'
@@ -6273,7 +6306,6 @@ pad_constant_like
              [[41, 42, 43]]]]
        Y.shape = (1, 3, 1, 3)

-
 参数：
          - **x** （Variable）- 输入Tensor变量。
          - **y** （Variable）- 输出Tensor变量。
@@ -6449,7 +6481,7 @@ pooling3d操作根据input，pool_type，pool_size，strides和paddings参数计

 例如，

-输入X形为 :math:`(N, C, D_{in}, H_{in}, W_{in})` ，输出形为 :math:`(N, C, D_{out}, H_{out}, W_{out})` 
+输入X形为 :math:`(N, C, D_{in}, H_{in}, W_{in})` ，输出形为 :math:`(N, C, D_{out}, H_{out}, W_{out})`

 当ceil_mode = false时，

@@ -6588,7 +6620,7 @@ prelu
 返回： 输出Tensor与输入shape相同。

 返回类型：  变量（Variable）
-  
+
 **代码示例：**

 .. code-block:: python
@@ -6646,7 +6678,7 @@ PyFunc运算。

 在调用此函数之前，应正确设置 ``out`` 的数据类型和形状。 但是，``out`` 和 ``x`` 对应梯度的数据类型和形状将自动推断而出。

-``backward_func`` 的输入顺序为：前向输入x，前向输出 ``out`` 和反向输入 ``out`` 的梯度。 如果 ``out`` 的某些变量没有梯度，则输入张量在Python端将为None。 
+``backward_func`` 的输入顺序为：前向输入x，前向输出 ``out`` 和反向输入 ``out`` 的梯度。 如果 ``out`` 的某些变量没有梯度，则输入张量在Python端将为None。

 如果in的某些变量没有梯度，则用户应返回None。

@@ -6814,7 +6846,7 @@ reduce_max
 返回：  运算、减少维度之后的Tensor变量。

 返回类型：  变量（Variable）
-          
+
 **代码示例**

 ..  code-block:: python
@@ -6862,7 +6894,7 @@ reduce_mean
 返回：  运算、减少维度之后的Tensor变量。

 返回类型：  变量（Variable）
-          
+
 **代码示例**

 ..  code-block:: python
@@ -6911,7 +6943,7 @@ reduce_min
 返回：  运算、减少维度之后的Tensor变量。

 返回类型：  变量（Variable）
-          
+
 **代码示例**

 ..  code-block:: python
@@ -6959,7 +6991,7 @@ reduce_prod
 返回：  运算、减少维度之后的Tensor变量。

 返回类型：  变量（Variable）
-          
+
 **代码示例**

 ..  code-block:: python
@@ -7008,7 +7040,7 @@ reduce_sum
 返回：  运算、减少维度之后的Tensor变量。

 返回类型：  变量（Variable）
-          
+
 **代码示例**

 ..  code-block:: python
@@ -7028,7 +7060,7 @@ reduce_sum
      # 接下来的示例中，我们在每处函数调用后面都标注出了它的结果张量。
      fluid.layers.reduce_sum(x, dim=[1, 2]) # [10, 26]
      fluid.layers.reduce_sum(x, dim=[0, 1]) # [16, 20]
-      
+



@@ -7046,10 +7078,10 @@ relu
 .. py:function:: paddle.fluid.layers.relu(x, name=None)

 Relu接受一个输入数据(张量)，输出一个张量。将线性函数y = max(0, x)应用到张量中的每个元素上。
-    
-.. math::                 
+
+.. math::
              \\Out=\max(0,x)\\
- 
+

 参数:
  - **x** (Variable):输入张量。
@@ -7084,7 +7116,7 @@ relu6
 relu6激活算子（Relu6 Activation Operator）

 .. math::
-  
+
    \\out=min(max(0, x), 6)\\


@@ -7126,7 +7158,7 @@ reshape
 在指定目标shape时存在一些技巧：

 .. code-block:: text
-	
+
 	1. -1表示这个维度的值是从x的元素总数和剩余维度推断出来的。因此，有且只有一个维度可以被设置为-1。
 	2. 0表示实际的维数是从x的对应维数中复制出来的，因此shape中0的索引值不能超过秩(x)。

@@ -7191,22 +7223,22 @@ align_corners和align_mode是可选参数，插值的计算方法可以由它们
    Example:

      For scale:
-      
+
        if align_corners = True && out_size > 1 :

          scale_factor = (in_size-1.0)/(out_size-1.0)
-        
+
        else:
-          
-          scale_factor = float(in_size/out_size)     
+
+          scale_factor = float(in_size/out_size)

    Bilinear interpolation:

      if align_corners = False , align_mode = 0
-          
+
          input : (N,C,H_in,W_in)
          output: (N,C,H_out,W_out) where:
-          
+
          H_out = (H_{in}+0.5) * scale_{factor} - 0.5
          W_out = (W_{in}+0.5) * scale_{factor} - 0.5

@@ -7222,7 +7254,7 @@ align_corners和align_mode是可选参数，插值的计算方法可以由它们


 参数:
-    - **input** (Variable) - 双线性插值的输入张量，是一个shpae为(N x C x h x w)的4d张量。
+    - **input** (Variable) - 双线性插值的输入张量，是一个shape为(N x C x h x w)的4d张量。
    - **out_shape** (Variable) - 一维张量，包含两个数。第一个数是高度，第二个数是宽度。
    - **scale** (float|None) - 用于输入高度或宽度的乘数因子。out_shape和scale至少要设置一个。out_shape的优先级高于scale。默认值:None。
    - **name** (str|None) - 输出变量名。
@@ -7262,18 +7294,18 @@ resize_nearest
    Example:

          For scale:
-          
+
            if align_corners = True && out_size > 1 :

              scale_factor = (in_size-1.0)/(out_size-1.0)
-            
+
            else:
-              
+
              scale_factor = float(in_size/out_size)
-            
-          
+
+
          Nearest neighbor interpolation:
-          
+
          if align_corners = False

              input : (N,C,H_in,W_in)
@@ -7376,9 +7408,9 @@ roi_pool

 .. py:function:: paddle.fluid.layers.roi_pool(input, rois, pooled_height=1, pooled_width=1, spatial_scale=1.0)

-    
+
 roi池化是对非均匀大小的输入执行最大池化，以获得固定大小的特征映射(例如7*7)。
-    
+
 该operator有三个步骤:

    1. 用pooled_width和pooled_height将每个区域划分为大小相等的部分
@@ -7387,7 +7419,7 @@ roi池化是对非均匀大小的输入执行最大池化，以获得固定大

 Faster-RCNN.使用了roi池化。roi关于roi池化请参考 https://stackoverflow.com/questions/43430056/what-is-roi-layer-in-fast-rcnn

-参数:    
+参数:
    - **input** (Variable) - 张量，ROIPoolOp的输入。输入张量的格式是NCHW。其中N为batch大小，C为输入通道数，H为特征高度，W为特征宽度
    - **roi** (Variable) -  roi区域。
    - **pooled_height** (integer) - (int，默认1)，池化输出的高度。默认:1
@@ -7395,9 +7427,9 @@ Faster-RCNN.使用了roi池化。roi关于roi池化请参考 https://stackoverfl
    - **spatial_scale** (float) - (float，默认1.0)，用于将ROI coords从输入比例转换为池化时使用的比例。默认1.0

 返回: (张量)，ROIPoolOp的输出是一个shape为(num_rois, channel, pooled_h, pooled_w)的4d张量。
-    
+
 返回类型: 变量（Variable）
-    
+

 **代码示例**

@@ -7421,17 +7453,17 @@ row_conv

 .. py:function:: paddle.fluid.layers.row_conv(input, future_context_size, param_attr=None, act=None)

-行卷积（Row-convolution operator）称为超前卷积（lookahead convolution）。下面关于DeepSpeech2的paper中介绍了这个operator 
-    
-    `<http://www.cs.cmu.edu/~dyogatam/papers/wang+etal.iclrworkshop2016.pdf>`_ 
+行卷积（Row-convolution operator）称为超前卷积（lookahead convolution）。下面关于DeepSpeech2的paper中介绍了这个operator
+
+    `<http://www.cs.cmu.edu/~dyogatam/papers/wang+etal.iclrworkshop2016.pdf>`_

 双向的RNN在深度语音模型中很有用，它通过对整个序列执行正向和反向传递来学习序列的表示。然而，与单向RNNs不同的是，在线部署和低延迟设置中，双向RNNs具有难度。超前卷积将来自未来子序列的信息以一种高效的方式进行计算，以改进单向递归神经网络。 row convolution operator 与一维序列卷积不同，计算方法如下:
-   
+
 给定输入序列长度为 :math:`t` 的输入序列 :math:`X` 和输入维度 :math:`D` ，以及一个大小为 :math:`context * D` 的滤波器 :math:`W` ，输出序列卷积为:

-.. math::   
+.. math::
 		out_i = \sum_{j=i}^{i+context-1} X_{j} · W_{j-i}
-    
+
 公式中：
    - :math:`out_i` : 第i行输出变量形为[1, D].
    - :math:`context` ： 下文（future context）大小
@@ -7445,7 +7477,7 @@ row_conv
    - **future_context_size** (int) -- 下文大小。请注意，卷积核的shape是[future_context_size + 1, D]。
    - **param_attr** (ParamAttr) --  参数的属性，包括名称、初始化器等。
    - **act** (str) -- 非线性激活函数。
-    
+
 返回: 输出(Out)是一个LodTensor，它支持可变时间长度的输入序列。这个LodTensor的内部量是一个形状为 T x N 的矩阵，和X的 shape 一样。


@@ -7454,7 +7486,7 @@ row_conv
 ..  code-block:: python

 	import paddle.fluid as fluid
-     
+
     	x = fluid.layers.data(name='x', shape=[16],
                        dtype='float32', lod_level=1)
 	out = fluid.layers.row_conv(input=x, future_context_size=2)
@@ -7644,15 +7676,15 @@ selu
 .. math::
    selu= \lambda*
    \begin{cases}
-         x                      &\quad \text{ if } x>0 \\ 
-         \alpha * e^x - \alpha  &\quad \text{ if } x<=0 
+         x                      &\quad \text{ if } x>0 \\
+         \alpha * e^x - \alpha  &\quad \text{ if } x<=0
    \end{cases}

 输入 ``x`` 可以选择性携带LoD信息。输出和它共享此LoD信息(如果有)。

 参数:
  - **x** (Variable) – 输入张量
-  - **scale** (float, None) – 如果标度没有设置，其默认值为 1.0507009873554804934193349852946。 详情请见： `Self-Normalizing Neural Networks <https://arxiv.org/abs/1706.02515.pdf>`_ 
+  - **scale** (float, None) – 如果标度没有设置，其默认值为 1.0507009873554804934193349852946。 详情请见： `Self-Normalizing Neural Networks <https://arxiv.org/abs/1706.02515.pdf>`_
  - **alpha** (float, None) – 如果没有设置改参数, 其默认值为 1.6732632423543772848170429916717。 详情请见： `Self-Normalizing Neural Networks <https://arxiv.org/abs/1706.02515.pdf>`_
  - **name** (str|None, default None) – 该层命名，若为None则自动为其命名

@@ -7690,7 +7722,7 @@ sequence_concat操作通过序列信息连接LoD张量（Tensor）。例如：X1
 参数:
        - **input** (list) – 要连接变量的列表
        - **name** (str|None) – 此层的名称(可选)。如果没有设置，该层将被自动命名。
-        
+
 返回:     连接好的输出变量。

 返回类型:   变量（Variable）
@@ -7701,7 +7733,7 @@ sequence_concat操作通过序列信息连接LoD张量（Tensor）。例如：X1
 ..  code-block:: python

        out = fluid.layers.sequence_concat(input=[seq1, seq2, seq3])
-        
+



@@ -7713,7 +7745,7 @@ sequence_concat操作通过序列信息连接LoD张量（Tensor）。例如：X1

 .. _cn_api_fluid_layers_sequence_conv:

-sequence_conv 
+sequence_conv
 -------------------------------

 .. py:function:: paddle.fluid.layers.sequence_conv(input, num_filters, filter_size=3, filter_stride=1, padding=None, bias_attr=None, param_attr=None, act=None, name=None)
@@ -7761,16 +7793,16 @@ sequence_enumerate
            win_size = 2  pad_value = 0
        输出：
            Out.lod = [[0, 3, 5]]  Out.data = [[1, 2], [2, 3], [3, 0], [4, 5], [5, 0]]  Out.dims = [5, 2]
-        
-参数:   
+
+参数:
        - **input** （Variable）- 作为索引序列的输入变量。
        - **win_size** （int）- 枚举所有子序列的窗口大小。
        - **pad_value** （int）- 填充值，默认为0。
-          
+
 返回:      枚举序列变量是LoD张量（LoDTensor）。

 返回类型:   Variable
-          
+
 **代码示例**

 ..  code-block:: python
@@ -7788,7 +7820,7 @@ sequence_enumerate

 .. _cn_api_fluid_layers_sequence_expand:

-sequence_expand 
+sequence_expand
 -------------------------------

 .. py:function:: paddle.fluid.layers.sequence_expand(x, y, ref_level=-1, name=None)
@@ -7860,7 +7892,7 @@ sequence_expand

 .. _cn_api_fluid_layers_sequence_expand_as:

-sequence_expand_as 
+sequence_expand_as
 -------------------------------

 .. py:function:: paddle.fluid.layers.sequence_expand_as(x, y, name=None)
@@ -7890,13 +7922,13 @@ Sequence Expand As Layer
    给定一个 input(X)：
        X.data = [[a, b], [c, d], [e, f]]
        X.dims = [3, 2]
-    
+
    和 input(Y):
        Y.lod = [[0, 2, 3, 6]]
    ref_level: 0

    得到输出张量：
-    
+
        Out.lod =  [[0,             2,     3,                    6]]
        Out.data = [[a, b], [a, b] [c, d], [e, f], [e, f], [e, f]]
        Out.dims = [6, 2]
@@ -7995,9 +8027,9 @@ sequence_last_step
    输出为Tensor:

        out.dim = [3, 1]
-        
+
        且 len(x.lod[-1]) == out.dims[0]
-        
+
        out.data = [3, 6, 1], where 3=last(1,3), 6=last(2,4,6), 1=last(5,1)

 参数：**input** (variable)-输入变量，为LoDTensor
@@ -8074,13 +8106,13 @@ sequence_pad
    例1:

    给定 1-level LoDTensor
-    
+
    input(X):
        X.lod = [[0,2,5]]
        X.data = [a,b,c,d,e]
    input(PadValue):
        PadValue.data = [0]
-    
+
    'padded_length'=4

    得到LoDTensor:
@@ -8090,17 +8122,17 @@ sequence_pad
 ::

    例2:
-    
+
    给定 1-level LoDTensor
-    
+
    input(X):
        X.lod = [[0,2,5]]
        X.data = [[a1,a2],[b1,b2],[c1,c2],[d1,d2],[e1,e2]]
    input(PadValue):
        PadValue.data = [0]
-    
+
    'padded_length' = -1,表示用最长输入序列的长度(此例中为3)
-    
+
    得到LoDTensor:
        Out.data = [[[a1,a2],[b1,b2],[0,0]],[[c1,c2],[d1,d2],[e1,e2]]]
        Length.data = [[2],[3]]
@@ -8109,17 +8141,17 @@ sequence_pad
 ::

    例3:
-    
+
    给定 1-level LoDTensor
-    
+
    input(X):
        X.lod = [[0,2,5]]
        X.data = [[a1,a2],[b1,b2],[c1,c2],[d1,d2],[e1,e2]]
    input(PadValue):
        PadValue.data = [p1,p2]
-    
+
    'padded_length' = -1,表示用最长输入序列的长度（此例中为3）
-    
+
    得到LoDTensor:
        Out.data = [[[a1,a2],[b1,b2],[p1,p2]],[[c1,c2],[d1,d2],[e1,e2]]]
        Length.data = [[2],[3]]
@@ -8157,7 +8189,7 @@ sequence_pad

 .. _cn_api_fluid_layers_sequence_pool:

-sequence_pool 
+sequence_pool
 -------------------------------

 .. py:function:: paddle.fluid.layers.sequence_pool(input, pool_type, is_test=False)
@@ -8223,7 +8255,7 @@ sequence_pool
 sequence_reshape
 -------------------------------

-.. py:function:: paddle.fluid.layers.sequence_reshape(input, new_dim) 
+.. py:function:: paddle.fluid.layers.sequence_reshape(input, new_dim)

 Sequence Reshape Layer
 该层重排输入序列。用户设置新维度。每一个序列的的长度通过原始长度、原始维度和新的维度计算得出。以下实例帮助解释该层的功能
@@ -8294,7 +8326,7 @@ sequence_reverse

 ::

-    Y.data() = [ [5, 6, 7, 8], [1, 2, 3, 4], # 索引为0，长度为2的逆序列 
+    Y.data() = [ [5, 6, 7, 8], [1, 2, 3, 4], # 索引为0，长度为2的逆序列
                 [17, 18, 19, 20], [13, 14, 15, 16], [9, 10, 11, 12] # 索引为1，长度为3的逆序列

 该运算在建立反dynamic RNN 网络中十分有用。
@@ -8331,7 +8363,7 @@ sequence_scatter
 这个operator将更新张量X，它使用Ids的LoD信息来选择要更新的行，并使用Ids中的值作为列来更新X的每一行。

 **样例**:
- 
+
 ::

    输入：
@@ -8410,7 +8442,7 @@ sequence_slice
        out.dims = (3, 2).

 .. note::
-   ``input`` ， ``offset`` ， ``length`` 的第一维大小应相同。 
+   ``input`` ， ``offset`` ， ``length`` 的第一维大小应相同。
   ``offset`` 从0开始。

 参数:
@@ -8536,7 +8568,7 @@ sequence_unpad
    x = fluid.layers.data(name='x', shape=[10, 5], dtype='float32')
    len = fluid.layers.data(name='length', shape=[1], dtype='int64')
    out = fluid.layers.sequence_unpad(x=x, length=len)
-    
+



@@ -8566,14 +8598,13 @@ shape层。

 返回类型：    Variable
        
-        
 **代码示例：**

 .. code-block:: python

    input = fluid.layers.data(
        name="input", shape=[3, 100, 100], dtype="float32")
-    out = fluid.layers.shape(input)        
+    out = fluid.layers.shape(input)



@@ -8632,14 +8663,14 @@ shuffle_channel
 返回：通道混洗结果是一个张量变量，其形状和类型与输入相同。

 返回类型：输出（Variable）
-        
-        
+
+
 **代码示例：**

 .. code-block:: python

    input = fluid.layers.data(name='input', shape=[4,2,2], dtype='float32')
-    out = fluid.layers.shuffle_channel(x=input, group=2)    
+    out = fluid.layers.shuffle_channel(x=input, group=2)



@@ -8662,7 +8693,7 @@ sigmoid_cross_entropy_with_logits

 .. math::
    loss = -Labels * log(sigma(X)) - (1 - Labels) * log(1 - sigma(X))
- 
+
 已知:

 .. math::
@@ -8683,7 +8714,7 @@ sigmoid_cross_entropy_with_logits


 参数:
-  - **x** (Variable) - (Tensor, 默认 Tensor<float>)，形为 N x D 的二维张量，N为batch大小，D为类别数目。该输入是一个由先前运算得出的logit组成的张量。logit是未标准化(unscaled)的log概率， 公式为 :math:`log(\frac{p}{1-p})` 
+  - **x** (Variable) - (Tensor, 默认 Tensor<float>)，形为 N x D 的二维张量，N为batch大小，D为类别数目。该输入是一个由先前运算得出的logit组成的张量。logit是未标准化(unscaled)的log概率， 公式为 :math:`log(\frac{p}{1-p})`
  - **label** (Variable) -  (Tensor, 默认 Tensor<float>) 具有和X相同类型，相同形状的二维张量。该输入张量代表了每个logit的可能标签
  - **ignore_index** （int） - （int，默认kIgnoreIndex）指定被忽略的目标值，它不会影响输入梯度
  - **name** (basestring|None) - 输出的名称
@@ -8742,7 +8773,7 @@ similarity_focus

    给定四维张量 x 形为 (BatchSize, C, A, B), 其中C 为通道Channel数目，
    特征图（feature map）的形为（A,B）：
-    
+
        x.shape = (2, 3, 2, 2)
        x.data = [[[[0.8, 0.1],
                    [0.4, 0.5]],
@@ -8829,17 +8860,17 @@ slice算子。

 ::

-        案例1：给定：data=[[1,2,3,4],[5,6,7,8],] 
-                     axes=[0,1] 
-                     starts=[1,0] 
-                     ends=[2,3] 
+        案例1：给定：data=[[1,2,3,4],[5,6,7,8],]
+                     axes=[0,1]
+                     starts=[1,0]
+                     ends=[2,3]
               则：
                     result=[[5,6,7],]

        案例2：给定：
-                     data=[[1,2,3,4],[5,6,7,8],] 
-                     starts=[0,1] 
-                     ends=[-1,1000] 
+                     data=[[1,2,3,4],[5,6,7,8],]
+                     starts=[0,1]
+                     ends=[-1,1000]
               则：
                     result=[[2,3,4],]

@@ -8889,15 +8920,15 @@ smooth_l1
        - **inside_weight** (Variable|None) - rank至少为2的张量。这个输入是可选的，与x的形状应该相同。如果给定， ``(x - y)`` 的结果将乘以这个张量元素。
        - **outside_weight** (变量|None) - 一个rank至少为2的张量。这个输入是可选的，它的形状应该与 ``x`` 相同。如果给定，那么 smooth L1 loss 就会乘以这个张量元素。
        - **sigma** (float|None) - smooth L1 loss layer的超参数。标量，默认值为1.0。
-   
+
 返回：	smooth L1 loss, shape为 [batch_size, 1]

-返回类型:  Variable    
+返回类型:  Variable

 **代码示例**

 ..  code-block:: python
-        
+
    data = fluid.layers.data(name='data', shape=[128], dtype='float32')
    label = fluid.layers.data(
        name='label', shape=[100], dtype='float32')
@@ -8923,7 +8954,7 @@ soft_relu
 SoftRelu 激活函数

 .. math::   out=ln(1+exp(max(min(x,threshold),threshold))
- 
+
 参数:
    - **x** (variable) - SoftRelu operator的输入
    - **threshold** (FLOAT|40.0) - SoftRelu的阈值
@@ -8933,7 +8964,7 @@ SoftRelu 激活函数

 .. code-block:: python

-    x = fluid.layers.data(name=”x”, shape=[2,3,16,16], dtype=”float32”) 
+    x = fluid.layers.data(name=”x”, shape=[2,3,16,16], dtype=”float32”)
    y = fluid.layers.soft_relu(x, threshold=20.0)


@@ -9026,6 +9057,7 @@ softmax_with_cross_entropy


 参数:
+
  - **logits** (Variable) - 未标准化(unscaled)的log概率,一个形为 N X K 的二维张量。 N是batch大小，K是类别总数。
  - **label** (Variable) - 2-D 张量，代表了正确标注（ground truth）, 如果 ``soft_label`` 为  False，则该参数是一个形为 N X 1 的Tensor<int64> 。如果 ``soft_label`` 为 True，它是 Tensor<float/double> ，形为 N X K 。
  - **soft_label** (bool) - 是否将输入标签当作软标签。默认为False。
@@ -9033,7 +9065,7 @@ softmax_with_cross_entropy
  - **numeric_stable_mode** (bool) – 标志位，指明是否使用一个具有更佳数学稳定性的算法。仅在 ``soft_label`` 为 False的GPU模式下生效. 若 ``soft_label`` 为 True 或者执行场所为CPU, 算法一直具有数学稳定性。 注意使用稳定算法时速度可能会变慢。默认为 True。
  - **return_softmax** (bool) – 标志位，指明是否额外返回一个softmax值， 同时返回交叉熵计算结果。默认为False。

-返回: 
+返回:
  - 如果 ``return_softmax`` 为 False， 则返回交叉熵损失
  - 如果 ``return_softmax`` 为 True，则返回元组 (loss, softmax) ，其中交叉熵损失为形为[N x 1]的二维张量，softmax为[N x K]的二维张量

@@ -9049,8 +9081,8 @@ softmax_with_cross_entropy
        fc = fluid.layers.fc(input=data, size=100)
        out = fluid.layers.softmax_with_cross_entropy(
        logits=fc, label=label)
-      
-      
+
+



@@ -9078,7 +9110,7 @@ space_to_depth

 - 在各位置上，不重叠的，大小为 :math:`block\_size * block\_size` 的块重组入深度depth
 - 输出张量的深度为 :math:`block\_size * block\_size * input\_channel`
- - 输入各个块中的Y,X坐标变为输出张量通道索引的高序部位 
+ - 输入各个块中的Y,X坐标变为输出张量通道索引的高序部位
 - channel可以被blocksize的平方整除
 - 高度，宽度可以被blocksize整除

@@ -9090,7 +9122,7 @@ space_to_depth

 返回类型：Variable

-抛出异常： 
+抛出异常：
  - ``TypeError`` - ``blocksize`` 必须是long类型

 **代码示例**
@@ -9125,7 +9157,7 @@ spectral_norm

 步骤1：生成形状为[H]的向量U,以及形状为[W]的向量V,其中H是输入权重的第 ``dim`` 个维度，W是剩余维度的乘积。

-步骤2： ``power_iters`` 应该是一个正整数，用U和V迭代计算 ``power_iters`` 轮。 
+步骤2： ``power_iters`` 应该是一个正整数，用U和V迭代计算 ``power_iters`` 轮。

 .. math::

@@ -9138,7 +9170,7 @@ spectral_norm
    \sigma(\mathbf{W}) &= \mathbf{u}^{T} \mathbf{W} \mathbf{v}\\
    \mathbf{W} &= \frac{\mathbf{W}}{\sigma(\mathbf{W})}

-可参考: `Spectral Normalization <https://arxiv.org/abs/1802.05957>`_ 
+可参考: `Spectral Normalization <https://arxiv.org/abs/1802.05957>`_

 参数：
    - **weight** (Variable)-spectral_norm算子的输入权重张量，可以是2-D, 3-D, 4-D, 5-D张量，它是fc、conv1d、conv2d、conv3d层的权重。
@@ -9206,7 +9238,7 @@ split

 .. _cn_api_fluid_layers_square_error_cost:

-square_error_cost 
+square_error_cost
 -------------------------------

 .. py:function:: paddle.fluid.layers.square_error_cost(input,label)
@@ -9252,7 +9284,7 @@ square_error_cost

 .. _cn_api_fluid_layers_squeeze:

-squeeze 
+squeeze
 -------------------------------

 .. py:function:: paddle.fluid.layers.squeeze(input, axes, name=None)
@@ -9291,7 +9323,7 @@ squeeze
 .. code-block:: python

    x = fluid.layers.data(name='x', shape=[5, 1, 10])
-    y = fluid.layers.sequeeze(input=x, axes=[1])      
+    y = fluid.layers.sequeeze(input=x, axes=[1])



@@ -9356,6 +9388,7 @@ stack
        Out.dims = [1, 3, 2]

 参数:	
+
  - **x** (Variable|list(Variable)|tuple(Variable)) – 输入变量
  - **axis** (int|None) – 对输入进行stack运算所在的轴

@@ -9382,7 +9415,7 @@ stanh

 STanh 激活算子（STanh Activation Operator.）

-.. math::      
+.. math::
          \\out=b*\frac{e^{a*x}-e^{-a*x}}{e^{a*x}+e^{-a*x}}\\

 参数：
@@ -9449,7 +9482,7 @@ swish

 Swish 激活函数

-.. math::   
+.. math::
         out = \frac{x}{1 + e^{- beta x}}

 参数：
@@ -9498,7 +9531,7 @@ teacher_student_sigmoid_loss

 **代码示例**：

-.. code-block:: python 
+.. code-block:: python

    cost = fluid.layers.teacher_student_sigmoid_loss(input=similarity, label=label)

@@ -9533,7 +9566,7 @@ temporal_shift

 步骤4：沿第3(C)维连接三个切片，并将结果重塑为[N*T, C, H, W]。

-有关时间移动的详细信息，请参阅文件： `Temporal Shift Module <https://arxiv.org/abs/1811.08383>`_ 
+有关时间移动的详细信息，请参阅文件： `Temporal Shift Module <https://arxiv.org/abs/1811.08383>`_

 参数：
  - **x**  (Variable) – 时移算符的输入张量。这是一个4维张量，形状为[N*T，C，H，W]。N为批量大小，T为时间段数，C为信道数，H为特征高度，W为特征宽度
@@ -9550,7 +9583,7 @@ temporal_shift

 **代码示例**：

-.. code-block:: python 
+.. code-block:: python

    input = fluid.layers.data(name='input', shape=[4,2,2], dtype='float32')
    out = fluid.layers.temporal_shift(x=input, seg_num=2, shift_ratio=0.2)
@@ -9603,7 +9636,7 @@ topk

 **代码示例**：

-.. code-block:: python 
+.. code-block:: python

    top5_values, top5_indices = fluid.layers.topk(input, k=5)

@@ -9689,7 +9722,7 @@ tree_conv
    # 输出的形会是[None, 10, 6, 1],
    # None 代表batch size, 10数据集的最大节点大小max_node_size, 6 代表输出大小output size, 1 代表 1 个filter
    out_vector = fluid.layers.reshape(out_vector, shape=[None, 10, 6])
-    # reshape之后, 输出张量output tensor为下一个树卷积的nodes_vector 
+    # reshape之后, 输出张量output tensor为下一个树卷积的nodes_vector
    out_vector_2 = fluid.layers.tree_conv(out_vector, edge_set, 3, 4, 2, 'tanh',
        ParamAttr(initializer=Constant(1.0), ParamAttr(initializer=Constant(1.0))
    # 输出tensor也可以用来池化(论文中称为global pooling)
@@ -9798,9 +9831,9 @@ unstack

 如果 ``num`` 为 None，则它可以从 ``x.shape[axis]`` 中推断而来。

-如果 ``x.shape[axis]`` <= 0或者Unknown, 则抛出异常 ``ValueError`` 。 
+如果 ``x.shape[axis]`` <= 0或者Unknown, 则抛出异常 ``ValueError`` 。

-参数:	
+参数:
  - **x** (Variable|list(Variable)|tuple(Variable)) – 输入变量
  - **axis** (int|None) – 对输入进行unstack运算所在的轴
  - **num** (int|None) - 输出变量的数目
@@ -9808,7 +9841,7 @@ unstack
 返回: 经unstack运算后的变量

 返回类型: list(Variable)
-  
+



@@ -9858,7 +9891,7 @@ warpctc


 ============
- ops 
+ ops
 ============


@@ -9875,6 +9908,7 @@ abs
    out = |x|

 参数:
+
    - **x** - abs算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
 
@@ -9958,12 +9992,13 @@ ceil


 参数:
+
    - **x** - Ceil算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn

 返回：        Ceil算子的输出。
-        
-        
+
+



@@ -9988,6 +10023,7 @@ Cosine余弦激活函数。


 参数:
+
    - **x** - cos算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn

@@ -10048,6 +10084,7 @@ Exp激活函数(Exp指以自然常数e为底的指数运算)。
    out = e^x

 参数:
+
    - **x** - Exp算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn

@@ -10078,6 +10115,7 @@ floor


 参数:
+
    - **x** - Floor算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn

@@ -10105,7 +10143,7 @@ HardShrink激活函数(HardShrink activation operator)


 .. math::
-	
+
 	out = \begin{cases}
        x, \text{if } x > \lambda \\
        x, \text{if } x < -\lambda \\
@@ -10123,7 +10161,7 @@ HardShrink激活函数(HardShrink activation operator)
 .. code-block:: python

    data = fluid.layers.data(name="input", shape=[784])
-    result = fluid.layers.hard_shrink(x=data, threshold=0.3)    
+    result = fluid.layers.hard_shrink(x=data, threshold=0.3)



@@ -10150,6 +10188,7 @@ Logsigmoid激活函数。

 参数:
    - **x** - LogSigmoid算子的输入
+    
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn 

 返回：        LogSigmoid算子的输出
@@ -10177,15 +10216,13 @@ Reciprocal（取倒数）激活函数
    out = \frac{1}{x}

 参数:
+
    - **x** - reciprocal算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn 

-返回：        Reciprocal算子的输出。        
-
+返回：        Reciprocal算子的输出。


-        
-        



@@ -10209,12 +10246,13 @@ Round取整激活函数。


 参数:
+
    - **x** - round算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn 

 返回：        Round算子的输出。
-        
-        
+
+



@@ -10237,13 +10275,14 @@ sigmoid激活函数


 参数:
+
    - **x** - Sigmoid算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn

 返回：     Sigmoid运算输出.


- 
+



@@ -10265,6 +10304,7 @@ sin


 参数:
+
    - **x** - sin算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn

@@ -10312,7 +10352,7 @@ softplus激活函数。
 softshrink
 -------------------------------

-.. py:function:: paddle.fluid.layers.softshrink(x, name=None)       
+.. py:function:: paddle.fluid.layers.softshrink(x, name=None)

 Softshrink激活算子

@@ -10322,9 +10362,9 @@ Softshrink激活算子
                    x + \lambda, \text{if } x < -\lambda \\
                    0,  \text{otherwise}
              \end{cases}
-       
+
 参数：
-        - **x** - Softshrink算子的输入 
+        - **x** - Softshrink算子的输入
        - **lambda** （FLOAT）- 非负偏移量。

 返回：       Softshrink算子的输出
@@ -10382,6 +10422,7 @@ sqrt
    out = \sqrt{x}

 参数:
+
    - **x** - Sqrt算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn

@@ -10440,6 +10481,7 @@ tanh 激活函数。


 参数:
+
    - **x** - Tanh算子的输入  
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn

@@ -10468,6 +10510,7 @@ tanh_shrink激活函数。
    out = x - \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}

 参数:
+
    - **x** - TanhShrink算子的输入 
    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn

@@ -10493,8 +10536,8 @@ ThresholdedRelu激活函数
 .. math::

 	out = \left\{\begin{matrix}
-	    x, if&x > threshold\\ 
-	    0, &otherwise 
+	    x, if&x > threshold\\
+	    0, &otherwise
 	    \end{matrix}\right.

 参数：
@@ -10551,7 +10594,7 @@ uniform_random


 ============
- tensor 
+ tensor
 ============


@@ -10561,7 +10604,7 @@ argmax
 -------------------------------

 .. py:function:: paddle.fluid.layers.argmax(x,axis=0)
-    
+
 **argmax**

 该功能计算输入张量元素中最大元素的索引，张量的元素在提供的轴上。
@@ -10595,7 +10638,7 @@ argmin
 -------------------------------

 .. py:function:: paddle.fluid.layers.argmin(x,axis=0)
-    
+
 **argmin**

 该功能计算输入张量元素中最小元素的索引，张量元素在提供的轴上。
@@ -10614,7 +10657,7 @@ argmin

    out = fluid.layers.argmin(x=in, axis=0)
    out = fluid.layers.argmin(x=in, axis=-1)
-    
+



@@ -10634,7 +10677,7 @@ argsort

 .. code-block:: text

-    例如： 
+    例如：
 	给定 input 并指定 axis=-1

        input = [[0.15849551, 0.45865775, 0.8563702 ],
@@ -10644,7 +10687,7 @@ argsort

        out = [[0.15849551, 0.45865775, 0.8563702 ],
            [0.12070083, 0.18776911, 0.28766365]],
-	
+
 	根据指定axis排序后的数据indices变为:

        indices = [[0, 1, 2],
@@ -10711,7 +10754,7 @@ assign

 .. _cn_api_fluid_layers_cast:

-cast 
+cast
 -------------------------------

 .. py:function:: paddle.fluid.layers.cast(x,dtype)
@@ -10748,7 +10791,7 @@ concat

 .. py:function:: paddle.fluid.layers.concat(input,axis=0,name=None)

-**Concat** 
+**Concat**

 这个函数将输入连接在前面提到的轴上，并将其作为输出返回。

@@ -11028,7 +11071,7 @@ isfinite

 .. _cn_api_fluid_layers_ones:

-ones 
+ones
 -------------------------------

 .. py:function:: paddle.fluid.layers.ones(shape,dtype,force_cpu=False)
@@ -11253,13 +11296,13 @@ zeros


 ============
- learning_rate_scheduler 
+ learning_rate_scheduler
 ============


 .. _cn_api_fluid_layers_append_LARS:

-append_LARS 
+append_LARS
 -------------------------------

 .. py:function:: paddle.fluid.layers.append_LARS(params_grads,learning_rate,weight_decay)
@@ -11318,7 +11361,7 @@ cosine_decay

 .. _cn_api_fluid_layers_exponential_decay:

-exponential_decay 
+exponential_decay
 -------------------------------

 .. py:function:: paddle.fluid.layers.exponential_decay(learning_rate,decay_steps,decay_rate,staircase=False)
@@ -11331,7 +11374,7 @@ exponential_decay
    if staircase == True:
        decayed_learning_rate = learning_rate * decay_rate ^ floor(global_step / decay_steps)
    else:
-        decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)    
+        decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)

 参数：
    - **learning_rate** (Variable|float)-初始学习率
@@ -11500,7 +11543,7 @@ Noam衰减方法。noam衰减的numpy实现如下。
                           np.power(current_steps, -0.5),
                           np.power(warmup_steps, -1.5) * current_steps])

-请参照 `attention is all you need <https://arxiv.org/pdf/1706.03762.pdf>`_ 
+请参照 `attention is all you need <https://arxiv.org/pdf/1706.03762.pdf>`_

 参数：
    - **d_model** (Variable)-模型的输入和输出维度
@@ -11554,7 +11597,7 @@ piecewise_decay

 .. _cn_api_fluid_layers_polynomial_decay:

-polynomial_decay 
+polynomial_decay
 -------------------------------

 .. py:function:: paddle.fluid.layers.polynomial_decay(learning_rate,decay_steps,end_learning_rate=0.0001,power=1.0,cycle=False)
@@ -11592,7 +11635,7 @@ polynomial_decay


 ============
- detection 
+ detection
 ============


@@ -11618,7 +11661,7 @@ anchor_generator

 返回：
    - Anchors(Varibale): 输出anchor，布局[H,W,num_anchors,4] , ``H``  是输入的高度， ``W`` 是输入的宽度， ``num_priors`` 是输入每位的框数,每个anchor格式（未归一化）为(xmin,ymin,xmax,ymax)
-    
+
    - Variances(Variable): anchor的扩展变量布局为 [H,W,num_priors,4]。 ``H`` 是输入的高度， ``W`` 是输入的宽度， ``num_priors`` 是输入每个位置的框数,每个变量的格式为(xcenter,ycenter,w,h)。

 返回类型：Anchors(Variable),Variances(Variable)
@@ -11644,7 +11687,7 @@ anchor_generator


 .. _cn_api_fluid_layers_bipartite_match:
-        
+
 bipartite_match
 -------------------------------

@@ -11686,7 +11729,7 @@ bipartite_match


 .. _cn_api_fluid_layers_box_clip:
-        
+
 box_clip
 -------------------------------

@@ -11714,7 +11757,7 @@ box_clip
    - **im_info (variable)**  – 具有（高度height，宽度width，比例scale）排列的形为[N，3]的图像的信息。高度和宽度是输入大小，比例是输入大小和原始大小的比率
    - **name (str)**  – 该层的名称。 为可选项

-返回：剪切后的tensor 
+返回：剪切后的tensor

 返回类型： Variable

@@ -11963,7 +12006,7 @@ density prior box的量由fixed_sizes and fixed_ratios决定。显然地，fixed


 .. _cn_api_fluid_layers_detection_map:
-        
+
 detection_map
 -------------------------------

@@ -11972,7 +12015,7 @@ detection_map
 检测mAP评估算子。一般步骤如下：首先，根据检测输入和标签计算TP（true positive）和FP（false positive），然后计算mAP评估值。支持'11 point'和积分mAP算法。请从以下文章中获取更多信息：

        https://sanchom.wordpress.com/tag/average-precision/
-        
+
        https://arxiv.org/abs/1512.02325

 参数：
@@ -11986,19 +12029,19 @@ detection_map
        - **input_states** - 如果不是None，它包含3个元素：

            1、pos_count（Tensor）是一个shape为[Ncls，1]的张量，存储每类的输入正例的数量，Ncls是输入分类的数量。此输入用于在执行多个小批量累积计算时传递最初小批量生成的AccumPosCount。当输入（PosCount）为空时，不执行累积计算，仅计算当前小批量的结果。
-        
+
            2、true_pos（LoDTensor）是一个shape为[Ntp，2]的2-D LoDTensor，存储每个类输入的正实例。此输入用于在执行多个小批量累积计算时传递最初小批量生成的AccumPosCount。
-        
+
            3、false_pos（LoDTensor）是一个shape为[Nfp，2]的2-D LoDTensor，存储每个类输入的负实例。此输入用于在执行多个小批量累积计算时传递最初小批量生成的AccumPosCount。
-        
+
        - **out_states** - 如果不是None，它包含3个元素：

-            1、accum_pos_count（Tensor）是一个shape为[Ncls，1]的Tensor，存储每个类的实例数。它结合了输入（PosCount）和从输入中的（Detection）和（label）计算的正例数。 
-        
-            2、accum_true_pos（LoDTensor）是一个shape为[Ntp'，2]的LoDTensor，存储每个类的正实例。它结合了输入（TruePos）和从输入中（Detection）和（label）计算的正实例数。 。 
-        
+            1、accum_pos_count（Tensor）是一个shape为[Ncls，1]的Tensor，存储每个类的实例数。它结合了输入（PosCount）和从输入中的（Detection）和（label）计算的正例数。
+
+            2、accum_true_pos（LoDTensor）是一个shape为[Ntp'，2]的LoDTensor，存储每个类的正实例。它结合了输入（TruePos）和从输入中（Detection）和（label）计算的正实例数。 。
+
            3、accum_false_pos（LoDTensor）是一个shape为[Nfp'，2]的LoDTensor，存储每个类的负实例。它结合了输入（FalsePos）和从输入中（Detection）和（label）计算的负实例数。
-        
+
        - **ap_version** （string，默认'integral'）- AP算法类型，'integral'或'11 point'。

 返回：        具有形状[1]的（Tensor），存储mAP的检测评估结果。
@@ -12168,7 +12211,7 @@ generate_mask_labels
    feeder.feed(batch_masks)


-参数： 
+参数：
    - **im_info**  (Variable) – 具有形状[N，3]的2-D张量。 N是批量大小，其每个元素是图像的[高度，宽度，比例]，对应第二维中的3。图像比例是 :math:`\frac{target\_size}{original\_size}` 。
    - **gt_classes**  (Variable) – 形为[M，1]的2-D LoDTensor。 M是真实值的总数，其每个元素都是一个类标签，对应第二维中的1。
    - **is_crowd**  (Variable) – 一个形为 ``gt_classes`` 的2-D LoDTensor，每个元素都是一个标志，指示一个groundtruth是否为crowd（群）。
@@ -12180,7 +12223,7 @@ generate_mask_labels

 返回：
    - 形为[P，4]的2D LoDTensor。 P是采样出的RoI总数。每个元素都是在原始图像大小范围内具有[xmin，ymin，xmax，ymax]格式的边界框(bounding box)。
-    - mask_rois_has_mask_int32（Variable）：形状为[P，1]的2D LoDTensor，其中每个元素为对于输入的RoI进行输出的mask RoI 索引 
+    - mask_rois_has_mask_int32（Variable）：形状为[P，1]的2D LoDTensor，其中每个元素为对于输入的RoI进行输出的mask RoI 索引
    - mask_int32（Variable）：形状为[P，K * M * M]的2D LoDTensor，K为种类数，M为mask预测的分辨率，每个元素都是二进制目标mask值。

 返回类型：mask_rois (Variable)
@@ -12224,7 +12267,7 @@ generate_proposal_labels

 该函数可以根据 ``GenerateProposals`` 的输出结果，即bounding boxes（区域框），groundtruth（正确标记数据）来对foreground boxes和background boxes进行采样，并计算loss值。

-RpnRois 是RPN的输出box， 并由 ``GenerateProposals`` 来进一步处理, 这些box将与groundtruth boxes合并， 并根据 ``batch_size_per_im`` 和 ``fg_fraction`` 进行采样。 
+RpnRois 是RPN的输出box， 并由 ``GenerateProposals`` 来进一步处理, 这些box将与groundtruth boxes合并， 并根据 ``batch_size_per_im`` 和 ``fg_fraction`` 进行采样。

 如果一个实例具有大于 ``fg_thresh`` (前景重叠阀值)的正确标记重叠，那么它会被认定为一个前景样本。
 如果一个实例具有的正确标记重叠大于 ``bg_thresh_lo`` 且小于 ``bg_thresh_hi`` (详见参数说明)，那么它将被认定为一个背景样本。
@@ -12232,7 +12275,7 @@ RpnRois 是RPN的输出box， 并由 ``GenerateProposals`` 来进一步处理, 

 对Rois中的每个box, 我们给它分配类标签和回归目标(box label)。最后 ``bboxInsideWeights`` 和 ``BboxOutsideWeights`` 用来指明是否它将影响训练loss值。

-参数:	
+参数:
  - **rpn_rois** (Variable) – 形为[N, 4]的二维LoDTensor。 N 为 ``GenerateProposals`` 的输出结果, 其中各元素为 :math:`[x_{min}, y_{min}, x_{max}, y_{max}]` 格式的边界框
  - **gt_classes** (Variable) – 形为[M, 1]的二维LoDTensor。 M 为正确标记数据数目, 其中各元素为正确标记数据的类别标签
  - **is_crowd** (Variable) – 形为[M, 1]的二维LoDTensor。M 为正确标记数据数目, 其中各元素为一个标志位，表明一个正确标记数据是不是crowd
@@ -12265,7 +12308,7 @@ RpnRois 是RPN的输出box， 并由 ``GenerateProposals`` 来进一步处理, 
 generate_proposals
 -------------------------------

-.. py:function:: paddle.fluid.layers.generate_proposals(scores, bbox_deltas, im_info, anchors, variances, pre_nms_top_n=6000, post_nms_top_n=1000, nms_thresh=0.5, min_size=0.1, eta=1.0, name=None) 
+.. py:function:: paddle.fluid.layers.generate_proposals(scores, bbox_deltas, im_info, anchors, variances, pre_nms_top_n=6000, post_nms_top_n=1000, nms_thresh=0.5, min_size=0.1, eta=1.0, name=None)

 生成proposal的Faster-RCNN

@@ -12274,15 +12317,15 @@ generate_proposals
 为了生成 ``proposals`` ，此操作执行以下步骤：

        1、转置和调整bbox_deltas的分数和大小为（H * W * A，1）和（H * W * A，4）。
-        
+
        2、计算方框位置作为 ``proposals`` 候选框。
-        
+
        3、剪辑框图像。
-        
+
        4、删除小面积的预测框。
-        
+
        5、应用NMS以获得最终 ``proposals`` 作为输出。
-        
+
 参数：
        - **scores** (Variable)- 是一个shape为[N，A，H，W]的4-D张量，表示每个框成为object的概率。N是批量大小，A是anchor数，H和W是feature map的高度和宽度。
        - **bbox_deltas** （Variable）- 是一个shape为[N，4 * A，H，W]的4-D张量，表示预测框位置和anchor位置之间的差异。
@@ -12290,9 +12333,9 @@ generate_proposals
        - **anchors** （Variable）- 是一个shape为[H，W，A，4]的4-D Tensor。H和W是 ``feature map`` 的高度和宽度，
        - **num_anchors** - 是每个位置的框的数量。每个anchor都是以非标准化格式（xmin，ymin，xmax，ymax）定义的。
        - **variances** （Variable）- anchor的方差，shape为[H，W，num_priors，4]。每个方差都是（xcenter，ycenter，w，h）这样的格式。
-        - **pre_nms_top_n** （float）- 每个图在NMS之前要保留的总框数。默认为6000。 
-        - **post_nms_top_n** （float）- 每个图在NMS后要保留的总框数。默认为1000。 
-        - **nms_thresh** （float）- NMS中的阈值，默认为0.5。 
+        - **pre_nms_top_n** （float）- 每个图在NMS之前要保留的总框数。默认为6000。
+        - **post_nms_top_n** （float）- 每个图在NMS后要保留的总框数。默认为1000。
+        - **nms_thresh** （float）- NMS中的阈值，默认为0.5。
        - **min_size** （float）- 删除高度或宽度小于min_size的预测框。默认为0.1。
        - **eta** （float）- 在自适应NMS中应用，如果自适应阈值> 0.5，则在每次迭代中使用adaptive_threshold = adaptive_treshold * eta。

@@ -12337,7 +12380,7 @@ iou_similarity


 .. _cn_api_fluid_layers_multi_box_head:
-        
+
 multi_box_head
 -------------------------------

@@ -12379,13 +12422,13 @@ multi_box_head
    - **variances** ： ``PriorBox`` 的方差。布局是[num_priors，4]。 ``num_priors`` 是每个输入位置的总窗口数。

 返回类型：元组（tuple）
-        
+
 **代码示例**

 ..  code-block:: python

        mbox_locs, mbox_confs, box, var = fluid.layers.multi_box_head(
-          inputs=[conv1, conv2, conv3, conv4, conv5, conv5],
+          inputs=[conv1, conv2, conv3, conv4, conv5, conv6],
          image=images,
          num_classes=21,
          min_ratio=20,
@@ -12404,7 +12447,7 @@ multi_box_head
 multiclass_nms
 -------------------------------

-.. py:function:: paddle.fluid.layers.multiclass_nms(bboxes, scores, score_threshold, nms_top_k, keep_top_k, nms_threshold=0.3, normalized=True, nms_eta=1.0, background_label=0, name=None)  
+.. py:function:: paddle.fluid.layers.multiclass_nms(bboxes, scores, score_threshold, nms_top_k, keep_top_k, nms_threshold=0.3, normalized=True, nms_eta=1.0, background_label=0, name=None)

 **多分类NMS**

@@ -12423,10 +12466,10 @@ multiclass_nms

    - **scores**  (Variable) – 支持两种类型的分数：

-      1. （tensor）具有形状[N，C，M]的3-D张量表示预测的置信度。 N是批量大小 batch size，C是种类数目，M是边界框bounding box的数量。对于每个类别，存在对应于M个边界框的总M个分数。请注意，M等于bboxes的第二维。 
+      1. （tensor）具有形状[N，C，M]的3-D张量表示预测的置信度。 N是批量大小 batch size，C是种类数目，M是边界框bounding box的数量。对于每个类别，存在对应于M个边界框的总M个分数。请注意，M等于bboxes的第二维。
      2. （LoDTensor）具有形状[M，C]的2-D LoDTensor。 M是bbox的数量，C是种类数目。在这种情况下，输入bboxes应该是形为[M，C，4]的第二种情况。
-            
-    - **background_label**  (int) – 背景标签（类别）的索引，背景标签（类别）将被忽略。如果设置为-1，则将考虑所有类别。默认值：0 
+
+    - **background_label**  (int) – 背景标签（类别）的索引，背景标签（类别）将被忽略。如果设置为-1，则将考虑所有类别。默认值：0
    - **score_threshold**  (float) – 过滤掉低置信度分数的边界框的阈值。如果没有提供，请考虑所有边界框。
    - **nms_top_k**  (int) – 根据通过score_threshold的过滤后而得的检测(detection)的置信度，所需要保留的最大检测数。
    - **nms_threshold**  (float) – 在NMS中使用的阈值。默认值：0.3 。
@@ -12463,7 +12506,7 @@ multiclass_nms
 polygon_box_transform
 -------------------------------

-.. py:function:: paddle.fluid.layers.polygon_box_transform(input, name=None)  
+.. py:function:: paddle.fluid.layers.polygon_box_transform(input, name=None)

 PolygonBoxTransform 算子。

@@ -12474,7 +12517,7 @@ PolygonBoxTransform 算子。
 参数：
    - **input** （Variable） - shape 为[batch_size，geometry_channels，height，width]的张量

-返回：与输入 shpae 相同
+返回：与输入 shape 相同

 返回类型：output（Variable）

@@ -12490,7 +12533,7 @@ PolygonBoxTransform 算子。

 .. _cn_api_fluid_layers_prior_box:

-prior_box 
+prior_box
 -------------------------------
 .. py:function:: paddle.fluid.layers.prior_box(input,image,min_sizes=None,max_sizes=None,aspect_ratios=[1.0],variance=[0.1,0.1,0.2,0.2],flip=False,clip=False,steps=[0.0,0.0],offset=0.5,name=None,min_max_aspect_ratios_order=False)

@@ -12615,10 +12658,10 @@ rpn_target_assign

 返回:

-返回元组 (predicted_scores, predicted_location, target_label, target_bbox, bbox_inside_weight) : 
-   - **predicted_scores** 和 **predicted_location** 是RPN的预测结果。 **target_label** 和 **target_bbox** 分别是真实准确数据(ground-truth)。 
-   - **predicted_location** 是一个形为[F，4]的2D Tensor， **target_bbox** 的形与 **predicted_location** 相同，F是foreground anchors的数量。 
-   - **predicted_scores** 是一个shape为[F + B，1]的2D Tensor， **target_label** 的形与 **predict_scores** 的形相同，B是background anchors的数量，F和B取决于此算子的输入。 
+返回元组 (predicted_scores, predicted_location, target_label, target_bbox, bbox_inside_weight) :
+   - **predicted_scores** 和 **predicted_location** 是RPN的预测结果。 **target_label** 和 **target_bbox** 分别是真实准确数据(ground-truth)。
+   - **predicted_location** 是一个形为[F，4]的2D Tensor， **target_bbox** 的形与 **predicted_location** 相同，F是foreground anchors的数量。
+   - **predicted_scores** 是一个shape为[F + B，1]的2D Tensor， **target_label** 的形与 **predict_scores** 的形相同，B是background anchors的数量，F和B取决于此算子的输入。
   - **Bbox_inside_weight** 标志着predicted_loction是否为fake_fg（假前景），其形为[F,4]。

 返回类型：        元组(tuple)
@@ -12639,9 +12682,9 @@ rpn_target_assign
        loc_pred, score_pred, loc_target, score_target, bbox_inside_weight=
                fluid.layers.rpn_target_assign(bbox_pred=bbox_pred,
                        cls_logits=cls_logits, anchor_box=anchor_box, gt_boxes=gt_boxes)
-        
-        
-        
+
+
+



@@ -12651,11 +12694,11 @@ rpn_target_assign


 .. _cn_api_fluid_layers_ssd_loss:
-        
+
 ssd_loss
 -------------------------------

-.. py:function:: paddle.fluid.layers.ssd_loss(location, confidence, gt_box, gt_label, prior_box, prior_box_var=None, background_label=0, overlap_threshold=0.5, neg_pos_ratio=3.0, neg_overlap=0.5, loc_loss_weight=1.0, conf_loss_weight=1.0, match_type='per_prediction', mining_type='max_negative', normalize=True, sample_size=None) 
+.. py:function:: paddle.fluid.layers.ssd_loss(location, confidence, gt_box, gt_label, prior_box, prior_box_var=None, background_label=0, overlap_threshold=0.5, neg_pos_ratio=3.0, neg_overlap=0.5, loc_loss_weight=1.0, conf_loss_weight=1.0, match_type='per_prediction', mining_type='max_negative', normalize=True, sample_size=None)

 用于SSD的对象检测算法的多窗口损失层

@@ -12664,13 +12707,13 @@ ssd_loss
 1、通过二分匹配算法查找匹配的边界框。

        1.1、计算真实框与先验框之间的IOU相似度。
-        
+
        1.2、通过二分匹配算法计算匹配的边界框。

 2、计算难分样本的置信度

        2.1、根据匹配的索引获取目标标签。
-        
+
        2.2、计算置信度损失。

 3、应用实例挖掘来获取负示例索引并更新匹配的索引。
@@ -12678,19 +12721,19 @@ ssd_loss
 4、分配分类和回归目标

        4.1、根据前面的框编码bbox。
-        
+
        4.2、分配回归目标。
-        
+
        4.3、分配分类目标。
-        
+
 5、计算总体客观损失。

        5.1计算置信度损失。
-        
+
        5.1计算本地化损失。
-        
+
        5.3计算总体加权损失。
-        
+
 参数：
        - **location** （Variable）- 位置预测是具有形状[N，Np，4]的3D张量，N是批量大小，Np是每个实例的预测总数。 4是坐标值的数量，布局是[xmin，ymin，xmax，ymax]。
        - **confidence**  (Variable) - 置信度预测是具有形状[N，Np，C]，N和Np的3D张量，它们与位置相同，C是类号。
@@ -12734,7 +12777,7 @@ ssd_loss
         gt_label = fluid.layers.data(
                 name='gt_label', shape=[1], lod_level=1, dtype='float32')
         loss = fluid.layers.ssd_loss(loc, scores, gt_box, gt_label, pb, pbv)
-        
+



@@ -12871,7 +12914,7 @@ yolo_box
 yolov3_loss
 -------------------------------

-.. py:function:: paddle.fluid.layers.yolov3_loss(x, gtbox, gtlabel, anchors, anchor_mask, class_num, ignore_thresh, downsample_ratio, gtscore=None, use_label_smooth=True, name=None)
+.. py:function:: paddle.fluid.layers.yolov3_loss(x, gt_box, gt_label, anchors, anchor_mask, class_num, ignore_thresh, downsample_ratio, gt_score=None, use_label_smooth=True, name=None)

 该运算通过给定的预测结果和真实框生成yolov3损失。

@@ -12912,7 +12955,7 @@ yolov3_loss
         $$
         loss = (loss_{xy} + loss_{wh}) * weight_{box} + loss_{conf} + loss_{class}
         $$
-         
+

 当 ``use_label_smooth`` 设置为 ``True`` 时，在计算分类损失时将平滑分类目标，将正样本的目标平滑到1.0-1.0 / class_num，并将负样本的目标平滑到1.0 / class_num。

@@ -12922,15 +12965,15 @@ yolov3_loss

 参数：
    - **x**  (Variable) – YOLOv3损失运算的输入张量，这是一个形状为[N，C，H，W]的四维张量。H和W应该相同，第二维（C）存储框的位置信息，以及每个anchor box的置信度得分和one-hot分类
-    - **gtbox**  (Variable) – 真实框，应该是[N，B，4]的形状。第三维用来承载x、y、w、h，x、y、w、h应该是输入图像相对值。 N是batch size，B是图像中所含有的的最多的box数目
-    - **gtlabel**  (Variable) – 真实框的类id，应该形为[N，B]。
+    - **gt_box**  (Variable) – 真实框，应该是[N，B，4]的形状。第三维用来承载x、y、w、h，x、y、w、h应该是输入图像相对值。 N是batch size，B是图像中所含有的的最多的box数目
+    - **gt_label**  (Variable) – 真实框的类id，应该形为[N，B]。
    - **anchors**  (list|tuple) – 指定anchor框的宽度和高度，它们将逐对进行解析
    - **anchor_mask**  (list|tuple) – 当前YOLOv3损失计算中使用的anchor的mask索引
    - **class_num**  (int) – 要预测的类数
    - **ignore_thresh**  (float) – 一定条件下忽略某框置信度损失的忽略阈值
    - **downsample_ratio**  (int) – 从网络输入到YOLOv3 loss输入的下采样率，因此应为第一，第二和第三个YOLOv3损失运算设置32,16,8
    - **name** (string) – yolov3损失层的命名
-    - **gtscore** （Variable） - 真实框的混合得分，形为[N，B]。 默认None。
+    - **gt_score** （Variable） - 真实框的混合得分，形为[N，B]。 默认None。
    - **use_label_smooth** (bool） - 是否使用平滑标签。 默认为True


@@ -12938,7 +12981,7 @@ yolov3_loss

 返回类型:   变量（Variable）

-抛出异常: 
+抛出异常:
    - ``TypeError``  – yolov3_loss的输入x必须是Variable
    - ``TypeError``  – 输入yolov3_loss的gtbox必须是Variable
    - ``TypeError``  – 输入yolov3_loss的gtlabel必须是None或Variable
@@ -12953,13 +12996,13 @@ yolov3_loss
 .. code-block:: python

    x = fluid.layers.data(name='x', shape=[255, 13, 13], dtype='float32')
-    gtbox = fluid.layers.data(name='gtbox', shape=[6, 4], dtype='float32')
-    gtlabel = fluid.layers.data(name='gtlabel', shape=[6], dtype='int32')
-    gtscore = fluid.layers.data(name='gtscore', shape=[6], dtype='float32')
+    gt_box = fluid.layers.data(name='gtbox', shape=[6, 4], dtype='float32')
+    gt_label = fluid.layers.data(name='gtlabel', shape=[6], dtype='int32')
+    gt_score = fluid.layers.data(name='gtscore', shape=[6], dtype='float32')
    anchors = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326]
    anchor_mask = [0, 1, 2]
-    loss = fluid.layers.yolov3_loss(x=x, gtbox=gtbox, gtlabel=gtlabel,
-                                    gtscore=gtscore, anchors=anchors,
+    loss = fluid.layers.yolov3_loss(x=x, gt_box=gt_box, gt_label=gt_label,
+                                    gt_score=gt_score, anchors=anchors,
                                    anchor_mask=anchor_mask, class_num=80,
                                    ignore_thresh=0.7, downsample_ratio=32)

@@ -12971,7 +13014,7 @@ yolov3_loss


 ============
- metric_op 
+ metric_op
 ============



--- a/doc/fluid/api_cn/profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn.rst
@@ -12,7 +12,7 @@ cuda_profiler
 .. py:function:: paddle.fluid.profiler.cuda_profiler(output_file, output_mode=None, config=None)


-CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行性能分析。分析结果将以键-值对格式或逗号分隔的格式写入output_file。用户可以通过output_mode参数设置输出模式，并通过配置参数设置计数器/选项。默认配置是[' gpustarttimestamp '， ' gpustarttimestamp '， ' gridsize3d '， ' threadblocksize '， ' streamid '， ' enableonstart 0 '， ' conckerneltrace ']。然后，用户可使用 `NVIDIA Visual Profiler <https://developer.nvidia.com/nvidia-visualprofiler>`_ 工具来加载这个输出文件以可视化结果。
+CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行性能分析。分析结果将以键-值对格式或逗号分隔的格式写入output_file。用户可以通过output_mode参数设置输出模式，并通过配置参数设置计数器/选项。默认配置是[' gpustarttimestamp '， ' gpustarttimestamp '， ' gridsize3d '， ' threadblocksize '， ' streamid '， ' enableonstart 0 '， ' conckerneltrace ']。然后，用户可使用 `NVIDIA Visual Profiler <https://developer.nvidia.com/nvidia-visual-profiler>`_ 工具来加载这个输出文件以可视化结果。


 参数:
@@ -28,7 +28,7 @@ CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行


 ..  code-block:: python
-  
+
    import paddle.fluid as fluid
    import paddle.fluid.profiler as profiler

@@ -46,7 +46,7 @@ CUDA分析器。通过CUDA运行时应用程序编程接口对CUDA程序进行
        for i in range(epoc):
            input = np.random.random(dshape).astype('float32')
            exe.run(fluid.default_main_program(), feed={'data': input})
-            
+
    # 之后可以使用 NVIDIA Visual Profile 可视化结果


@@ -67,20 +67,20 @@ profiler
 profile interface 。与cuda_profiler不同，此profiler可用于分析CPU和GPU程序。默认情况下，它记录CPU和GPU kernel，如果想分析其他程序，可以参考教程来在c++代码中添加更多代码。


-如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md <https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md>`_ 
+如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `这里 <../advanced_usage/development/profiling/timeline_cn.html>`_

 参数:
  - **state** (string) –  profiling state, 取值为 'CPU' 或 'GPU',  profiler 使用 CPU timer 或GPU timer 进行 profiling. 虽然用户可能在开始时指定了执行位置(CPUPlace/CUDAPlace)，但是为了灵活性，profiler不会使用这个位置。
  - **sorted_key** (string) – 如果为None，prfile的结果将按照事件的第一次结束时间顺序打印。否则，结果将按标志排序。标志取值为"call"、"total"、"max"、"min" "ave"之一，根据调用着的数量进行排序。total表示按总执行时间排序，max 表示按最大执行时间排序。min 表示按最小执行时间排序。ave表示按平均执行时间排序。
  - **profile_path** (string) –  如果 state == 'All', 结果将写入文件 profile proto.
-  
+
 抛出异常：
  - ``ValueError`` – 如果state 取值不在 ['CPU', 'GPU', 'All']中. 如果 sorted_key 取值不在 ['calls', 'total', 'max', 'min', 'ave']
-  
+
 **代码示例**

 ..  code-block:: python
-    
+
    import paddle.fluid.profiler as profiler

    with profiler.profiler('All', 'total', '/tmp/profile') as prof:
@@ -110,7 +110,7 @@ reset_profiler
 **代码示例**

 ..  code-block:: python
-  
+
    import paddle.fluid.profiler as profiler
    with profiler.profiler(state, 'total', '/tmp/profile'):
    for iter in range(10):
@@ -133,10 +133,10 @@ start_profiler
 .. py:function:: paddle.fluid.profiler.start_profiler(state)

 激活使用 profiler， 用户可以使用 ``fluid.profiler.start_profiler`` 和 ``fluid.profiler.stop_profiler`` 插入代码
-不能使用 ``fluid.profiler.profiler`` 
+不能使用 ``fluid.profiler.profiler``


-如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md <https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/optimization/timeline.md>`_ 
+如果 state== ' All '，在profile_path 中写入文件 profile proto 。该文件记录执行期间的时间顺序信息。然后用户可以看到这个文件的时间轴，请参考 `这里 <../advanced_usage/development/profiling/timeline_cn.html>`_

 参数:
  - **state** (string) – profiling state, 取值为 'CPU' 或 'GPU' 或 'All', 'CPU' 代表只分析 cpu. 'GPU' 代表只分析 GPU . 'All' 会产生 timeline.
@@ -147,7 +147,7 @@ start_profiler
 **代码示例**

 ..  code-block:: python
-    
+
    import paddle.fluid.profiler as profiler

    profiler.start_profiler('GPU')
@@ -174,12 +174,12 @@ stop_profiler
 .. py:function:: paddle.fluid.profiler.stop_profiler(sorted_key=None, profile_path='/tmp/profile')

 停止 profiler， 用户可以使用 ``fluid.profiler.start_profiler`` 和 ``fluid.profiler.stop_profiler`` 插入代码
-不能使用 ``fluid.profiler.profiler`` 
+不能使用 ``fluid.profiler.profiler``

 参数:
  - **sorted_key** (string) – 如果为None，prfile的结果将按照事件的第一次结束时间顺序打印。否则，结果将按标志排序。标志取值为"call"、"total"、"max"、"min" "ave"之一，根据调用着的数量进行排序。total表示按总执行时间排序，max 表示按最大执行时间排序。min 表示按最小执行时间排序。ave表示按平均执行时间排序。
  - **profile_path** (string) - 如果 state == 'All', 结果将写入文件 profile proto.
-  
+

 抛出异常:
  - ``ValueError`` – 如果state 取值不在 ['CPU', 'GPU', 'All']中
@@ -187,7 +187,7 @@ stop_profiler
 **代码示例**

 ..  code-block:: python
-    
+
    import paddle.fluid.profiler as profiler

    profiler.start_profiler('GPU')

--- a/doc/fluid/api_guides/X2Paddle/Caffe-Fluid.rst
+++ b/doc/fluid/api_guides/X2Paddle/Caffe-Fluid.rst
+.. _Caffe-Fluid:
+
+########################
+Caffe-Fluid常用层对应表
+########################
+
+本文档梳理了Caffe常用Layer与PaddlePaddle API对应关系和差异分析。根据文档对应关系，有Caffe使用经验的用户，可根据对应关系，快速熟悉PaddlePaddle的接口使用。  
+
+
+..  csv-table:: 
+    :header: "序号", "Caffe Layer", "Fluid接口", "备注"
+    :widths: 1, 8, 8, 3
+
+    "1",  "`AbsVal <http://caffe.berkeleyvision.org/tutorial/layers/absval.html>`_", ":ref:`cn_api_fluid_layers_abs`",  "功能一致"
+    "2",  "`Accuracy <http://caffe.berkeleyvision.org/tutorial/layers/accuracy.html>`_", ":ref:`cn_api_fluid_layers_accuracy`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Accuracy.md>`_"
+    "3",  "`ArgMax <http://caffe.berkeleyvision.org/tutorial/layers/argmax.html>`_", ":ref:`cn_api_fluid_layers_argmax`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/ArgMax.md>`_"
+    "4",  "`BatchNorm <http://caffe.berkeleyvision.org/tutorial/layers/batchnorm.html>`_", ":ref:`cn_api_fluid_layers_batch_norm`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/BatchNorm.md>`_"
+    "5",  "`BNLL <http://caffe.berkeleyvision.org/tutorial/layers/bnll.html>`_", ":ref:`cn_api_fluid_layers_softplus`",  "功能一致"
+    "6",  "`Concat <http://caffe.berkeleyvision.org/tutorial/layers/concat.html>`_", ":ref:`cn_api_fluid_layers_concat`",  "功能一致"
+    "7",  "`Convolution <http://caffe.berkeleyvision.org/tutorial/layers/convolution.html>`_", ":ref:`cn_api_fluid_layers_conv2d`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Convolution.md>`_"
+    "8",  "`Crop <http://caffe.berkeleyvision.org/tutorial/layers/crop.html>`_", ":ref:`cn_api_fluid_layers_crop`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Crop.md>`_"
+    "9",  "`Deconvolution <http://caffe.berkeleyvision.org/tutorial/layers/deconvolution.html>`_", ":ref:`cn_api_fluid_layers_conv2d_transpose`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Deconvolution.md>`_"
+    "10",  "`Dropout <http://caffe.berkeleyvision.org/tutorial/layers/dropout.html>`_", ":ref:`cn_api_fluid_layers_dropout`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Dropout.md>`_"
+    "11",  "`Eltwise <http://caffe.berkeleyvision.org/tutorial/layers/eltwise.html>`_",  "无相应接口",  "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Eltwise.md>`_"
+    "12",  "`ELU <http://caffe.berkeleyvision.org/tutorial/layers/elu.html>`_", ":ref:`cn_api_fluid_layers_elu`",  "功能一致"
+    "13",  "`EuclideanLoss <http://caffe.berkeleyvision.org/tutorial/layers/euclideanloss.html>`_", ":ref:`cn_api_fluid_layers_square_error_cost`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/EuclideanLoss.md>`_"
+    "14",  "`Exp <http://caffe.berkeleyvision.org/tutorial/layers/exp.html>`_", ":ref:`cn_api_fluid_layers_exp`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Exp.md>`_"
+    "15",  "`Flatten <http://caffe.berkeleyvision.org/tutorial/layers/flatten.html>`_", ":ref:`cn_api_fluid_layers_reshape`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Flatten.md>`_"
+    "16",  "`InnerProduct <http://caffe.berkeleyvision.org/tutorial/layers/innerproduct.html>`_", ":ref:`cn_api_fluid_layers_fc`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/InnerProduct.md>`_"
+    "17",  "`Input <http://caffe.berkeleyvision.org/tutorial/layers/input.html>`_", ":ref:`cn_api_fluid_layers_data`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Input.md>`_"
+    "18",  "`Log <http://caffe.berkeleyvision.org/tutorial/layers/log.html>`_", ":ref:`cn_api_fluid_layers_log`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Log.md>`_"
+    "19",  "`LRN <http://caffe.berkeleyvision.org/tutorial/layers/lrn.html>`_", ":ref:`cn_api_fluid_layers_lrn`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/LRN.md>`_"
+    "20",  "`Pooling <http://caffe.berkeleyvision.org/tutorial/layers/pooling.html>`_", ":ref:`cn_api_fluid_layers_pool2d`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Pooling.md>`_"
+    "21",  "`Power <http://caffe.berkeleyvision.org/tutorial/layers/power.html>`_", ":ref:`cn_api_fluid_layers_pow`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Power.md>`_"
+    "22",  "`PReLU <http://caffe.berkeleyvision.org/tutorial/layers/prelu.html>`_", ":ref:`cn_api_fluid_layers_prelu`",  "功能一致"
+    "23",  "`Reduction <http://caffe.berkeleyvision.org/tutorial/layers/reduction.html>`_",  "无相应接口",  "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Reduction.md>`_"
+    "24",  "`ReLU <http://caffe.berkeleyvision.org/tutorial/layers/relu.html>`_", ":ref:`cn_api_fluid_layers_leaky_relu`",  "功能一致"
+    "25",  "`Reshape <http://caffe.berkeleyvision.org/tutorial/layers/reshape.html>`_", ":ref:`cn_api_fluid_layers_reshape`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Reshape.md>`_"
+    "26",  "`SigmoidCrossEntropyLoss <http://caffe.berkeleyvision.org/tutorial/layers/sigmoidcrossentropyloss.html>`_", ":ref:`cn_api_fluid_layers_sigmoid_cross_entropy_with_logits`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/SigmoidCrossEntropyLoss.md>`_"
+    "27",  "`Sigmoid <http://caffe.berkeleyvision.org/tutorial/layers/sigmoid.html>`_", ":ref:`cn_api_fluid_layers_sigmoid`",  "功能一致"
+    "28",  "`Slice <http://caffe.berkeleyvision.org/tutorial/layers/slice.html>`_", ":ref:`cn_api_fluid_layers_slice`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Slice.md>`_"
+    "29",  "`SoftmaxWithLoss <http://caffe.berkeleyvision.org/tutorial/layers/softmaxwithloss.html>`_", ":ref:`cn_api_fluid_layers_softmax_with_cross_entropy`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/SofmaxWithLoss.md>`_"
+    "30",  "`Softmax <http://caffe.berkeleyvision.org/tutorial/layers/softmax.html>`_", ":ref:`cn_api_fluid_layers_softmax`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Sofmax.md>`_"
+    "31",  "`TanH <http://caffe.berkeleyvision.org/tutorial/layers/tanh.html>`_", ":ref:`cn_api_fluid_layers_tanh`",  "功能一致"
+    "32",  "`Tile <http://caffe.berkeleyvision.org/tutorial/layers/tile.html>`_", ":ref:`cn_api_fluid_layers_expand`",  "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/caffe2fluid/doc/Tile.md>`_"
--- a/doc/fluid/api_guides/X2Paddle/TensorFlow-Fluid.rst
+++ b/doc/fluid/api_guides/X2Paddle/TensorFlow-Fluid.rst
+.. _TensorFlow-Fluid:
+
+###############################
+TensorFlow-Fluid常用接口对应表
+###############################
+
+本文档基于TensorFlow v1.13梳理了常用API与PaddlePaddle API对应关系和差异分析。根据文档对应关系，有TensorFlow使用经验的用户，可根据对应关系，快速熟悉PaddlePaddle的接口使用。 
+
+..  csv-table:: 
+    :header: "序号", "TensorFlow接口", "Fluid接口", "备注"
+    :widths: 1, 8, 8, 3
+
+    "1", "`tf.abs <https://www.tensorflow.org/api_docs/python/tf/abs>`_", ":ref:`cn_api_fluid_layers_abs`", "功能一致"
+    "2", "`tf.add <https://www.tensorflow.org/api_docs/python/tf/add>`_", ":ref:`cn_api_fluid_layers_elementwise_add`", "功能一致"
+    "3", "`tf.argmax <https://www.tensorflow.org/api_docs/python/tf/argmax>`_", ":ref:`cn_api_fluid_layers_argmax`", "功能一致"
+    "4", "`tf.argmin <https://www.tensorflow.org/api_docs/python/tf/argmin>`_", ":ref:`cn_api_fluid_layers_argmin`", "功能一致"
+    "5", "`tf.assign <https://www.tensorflow.org/api_docs/python/tf/assign>`_", ":ref:`cn_api_fluid_layers_assign`", "功能一致"
+    "6", "`tf.assign_add <https://www.tensorflow.org/api_docs/python/tf/assign_add>`_", ":ref:`cn_api_fluid_layers_increment`", "功能一致"
+    "7", "`tf.case <https://www.tensorflow.org/api_docs/python/tf/case>`_", ":ref:`cn_api_fluid_layers_Switch`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.case.md>`_"
+    "8", "`tf.cast <https://www.tensorflow.org/api_docs/python/tf/cast>`_", ":ref:`cn_api_fluid_layers_cast`", "功能一致"
+    "9", "`tf.clip_by_global_norm <https://www.tensorflow.org/api_docs/python/tf/clip_by_global_norm>`_", ":ref:`cn_api_fluid_clip_GradientClipByGlobalNorm`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.clip_by_global_norm.md>`_"
+    "10", "`tf.clip_by_norm <https://www.tensorflow.org/api_docs/python/tf/clip_by_norm>`_", ":ref:`cn_api_fluid_layers_clip_by_norm`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.clip_by_norm.md>`_"
+    "11", "`tf.clip_by_value <https://www.tensorflow.org/api_docs/python/tf/clip_by_value>`_", ":ref:`cn_api_fluid_layers_clip`", "功能一致"
+    "12", "`tf.concat <https://www.tensorflow.org/api_docs/python/tf/concat>`_", ":ref:`cn_api_fluid_layers_concat`", "功能一致"
+    "13", "`tf.cond <https://www.tensorflow.org/api_docs/python/tf/cond>`_", ":ref:`cn_api_fluid_layers_ifElse`", "功能一致"
+    "14", "`tf.constant <https://www.tensorflow.org/api_docs/python/tf/constant>`_", ":ref:`cn_api_fluid_layers_fill_constant`", "功能一致"
+    "15", "`tf.contrib.layers.batch_norm <https://www.tensorflow.org/api_docs/python/tf/contrib/layers/batch_norm>`_", ":ref:`cn_api_fluid_layers_batch_norm`", "功能一致"
+    "16", "`tf.contrib.layers.flatten <https://www.tensorflow.org/api_docs/python/tf/contrib/layers/flatten>`_", ":ref:`cn_api_fluid_layers_flatten`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.contrib.layers.flatten.md>`_"
+    "17", "`tf.contrib.layers.fully_connected <https://www.tensorflow.org/api_docs/python/tf/contrib/layers/fully_connected>`_", ":ref:`cn_api_fluid_layers_fc`", "功能一致"
+    "18", "`tf.contrib.layers.one_hot_encoding <https://www.tensorflow.org/api_docs/python/tf/contrib/layers/one_hot_encoding>`_", ":ref:`cn_api_fluid_layers_one_hot`", "功能一致"
+    "19", "`tf.contrib.layers.softmax <https://www.tensorflow.org/api_docs/python/tf/contrib/layers/softmax>`_", ":ref:`cn_api_fluid_layers_softmax`", "功能一致"
+    "20", "`tf.contrib.layers.xavier_initializer <https://www.tensorflow.org/api_docs/python/tf/contrib/layers/xavier_initializer>`_", ":ref:`cn_api_fluid_initializer_Xavier`", "功能一致"
+    "21", "`tf.contrib.rnn.GRUCell <https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/GRUCell>`_", ":ref:`cn_api_fluid_layers_gru_unit`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.contrib.rnn.GRUCell.md>`_"
+    "22", "`tf.contrib.rnn.MultiRNNCell <https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell>`_", "无相应接口", "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.rnn_cell.MultiRNNCell.md>`_"
+    "23", "`tf.contrib.rnn.static_rnn <https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/static_rnn>`_", ":ref:`cn_api_fluid_layers_DynamicRNN`", "功能一致"
+    "24", "`tf.convert_to_tensor <https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor>`_", ":ref:`cn_api_fluid_layers_assign`", "功能一致"
+    "25", "`tf.cos <https://www.tensorflow.org/api_docs/python/tf/cos>`_", ":ref:`cn_api_fluid_layers_cos`", "功能一致"
+    "26", "`tf.div <https://www.tensorflow.org/api_docs/python/tf/div>`_", ":ref:`cn_api_fluid_layers_elementwise_div`", "功能一致"
+    "27", "`tf.divide <https://www.tensorflow.org/api_docs/python/tf/divide>`_", ":ref:`cn_api_fluid_layers_elementwise_div`", "功能一致"
+    "28", "`tf.dropout <https://www.tensorflow.org/api_docs/python/tf/dropout>`_", ":ref:`cn_api_fluid_layers_dropout`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.dropout.md>`_"
+    "29", "`tf.equal <https://www.tensorflow.org/api_docs/python/tf/equal>`_", "`运算符== <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/compare_op.md>`_", "功能一致"
+    "30", "`tf.exp <https://www.tensorflow.org/api_docs/python/tf/exp>`_", ":ref:`cn_api_fluid_layers_exp`", "功能一致"
+    "31", "`tf.expand_dims <https://www.tensorflow.org/api_docs/python/tf/expand_dims>`_", ":ref:`cn_api_fluid_layers_unsqueeze`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.expand_dims.md>`_"
+    "32", "`tf.fill <https://www.tensorflow.org/api_docs/python/tf/fill>`_", ":ref:`cn_api_fluid_layers_fill_constant`", "功能一致"
+    "33", "`tf.floor <https://www.tensorflow.org/api_docs/python/tf/floor>`_", ":ref:`cn_api_fluid_layers_floor`", "功能一致"
+    "34", "`tf.gather <https://www.tensorflow.org/api_docs/python/tf/gather>`_", ":ref:`cn_api_fluid_layers_gather`", "功能一致"
+    "35", "`tf.greater <https://www.tensorflow.org/api_docs/python/tf/greater>`_", "`运算符> <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/compare_op.md>`_", "功能一致"
+    "36", "`tf.greater_equal <https://www.tensorflow.org/api_docs/python/tf/greater_equal>`_", "`运算符>= <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/compare_op.md>`_", "功能一致"
+    "37", "`tf.image.non_max_suppression <https://www.tensorflow.org/api_docs/python/tf/image/non_max_suppression>`_", ":ref:`cn_api_fluid_layers_multiclass_nms`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.image.non_max_suppression.md>`_"
+    "38", "`tf.image.resize_bilinear <https://www.tensorflow.org/api_docs/python/tf/image/resize_bilinear>`_", ":ref:`cn_api_fluid_layers_resize_bilinear`", "功能一致"
+    "39", "`tf.image.resize_images <https://www.tensorflow.org/api_docs/python/tf/image/resize_images>`_", ":ref:`cn_api_fluid_layers_image_resize`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.image.resize_images.md>`_"
+    "40", "`tf.image.resize_nearest_neighbor <https://www.tensorflow.org/api_docs/python/tf/image/resize_nearest_neighbor>`_", ":ref:`cn_api_fluid_layers_resize_nearest`", "功能一致"
+    "41", "`tf.is_finite <https://www.tensorflow.org/api_docs/python/tf/is_finite>`_", ":ref:`cn_api_fluid_layers_isfinite`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.math.is_finite.md>`_"
+    "42", "`tf.layers.batch_normalization <https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization>`_", ":ref:`cn_api_fluid_layers_batch_norm`", "功能一致"
+    "43", "`tf.layers.conv2d <https://www.tensorflow.org/api_docs/python/tf/layers/conv2d>`_", ":ref:`cn_api_fluid_layers_conv2d`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.layers.conv2d.md>`_"
+    "44", "`tf.layers.dense <https://www.tensorflow.org/api_docs/python/tf/layers/dense>`_", ":ref:`cn_api_fluid_layers_fc`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.layers.dense.md>`_"
+    "45", "`tf.layers.dropout <https://www.tensorflow.org/api_docs/python/tf/layers/dropout>`_", ":ref:`cn_api_fluid_layers_dropout`", "功能一致"
+    "46", "`tf.layers.Dropout <https://www.tensorflow.org/api_docs/python/tf/layers/Dropout>`_", ":ref:`cn_api_fluid_layers_dropout`", "功能一致"
+    "47", "`tf.layers.flatten <https://www.tensorflow.org/api_docs/python/tf/layers/flatten>`_", ":ref:`cn_api_fluid_layers_flatten`", "功能一致"
+    "48", "`tf.less <https://www.tensorflow.org/api_docs/python/tf/less>`_", "`运算符< <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/compare_op.md>`_", "功能一致"
+    "49", "`tf.less_equal <https://www.tensorflow.org/api_docs/python/tf/less_equal>`_", "`运算符<= <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/compare_op.md>`_", "功能一致"
+    "50", "`tf.log <https://www.tensorflow.org/api_docs/python/tf/log>`_", ":ref:`cn_api_fluid_layers_log`", "功能一致"
+    "51", "`tf.logical_and <https://www.tensorflow.org/api_docs/python/tf/logical_and>`_", ":ref:`cn_api_fluid_layers_logical_and`", "功能一致"
+    "52", "`tf.logical_not <https://www.tensorflow.org/api_docs/python/tf/logical_not>`_", ":ref:`cn_api_fluid_layers_logical_not`", "功能一致"
+    "53", "`tf.logical_or <https://www.tensorflow.org/api_docs/python/tf/logical_or>`_", ":ref:`cn_api_fluid_layers_logical_or`", "功能一致"
+    "54", "`tf.losses.mean_squared_error <https://www.tensorflow.org/api_docs/python/tf/losses/mean_squared_error>`_", ":ref:`cn_api_fluid_layers_square_error_cost`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.losses.mean_and_squared_error.md>`_"
+    "55", "`tf.losses.sigmoid_cross_entropy <https://www.tensorflow.org/api_docs/python/tf/losses/sigmoid_cross_entropy>`_", ":ref:`cn_api_fluid_layers_sigmoid_cross_entropy_with_logits`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.losses.sigmoid_cross_entropy.md>`_"
+    "56", "`tf.losses.softmax_cross_entropy <https://www.tensorflow.org/api_docs/python/tf/losses/softmax_cross_entropy>`_", ":ref:`cn_api_fluid_layers_softmax_with_cross_entropy`", "功能一致"
+    "57", "`tf.matmul <https://www.tensorflow.org/api_docs/python/tf/matmul>`_", ":ref:`cn_api_fluid_layers_matmul`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.matmul.md>`_"
+    "58", "`tf.maximum <https://www.tensorflow.org/api_docs/python/tf/maximum>`_", ":ref:`cn_api_fluid_layers_elementwise_max`", "功能一致"
+    "59", "`tf.metrics.accuracy <https://www.tensorflow.org/api_docs/python/tf/metrics/accuracy>`_", ":ref:`cn_api_fluid_layers_accuracy`", "功能一致"
+    "60", "`tf.metrics.mean <https://www.tensorflow.org/api_docs/python/tf/metrics/mean>`_", ":ref:`cn_api_fluid_layers_mean`", "功能一致"
+    "61", "`tf.minimum <https://www.tensorflow.org/api_docs/python/tf/minimum>`_", ":ref:`cn_api_fluid_layers_elementwise_min`", "功能一致"
+    "62", "`tf.multiply <https://www.tensorflow.org/api_docs/python/tf/multiply>`_", ":ref:`cn_api_fluid_layers_elementwise_mul`", "功能一致"
+    "63", "`tf.nn.avg_pool <https://www.tensorflow.org/api_docs/python/tf/nn/avg_pool>`_", ":ref:`cn_api_fluid_layers_pool2d`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.avg_pool.md>`_"
+    "64", "`tf.nn.batch_normalization <https://www.tensorflow.org/api_docs/python/tf/nn/batch_normalization>`_", ":ref:`cn_api_fluid_layers_batch_norm`", "功能一致"
+    "65", "`tf.nn.bidirectional_dynamic_rnn <https://www.tensorflow.org/api_docs/python/tf/nn/bidirectional_dynamic_rnn>`_", "无相应接口", "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.bidirectional_dynamic_rnn.md>`_"
+    "66", "`tf.nn.conv2d <https://www.tensorflow.org/api_docs/python/tf/nn/conv2d>`_", ":ref:`cn_api_fluid_layers_conv2d`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.conv2d.md>`_"
+    "67", "`tf.nn.conv2d_transpose <https://www.tensorflow.org/api_docs/python/tf/nn/conv2d_transpose>`_", ":ref:`cn_api_fluid_layers_conv2d_transpose`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.conv2d_transpose.md>`_"
+    "68", "`tf.nn.conv3d_transpose <https://www.tensorflow.org/api_docs/python/tf/nn/conv3d_transpose>`_", ":ref:`cn_api_fluid_layers_conv3d_transpose`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.conv3d_transpose.md>`_"
+    "69", "`tf.nn.depthwise_conv2d <https://www.tensorflow.org/api_docs/python/tf/nn/depthwise_conv2d>`_", ":ref:`cn_api_fluid_layers_conv2d`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.depthwise_conv2d.md>`_"
+    "70", "`tf.nn.dynamic_rnn <https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn>`_", ":ref:`cn_api_fluid_layers_DynamicRNN`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.dynamic_rnn.md>`_"
+    "71", "`tf.nn.l2_normalize <https://www.tensorflow.org/api_docs/python/tf/nn/l2_normalize>`_", ":ref:`cn_api_fluid_layers_l2_normalize`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.l2_normalize.md>`_"
+    "72", "`tf.nn.leaky_relu <https://www.tensorflow.org/api_docs/python/tf/nn/leaky_relu>`_", ":ref:`cn_api_fluid_layers_leaky_relu`", "功能一致"
+    "73", "`tf.nn.lrn <https://www.tensorflow.org/api_docs/python/tf/nn/lrn>`_", ":ref:`cn_api_fluid_layers_lrn`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.lrn.md>`_"
+    "74", "`tf.nn.max_pool <https://www.tensorflow.org/api_docs/python/tf/nn/max_pool>`_", ":ref:`cn_api_fluid_layers_pool2d`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.max_pool.md>`_"
+    "75", "`tf.nn.relu <https://www.tensorflow.org/api_docs/python/tf/nn/relu>`_", ":ref:`cn_api_fluid_layers_relu`", "功能一致"
+    "76", "`tf.nn.relu6 <https://www.tensorflow.org/api_docs/python/tf/nn/relu6>`_", ":ref:`cn_api_fluid_layers_relu6`", "功能一致"
+    "77", "`tf.nn.rnn_cell.LSTMCell <https://www.tensorflow.org/api_docs/python/tf/nn/rnn_cell/LSTMCell>`_", ":ref:`cn_api_fluid_layers_lstm_unit`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.rnn_cell.LSTMCell.md>`_"
+    "78", "`tf.nn.separable_conv2d <https://www.tensorflow.org/api_docs/python/tf/nn/separable_conv2d>`_", "无相应接口", "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.separable_conv2d.md>`_"
+    "79", "`tf.nn.sigmoid <https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid>`_", ":ref:`cn_api_fluid_layers_sigmoid`", "功能一致"
+    "80", "`tf.nn.sigmoid_cross_entropy_with_logits <https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits>`_", ":ref:`cn_api_fluid_layers_sigmoid_cross_entropy_with_logits`", "功能一致"
+    "81", "`tf.nn.softmax <https://www.tensorflow.org/api_docs/python/tf/nn/softmax>`_", ":ref:`cn_api_fluid_layers_softmax`", "功能一致"
+    "82", "`tf.nn.softmax_cross_entropy_with_logits <https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits>`_", ":ref:`cn_api_fluid_layers_softmax_with_cross_entropy`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.softmax_cross_entropy_with_logits.md>`_"
+    "83", "`tf.nn.softplus <https://www.tensorflow.org/api_docs/python/tf/nn/softplus>`_", ":ref:`cn_api_fluid_layers_softplus`", "功能一致"
+    "84", "`tf.nn.softsign <https://www.tensorflow.org/api_docs/python/tf/nn/softsign>`_", ":ref:`cn_api_fluid_layers_softsign`", "功能一致"
+    "85", "`tf.nn.tanh <https://www.tensorflow.org/api_docs/python/tf/nn/tanh>`_", ":ref:`cn_api_fluid_layers_tanh`", "功能一致"
+    "86", "`tf.one_hot <https://www.tensorflow.org/api_docs/python/tf/one_hot>`_", ":ref:`cn_api_fluid_layers_one_hot`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.one_hot.md>`_"
+    "87", "`tf.ones <https://www.tensorflow.org/api_docs/python/tf/ones>`_", ":ref:`cn_api_fluid_layers_ones`", "功能一致"
+    "88", "`tf.ones_initializer <https://www.tensorflow.org/api_docs/python/tf/ones_initializer>`_", ":ref:`cn_api_fluid_initializer_Constant`", "功能一致"
+    "89", "`tf.pad <https://www.tensorflow.org/api_docs/python/tf/pad>`_", ":ref:`cn_api_fluid_layers_pad`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.pad.md>`_"
+    "90", "`tf.placeholder <https://www.tensorflow.org/api_docs/python/tf/placeholder>`_", ":ref:`cn_api_fluid_layers_data`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.placeholder.md>`_"
+    "91", "`tf.pow <https://www.tensorflow.org/api_docs/python/tf/pow>`_", ":ref:`cn_api_fluid_layers_pow`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.pow.md>`_"
+    "92", "`tf.print <https://www.tensorflow.org/api_docs/python/tf/print>`_", ":ref:`cn_api_fluid_layers_print`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.print.md>`_"
+    "93", "`tf.py_func <https://www.tensorflow.org/api_docs/python/tf/py_func>`_", ":ref:`cn_api_fluid_layers_py_func`", "功能一致"
+    "94", "`tf.random_normal <https://www.tensorflow.org/api_docs/python/tf/random_normal>`_", ":ref:`cn_api_fluid_layers_gaussian_random`", "功能一致"
+    "95", "`tf.random_normal_initializer <https://www.tensorflow.org/api_docs/python/tf/random_normal_initializer>`_", ":ref:`cn_api_fluid_initializer_Normal`", "功能一致"
+    "96", "`tf.random_uniform <https://www.tensorflow.org/api_docs/python/tf/random_uniform>`_", ":ref:`cn_api_fluid_layers_uniform_random`", "功能一致"
+    "97", "`tf.random_uniform_initializer <https://www.tensorflow.org/api_docs/python/tf/random_uniform_initializer>`_", ":ref:`cn_api_fluid_initializer_UniformInitializer`", "功能一致"
+    "98", "`tf.reduce_logsumexp <https://www.tensorflow.org/api_docs/python/tf/reduce_logsumexp>`_", "无相应接口", "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.reduce_logsumexp.md>`_"
+    "99", "`tf.reduce_max <https://www.tensorflow.org/api_docs/python/tf/reduce_max>`_", ":ref:`cn_api_fluid_layers_reduce_max`", "功能一致"
+    "100", "`tf.reduce_mean <https://www.tensorflow.org/api_docs/python/tf/reduce_mean>`_", ":ref:`cn_api_fluid_layers_reduce_mean`", "功能一致"
+    "101", "`tf.reduce_min <https://www.tensorflow.org/api_docs/python/tf/reduce_min>`_", ":ref:`cn_api_fluid_layers_reduce_min`", "功能一致"
+    "102", "`tf.reduce_sum <https://www.tensorflow.org/api_docs/python/tf/reduce_sum>`_", ":ref:`cn_api_fluid_layers_reduce_sum`", "功能一致"
+    "103", "`tf.reshape <https://www.tensorflow.org/api_docs/python/tf/reshape>`_", ":ref:`cn_api_fluid_layers_reshape`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.reshape.md>`_"
+    "104", "`tf.reverse <https://www.tensorflow.org/api_docs/python/tf/reverse>`_", ":ref:`cn_api_fluid_layers_reverse`", "功能一致"
+    "105", "`tf.reverse_sequence <https://www.tensorflow.org/api_docs/python/tf/reverse_sequence>`_", ":ref:`cn_api_fluid_layers_sequence_reverse`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.reverse_sequence.md>`_"
+    "106", "`tf.reverse_v2 <https://www.tensorflow.org/api_docs/python/tf/reverse_v2>`_", ":ref:`cn_api_fluid_layers_reverse`", "功能一致"
+    "107", "`tf.round <https://www.tensorflow.org/api_docs/python/tf/round>`_", ":ref:`cn_api_fluid_layers_round`", "功能一致"
+    "108", "`tf.rsqrt <https://www.tensorflow.org/api_docs/python/tf/rsqrt>`_", "无相应接口", "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.math.rsqrt.md>`_"
+    "109", "`tf.scalar_mul <https://www.tensorflow.org/api_docs/python/tf/scalar_mul>`_", ":ref:`cn_api_fluid_layers_scale`", "功能一致"
+    "110", "`tf.scatter_update <https://www.tensorflow.org/api_docs/python/tf/scatter_update>`_", ":ref:`cn_api_fluid_layers_scatter`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.scatter_update.md>`_"
+    "111", "`tf.sequence_mask <https://www.tensorflow.org/api_docs/python/tf/sequence_mask>`_", ":ref:`cn_api_fluid_layers_sequence_mask`", "功能一致"
+    "112", "`tf.shape <https://www.tensorflow.org/api_docs/python/tf/shape>`_", ":ref:`cn_api_fluid_layers_shape`", "功能一致"
+    "113", "`tf.sigmoid <https://www.tensorflow.org/api_docs/python/tf/sigmoid>`_", ":ref:`cn_api_fluid_layers_sigmoid`", "功能一致"
+    "114", "`tf.sin <https://www.tensorflow.org/api_docs/python/tf/sin>`_", ":ref:`cn_api_fluid_layers_sin`", "功能一致"
+    "115", "`tf.slice <https://www.tensorflow.org/api_docs/python/tf/slice>`_", ":ref:`cn_api_fluid_layers_slice`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.slice.md>`_"
+    "116", "`tf.split <https://www.tensorflow.org/api_docs/python/tf/split>`_", ":ref:`cn_api_fluid_layers_split`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.split.md>`_"
+    "117", "`tf.sqrt <https://www.tensorflow.org/api_docs/python/tf/sqrt>`_", ":ref:`cn_api_fluid_layers_sqrt`", "功能一致"
+    "118", "`tf.square <https://www.tensorflow.org/api_docs/python/tf/square>`_", ":ref:`cn_api_fluid_layers_square`", "功能一致"
+    "119", "`tf.squared_difference <https://www.tensorflow.org/api_docs/python/tf/squared_difference>`_", "无相应接口", "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.squared_difference.md>`_"
+    "120", "`tf.squeeze <https://www.tensorflow.org/api_docs/python/tf/squeeze>`_", ":ref:`cn_api_fluid_layers_squeeze`", "功能一致"
+    "121", "`tf.stack <https://www.tensorflow.org/api_docs/python/tf/stack>`_", ":ref:`cn_api_fluid_layers_stack`", "功能一致"
+    "122", "`tf.stop_gradient <https://www.tensorflow.org/api_docs/python/tf/stop_gradient>`_", "无相应接口", "`Fluid实现 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.stop_gradient.md>`_"
+    "123", "`tf.subtract <https://www.tensorflow.org/api_docs/python/tf/subtract>`_", ":ref:`cn_api_fluid_layers_elementwise_sub`", "功能一致"
+    "124", "`tf.tanh <https://www.tensorflow.org/api_docs/python/tf/tanh>`_", ":ref:`cn_api_fluid_layers_tanh`", "功能一致"
+    "125", "`tf.tile <https://www.tensorflow.org/api_docs/python/tf/tile>`_", ":ref:`cn_api_fluid_layers_expand`", "功能一致"
+    "126", "`tf.top_k <https://www.tensorflow.org/api_docs/python/tf/top_k>`_", ":ref:`cn_api_fluid_layers_topk`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.nn.top_k.md>`_"
+    "127", "`tf.train.AdagradOptimizer <https://www.tensorflow.org/api_docs/python/tf/train/AdagradOptimizer>`_", ":ref:`cn_api_fluid_optimizer_AdagradOptimizer`", "功能一致"
+    "128", "`tf.train.AdamOptimizer <https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer>`_", ":ref:`cn_api_fluid_optimizer_Adam`", "功能一致"
+    "129", "`tf.train.exponential_decay <https://www.tensorflow.org/api_docs/python/tf/train/exponential_decay>`_", ":ref:`cn_api_fluid_layers_exponential_decay`", "功能一致"
+    "130", "`tf.train.GradientDescentOptimizer <https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer>`_", ":ref:`cn_api_fluid_optimizer_SGDOptimizer`", "功能一致"
+    "131", "`tf.train.MomentumOptimizer <https://www.tensorflow.org/api_docs/python/tf/train/MomentumOptimizer>`_", ":ref:`cn_api_fluid_optimizer_MomentumOptimizer`", "功能一致"
+    "132", "`tf.train.polynomial_decay <https://www.tensorflow.org/api_docs/python/tf/train/polynomial_decay>`_", ":ref:`cn_api_fluid_layers_polynomial_decay`", "功能一致"
+    "133", "`tf.train.RMSPropOptimizer <https://www.tensorflow.org/api_docs/python/tf/train/RMSPropOptimizer>`_", ":ref:`cn_api_fluid_optimizer_RMSPropOptimizer`", "功能一致"
+    "134", "`tf.transpose <https://www.tensorflow.org/api_docs/python/tf/transpose>`_", ":ref:`cn_api_fluid_layers_transpose`", "功能一致"
+    "135", "`tf.truediv <https://www.tensorflow.org/api_docs/python/tf/truediv>`_", ":ref:`cn_api_fluid_layers_elementwise_div`", "功能一致"
+    "136", "`tf.truncated_normal <https://www.tensorflow.org/api_docs/python/tf/truncated_normal>`_", ":ref:`cn_api_fluid_initializer_TruncatedNormal`", "功能一致"
+    "137", "`tf.truncated_normal_initializer <https://www.tensorflow.org/api_docs/python/tf/truncated_normal_initializer>`_", ":ref:`cn_api_fluid_initializer_TruncatedNormal`", "功能一致"
+    "138", "`tf.unstack <https://www.tensorflow.org/api_docs/python/tf/unstack>`_", ":ref:`cn_api_fluid_layers_unstack`", "功能一致"
+    "139", "`tf.Variable <https://www.tensorflow.org/api_docs/python/tf/Variable>`_", ":ref:`cn_api_fluid_layers_create_parameter`", "功能一致"
+    "140", "`tf.while_loop <https://www.tensorflow.org/api_docs/python/tf/while_loop>`_", ":ref:`cn_api_fluid_layers_While`", "`差异对比 <https://github.com/PaddlePaddle/X2Paddle/blob/master/tensorflow2fluid/doc/tf.while_loop.md>`_"
+    "141", "`tf.zeros <https://www.tensorflow.org/api_docs/python/tf/zeros>`_", ":ref:`cn_api_fluid_layers_zeros`", "功能一致"
+    "142", "`tf.zeros_initializer <https://www.tensorflow.org/api_docs/python/tf/zeros_initializer>`_", ":ref:`cn_api_fluid_initializer_Constant`", "功能一致"
--- a/doc/fluid/api_guides/high_low_level_api.md
+++ b/doc/fluid/api_guides/high_low_level_api.md
-## High/Low-level API简介
-
-PaddlePaddle Fluid目前有2套API接口：
-
- Low-level（底层） API：
-	
-	- 灵活性强并且已经相对成熟，使用它训练的模型，能直接支持C++预测上线。
-	- 提供了大量的模型作为使用示例，包括[Book](https://github.com/PaddlePaddle/book)中的全部章节，以及[models](https://github.com/PaddlePaddle/models)中的所有章节。
-	- 适用人群：对深度学习有一定了解，需要自定义网络进行训练/预测/上线部署的用户。
-
- High-level（高层）API：
-	
-	- 使用简单
-	- 尚未成熟，接口暂时在[paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib)下面。
--- a/doc/fluid/api_guides/high_low_level_api_en.md
+++ b/doc/fluid/api_guides/high_low_level_api_en.md
-## Introduction to High/Low-level API
-
-Currently PaddlePaddle Fluid has 2 branches of API interfaces:
-
- Low-level API:
-
-	- It is highly flexible and relatively mature. The model trained by it can directly support C++ inference deployment and release.
-	- There are a large number of models as examples, including all chapters in [book](https://github.com/PaddlePaddle/book), and [models](https://github.com/PaddlePaddle/models).
-	- Recommended for users who have a certain understanding of deep learning and need to customize a network for training/inference/online deployment.
-
- High-level API:
-
-	- Simple to use
-    - Still under development. the interface is temporarily in [paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib).
\ No newline at end of file
--- a/doc/fluid/api_guides/index.rst
+++ b/doc/fluid/api_guides/index.rst
 ===========
-API使用指南
+API分类检索
 ===========

-API使用指南分功能向您介绍PaddlePaddle Fluid的API体系和用法，帮助您快速了解PaddlePaddle Fluid API的全貌，包括以下几个模块：
+本模块分功能向您介绍PaddlePaddle Fluid的API体系和用法，提高您的查找效率，帮助您快速了解PaddlePaddle Fluid API的全貌，包括以下几个模块：

 ..  toctree::
    :maxdepth: 1

-    high_low_level_api.md
+    low_level/program.rst
    low_level/layers/index.rst
-    low_level/executor.rst
+    low_level/nets.rst
    low_level/optimizer.rst
+    low_level/backward.rst
    low_level/metrics.rst
    low_level/model_save_reader.rst
    low_level/inference.rst
-    low_level/distributed/index.rst
    low_level/memory_optimize.rst
-    low_level/nets.rst
+    low_level/executor.rst
    low_level/parallel_executor.rst
-    low_level/backward.rst
+    low_level/compiled_program.rst
    low_level/parameter.rst
-    low_level/program.rst
+    low_level/distributed/index.rst
+    X2Paddle/TensorFlow-Fluid.rst
+    X2Paddle/Caffe-Fluid.rst
--- a/doc/fluid/api_guides/index_en.rst
+++ b/doc/fluid/api_guides/index_en.rst
-===========
-API Guides
-===========
+=================
+API Quick Search
+=================

 This section introduces the Fluid API structure and usage, to help you quickly get the full picture of the PaddlePaddle Fluid API. This section is divided into the following modules:

 ..  toctree::
    :maxdepth: 1

-    high_low_level_api_en.md
+    low_level/program_en.rst
    low_level/layers/index_en.rst
-    low_level/executor_en.rst
+    low_level/nets_en.rst
    low_level/optimizer_en.rst
+    low_level/backward_en.rst
    low_level/metrics_en.rst
    low_level/model_save_reader_en.rst
    low_level/inference_en.rst
-    low_level/distributed/index_en.rst
    low_level/memory_optimize_en.rst
-    low_level/nets_en.rst
+    low_level/executor_en.rst
    low_level/parallel_executor_en.rst
    low_level/compiled_program_en.rst
-    low_level/backward_en.rst
    low_level/parameter_en.rst
-    low_level/program_en.rst
+    low_level/distributed/index_en.rst
--- a/doc/fluid/api_guides/low_level/compiled_program_cn.rst
+++ b/doc/fluid/api_guides/low_level/compiled_program_cn.rst
--- a/doc/fluid/api_guides/low_level/distributed/async_training.rst
+++ b/doc/fluid/api_guides/low_level/distributed/async_training.rst
@@ -4,7 +4,7 @@
 分布式异步训练
 ############

-Fluid支持数据并行的分布式异步训练，API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的
+Fluid支持数据并行的分布式异步训练，API使用 :code:`DistributeTranspiler` 将单机网络配置转换成可以多机执行的
 :code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码，根据环境变量或启动参数，
 可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid异步训练只支持pserver模式，异步训练和 `同步训练 <../distributed/sync_training.html>`_ 的主要差异在于：异步训练每个trainer的梯度是单独更新到参数上的，
 而同步训练是所有trainer的梯度合并之后统一更新到参数上，因此，同步训练和异步训练的超参数需要分别调节。
@@ -16,17 +16,17 @@ API详细使用方法参考 :ref:`cn_api_fluid_DistributeTranspiler` ，简单

 .. code-block:: python

-    config = fluid.DistributedTranspilerConfig()
+    config = fluid.DistributeTranspilerConfig()
    # 配置策略config
    config.slice_var_up = False
-    t = fluid.DistributedTranspiler(config=config)
-    t.transpile(trainer_id, 
+    t = fluid.DistributeTranspiler(config=config)
+    t.transpile(trainer_id,
                program=main_program,
                pservers="192.168.0.1:6174,192.168.0.2:6174",
                trainers=1,
                sync_mode=False)

-以上参数说明请参考 `同步训练 <../distributed/sync_training.html>`_ 
+以上参数说明请参考 `同步训练 <../distributed/sync_training.html>`_

 需要注意的是：进行异步训练时，请修改 :code:`sync_mode` 的值


--- a/doc/fluid/api_guides/low_level/distributed/async_training_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/async_training_en.rst
@@ -4,21 +4,21 @@
 Asynchronous Distributed Training
 ####################################

-Fluid supports parallelism asynchronous distributed training. :code:`DistributedTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed. 
+Fluid supports parallelism asynchronous distributed training. :code:`DistributeTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed.

 **Asynchronous distributed training in Fluid only supports the pserver mode** . The main difference between asynchronous training and `synchronous training <../distributed/sync_training_en.html>`_ is that the gradients of each trainer are asynchronously applied on the parameters, but in synchronous training, the gradients of all trainers must be combined first and then they are used to update the parameters. Therefore, the hyperparameters of synchronous training and asynchronous training need to be adjusted separately.

-Asynchronous distributed training in Pserver mode 
+Asynchronous distributed training in Pserver mode
 ==================================================

 For detailed API, please refer to :ref:`api_fluid_transpiler_DistributeTranspiler` . A simple example:

 .. code-block:: python

-	config = fluid.DistributedTranspilerConfig()
-	#Configuring config policy 
+	config = fluid.DistributeTranspilerConfig()
+	#Configuring config policy
 	config.slice_var_up = False
-	t = fluid.DistributedTranspiler(config=config)
+	t = fluid.DistributeTranspiler(config=config)
 	t.transpile(trainer_id,
 				program=main_program,
 				pservers="192.168.0.1:6174,192.168.0.2:6174",

--- a/doc/fluid/api_guides/low_level/distributed/index.rst
+++ b/doc/fluid/api_guides/low_level/distributed/index.rst
@@ -7,8 +7,5 @@

    sync_training.rst
    async_training.rst
-    cpu_train_best_practice.rst
    large_scale_sparse_feature_training.rst
    cluster_train_data_cn.rst
-
-
--- a/doc/fluid/api_guides/low_level/distributed/index_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/index_en.rst
@@ -7,7 +7,6 @@ Distributed Training

    sync_training_en.rst
    async_training_en.rst
-    cpu_train_best_practice_en.rst
    large_scale_sparse_feature_training_en.rst
    cluster_train_data_en.rst


--- a/doc/fluid/api_guides/low_level/distributed/sync_training.rst
+++ b/doc/fluid/api_guides/low_level/distributed/sync_training.rst
@@ -4,7 +4,7 @@
 分布式同步训练
 ############

-Fluid支持数据并行的分布式同步训练，API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的
+Fluid支持数据并行的分布式同步训练，API使用 :code:`DistributeTranspiler` 将单机网络配置转换成可以多机执行的
 :code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码，根据环境变量或启动参数，
 可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid分布式同步训练同时支持pserver模式和NCCL2模式，
 在API使用上有差别，需要注意。
@@ -16,11 +16,11 @@ API详细使用方法参考 :ref:`DistributeTranspiler` ，简单实例用法：

 .. code-block:: python

-    config = fluid.DistributedTranspilerConfig()
+    config = fluid.DistributeTranspilerConfig()
    # 配置策略config
    config.slice_var_up = False
-    t = fluid.DistributedTranspiler(config=config)
-    t.transpile(trainer_id, 
+    t = fluid.DistributeTranspiler(config=config)
+    t.transpile(trainer_id,
                program=main_program,
                pservers="192.168.0.1:6174,192.168.0.2:6174",
                trainers=1,
@@ -68,8 +68,8 @@ NCCL2模式分布式训练

    config = fluid.DistributeTranspilerConfig()
    config.mode = "nccl2"
-    t = fluid.DistributedTranspiler(config=config)
-    t.transpile(trainer_id, 
+    t = fluid.DistributeTranspiler(config=config)
+    t.transpile(trainer_id,
                program=main_program,
                startup_program=startup_program,
                trainers="192.168.0.1:6174,192.168.0.2:6174",

--- a/doc/fluid/api_guides/low_level/distributed/sync_training_en.rst
+++ b/doc/fluid/api_guides/low_level/distributed/sync_training_en.rst
@@ -4,19 +4,19 @@
 Synchronous Distributed Training
 ####################################

-Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributedTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
+Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributeTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.

-Distributed training in pserver mode 
+Distributed training in pserver mode
 ======================================

 For API Reference, please refer to :ref:`DistributeTranspiler`. A simple example :

 .. code-block:: python

-	config = fluid.DistributedTranspilerConfig()
+	config = fluid.DistributeTranspilerConfig()
 	#Configuring policy config
 	config.slice_var_up = False
-	t = fluid.DistributedTranspiler(config=config)
+	t = fluid.DistributeTranspiler(config=config)
 	t.transpile(trainer_id,
 				program=main_program,
 				pservers="192.168.0.1:6174,192.168.0.2:6174",
@@ -51,7 +51,7 @@ Configuration for general environment variables:
 - :code:`FLAGS_rpc_deadline` : int, the longest waiting time for RPC communication, in milliseconds, default 180000


-Distributed training in NCCL2 mode 
+Distributed training in NCCL2 mode
 ====================================

 The multi-node synchronous training mode based on NCCL2 (Collective Communication) is only supported in the GPU cluster.
@@ -65,7 +65,7 @@ Use the following code to convert the current :code:`Program` to a Fluid :code:`

 	Config = fluid.DistributeTranspilerConfig()
 	Config.mode = "nccl2"
-	t = fluid.DistributedTranspiler(config=config)
+	t = fluid.DistributeTranspiler(config=config)
 	t.transpile(trainer_id,
 				program=main_program,
 				startup_program=startup_program,

--- a/doc/fluid/api_guides/low_level/executor.rst
+++ b/doc/fluid/api_guides/low_level/executor.rst
@@ -4,11 +4,11 @@
 执行引擎
 ##########

-:code:`Executor` 实现了一个简易的执行器，所有的操作在其中顺序执行。你可以在Python脚本中运行:code:`Executor`。PaddlePaddle Fluid中有两种执行器。一种是:code:`Executor` 默认的单线程执行器，另一种是并行计算执行器，在:ref:`api_guide_parallel_executor`中进行了解释。`Executor`和:ref:`api_guide_parallel_executor`的配置不同，这可能会给部分用户带来困惑。为使执行器更加灵活，我们引入了:ref:`api_guide_compiled_program`，:ref:`api_guide_compiled_program`用于把一个程序转换为不同的优化组合，可以通过:code:`Executor`运行。
+:code:`Executor` 实现了一个简易的执行器，所有的操作在其中顺序执行。你可以在Python脚本中运行 :code:`Executor` 。PaddlePaddle Fluid中有两种执行器。一种是 :code:`Executor` 默认的单线程执行器，另一种是并行计算执行器，在 :ref:`api_guide_parallel_executor` 中进行了解释。``Executor`` 和 :ref:`api_guide_parallel_executor` 的配置不同，这可能会给部分用户带来困惑。为使执行器更加灵活，我们引入了 :ref:`api_guide_compiled_program` ， :ref:`api_guide_compiled_program` 用于把一个程序转换为不同的优化组合，可以通过 :code:`Executor` 运行。

-:code:`Executor`的逻辑非常简单。建议在调试阶段用:code:`Executor`在一台计算机上完整地运行模型，然后转向多设备或多台计算机计算。
+ :code:`Executor` 的逻辑非常简单。建议在调试阶段用 :code:`Executor` 在一台计算机上完整地运行模型，然后转向多设备或多台计算机计算。

-:code:`Executor`在构造时接受一个:code:`Place`，它既可能是:ref:`api_fluid_CPUPlace`也可能是:ref:`api_fluid_CUDAPlace`。
+ :code:`Executor` 在构造时接受一个 :code:`Place` ，它既可能是 :ref:`api_fluid_CPUPlace` 也可能是 :ref:`api_fluid_CUDAPlace` 。

 .. code-block:: python
    # 首先创建Executor。
@@ -16,12 +16,12 @@
    exe = fluid.Executor(place)
    # 运行启动程序仅一次。
    exe.run(fluid.default_startup_program())
-    
+
    # 直接运行主程序。
    loss, = exe.run(fluid.default_main_program(),
                    feed=feed_dict,
                    fetch_list=[loss.name])
-简单样例请参照 `quick_start_fit_a_line <http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/quick_start/fit_a_line/README.html>`_ 
+简单样例请参照 `basics_fit_a_line <../../beginners_guide/basics/fit_a_line/README.cn.html>`_

 - 相关API :
- - :ref:`cn_api_fluid_Executor` 
+ - :ref:`cn_api_fluid_Executor`
--- a/doc/fluid/api_guides/low_level/executor_en.rst
+++ b/doc/fluid/api_guides/low_level/executor_en.rst
@@ -8,7 +8,7 @@ Executor

 The logic of :code:`Executor` is very simple. It is suggested to thoroughly run the model with :code:`Executor` in debugging phase on one computer and then switch to mode of multiple devices or multiple computers to compute.

-:code:`Executor` receives a :code:`Place` at construction, which can either be :ref:`api_fluid_CPUPlace` or :ref:`api_fluid_CUDAPlace`. 
+:code:`Executor` receives a :code:`Place` at construction, which can either be :ref:`api_fluid_CPUPlace` or :ref:`api_fluid_CUDAPlace`.

 .. code-block:: python

@@ -18,14 +18,14 @@ The logic of :code:`Executor` is very simple. It is suggested to thoroughly run

    # Run the startup program once and only once.
    exe.run(fluid.default_startup_program())
-    
+
    # Run the main program directly.
    loss, = exe.run(fluid.default_main_program(),
                    feed=feed_dict,
                    fetch_list=[loss.name])


-For simple example please refer to `quick_start_fit_a_line <http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/quick_start/fit_a_line/README.html>`_ 
+For simple example please refer to `basics_fit_a_line <../../beginners_guide/basics/fit_a_line/README.html>`_

 - Related API :
 - :ref:`api_fluid_Executor`

--- a/doc/fluid/api_guides/low_level/layers/conv.rst
+++ b/doc/fluid/api_guides/low_level/layers/conv.rst
@@ -14,30 +14,31 @@
 ---------------------

 卷积需要依据滑动步长(stride)、填充长度(padding)、卷积核窗口大小(filter size)、分组数(groups)、扩张系数(dilation rate)来决定如何计算。groups最早在 `AlexNet <https://www.nvidia.cn/content/tesla/pdf/machine-learning/imagenet-classification-with-deep-convolutional-nn.pdf>`_ 中引入, 可以理解为将原始的卷积分为独立若干组卷积计算。
-  
+
  **注意**: 同cuDNN的方式，Fluid目前只支持在特征图上下填充相同的长度，左右也是。

- 输入输出Layout: 
+- 输入输出Layout:

  2D卷积输入特征的Layout为[N, C, H, W]或[N, H, W, C], N即batch size，C是通道数，H、W是特征的高度和宽度，输出特征和输入特征的Layout一致。(相应的3D卷积输入特征的Layout为[N, C, D, H, W]或[N, D, H, W, C]，但**注意**，Fluid的卷积当前只支持[N, C, H, W]，[N, C, D, H, W]。)
-   
- 卷积核的Layout: 
-  
+
+- 卷积核的Layout:
+
  Fluid中2D卷积的卷积核(也称权重)的Layout为[C_o, C_in / groups, f_h, f_w]，C_o、C_in表示输出、输入通道数，f_h、f_w表示卷积核窗口的高度和宽度，按行序存储。(相应的2D卷积的卷积核Layout为[C_o, C_in / groups, f_d, f_h, d_w]，同样按行序存储。)
-  
- 深度可分离卷积(depthwise separable convolution): 
-   
+
+- 深度可分离卷积(depthwise separable convolution):
+
  在深度可分离卷积中包括depthwise convolution和pointwise convolution两组，这两个卷积的接口和上述普通卷积接口相同。前者可以通过给普通卷积设置groups来做，后者通过设置卷积核filters的大小为1x1，深度可分离卷积减少参数的同时减少了计算量。
-  
+
  对于depthwise convolution，可以设置groups等于输入通道数，此时，2D卷积的卷积核形状为[C_o, 1, f_h, f_w]。
  对于pointwise convolution，卷积核的形状为[C_o, C_in, 1, 1]。
-  
-  **注意**：Fluid针对depthwise convolution的GPU计算做了高度优化，您可以通过在 :code:`fluid.layers.conv2d`接口设置 :code:`use_cudnn=False`来使用Fluid自身优化的CUDA程序。
-   
+
+  **注意**：Fluid针对depthwise convolution的GPU计算做了高度优化，您可以通过在
+  :code:`fluid.layers.conv2d` 接口设置 :code:`use_cudnn=False` 来使用Fluid自身优化的CUDA程序。
+
 - 空洞卷积(dilated convolution):
-  
+
  空洞卷积相比普通卷积而言，卷积核在特征图上取值时不在连续，而是间隔的，这个间隔数称作dilation，等于1时，即为普通卷积，空洞卷积相比普通卷积的感受野更大。
-  
+
 - API汇总:
 - :ref:`cn_api_fluid_layers_conv2d`
 - :ref:`cn_api_fluid_layers_conv3d`
@@ -50,14 +51,14 @@

 Fluid可以表示变长的序列结构，这里的变长是指不同样本的时间步(step)数不一样，通常是一个2D的Tensor和一个能够区分的样本长度的辅助结构来表示。假定，2D的Tensor的形状是shape，shape[0]是所有样本的总时间步数，shape[1]是序列特征的大小。

-基于此数据结构的卷积在Fluid里称作序列卷积，也表示一维卷积。同图像卷积，序列卷积的输入参数有卷积核大小、填充大小、滑动步长，但与2D卷积不同的是，这些参数个数都为1。**注意**，目前仅支持stride为1的情况，输出序列的时间步数和输入序列相同。 
+基于此数据结构的卷积在Fluid里称作序列卷积，也表示一维卷积。同图像卷积，序列卷积的输入参数有卷积核大小、填充大小、滑动步长，但与2D卷积不同的是，这些参数个数都为1。**注意**，目前仅支持stride为1的情况，输出序列的时间步数和输入序列相同。

 假如：输入序列形状为(T, N)， T即该序列的时间步数，N是序列特征大小；卷积核的上下文步长为K，输出序列长度为M，则卷积核权重形状为(K * N, M），输出序列形状为(T, M)。
-  
+
 另外，参考DeepSpeech，Fluid实现了行卷积row convolution, 或称
 `look ahead convolution <http://www.cs.cmu.edu/~dyogatam/papers/wang+etal.iclrworkshop2016.pdf>`_ ，
 该卷积相比上述普通序列卷积可以减少参数。
- 
+

 - API汇总:
 - :ref:`cn_api_fluid_layers_sequence_conv`

--- a/doc/fluid/api_guides/low_level/layers/learning_rate_scheduler.rst
+++ b/doc/fluid/api_guides/low_level/layers/learning_rate_scheduler.rst
@@ -38,3 +38,9 @@
 * :code:`append_LARS`: 通过Layer-wise Adaptive Rate Scaling算法获得学习率，相关算法请参考 `《Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation》 <https://arxiv.org/abs/1802.09750>`_ 。
  相关API Reference请参考 :ref:`cn_api_fluid_layers_append_LARS`

+* :code:`cosine_decay`: 余弦衰减，即学习率随step数变化呈余弦函数。
+  相关API Reference请参考 :ref:`cn_api_fluid_layers_cosine_decay`
+
+* :code:`linear_lr_warmup`: 学习率随step数线性增加到指定学习率。
+  相关API Reference请参考 :ref:`cn_api_fluid_layers_linear_lr_warmup`
+
--- a/doc/fluid/api_guides/low_level/layers/sparse_update.rst
+++ b/doc/fluid/api_guides/low_level/layers/sparse_update.rst
@@ -37,9 +37,9 @@ API详细使用方法参考 :ref:`cn_api_fluid_layers_embedding` ，以下是一

 以上参数中：

- :code:`is_sparse` ： 反向计算的时候梯度是否为sparse tensor。如果不设置，梯度是一个 `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/prepare_data/lod_tensor.md>`_  。默认为False。
+- :code:`is_sparse` ： 反向计算的时候梯度是否为sparse tensor。如果不设置，梯度是一个 `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/basic_concept/lod_tensor.html>`_  。默认为False。

- :code:`is_distributed` ： 标志是否是用在分布式的场景下。一般大规模稀疏更新（embedding的第0维维度很大，比如几百万以上）才需要设置。具体可以参考大规模稀疏的API guide  :ref:`api_guide_async_training`  。默认为False。
+- :code:`is_distributed` ： 标志是否是用在分布式的场景下。一般大规模稀疏更新（embedding的第0维维度很大，比如几百万以上）才需要设置。具体可以参考大规模稀疏的API guide  :ref:`cn_api_guide_async_training`  。默认为False。

 - API汇总:
 - :ref:`cn_api_fluid_layers_embedding`
--- a/doc/fluid/api_guides/low_level/layers/sparse_update_en.rst
+++ b/doc/fluid/api_guides/low_level/layers/sparse_update_en.rst
@@ -37,9 +37,9 @@ API reference :ref:`api_fluid_layers_embedding` . Here is a simple example:

 The parameters:

- :code:`is_sparse` : Whether the gradient is a sparse tensor in the backward calculation. If not set, the gradient is a `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/prepare_data/lod_tensor.md>`_ . The default is False.
+- :code:`is_sparse` : Whether the gradient is a sparse tensor in the backward calculation. If not set, the gradient is a `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/basic_concept/lod_tensor_en.html>`_ . The default is False.

 - :code:`is_distributed` : Whether the current training is in a distributed scenario. Generally, this parameter can only be set in large-scale sparse updates (the 0th dimension of embedding is very large, such as several million or more). For details, please refer to the large-scale sparse API guide :ref:`api_guide_async_training`. The default is False.

 - API :
-   - :ref:`api_fluid_layers_embedding`
\ No newline at end of file
+   - :ref:`api_fluid_layers_embedding`
--- a/doc/fluid/api_guides/low_level/nets.rst
+++ b/doc/fluid/api_guides/low_level/nets.rst
@@ -33,8 +33,9 @@ API Reference 请参考 :ref:`cn_api_fluid_nets_img_conv_group`

 :code:`sequence_conv_pool` 是由 :ref:`cn_api_fluid_layers_sequence_conv` 与 :ref:`cn_api_fluid_layers_sequence_pool` 串联而成。
 该模块在 `自然语言处理 <https://zh.wikipedia.org/wiki/自然语言处理>`_ 以及 `语音识别 <https://zh.wikipedia.org/wiki/语音识别>`_ 等领域均有广泛应用，
-比如 `文本分类模型 <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/text_classification/nets.py>`_ , 
-`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/tagspace/train.py>`_  以及 `Multi-view Simnet <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/multiview_simnet/nets.py>`_ 等模型。
+比如 `文本分类模型 <https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/text_classification/nets.py>`_ ,
+`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/tagspace/train.py>`_  以及
+`Multi-view Simnet <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/multiview_simnet/nets.py>`_ 等模型。

 API Reference 请参考 :ref:`cn_api_fluid_nets_sequence_conv_pool`

@@ -55,7 +56,7 @@ API Reference 请参考 :ref:`cn_api_fluid_nets_glu`
 .. math::
 Attention(Q, K, V)= softmax(QK^\mathrm{T})V

-该模块广泛使用在 `机器翻译 <https://zh.wikipedia.org/zh/机器翻译>`_ 的模型中，比如 `Transformer <https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/neural_machine_translation/transformer>`_ 。
+该模块广泛使用在 `机器翻译 <https://zh.wikipedia.org/zh/机器翻译>`_ 的模型中，比如 `Transformer <https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/neural_machine_translation/transformer>`_ 。

 API Reference 请参考 :ref:`cn_api_fluid_nets_scaled_dot_product_attention`

--- a/doc/fluid/api_guides/low_level/nets_en.rst
+++ b/doc/fluid/api_guides/low_level/nets_en.rst
@@ -32,8 +32,8 @@ For API Reference, please refer to :ref:`api_fluid_nets_img_conv_group`
 --------------------

 :code:`sequence_conv_pool` is got by concatenating :ref:`api_fluid_layers_sequence_conv` with :ref:`api_fluid_layers_sequence_pool`.
-The module is widely used in the field of `natural language processing <https://en.wikipedia.org/wiki/Natural_language_processing>`_ and `speech recognition <https://en.wikipedia.org/wiki/Speech_recognition>`_ .  Models such as the `text classification model <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/text_classification/nets.py>`_ ,
-`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/tagspace/train.py>`_ and `Multi view Simnet <https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/multiview_simnet/nets.py>`_.
+The module is widely used in the field of `natural language processing <https://en.wikipedia.org/wiki/Natural_language_processing>`_ and `speech recognition <https://en.wikipedia.org/wiki/Speech_recognition>`_ .  Models such as the `text classification model <https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/text_classification/nets.py>`_ ,
+`TagSpace <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/tagspace/train.py>`_ and `Multi view Simnet <https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/multiview_simnet/nets.py>`_.

 For API Reference, please refer to :ref:`api_fluid_nets_sequence_conv_pool`

@@ -54,6 +54,6 @@ For the input data :code:`Queries` , :code:`Key` and :code:`Values`, calculate t
 .. math::
 Attention(Q, K, V)= softmax(QK^\mathrm{T})V

-This module is widely used in the model of `machine translation <https://en.wikipedia.org/wiki/Machine_translation>`_, such as `Transformer <https://github.com/PaddlePaddle/models/tree/develop/Fluid/PaddleNLP/neural_machine_translation/transformer>`_ .
+This module is widely used in the model of `machine translation <https://en.wikipedia.org/wiki/Machine_translation>`_, such as `Transformer <https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/neural_machine_translation/transformer>`_ .

 For API Reference, please refer to :ref:`api_fluid_nets_scaled_dot_product_attention`
--- a/doc/fluid/api_guides/low_level/parallel_executor.rst
+++ b/doc/fluid/api_guides/low_level/parallel_executor.rst
@@ -29,7 +29,7 @@

 **注意** ：如果在Reduce模式下使用 :code:`CPU` 多线程执行 :code:`Program` ， :code:`Program` 的参数在多个线程间是共享的，在某些模型上，Reduce模式可以大幅节省内存。

-鉴于模型的执行速率和模型结构及执行器的执行策略有关，:code:`ParallelExecutor` 允许你修改执行器的相关参数，例如线程池的规模( :code:`num_threads` )、为清除临时变量:code:`num_iteration_per_drop_scope`需要进行的循环次数。更多信息请参照:ref:`cn_api_fluid_ExecutionStrategy`。
+鉴于模型的执行速率和模型结构及执行器的执行策略有关，:code:`ParallelExecutor` 允许你修改执行器的相关参数，例如线程池的规模( :code:`num_threads` )、为清除临时变量 :code:`num_iteration_per_drop_scope` 需要进行的循环次数。更多信息请参照 :ref:`cn_api_fluid_ExecutionStrategy` 。


 .. code-block:: python
@@ -49,8 +49,8 @@
    exec_strategy.num_threads = dev_count * 4 # the size of thread pool.
    build_strategy = fluid.BuildStrategy()
    build_strategy.memory_optimize = True if memory_opt else False
-    train_exe = fluid.ParallelExecutor(use_cuda=use_cuda, 
-                                       main_program=train_program, 
+    train_exe = fluid.ParallelExecutor(use_cuda=use_cuda,
+                                       main_program=train_program,
                                       build_strategy=build_strategy,
                                       exec_strategy=exec_strategy,
                                       loss_name=loss.name)

--- a/doc/fluid/api_guides/low_level/program.rst
+++ b/doc/fluid/api_guides/low_level/program.rst
 .. _api_guide_Program:

-###############################
-Program/Block/Operator/Variable
-###############################
+#########
+基础概念
+#########

 ==================
 Program
@@ -13,13 +13,13 @@ Program

 总得来说：

-* 一个模型是一个 Fluid :code:`Program` ,一个模型可以含有多于一个 :code:`Program` ； 
+* 一个模型是一个 Fluid :code:`Program` ,一个模型可以含有多于一个 :code:`Program` ；

 * :code:`Program` 由嵌套的 :code:`Block` 构成，:code:`Block` 的概念可以类比到 C++ 或是 Java 中的一对大括号，或是 Python 语言中的一个缩进块；

 * :code:`Block` 中的计算由顺序执行、条件选择或者循环执行三种方式组合，构成复杂的计算逻辑；

-* :code:`Block` 中包含对计算和计算对象的描述。计算的描述称之为 Operator；计算作用的对象（或者说 Operator 的输入和输出）被统一为 Tensor，在Fluid中，Tensor 用层级为0的 `LoD-Tensor <http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/prepare_data/lod_tensor.html#permalink-4-lod-tensor>`_ 表示。 
+* :code:`Block` 中包含对计算和计算对象的描述。计算的描述称之为 Operator；计算作用的对象（或者说 Operator 的输入和输出）被统一为 Tensor，在Fluid中，Tensor 用层级为0的 `LoD-Tensor <http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/prepare_data/lod_tensor.html#permalink-4-lod-tensor>`_ 表示。



@@ -37,7 +37,7 @@ Block
 +----------------------+-------------------------+
 | if-else, switch      | IfElseOp, SwitchOp      |
 +----------------------+-------------------------+
-| 顺序执行              | 一系列 layers            | 
+| 顺序执行              | 一系列 layers            |
 +----------------------+-------------------------+

 如上文所说，Fluid 中的 :code:`Block` 描述了一组以顺序、选择或是循环执行的 Operator 以及 Operator 操作的对象：Tensor。
@@ -54,7 +54,7 @@ Operator
 这是因为一些常见的对 Tensor 的操作可能是由更多基础操作构成，为了提高使用的便利性，框架内部对基础 Operator 进行了一些封装，包括创建 Operator 依赖可学习参数，可学习参数的初始化细节等，减少用户重复开发的成本。


-更多内容可参考阅读 `Fluid设计思想 <../../advanced_usage/design_idea/fluid_design_idea.html>`_ 
+更多内容可参考阅读 `Fluid设计思想 <../../advanced_usage/design_idea/fluid_design_idea.html>`_


 =========
@@ -78,4 +78,4 @@ Fluid 中的 :code:`Variable` 可以包含任何类型的值———在大多

 * 用户还可以使用 :ref:`cn_api_fluid_program_guard` 配合 :code:`with` 语句，修改配置好的 :ref:`cn_api_fluid_default_startup_program` 和 :ref:`cn_api_fluid_default_main_program` 。

-* 在Fluid中，Block内部执行顺序由控制流决定，如 :ref:`cn_api_fluid_layers_IfElse` , :ref:`cn_api_fluid_layers_While`, :ref:`cn_api_fluid_layers_Switch` 等，更多内容可参考： :ref:`api_guide_control_flow` 
+* 在Fluid中，Block内部执行顺序由控制流决定，如 :ref:`cn_api_fluid_layers_IfElse` , :ref:`cn_api_fluid_layers_While`, :ref:`cn_api_fluid_layers_Switch` 等，更多内容可参考： :ref:`api_guide_control_flow`
--- a/doc/fluid/api_guides/low_level/program_en.rst
+++ b/doc/fluid/api_guides/low_level/program_en.rst
 .. _api_guide_Program_en:

-###############################
-Program/Block/Operator/Variable
-###############################
+###############
+Basic Concept
+###############

 ==================
 Program
@@ -36,7 +36,7 @@ Block
 +----------------------+-------------------------+
 | if-else, switch      | IfElseOp, SwitchOp      |
 +----------------------+-------------------------+
-| execute sequentially | a series of layers      | 
+| execute sequentially | a series of layers      |
 +----------------------+-------------------------+

 As mentioned above,  :code:`Block` in Fluid describes a set of Operators that include sequential execution, conditional selection or loop execution, and the operating object of Operator: Tensor.
@@ -53,7 +53,7 @@ This is because some common operations on Tensor may consist of more basic opera



-More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_ 
+More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_


 =========
@@ -75,4 +75,4 @@ Related API
 * Users can also use :ref:`api_fluid_program_guard` with :code:`with` to modify the configured :ref:`api_fluid_default_startup_program` and :ref:`api_fluid_default_main_program` .


-* In Fluid，the execution order in a Block is determined by control flow，such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to： :ref:`api_guide_control_flow_en` 
+* In Fluid，the execution order in a Block is determined by control flow，such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to： :ref:`api_guide_control_flow_en`
--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/README.cn.md
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/README.cn.md
--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/README.md
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/README.md
--- a/doc/fluid/beginners_guide/quick_start/fit_a_line/image
+++ b/doc/fluid/beginners_guide/quick_start/fit_a_line/image
--- a/doc/fluid/beginners_guide/basics/index.rst
+++ b/doc/fluid/beginners_guide/basics/index.rst
 ################
-深度学习基础
+深度学习基础教程
 ################


-本章由7篇文档组成，它们按照简单到难的顺序排列，将指导您如何使用PaddlePaddle完成基础的深度学习任务
+本章由9篇文档组成，它们按照简单到难的顺序排列，将指导您如何使用PaddlePaddle完成基础的深度学习任务

 本章文档涉及大量了深度学习基础知识，也介绍了如何使用PaddlePaddle实现这些内容，请参阅以下说明了解如何使用：

@@ -15,6 +15,8 @@
 ..  toctree::
    :titlesonly:

+    fit_a_line/README.cn.md
+    recognize_digits/README.cn.md
    image_classification/index.md
    word2vec/index.md
    recommender_system/index.md

--- a/doc/fluid/beginners_guide/basics/index_en.rst
+++ b/doc/fluid/beginners_guide/basics/index_en.rst
-##########################
+############################
 Basic Deep Learning Models
-##########################
+############################

-This section collects six documents arranging from the simplest to the most challenging, which will guide you through the basic deep learning tasks in PaddlePaddle.
+This section collects 8 documents arranging from the simplest to the most challenging, which will guide you through the basic deep learning tasks in PaddlePaddle.

 The documentation in this chapter covers a lot of deep learning basics and how to implement them with PaddlePaddle. See the instructions below for how to use:

@@ -15,6 +15,8 @@ The book you are reading is an "interactive" e-book - each chapter can be run in
 ..  toctree::
    :titlesonly:

+    fit_a_line/README.md
+    recognize_digits/README.md
    image_classification/index_en.md
    word2vec/index_en.md
    recommender_system/index_en.md
@@ -45,7 +47,7 @@ Just run these in shell:

 	docker run -d -p 8888:8888 paddlepaddle/book

-It downloads the Docker image for running books from DockerHub.com. 
+It downloads the Docker image for running books from DockerHub.com.
 To read and edit this book on-line, please visit http://localhost:8888 in your browser.

 If the Internet connection to DockerHub.com is compromised, try our spare docker image named docker.paddlepaddlehub.com:

--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/README.cn.md
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/README.cn.md
--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/README.md
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/README.md
--- a/doc/fluid/beginners_guide/quick_start/recognize_digits/image
+++ b/doc/fluid/beginners_guide/quick_start/recognize_digits/image
--- a/doc/fluid/beginners_guide/index.rst
+++ b/doc/fluid/beginners_guide/index.rst
@@ -6,23 +6,24 @@ PaddlePaddle (PArallel Distributed Deep LEarning)是一个易用、高效、灵

 您可参考PaddlePaddle的 `Github <https://github.com/PaddlePaddle/Paddle>`_ 了解详情，也可阅读 `版本说明 <../release_note.html>`_ 了解新版本的特性

-当您第一次来到PaddlePaddle，请您首先阅读以下文档，了解安装方法：
+让我们从这里开始：

-    - `安装说明 <../beginners_guide/install/index_cn.html>`_：我们支持在Ubuntu/CentOS/Windows/MacOS环境上的安装
+    - `快速开始 <../beginners_guide/quick_start.html>`_

-如果您已经具备一定的深度学习基础，第一次使用PaddlePaddle时，可以跟随下列简单的模型案例供您快速上手：
+当您第一次来到PaddlePaddle，请您首先阅读以下文档，了解安装方法：

-    - `Fluid编程指南 <../beginners_guide/programming_guide/programming_guide.html>`_：介绍 Fluid 的基本概念和使用方法
+    - `安装说明 <../beginners_guide/install/index_cn.html>`_：我们支持在Ubuntu/CentOS/Windows/MacOS环境上的安装

-    - `快速入门 <../beginners_guide/quick_start/index.html>`_：提供线性回归和识别数字两个入门级模型，帮助您快速上手训练网络
+这里为您提供了更多学习资料:

    - `深度学习基础 <../beginners_guide/basics/index.html>`_：覆盖图像分类、个性化推荐、机器翻译等多个深度领域的基础知识，提供 Fluid 实现案例

+    - `Fluid编程指南 <../beginners_guide/programming_guide/programming_guide.html>`_：介绍 Fluid 的基本概念和使用方法

 ..  toctree::
    :hidden:

+    quick_start_cn.rst
    install/index_cn.rst
-    quick_start/index.rst
-    basics/index.rst
+    basics/index_cn.rst
    programming_guide/programming_guide.md
--- a/doc/fluid/beginners_guide/index_en.rst
+++ b/doc/fluid/beginners_guide/index_en.rst
@@ -15,8 +15,6 @@ If you have been armed with certain level of deep learning knowledge, and it hap

    - `Programming with Fluid <../beginners_guide/programming_guide/programming_guide_en.html>`_ ： Core concepts and basic usage of Fluid

-    - `Quick Start <../beginners_guide/quick_start/index_en.html>`_： Two easy-to-go models, linear regression model and digit recognition model, are in place to speed up your study of training neural networks
-
    - `Deep Learning  Basics <../beginners_guide/basics/index_en.html>`_： This section encompasses various fields of fundamental deep learning knowledge, such as image classification, customized recommendation, machine translation, and examples implemented by Fluid are provided.


@@ -24,6 +22,5 @@ If you have been armed with certain level of deep learning knowledge, and it hap
    :hidden:

    install/index_en.rst
-    quick_start/index_en.rst
    basics/index_en.rst
    programming_guide/programming_guide_en.md
--- a/doc/fluid/beginners_guide/install/Tables.md
+++ b/doc/fluid/beginners_guide/install/Tables.md
@@ -111,16 +111,6 @@
 		<td> 是否支持GPU </td>
 		<td> ON </td>
 	</tr>
-	<tr>
-		<td> WITH_C_API </td>
-		<td> 是否仅编译CAPI </td>
-		<td>  OFF </td>
-	</tr>
-		<tr>
-		<td> WITH_DOUBLE </td>
-		<td> 是否使用双精度浮点数 </td>
-		<td> OFF </td>
-	</tr>
 	<tr>
 		<td> WITH_DSO </td>
 		<td> 是否运行时动态加载CUDA动态库，而非静态加载CUDA动态库 </td>
@@ -136,30 +126,11 @@
 		<td> 是否内嵌PYTHON解释器 </td>
 		<td> ON </td>
 	</tr>
-	<tr>
-		<td> WITH_STYLE_CHECK </td>
-		<td> 是否编译时进行代码风格检查 </td>
-		<td> ON </td>
-	</tr>
 	<tr>
 		<td> WITH_TESTING </td>
 		<td> 是否开启单元测试 </td>
 		<td> OFF </td>
 	</tr>
-	<tr>
-		<td> WITH_DOC </td>
-		<td> 是否编译中英文文档 </td>
-		<td> OFF </td>
-	</tr>
-	<tr>
-		<td> WITH_SWIG_PY </td>
-		<td> 是否编译PYTHON的SWIG接口，该接口可用于预测和定制化训练 </td>
-		<td> Auto </td>
-	<tr>
-		<td> WITH_GOLANG </td>
-		<td> 是否编译go语言的可容错parameter server </td>
-		<td> OFF </td>
-	</tr>
 	<tr>
 		<td> WITH_MKL </td>
 		<td> 是否使用MKL数学库，如果为否则是用OpenBLAS </td>
@@ -175,11 +146,6 @@
 		<td> 是否编译带有分布式的版本 </td>
 		<td> OFF </td>
 	</tr>
-	<tr>
-		<td> WITH_RDMA </td>
-		<td> 是否编译支持RDMA的相关部分 </td>
-		<td> OFF </td>
-	</tr>
 	<tr>
 		<td> WITH_BRPC_RDMA </td>
 		<td> 是否使用BRPC RDMA作为RPC协议 </td>
@@ -190,11 +156,6 @@
 		<td> 是否打开预测优化 </td>
 		<td> OFF </td>
 	</tr>
-	<tr>
-		<td> DWITH_ANAKIN </td>
-		<td> 是否编译ANAKIN </td>
-		<td> OFF </td>
-	</tr>
   </tbody>
 </table>
 </p>

--- a/doc/fluid/beginners_guide/install/compile/compile_MacOS.md
+++ b/doc/fluid/beginners_guide/install/compile/compile_MacOS.md
@@ -186,6 +186,7 @@
 			For Python2: cmake .. -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF  -DCMAKE_BUILD_TYPE=Release
 			For Python3: cmake .. -DPY_VERSION=3.5 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
 			 -DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF  -DCMAKE_BUILD_TYPE=Release
+
 	>`-DPY_VERSION=3.5`请修改为安装环境的Python版本

 10. 使用以下命令来编译：

--- a/doc/fluid/beginners_guide/install/compile/compile_MacOS_en.md
+++ b/doc/fluid/beginners_guide/install/compile/compile_MacOS_en.md
@@ -68,7 +68,7 @@ Once you have **properly installed Docker**, you can start **compiling PaddlePad

 9. Execute cmake:

-	> For details on the compilation options, see the [compilation options table](../Tables.html/#Compile).
+	> For details on the compilation options, see the [compilation options table](../Tables_en.html/#Compile).

 	* For users who need to compile the **CPU version PaddlePaddle**:

@@ -121,7 +121,7 @@ Congratulations, you have now completed the process of compiling PaddlePaddle us

 4. (Only For Python3) Set Python-related environment variables:

-	- a. First use 
+	- a. First use
 			```find `dirname $(dirname
 			  $(which python3))` -name "libpython3.*.dylib"```
 			to find the path to Pythonlib (the first one it prompts is the dylib path for the python you need to use), then (below [python-lib-path] is replaced by finding the file path)
@@ -148,7 +148,7 @@ Congratulations, you have now completed the process of compiling PaddlePaddle us
 		Since we are using CMake3.4 please follow the steps below:

 		1. Download the CMake image from the [official CMake website](https://cmake.org/files/v3.4/cmake-3.4.3-Darwin-x86_64.dmg) and install it.
-	
+
 		2. Enter `sudo "/Applications/CMake.app/Contents/bin/cmake-gui" –install` in the console

 	- b. If you do not want to use the system default blas and want to use your own installed OPENBLAS please read [FAQ](../FAQ.html/#OPENBLAS)

--- a/doc/fluid/beginners_guide/install/install_Ubuntu_en.md
+++ b/doc/fluid/beginners_guide/install/install_Ubuntu_en.md
@@ -4,7 +4,7 @@

 This instruction describes how to install PaddlePaddle on a *64-bit desktop or laptop* and Ubuntu system. The Ubuntu systems we support must meet the following requirements:

-Please note: Attempts on other systems may cause the installation to fail. Please ensure that your environment meets the conditions. The installation we provide by default requires your computer processor to support the AVX instruction set. Otherwise, please select the version of `no_avx` in the [latest Release installation package list](./Tables.html/#ciwhls-release).
+Please note: Attempts on other systems may cause the installation to fail. Please ensure that your environment meets the conditions. The installation we provide by default requires your computer processor to support the AVX instruction set. Otherwise, please select the version of `no_avx` in the [latest Release installation package list](./Tables_en.html/#ciwhls-release).

 Under Ubuntu, you can use `cat /proc/cpuinfo | grep avx` to check if your processor supports the AVX instruction set.

@@ -80,9 +80,9 @@ Now let's install PaddlePaddle:
 	* For users who need **the GPU version PaddlePaddle**: `pip install paddlepaddle-gpu` or `pip3 install paddlepaddle-gpu`

 	> 1.In order to prevent problem "nccl.h cannot be found", please first install nccl2 according to the following command (here is ubuntu 16.04, CUDA9, ncDNN v7 nccl2 installation instructions), for more information about the installation information, please refer to [the NVIDIA official website](https://developer.nvidia.com/nccl/nccl-download):
-	
+
 		i. `Wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb`
-		ii. `dpkg -i nvidia-machine- Learning-repo-ubuntu1604_1.0.0-1_amd64.deb` 
+		ii. `dpkg -i nvidia-machine- Learning-repo-ubuntu1604_1.0.0-1_amd64.deb`
 		iii. `sudo apt-get install -y libnccl2=2.2.13-1+cuda9.0 libnccl-dev=2.2.13-1+cuda9.0`

 	> 2.If you do not specify the pypi package version number, we will by default provide you with a version of PaddlePaddle that supports Cuda 9/cuDNN v7.

--- a/doc/fluid/beginners_guide/programming_guide/programming_guide.md
+++ b/doc/fluid/beginners_guide/programming_guide/programming_guide.md
@@ -236,7 +236,7 @@ Fluid的设计思想类似于高级编程语言C++和JAVA等。程序的执行
 #定义Exector
 cpu = fluid.core.CPUPlace() #定义运算场所，这里选择在CPU下训练
 exe = fluid.Executor(cpu) #创建执行器
-exe.run(fluid.default_startup_program()) #初始化Program
+exe.run(fluid.default_startup_program()) #用来进行初始化的program

 #训练Program，开始计算
 #feed以字典的形式定义了数据传入网络的顺序
@@ -407,17 +407,17 @@ outs = exe.run(
    ```
    可以看到100次迭代后，预测值已经非常接近真实值了，损失值也从初始值9.05下降到了0.01。

-    恭喜您！已经成功完成了第一个简单网络的搭建，想尝试线性回归的进阶版——房价预测模型，请阅读：[线性回归](../../beginners_guide/quick_start/fit_a_line/README.cn.html)。更多丰富的模型实例可以在[模型库](../../user_guides/models/index_cn.html)中找到。
+    恭喜您！已经成功完成了第一个简单网络的搭建，想尝试线性回归的进阶版——房价预测模型，请阅读：[线性回归](../../beginners_guide/basics/fit_a_line/README.cn.html)。更多丰富的模型实例可以在[模型库](../../user_guides/models/index_cn.html)中找到。

 <a name="what_next"></a>
 ## What's next

 如果您已经掌握了基本操作，可以进行下一阶段的学习了：

-跟随这一教程将学习到如何对实际问题建模并使用fluid构建模型：[配置简单的网络](../../user_guides/howto/configure_simple_model/index.html)。
+跟随这一教程将学习到如何对实际问题建模并使用fluid构建模型：[配置简单的网络](../../user_guides/howto/configure_simple_model/index_cn.html)。

-完成网络搭建后，可以开始在单机或多机上训练您的网络了，详细步骤请参考[训练神经网络](../../user_guides/howto/training/index.html)。
+完成网络搭建后，可以开始在单机或多机上训练您的网络了，详细步骤请参考[训练神经网络](../../user_guides/howto/training/index_cn.html)。

-除此之外，使用文档模块根据开发者的不同背景划分了三个学习阶段：[新手入门](../../beginners_guide/index.html)、[使用指南](../../user_guides/index.html)和[进阶使用](../../advanced_usage/index.html)。
+除此之外，使用文档模块根据开发者的不同背景划分了三个学习阶段：[新手入门](../../beginners_guide/index_cn.html)、[使用指南](../../user_guides/index_cn.html)和[进阶使用](../../advanced_usage/index_cn.html)。

-如果您希望阅读更多场景下的应用案例，可以跟随导航栏进入[快速入门](../../beginners_guide/quick_start/index.html)和[深度学习基础知识](../../beginners_guide/basics/index.html)。已经具备深度学习基础知识的用户，可以从[使用指南](../../user_guides/index.html)开始阅读。
+如果您希望阅读更多场景下的应用案例，可以参考[深度学习基础教程](../../beginners_guide/basics/index_cn.html)。已经具备深度学习基础知识的用户，可以从[使用指南](../../user_guides/index_cn.html)开始阅读。
--- a/doc/fluid/beginners_guide/programming_guide/programming_guide_en.md
+++ b/doc/fluid/beginners_guide/programming_guide/programming_guide_en.md
@@ -414,7 +414,7 @@ Firstly, define input data format, model structure,loss function and optimized a
    ```
    Now we discover that predicted value is nearly close to real value and the loss value descends from original value 9.05 to 0.01 after iteration for 100 times.

-    Congratulations! You have succeed to create a simple network. If you want to try advanced linear regression —— predict model of housing price, please read [linear regression](../../beginners_guide/quick_start/fit_a_line/README.en.html). More examples of model can be found in [models](../../user_guides/models/index_en.html).
+    Congratulations! You have succeed to create a simple network. If you want to try advanced linear regression —— predict model of housing price, please read [linear regression](../../beginners_guide/basics/fit_a_line/README.en.html). More examples of model can be found in [models](../../user_guides/models/index_en.html).

 <a name="what_next"></a>
 ## What's next
@@ -427,4 +427,4 @@ After the construction of network, you can start training your network in single

 In addition, there are three learning levels in documentation according to developer's background and experience: [Beginner's Guide](../../beginners_guide/index_en.html) , [User Guides](../../user_guides/index_en.html) and [Advanced User Guides](../../advanced_usage/index_en.html).

-If you want to read examples in more application scenarios, you can go to [quick start](../../beginners_guide/quick_start/index_en.html) and [basic knowledge of deep learning](../../beginners_guide/basics/index_en.html) .If you have learned basic knowledge of deep learning, you can read from [user guide](../../user_guides/index_en.html).
+If you want to read examples in more application scenarios, you can go to [basic knowledge of deep learning](../../beginners_guide/basics/index_en.html) .If you have learned basic knowledge of deep learning, you can read from [user guide](../../user_guides/index_en.html).
--- a/doc/fluid/beginners_guide/quick_start/index.rst
+++ b/doc/fluid/beginners_guide/quick_start/index.rst
-########
-快速入门
-########
-
-欢迎来到快速入门部分，在这里，我们将向您介绍如何通过PaddlePaddle Fluid实现经典的线性回归和手写识别的模型，以下两篇文档将指导您使用真实数据集搭建起模型、进行训练和预测：
-
-..  toctree::
-    :titlesonly:
-
-    fit_a_line/README.cn.md
-    recognize_digits/README.cn.md
--- a/doc/fluid/beginners_guide/quick_start/index_en.rst
+++ b/doc/fluid/beginners_guide/quick_start/index_en.rst
-##############
-Quick Start
-##############
-
-Welcome to Quick Start! 
-
-This section will tutor you to invent your won models of classical *linear Regression* and *Handwritten Digits Recognition* tasks in PaddlePaddle Fluid. The following tutorials provide details on model definition, training, and inference in a friendly manner based on real-life datasets:
-
-..  toctree::
-    :titlesonly:
-
-    fit_a_line/README.md
-    recognize_digits/README.md
--- a/doc/fluid/beginners_guide/quick_start.rst
+++ b/doc/fluid/beginners_guide/quick_start.rst
--- a/doc/fluid/index_cn.rst
+++ b/doc/fluid/index_cn.rst
@@ -11,8 +11,8 @@
    :maxdepth: 1


-    beginners_guide/index.rst
-    user_guides/index.rst
-    advanced_usage/index.rst
+    beginners_guide/index_cn.rst
+    user_guides/index_cn.rst
+    advanced_usage/index_cn.rst
    api_cn/index_cn.rst
-    release_note.rst
+    release_note_cn.rst
--- a/doc/fluid/release_note.rst
+++ b/doc/fluid/release_note.rst
--- a/doc/fluid/user_guides/howto/configure_simple_model/index.rst
+++ b/doc/fluid/user_guides/howto/configure_simple_model/index.rst
--- a/doc/fluid/user_guides/howto/evaluation_and_debugging/index.rst
+++ b/doc/fluid/user_guides/howto/evaluation_and_debugging/index.rst
--- a/doc/fluid/user_guides/howto/prepare_data/feeding_data.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/feeding_data.rst
@@ -53,6 +53,8 @@ PaddlePaddle Fluid支持使用 :code:`fluid.layers.data()` 配置数据层；
 .. code-block:: python

   exe = fluid.Executor(fluid.CPUPlace())
+   # init Program
+   exe.run(fluid.default_startup_program())
   exe.run(feed={
      "image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
      "label": numpy.random.random(size=(32, 1)).astype('int64')
@@ -84,7 +86,7 @@ PaddlePaddle Fluid支持使用 :code:`fluid.layers.data()` 配置数据层；
   exe.run(feed={
     "sentence": create_lod_tensor(
       data=numpy.array([1, 3, 4, 5, 3, 6, 8], dtype='int64').reshape(-1, 1),
-       lod=[[4, 1, 2]],
+       recursive_seq_lens=[[4, 1, 2]],
       place=fluid.CPUPlace()
     )
   })

--- a/doc/fluid/user_guides/howto/prepare_data/feeding_data_en.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/feeding_data_en.rst
@@ -45,6 +45,8 @@ For example:
 .. code-block:: python

   exe = fluid.Executor(fluid.CPUPlace())
+   # init Program
+   exe.run(fluid.default_startup_program())
   exe.run(feed={
      "image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
      "label": numpy.random.random(size=(32, 1)).astype('int64')
@@ -81,7 +83,7 @@ For example:
   exe.run(feed={
     "sentence": create_lod_tensor(
       data=numpy.array([1, 3, 4, 5, 3, 6, 8], dtype='int64').reshape(-1, 1),
-       lod=[4, 1, 2],
+       recursive_seq_lens=[[4, 1, 2]],
       place=fluid.CPUPlace()
     )
   })

--- a/doc/fluid/user_guides/howto/prepare_data/index.rst
+++ b/doc/fluid/user_guides/howto/prepare_data/index.rst
--- a/doc/fluid/user_guides/howto/training/deploy_ctr_on_baidu_cloud_cn.rst
+++ b/doc/fluid/user_guides/howto/training/deploy_ctr_on_baidu_cloud_cn.rst
 ..  _deploy_ctr_on_baidu_cloud_cn:

-百度云分布式训练CTR
+在百度云分布式训练CTR
 =========================

 Fluid支持数据并行的分布式训练，也支持基于Kubernetes的分布式部署。本文以百度云为例，说明如何通过在云服务器上分布式训练Click-Through-Rate（以下简称ctr）任务。

--- a/doc/fluid/user_guides/howto/training/index.rst
+++ b/doc/fluid/user_guides/howto/training/index.rst
@@ -7,6 +7,6 @@ PaddlePaddle Fluid支持单机训练和多节点训练。每种训练模式下
 .. toctree::
   :maxdepth: 1

-   single_node
-   multi_node
-   save_load_variables
+   single_node.rst
+   multi_node.rst
+   save_load_variables.rst
--- a/doc/fluid/user_guides/howto/training/multi_node.rst
+++ b/doc/fluid/user_guides/howto/training/multi_node.rst
@@ -2,9 +2,10 @@
 多机训练
 ########

-.. toctree::
-   :maxdepth: 1
+..  toctree::
+    :maxdepth: 1

-   cluster_quick_start.rst
-   cluster_howto.rst
-   train_on_baidu_cloud_cn.rst
+    cluster_quick_start.rst
+    cluster_howto.rst
+    train_on_baidu_cloud_cn.rst
+    deploy_ctr_on_baidu_cloud_cn.rst
--- a/doc/fluid/user_guides/howto/training/save_load_variables.rst
+++ b/doc/fluid/user_guides/howto/training/save_load_variables.rst
 .. _user_guide_save_load_vars:

-##################
-模型/变量的保存、载入与增量训练
-##################
+#############################
+模型/变量的保存/载入与增量训练
+#############################

 模型变量分类
 ############
@@ -69,7 +69,7 @@
 载入模型用于对新样本的预测
 ==========================

-对于通过 :code:`fluid.io.save_params` 保存的模型，可以使用 :code:`fluid.io.load_params` 
+对于通过 :code:`fluid.io.save_params` 保存的模型，可以使用 :code:`fluid.io.load_params`
 来进行载入。

 例如：
@@ -149,7 +149,7 @@
    fluid.io.load_persistables(exe, path, startup_prog)
    main_prog = fluid.default_main_program()
    exe.run(main_prog)
-    
+
 上面的例子中，通过调用 :code:`fluid.io.load_persistables` 函数，PaddlePaddle Fluid会从默认
 :code:`fluid.Program` 也就是 :code:`prog` 的所有模型变量中找出长期变量，从指定的 :code:`path` 目录中将它们一一加载， 然后再继续进行训练。


--- a/doc/fluid/user_guides/howto/training/single_node.rst
+++ b/doc/fluid/user_guides/howto/training/single_node.rst
@@ -77,25 +77,25 @@

 多卡训练
 #######################
-在多卡训练中，你可以使用:code:`fluid.compiler.CompiledProgram`来编译:code:`fluid.Program`，然后调用:code:`with_data_parallel`。例如：
+在多卡训练中，你可以使用 :code:`fluid.compiler.CompiledProgram` 来编译 :code:`fluid.Program` ，然后调用 :code:`with_data_parallel` 。例如：

 .. code-block:: python
-   
+
    exe = fluid.Executor(...)
-    
+
    compiled_prog = fluid.compiler.CompiledProgram(
        fluid.default_main_program()).with_data_parallel(
            loss_name=loss.name)
-           
-    result = exe.run(program=compiled_prog, 
-                    fetch_list=[loss.name], 
-                    feed={"image": ..., "label": ...}) 
+
+    result = exe.run(program=compiled_prog,
+                    fetch_list=[loss.name],
+                    feed={"image": ..., "label": ...})

 注释：

-1. :ref:`cn_api_fluid_CompiledProgram`的构造函数需要经过:code:`fluid.Program`设置后运行，这在运行时内无法被修改。
-2. 如果:code:`exe`是用CUDAPlace来初始化的，模型会在GPU中运行。在显卡训练模式中，所有的显卡都将被占用。用户可以配置 `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ 以更改被占用的显卡。
-3. 如果:code:`exe`是用CPUPlace来初始化的，模型会在CPU中运行。在这种情况下，多线程用于运行模型，同时线程的数目和逻辑核的数目相等。用户可以配置`CPU_NUM`以更改使用中的线程数目。
+1. :ref:`cn_api_fluid_CompiledProgram` 的构造函数需要经过 :code:`fluid.Program` 设置后运行，这在运行时内无法被修改。
+2. 如果 :code:`exe` 是用CUDAPlace来初始化的，模型会在GPU中运行。在显卡训练模式中，所有的显卡都将被占用。用户可以配置 `CUDA_VISIBLE_DEVICES <http://www.acceleware.com/blog/cudavisibledevices-masking-gpus>`_ 以更改被占用的显卡。
+3. 如果 :code:`exe` 是用CPUPlace来初始化的，模型会在CPU中运行。在这种情况下，多线程用于运行模型，同时线程的数目和逻辑核的数目相等。用户可以配置 ``CPU_NUM`` 以更改使用中的线程数目。

 进阶使用
 ###############

--- a/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst
+++ b/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst
 .. _train_on_baidu_cloud_cn:

-在百度云启动Fluid分布式训练
+在百度云启动分布式训练
 =========================

 PaddlePaddle Fluid分布式训练，可以不依赖集群系统（比如MPI，Kubernetes）启动分布式训练。

--- a/doc/fluid/user_guides/index.rst
+++ b/doc/fluid/user_guides/index.rst
@@ -8,29 +8,28 @@

    - `基本概念 <../user_guides/howto/basic_concept/index_cn.html>`_ ：介绍了Fluid的基本使用概念

-    - `准备数据 <../user_guides/howto/prepare_data/index.html>`_ ：介绍使用 Fluid 训练网络时，数据的支持类型及传输方法
+    - `准备数据 <../user_guides/howto/prepare_data/index_cn.html>`_ ：介绍使用 Fluid 训练网络时，数据的支持类型及传输方法

-    - `配置简单的网络 <../user_guides/howto/configure_simple_model/index.html>`_： 介绍如何针对问题建模，并利用 Fluid 中相关算子搭建网络
+    - `配置简单的网络 <../user_guides/howto/configure_simple_model/index_cn.html>`_： 介绍如何针对问题建模，并利用 Fluid 中相关算子搭建网络

-    - `训练神经网络 <../user_guides/howto/training/index.html>`_：介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量
+    - `训练神经网络 <../user_guides/howto/training/index_cn.html>`_：介绍如何使用 Fluid 进行单机训练、多机训练、以及保存和载入模型变量

-    - `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index.html>`_：介绍在 Fluid 下进行模型评估和调试的方法，包括：

-	- `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.md>`_：介绍在 Fluid 下使用DyGraph
+	- `DyGraph模式 <../user_guides/howto/dygraph/DyGraph.md>`_：介绍在 Fluid 下使用DyGraph       
+
+    - `模型评估与调试 <../user_guides/howto/evaluation_and_debugging/index_cn.html>`_：介绍在 Fluid 下进行模型评估和调试的方法，包括：

 基于 Fluid 复现的多领域经典模型：

    - `Fluid 模型库 <../user_guides/models/index_cn.html>`_

-
-
 ..  toctree::
    :hidden:

    howto/basic_concept/index_cn.rst
-    howto/prepare_data/index
-    howto/configure_simple_model/index
-    howto/training/index
-    howto/evaluation_and_debugging/index
+    howto/prepare_data/index_cn.rst
+    howto/configure_simple_model/index_cn.rst
+    howto/training/index_cn.rst
+    howto/evaluation_and_debugging/index_cn.rst
    howto/dygraph/DyGraph.md
    models/index_cn.rst
--- a/doc/fluid/user_guides/models/index_en.rst
+++ b/doc/fluid/user_guides/models/index_en.rst
@@ -87,7 +87,7 @@ Automatic Speech Recognition (ASR) is a technique for transcribing vocabulary co

 Different from the end-to-end direct prediction for word distribution of the deep learning model  `DeepSpeech <https://github.com/PaddlePaddle/DeepSpeech>`__ , this example is closer to the traditional language recognition process. With phoneme as the modeling unit, it focuses on the training of acoustic models in speech recognition, use `kaldi <http://www.kaldi-asr.org>`__ for feature extraction and label alignment of audio data, and integrate kaldi's decoder to complete decoding.

- `DeepASR <https://github.com/PaddlePaddle/models/blob/develop/DeepASR/README_cn.md>`__
+- `DeepASR <https://github.com/PaddlePaddle/models/blob/develop/PaddleSpeech/DeepASR/README.md>`__

 Machine Translation
 ---------------------