optimize the description for return value test=develop

f799229c · peizhilin · cf85767c · 58871b0d · f799229c · f799229c
93 changed file
--- a/.gitmodules
+++ b/.gitmodules
 [submodule "external/book"]
 	path = external/book
 	url = https://github.com/PaddlePaddle/book
-[submodule "external/Anakin"]
-	path = external/Anakin
-	url = https://github.com/PaddlePaddle/Anakin
-[submodule "external/paddle-mobile"]
-	path = external/paddle-mobile
-	url = https://github.com/PaddlePaddle/paddle-mobile
 [submodule "external/Paddle"]
 	path = external/Paddle
 	url = https://github.com/PaddlePaddle/Paddle
-[submodule "external/models"]
-	path = external/models
-	url = https://github.com/PaddlePaddle/models
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@ English | [简体中文](./README_cn.md)
 FluidDoc consolidates all the documentations related to Paddle. It supplies the contents to PaddlePaddle.org via CI. 

 # Architecture
-FluidDoc submodules Paddle, Book, Models, Mobile and Anakin under `external` folder. All submodules should be put under `external` as standard practice. 
+FluidDoc submodules Paddle, Book under `external` folder. All submodules should be put under `external` as standard practice. 

 FluidDoc then uses them as references to load up the documents. The FluidDoc constructs the whole doc-tree under the `FluidDoc/doc/fluid` folder. The entry point is `FluidDoc/doc/fluid/index_cn.rst` and `FluidDoc/doc/fluid/index_en.rst`

@@ -22,7 +22,7 @@ To preview documents constructured by FluidDoc. Please follow the [regular previ
 # Publish New release
 1. Checkout a new release branch. The branch name should follow `release/<version>`
 1. Update the documentations on the submodules or within FluidDoc
-1. Make sure all the submodules are ready for release. Paddle, book, model, mobile and Anakin should all have stable commits. Note: Paddle repo should update the API RST files accordinly if Paddle changes the included module/classes. 
+1. Make sure all the submodules are ready for release. Paddle, book should all have stable commits. Note: Paddle repo should update the API RST files accordinly if Paddle changes the included module/classes. 
 1. Update the submodules under `external` folder and commit the changes.
 1. Git push the branch to Github, Travis CI will start several builds to publish the documents to the PaddlePaddle.org server
 1. Please notify the PaddlePaddle.org team that the release content is ready. PaddlePaddle.org team should enable the version and update the default version to the latest one. PaddlePaddle.org should also update the search index accordingly (Until the search server is up)
--- a/README_cn.md
+++ b/README_cn.md
@@ -9,7 +9,7 @@ FluidDoc包含了所有PaddlePaddle相关的文档，它通过CI系统为PaddleP

 # 架构

-FluidDoc将Paddle, Book, Models, Mobile and Anakin作为子模块，并放置在 `external` 目录下。按照标准做法，所有的子模块应当置于`external` 目录下
+FluidDoc将Paddle, Book 作为子模块，并放置在 `external` 目录下。按照标准做法，所有的子模块应当置于`external` 目录下

 FluidDoc通过引用这些子模块来加载这些Repo中的文档。FluidDoc在 `FluidDoc/doc/fluid` 目录下构建了文档的整体树形结构。可以分别在 `FluidDoc/doc/fluid/index_cn.rst` 和 `FluidDoc/doc/fluid/index_en.rst` 查看。

@@ -26,7 +26,7 @@ FluidDoc 需要Paddle Repo的python模块去编译生成API文档。但由于Pad
 ## 发布新的分支
 1. 创建一个新的分支，此分支的名字应遵循`release/<version>`
 1. 在FluidDoc和子模块中更新文档
-1. 确认所有的子模块中处于发布就绪的状态。Paddle, book, model, mobile and Anakin 应全部有稳定的commit
+1. 确认所有的子模块中处于发布就绪的状态。Paddle, book 应全部有稳定的commit
 请注意：如果Paddle Repo更改了module/classes，涉及API文档的RST文件应当也被更新
 1. 在 `external` 中更新文件然后commit文档变更
 1. 将这个分支push到Github，Travis CI将会启动几项构建工作以把文档发布到PaddlePaddle.org的服务器

--- a/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst
@@ -7,16 +7,20 @@ ParamAttr

 .. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)

-该类代表了参数的各种属性。 为了使神经网络训练过程更加流畅，用户可以根据需要调整参数属性。比如learning rate（学习率）, regularization（正则化）, trainable（可训练性）, do_model_average(平均化模型)和参数初始化方法.
+创建一个参数属性对象，用户可设置参数的名称、初始化方式、学习率、正则化规则、是否需要训练、梯度裁剪方式、是否做模型平均等属性。

 参数:
-    - **name** (str) – 参数名。默认为None。
-    - **initializer** (Initializer) – 初始化该参数的方法。 默认为None
-    - **learning_rate** (float) – 参数的学习率。计算方法为 :math:`global\_lr*parameter\_lr∗scheduler\_factor` 。 默认为1.0
-    - **regularizer** (WeightDecayRegularizer) – 正则因子. 默认为None
-    - **trainable** (bool) – 该参数是否可训练。默认为True
-    - **gradient_clip** (BaseGradientClipAttr) – 减少参数梯度的方法。默认为None
-    - **do_model_average** (bool) – 该参数是否服从模型平均值。默认为False
+    - **name** (str，可选) - 参数的名称。默认值为None，表示框架自动创建参数的名称。
+    - **initializer** (Initializer，可选) - 参数的初始化方式。默认值为None，表示权重参数采用Xavier初始化方式，偏置参数采用全0初始化方式。
+    - **learning_rate** (float) - 参数的学习率。实际参数的学习率等于全局学习率乘以参数的学习率，再乘以learning rate schedule的系数。
+    - **regularizer** (WeightDecayRegularizer，可选) - 正则化因子。默认值为None，表示没有正则化因子。
+    - **trainable** (bool) - 参数是否需要训练。默认值为True，表示需要训练。
+    - **gradient_clip** (BaseGradientClipAttr，可选) - 梯度裁剪方式。默认值为None，表示不需要梯度裁剪。
+    - **do_model_average** (bool) - 是否做模型平均。默认值为False，表示不做模型平均。
+
+返回: 表示参数属性的对象。
+
+返回类型: ParamAttr

 **代码示例**

@@ -28,18 +32,8 @@ ParamAttr
                                   learning_rate=0.5,
                                   regularizer=fluid.regularizer.L2Decay(1.0),
                                   trainable=True)
+   print(w_param_attrs.name) # "fc_weight"
   x = fluid.layers.data(name='X', shape=[1], dtype='float32')
   y_predict = fluid.layers.fc(input=x, size=10, param_attr=w_param_attrs)


-
-
-
-
-
-
-
-
-
-
-
--- a/doc/fluid/api_cn/fluid_cn/Program_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/Program_cn.rst
--- a/doc/fluid/api_cn/fluid_cn/create_lod_tensor_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/create_lod_tensor_cn.rst
@@ -6,45 +6,41 @@ create_lod_tensor

 .. py:function:: paddle.fluid.create_lod_tensor(data, recursive_seq_lens, place)

+从一个numpy数组、list或LoDTensor创建一个新的LoDTensor。

-该函数从一个numpy数组，列表或者已经存在的lod tensor中创建一个lod tensor。
+具体实现方法如下:

-通过一下几步实现:
+1. 检查基于序列长度的LoD（length-based LoD），即参数中的 :code:`recursive_seq_lens` 是否正确。

-1. 检查length-based level of detail (LoD,长度为基准的细节层次)，或称recursive_sequence_lengths(递归序列长度)的正确性
+2. 将 :code:`recursive_seq_lens` 转换为基于偏移量的LoD（offset-based LoD）。

-2. 将recursive_sequence_lengths转化为offset-based LoD(偏移量为基准的LoD)
+3. 根据place参数，把所提供的 :code:`data` （numpy数组、list或LoDTensor）的数据复制到CPU或GPU上。

-3. 把提供的numpy数组，列表或者已经存在的lod tensor复制到CPU或GPU中(依据执行场所确定)
+4. 将基于偏移量的LoD设置到输出的LoDTensor中。

-4. 利用offset-based LoD来设置LoD
+假设我们想创建一个LoDTensor表示词的序列，其中每个词用一个整数id表示。若待创建的LoDTensor表示2个句子，其中一个句子包含2个单词，另一个句子包含3个单词。

-例如：
-假如我们想用LoD Tensor来承载一词序列的数据，其中每个词由一个整数来表示。现在，我们意图创建一个LoD Tensor来代表两个句子，其中一个句子有两个词，另外一个句子有三个。那么数 ``data`` 可以是一个numpy数组，形状为（5,1）。同时， ``recursive_seq_lens`` 为 [[2, 3]]，表明各个句子的长度。这个长度为基准的 ``recursive_seq_lens`` 将在函数中会被转化为以偏移量为基准的 LoD [[0, 2, 5]]。
+那么， :code:`data` 为一个维度为(5, 1)的numpy整数数组； :code:`recursive_seq_lens` 为[[2, 3]]，表示每个句子含的单词个数。在该接口内部，基于序列长度的
+:code:`recursive_seq_lens` [[2, 3]]会转换为为基于偏移量的LoD [[0, 2, 5]]。

-.. code-block:: python
-
-        import paddle.fluid as fluid
-        import numpy as np
-     
-        t = fluid.create_lod_tensor(np.ndarray([5, 30]), [[2, 3]], fluid.CPUPlace())
-
-参考 :ref:`api_guide_tensor` 以获取更多关于LoD的信息。
+请查阅 :ref:`cn_user_guide_lod_tensor` 了解更多关于LoD的介绍。

 参数:
-  - **data** (numpy.ndarray|list|LoDTensor) – 容纳着待复制数据的一个numpy数组、列表或LoD Tensor
-  - **recursive_seq_lens** (list) – 一组列表的列表， 表明了由用户指明的length-based level of detail信息
-  - **place** (Place) – CPU或GPU。 指明返回的新LoD Tensor存储地点
-
-返回: 一个fluid LoDTensor对象，包含数据和 ``recursive_seq_lens`` 信息
-
-
-
-
+    - **data** (numpy.ndarray|list|LoDTensor) - 表示LoDTensor数据的numpy数组、list或LoDTensor。
+    - **recursive_seq_lens** (list[list[int]]) - 基于序列长度的LoD信息。
+    - **place** (CPUPlace|CUDAPlace) - 表示返回的LoDTensor存储在CPU或GPU place中。

+返回: 包含数据信息和序列长度信息的LoDTensor。

+返回类型: LoDTensor

+**代码示例**

+.. code-block:: python

+        import paddle.fluid as fluid
+        import numpy as np
+     
+        t = fluid.create_lod_tensor(np.ndarray([5, 30]), [[2, 3]], fluid.CPUPlace())


--- a/doc/fluid/api_cn/fluid_cn/create_random_int_lodtensor_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/create_random_int_lodtensor_cn.rst
@@ -6,29 +6,27 @@ create_random_int_lodtensor

 .. py:function:: paddle.fluid.create_random_int_lodtensor(recursive_seq_lens, base_shape, place, low, high)

+创建一个包含随机整数的LoDTensor。

+具体实现方法如下：

-该函数创建一个存储多个随机整数的LoD Tensor。
+1. 基于序列长度 :code:`recursive_seq_lens` 和 :code:`base_shape` 产生返回值的维度。返回值的第一维等于序列总长度，其余维度为 :code:`base_shape` 。

-该函数是经常在书中出现的案例，所以我们根据新的API： ``create_lod_tensor`` 更改它然后放在LoD Tensor板块里来简化代码。
+2. 创建一个包含随机整数的numpy数组，并作为 :code:`data` 参数传入 :ref:`cn_api_fluid_create_lod_tensor` 接口中创建LoDTensor返回。

-该函数实现以下功能：
-
-1. 根据用户输入的length-based ``recursive_seq_lens`` （基于长度的递归序列长）和在 ``basic_shape`` 中的基本元素形状计算LoDTensor的整体形状
-2. 由此形状，建立numpy数组
-3. 使用API： ``create_lod_tensor`` 建立LoDTensor
-
-
-假如我们想用LoD Tensor来承载一词序列，其中每个词由一个整数来表示。现在，我们意图创建一个LoD Tensor来代表两个句子，其中一个句子有两个词，另外一个句子有三个。那么 ``base_shape`` 为[1], 输入的length-based ``recursive_seq_lens`` 是 [[2, 3]]。那么LoDTensor的整体形状应为[5, 1]，并且为两个句子存储5个词。
+假设我们想创建一个LoDTensor表示序列信息，共包含2个序列，维度分别为[2, 30]和[3, 30]，那么序列长度 :code:`recursive_seq_lens` 传入[[2, 3]]，:code:`base_shape` 传入[30]（即除了序列长度以外的维度）。
+最后返回的LoDTensor的维度为[5, 30]，其中第一维5为序列总长度，其余维度为 :code:`base_shape` 。

 参数:
-    - **recursive_seq_lens** (list) – 一组列表的列表， 表明了由用户指明的length-based level of detail信息
-    - **base_shape** (list) – LoDTensor所容纳的基本元素的形状
-    - **place** (Place) –  CPU或GPU。 指明返回的新LoD Tensor存储地点
-    - **low** (int) – 随机数下限
-    - **high** (int) – 随机数上限
+    - **recursive_seq_lens** (list[list[int]]) - 基于序列长度的LoD信息。
+    - **base_shape** (list) - 除第一维以外输出结果的维度信息。
+    - **place** (CPUPlace|CUDAPlace) - 表示返回的LoDTensor存储在CPU或GPU place中。
+    - **low** (int) - 随机整数的下限值。
+    - **high** (int) - 随机整数的上限值，必须大于或等于low。
+
+返回: 包含随机整数数据信息和序列长度信息的LoDTensor，数值范围在[low, high]之间。

-返回: 一个fluid LoDTensor对象，包含张量数据和 ``recursive_seq_lens`` 信息
+返回类型: LoDTensor

 **代码示例**

@@ -37,4 +35,5 @@ create_random_int_lodtensor
        import paddle.fluid as fluid
     
        t = fluid.create_random_int_lodtensor(recursive_seq_lens=[[2, 3]],base_shape=[30], place=fluid.CPUPlace(), low=0, high=10)
+        print(t.shape()) # [5, 30]

--- a/doc/fluid/api_cn/fluid_cn/default_main_program_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/default_main_program_cn.rst
@@ -6,20 +6,20 @@ default_main_program
 .. py:function:: paddle.fluid.default_main_program()


+此接口可以获取当前用于存储op和variable描述信息的 ``default main program``

+``fluid.layers`` 接口中添加的op和variable会存储在 ``default main program`` 中

+``default main program`` 是fluid的许多编程接口中Program参数的默认值。例如对于 ``Executor.run()`` 如果用户没有传入Program参数，会默认使用 ``default main program`` 

-此函数用于获取默认或全局main program(主程序)。该主程序用于训练和测试模型。
+可以使用 :ref:`cn_api_fluid_program_guard` 来替换 ``default main program`` 

-``fluid.layers`` 中的所有layer函数可以向 ``default_main_program`` 中添加operators（算子）和variables（变量）。
+参数: 
+    - 无

-``default_main_program`` 是fluid的许多编程接口（API）的Program参数的缺省值。例如,当用户program没有传入的时候，
-``Executor.run()`` 会默认执行 ``default_main_program`` 。
+返回： 当前默认用于存储op和variable描述的Program

-
-返回： main program
-
-返回类型: Program
+返回类型： :ref:`cn_api_fluid_Program`

 **代码示例**


--- a/doc/fluid/api_cn/fluid_cn/memory_optimize_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/memory_optimize_cn.rst
@@ -5,48 +5,5 @@ memory_optimize

 .. py:function:: paddle.fluid.memory_optimize(input_program, skip_opt_set=None, print_log=False, level=0, skip_grads=True)

-历史遗留的内存优化策略，通过在不同operators间重用var内存来减少总内存消耗。
-用一个简单的示例来解释该算法：
-
-c = a + b  # 假设这里是最后一次使用a
-d = b * c
-
-鉴于在“c = a + b”之后不再使用a，且a和d的大小相同，我们可以用变量a来代替变量d，即实际上，上面的代码可以优化成：
-
-c = a + b
-a = b * c
-     
-请注意，在此历史遗存设计中，我们将直接用变量a代替变量d，这意味着在你调用该API后，某些变量将会消失，还有一些会取非预期值。正如上面的例子中，执行程序后，实际上a取d的值。
-    
-因此，为避免重要变量在优化过程中被重用或移除，我们支持用skip_opt_set指定一个变量白名单。skip_opt_set中的变量不会受memory_optimize API的影响。
-     
-     
-.. note::
-    
-     此API已被弃用，请不要在你新写的代码中使用它。它不支持block中嵌套子block，如While、IfElse等。
-
-参数:
-  - **input_program** (str) – 输入Program。
-  - **skip_opt_set** (set) – set中的vars将不被内存优化。
-  - **print_log** (bool) – 是否打印debug日志。
-  - **level** (int) - 值为0或1。如果level=0，则仅当a.size == b.size时我们才用b代替a；如果level=1，只要a.size <= b.size时我们就可以用b代替a。
-
-返回: None
-
-**示例代码**
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-    main_prog = fluid.Program()
-    startup_prog = fluid.Program()
-     
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-     
-    exe.run(startup_prog)
-    fluid.memory_optimize(main_prog)
-
-
-
+**从1.6版本开始此接口不再推荐使用，请不要在新写的代码中使用它，1.6+版本已默认开启更优的存储优化策略**

--- a/doc/fluid/api_cn/initializer_cn/NormalInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/NormalInitializer_cn.rst
@@ -5,20 +5,20 @@ NormalInitializer

 .. py:class:: paddle.fluid.initializer.NormalInitializer(loc=0.0, scale=1.0, seed=0)

-随机正态（高斯）分布初始化器
+随机正态(高斯)分布初始化函数

 参数：
-        - **loc** （float） - 正态分布的平均值
-        - **scale** （float） - 正态分布的标准差
-        - **seed** （int） - 随机种子
+    - **loc** (float) - 正态分布的平均值
+    - **scale** (float) - 正态分布的标准差
+    - **seed** (int) - 随机种子

 **代码示例**

 .. code-block:: python

-        import paddle.fluid as fluid
-        x = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
-        fc = fluid.layers.fc(input=x, size=10,
-            param_attr=fluid.initializer.Normal(loc=0.0, scale=2.0))
+    import paddle.fluid as fluid
+    x = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
+    fc = fluid.layers.fc(input=x, size=10,
+        param_attr=fluid.initializer.Normal(loc=0.0, scale=2.0))


--- a/doc/fluid/api_cn/initializer_cn/TruncatedNormalInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/TruncatedNormalInitializer_cn.rst
@@ -5,12 +5,12 @@ TruncatedNormalInitializer

 .. py:class:: paddle.fluid.initializer.TruncatedNormalInitializer(loc=0.0, scale=1.0, seed=0)

-Random Truncated Normal（高斯）分布初始化器
+Random Truncated Normal(高斯)分布初始化函数

 参数：
-        - **loc** （float） - 正态分布的平均值
-        - **scale** （float） - 正态分布的标准差
-        - **seed** （int） - 随机种子
+    - **loc** (float) - 正态分布的平均值
+    - **scale** (float) - 正态分布的标准差
+    - **seed** (int) - 随机种子

 **代码示例**


--- a/doc/fluid/api_cn/initializer_cn/UniformInitializer_cn.rst
+++ b/doc/fluid/api_cn/initializer_cn/UniformInitializer_cn.rst
@@ -8,9 +8,9 @@ UniformInitializer
 随机均匀分布初始化器

 参数：
-        - **low** (float) - 下界 
-        - **high** (float) - 上界
-        - **seed** (int) - 随机种子
+    - **low** (float) - 下界 
+    - **high** (float) - 上界
+    - **seed** (int) - 随机种子

 **代码示例**


--- a/doc/fluid/api_cn/io_cn/DataLoader_cn.rst
+++ b/doc/fluid/api_cn/io_cn/DataLoader_cn.rst
+.. _cn_api_fluid_io_DataLoader:
+
+DataLoader
+-------------------------------
+
+.. py:class:: paddle.fluid.io.DataLoader
+
+
+.. py:method:: from_generator(feed_list=None, capacity=None, use_double_buffer=True, iterable=True, return_list=False)
+
+创建一个DataLoader对象用于加载Python生成器产生的数据。数据会由Python线程预先读取，并异步送入一个队列中。
+
+本方法创建的DataLoader对象提供了3个方法设置数据源，分别是 :code:`set_sample_generator` , :code:`set_sample_list_generator` 和
+:code:`set_batch_generator` 。请查阅下述示例代码了解它们的使用方法。
+
+如果iterable = True，本方法创建的DataLoader对象时一个Python生成器，可以for-range的方法循环迭代。
+
+如果iterable = False，本方法创建的DataLoader对象提供 :code:`start()` 和 :code:`reset()` 方法控制数据读取过程。此模式用于兼容
+``fluid.layers.py_reader`` 的使用方式。用户可使用iterable = False模式，方便地将 ``fluid.layers.py_reader`` 的代码迁移至
+``fluid.io.DataLoader`` 。
+
+参数:
+    - **feed_list** (list(Variable)|tuple(Variable)) - feed变量列表，由 ``fluid.layers.data()`` 创建。
+    - **capacity** (int) - DataLoader对象内部维护队列的容量大小。单位是batch数量。若reader读取速度较快，建议设置较大的capacity值。
+    - **use_double_buffer** (bool) - 是否使用 ``double_buffer_reader`` 。若use_double_buffer=True，DataLoader会异步地预读取下一个batch的数据，可加速数据读取过程，但同时会占用少量的CPU/GPU存储，即一个batch输入数据的存储空间。
+    - **iterable** (bool) - 所创建的DataLoader对象是否可迭代。
+    - **return_list** (bool) - 每个设备上的数据是否以list形式返回。仅在iterable = True模式下有效。若return_list = False，每个设备上的返回数据均是str -> LoDTensor的映射表，其中映射表的key是每个输入变量的名称。若return_list = True，则每个设备上的返回数据均是list(LoDTensor)。推荐在静态图模式下使用return_list = False，在动态图模式下使用return_list = True。
+
+返回: 被创建的DataLoader对象
+
+返回类型: loader (DataLoader)
+
+**代码示例**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import numpy as np
+
+            BATCH_NUM = 10
+            BATCH_SIZE = 16
+            EPOCH_NUM = 4
+
+            CLASS_NUM = 10
+
+            ITERABLE = True # whether the created DataLoader object is iterable
+            USE_GPU = False # whether to use GPU
+
+            DATA_FORMAT = 'batch_generator' # data format of data source user provides
+
+            def simple_net(image, label):
+                fc_tmp = fluid.layers.fc(image, size=CLASS_NUM)
+                cross_entropy = fluid.layers.softmax_with_cross_entropy(image, label)
+                loss = fluid.layers.reduce_mean(cross_entropy)
+                sgd = fluid.optimizer.SGD(learning_rate=1e-3)
+                sgd.minimize(loss)
+                return loss
+
+            def get_random_images_and_labels(image_shape, label_shape):
+                image = np.random.random(size=image_shape).astype('float32')
+                label = np.random.random(size=label_shape).astype('int64')
+                return image, label
+
+            # If the data generator yields one sample each time,
+            # use DataLoader.set_sample_generator to set the data source.
+            def sample_generator_creator():
+                def __reader__():
+                    for _ in range(BATCH_NUM * BATCH_SIZE):
+                        image, label = get_random_images_and_labels([784], [1])
+                        yield image, label
+
+                return __reader__
+
+            # If the data generator yield list of samples each time,
+            # use DataLoader.set_sample_list_generator to set the data source.
+            def sample_list_generator_creator():
+                def __reader__():
+                    for _ in range(BATCH_NUM):
+                        sample_list = []
+                        for _ in range(BATCH_SIZE):
+                            image, label = get_random_images_and_labels([784], [1])
+                            sample_list.append([image, label])
+
+                        yield sample_list
+
+                return __reader__
+
+            # If the data generator yields a batch each time,
+            # use DataLoader.set_batch_generator to set the data source.
+            def batch_generator_creator():
+                def __reader__():
+                    for _ in range(BATCH_NUM):
+                        batch_image, batch_label = get_random_images_and_labels([BATCH_SIZE, 784], [BATCH_SIZE, 1])
+                        yield batch_image, batch_label
+
+                return __reader__
+
+            # If DataLoader is iterable, use for loop to train the network
+            def train_iterable(exe, prog, loss, loader):
+                for _ in range(EPOCH_NUM):
+                    for data in loader():
+                        exe.run(prog, feed=data, fetch_list=[loss])
+
+            # If DataLoader is not iterable, use start() and reset() method to control the process
+            def train_non_iterable(exe, prog, loss, loader):
+                for _ in range(EPOCH_NUM):
+                    loader.start() # call DataLoader.start() before each epoch starts
+                    try:
+                        while True:
+                            exe.run(prog, fetch_list=[loss])
+                    except fluid.core.EOFException:
+                        loader.reset() # call DataLoader.reset() after catching EOFException
+
+            def set_data_source(loader, places):
+                if DATA_FORMAT == 'sample_generator':
+                    loader.set_sample_generator(sample_generator_creator(), batch_size=BATCH_SIZE, drop_last=True, places=places)
+                elif DATA_FORMAT == 'sample_list_generator':
+                    loader.set_sample_list_generator(sample_list_generator_creator(), places=places)
+                elif DATA_FORMAT == 'batch_generator':
+                    loader.set_batch_generator(batch_generator_creator(), places=places)
+                else:
+                    raise ValueError('Unsupported data format')
+
+            image = fluid.layers.data(name='image', shape=[784], dtype='float32')
+            label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+
+            # Define DataLoader
+            loader = fluid.io.DataLoader.from_generator(feed_list=[image, label], capacity=16, iterable=ITERABLE)
+
+            # Define network
+            loss = simple_net(image, label)
+
+            # Set data source of DataLoader
+            #
+            # If DataLoader is iterable, places must be given and the number of places must be the same with device number.
+            #  - If you are using GPU, call `fluid.cuda_places()` to get all GPU places.
+            #  - If you are using CPU, call `fluid.cpu_places()` to get all CPU places.
+            #
+            # If DataLoader is not iterable, places can be None.
+            places = fluid.cuda_places() if USE_GPU else fluid.cpu_places()
+            set_data_source(loader, places)
+
+            exe = fluid.Executor(places[0])
+            exe.run(fluid.default_startup_program())
+
+            prog = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel(loss_name=loss.name)
+
+            if loader.iterable:
+                train_iterable(exe, prog, loss, loader)
+            else:
+                train_non_iterable(exe, prog, loss, loader)
+
+
+            '''
+            Users can use return_list = True in dygraph mode.
+            '''
+            with fluid.dygraph.guard(places[0]):
+                loader = fluid.io.DataLoader.from_generator(capacity=2, return_list=True)
+                set_data_source(loader, places[0])
+                for image, label in loader():
+                    relu = fluid.layers.relu(image)
+                    assert image.shape == [BATCH_SIZE, 784]
+                    assert label.shape == [BATCH_SIZE, 1]
+                    assert relu.shape == [BATCH_SIZE, 784]
+
+
+.. py:method:: from_dataset(dataset, places, drop_last=True)
+
+创建一个DataLoader对象用于加载Dataset产生的数据。目前，Dataset仅支持Linux系统下使用。
+
+参数:
+    - **dataset** (InMemoryDataset|QueueDataset) - Dataset对象。
+    - **places** (list(CUDAPlace)|list(CPUPlace)) - DataLoader对象返回数据所在的place。
+    - **drop_last** (bool) - 是否丢弃最后样本数量不足batch size的batch。若drop_last = True则丢弃，若drop_last = False则不丢弃。
+
+返回: 被创建的DataLoader对象，可以for-range的方式循环迭代
+
+返回类型: loader (DataLoader)
+
+**代码示例**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            image = fluid.layers.data(name='image', shape=[784], dtype='float32')
+            label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+
+            dataset = fluid.DatasetFactory().create_dataset("QueueDataset")
+            dataset.set_batch_size(32)
+            dataset.set_filelist(['a.txt', 'b.txt', 'c.txt'])
+            dataset.set_use_var([image, label])
+            dataset.set_pipe_command('cat')
+
+            loader = fluid.io.DataLoader.from_dataset(dataset, fluid.cpu_places())
+
+
--- a/doc/fluid/api_cn/io_cn/PyReader_cn.rst
+++ b/doc/fluid/api_cn/io_cn/PyReader_cn.rst
@@ -9,11 +9,12 @@ PyReader
 在python中为数据输入创建一个reader对象。将使用python线程预取数据，并将其异步插入队列。当调用Executor.run（…）时，将自动提取队列中的数据。 

 参数:
-  - **feed_list** (list(Variable)|tuple(Variable))  – feed变量列表，由 ``fluid.layers.data()`` 创建。在可迭代模式下它可以被设置为None。
-  - **capacity** (int) – 在Pyreader对象中维护的队列的容量。
-  - **use_double_buffer** (bool) – 是否使用 ``double_buffer_reader`` 来加速数据输入。
-  - **iterable** (bool) –  被创建的reader对象是否可迭代。
-  - **eturn_list** (bool) –  是否以list的形式将返回值
+    - **feed_list** (list(Variable)|tuple(Variable)) - feed变量列表，由 ``fluid.layers.data()`` 创建。
+    - **capacity** (int) - PyReader对象内部维护队列的容量大小。单位是batch数量。若reader读取速度较快，建议设置较大的capacity值。
+    - **use_double_buffer** (bool) - 是否使用 ``double_buffer_reader`` 。若use_double_buffer=True，PyReader会异步地预读取下一个batch的数据，可加速数据读取过程，但同时会占用少量的CPU/GPU存储，即一个batch输入数据的存储空间。
+    - **iterable** (bool) - 所创建的DataLoader对象是否可迭代。
+    - **return_list** (bool) - 每个设备上的数据是否以list形式返回。仅在iterable = True模式下有效。若return_list = False，每个设备上的返回数据均是str -> LoDTensor的映射表，其中映射表的key是每个输入变量的名称。若return_list = True，则每个设备上的返回数据均是list(LoDTensor)。推荐在静态图模式下使用return_list = False，在动态图模式下使用return_list = True。
+

 返回: 被创建的reader对象

@@ -22,13 +23,13 @@ PyReader

 **代码示例**

-1.如果iterable=False，则创建的Pyreader对象几乎与 ``fluid.layers.py_reader（）`` 相同。算子将被插入program中。用户应该在每个epoch之前调用start（），并在epoch结束时捕获 ``Executor.run（）`` 抛出的 ``fluid.core.EOFException `` 。一旦捕获到异常，用户应该调用reset（）手动重置reader。
+1.如果iterable=False，则创建的PyReader对象几乎与 ``fluid.layers.py_reader（）`` 相同。算子将被插入program中。用户应该在每个epoch之前调用start（），并在epoch结束时捕获 ``Executor.run（）`` 抛出的 ``fluid.core.EOFException `` 。一旦捕获到异常，用户应该调用reset（）手动重置reader。

 .. code-block:: python

-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np

    EPOCH_NUM = 3
    ITER_NUM = 5
@@ -67,13 +68,13 @@ PyReader
                break


-2.如果iterable=True，则创建的Pyreader对象与程序分离。程序中不会插入任何算子。在本例中，创建的reader是一个python生成器，它是可迭代的。用户应将从Pyreader对象生成的数据输入 ``Executor.run(feed=...)`` 。
+2.如果iterable=True，则创建的PyReader对象与程序分离。程序中不会插入任何算子。在本例中，创建的reader是一个python生成器，它是可迭代的。用户应将从PyReader对象生成的数据输入 ``Executor.run(feed=...)`` 。

 .. code-block:: python

-   import paddle
-   import paddle.fluid as fluid
-   import numpy as np
+   import paddle
+   import paddle.fluid as fluid
+   import numpy as np

   EPOCH_NUM = 3
   ITER_NUM = 5
@@ -100,40 +101,34 @@ PyReader
       for data in reader():
           executor.run(feed=data)

-3. return_list=True，返回值将用list表示而非dict
+3. return_list=True，返回值将用list表示而非dict，通常用于动态图模式中。

 .. code-block:: python

-   import paddle
-   import paddle.fluid as fluid
-   import numpy as np
-
-   EPOCH_NUM = 3
-   ITER_NUM = 5
-   BATCH_SIZE = 10
-
-   def reader_creator_random_image(height, width):
-       def reader():
-           for i in range(ITER_NUM):
-               yield np.random.uniform(low=0, high=255, size=[height, width]),
-       return reader
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np

-   image = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
-   reader = fluid.io.PyReader(feed_list=[image], capacity=4, iterable=True, return_list=True)
-
-   user_defined_reader = reader_creator_random_image(784, 784)
-   reader.decorate_sample_list_generator(
-       paddle.batch(user_defined_reader, batch_size=BATCH_SIZE),
-       fluid.core.CPUPlace())
-   # 此处省略网络定义
-   executor = fluid.Executor(fluid.core.CPUPlace())
-   executor.run(fluid.default_main_program())
-
-   for _ in range(EPOCH_NUM):
-       for data in reader():
-           executor.run(feed={"image": data[0]})
+    EPOCH_NUM = 3
+    ITER_NUM = 5
+    BATCH_SIZE = 10

+    def reader_creator_random_image(height, width):
+        def reader():
+            for i in range(ITER_NUM):
+                yield np.random.uniform(low=0, high=255, size=[height, width]), \
+                    np.random.random_integers(low=0, high=9, size=[1])
+        return reader

+    place = fluid.CPUPlace()
+    with fluid.dygraph.guard(place):
+        py_reader = fluid.io.PyReader(capacity=2, return_list=True)
+        user_defined_reader = reader_creator_random_image(784, 784)
+        py_reader.decorate_sample_list_generator(
+            paddle.batch(user_defined_reader, batch_size=BATCH_SIZE),
+            place)
+        for image, label in py_reader():
+            relu = fluid.layers.relu(image)

 .. py:method:: start()

@@ -145,7 +140,7 @@ PyReader

  import paddle
  import paddle.fluid as fluid
-  import numpy as np
+  import numpy as np

  BATCH_SIZE = 10
     
@@ -179,7 +174,7 @@ PyReader

            import paddle
            import paddle.fluid as fluid
-            import numpy as np
+            import numpy as np

            BATCH_SIZE = 10
     
@@ -205,11 +200,11 @@ PyReader

 .. py:method:: decorate_sample_generator(sample_generator, batch_size, drop_last=True, places=None)

-设置Pyreader对象的数据源。
+设置PyReader对象的数据源。

 提供的 ``sample_generator`` 应该是一个python生成器，它生成的数据类型应为list(numpy.ndarray)。

-当Pyreader对象可迭代时，必须设置 ``places`` 。
+当PyReader对象可迭代时，必须设置 ``places`` 。

 如果所有的输入都没有LOD，这个方法比 ``decorate_sample_list_generator(paddle.batch(sample_generator, ...))`` 更快。

@@ -223,8 +218,8 @@ PyReader

 .. code-block:: python
     
-            import paddle.fluid as fluid
-            import numpy as np
+            import paddle.fluid as fluid
+            import numpy as np

            EPOCH_NUM = 3
            ITER_NUM = 15
@@ -258,11 +253,11 @@ PyReader

 .. py:method:: decorate_sample_list_generator(reader, places=None)

-设置Pyreader对象的数据源。
+设置PyReader对象的数据源。

 提供的 ``reader`` 应该是一个python生成器，它生成列表（numpy.ndarray）类型的批处理数据。

-当Pyreader对象不可迭代时，必须设置 ``places`` 。
+当PyReader对象不可迭代时，必须设置 ``places`` 。

 参数:
  - **reader** (generator)  – 返回列表（numpy.ndarray）类型的批处理数据的Python生成器
@@ -274,7 +269,7 @@ PyReader
            
            import paddle
            import paddle.fluid as fluid
-            import numpy as np
+            import numpy as np

            EPOCH_NUM = 3
            ITER_NUM = 15
@@ -308,11 +303,11 @@ PyReader

 .. py:method:: decorate_batch_generator(reader, places=None)

-设置Pyreader对象的数据源。
+设置PyReader对象的数据源。

 提供的 ``reader`` 应该是一个python生成器，它生成列表（numpy.ndarray）类型或LoDTensor类型的批处理数据。

-当Pyreader对象不可迭代时，必须设置 ``places`` 。
+当PyReader对象不可迭代时，必须设置 ``places`` 。

 参数:
  - **reader** (generator)  – 返回LoDTensor类型的批处理数据的Python生成器
@@ -322,8 +317,8 @@ PyReader

 .. code-block:: python

-            import paddle.fluid as fluid
-            import numpy as np
+            import paddle.fluid as fluid
+            import numpy as np

            EPOCH_NUM = 3
            ITER_NUM = 15
@@ -354,3 +349,6 @@ PyReader
                    executor.run(feed=data)


+.. py:method:: next()
+
+获取下一个数据。用户不应直接调用此方法。此方法用于PaddlePaddle框架内部实现Python 2.x的迭代器协议。
\ No newline at end of file
--- a/doc/fluid/api_cn/io_cn/load_inference_model_cn.rst
+++ b/doc/fluid/api_cn/io_cn/load_inference_model_cn.rst
@@ -5,26 +5,33 @@ load_inference_model

 .. py:function:: paddle.fluid.io.load_inference_model(dirname, executor, model_filename=None, params_filename=None, pserver_endpoints=None)

-从指定目录中加载预测模型(inference model)。通过这个API，您可以获得模型结构（预测程序）和模型参数。如果您只想下载预训练后的模型的参数，请使用load_params API。更多细节请参考 ``模型/变量的保存、载入与增量训练`` 。
+从指定文件路径中加载预测模型(Inference Model)，即调用该接口可获得模型结构（Inference Program）和模型参数。若只想加载预训练后的模型参数，请使用 :ref:`cn_api_fluid_io_load_params` 接口。更多细节请参考 :ref:`api_guide_model_save_reader` 。

-参数:
-  - **dirname** (str) – model的路径
-  - **executor** (Executor) – 运行 inference model的 ``executor``
-  - **model_filename** (str|None) –  存储着预测 Program 的文件名称。如果设置为None，将使用默认的文件名为： ``__model__``
-  - **params_filename** (str|None) –  加载所有相关参数的文件名称。如果设置为None，则参数将保存在单独的文件中。
-  - **pserver_endpoints** (list|None) – 只有在分布式预测时需要用到。 当在训练时使用分布式 look up table , 需要这个参数. 该参数是 pserver endpoints 的列表
+参数：
+  - **dirname** (str) – 待加载模型的存储路径。
+  - **executor** (Executor) – 运行 Inference Model 的 ``executor`` ，详见 :ref:`api_guide_executor` 。
+  - **model_filename** (str，可选) –  存储Inference Program结构的文件名称。如果设置为None，则使用 ``__model__`` 作为默认的文件名。默认值为None。
+  - **params_filename** (str，可选) –  存储所有模型参数的文件名称。当且仅当所有模型参数被保存在一个单独的二进制文件中，它才需要被指定。如果模型参数是存储在各自分离的文件中，设置它的值为None。默认值为None。
+  - **pserver_endpoints** (list，可选) – 只有在分布式预测时才需要用到。当训练过程中使用分布式查找表(distributed lookup table)时, 预测时需要指定pserver_endpoints的值。它是 pserver endpoints 的列表，默认值为None。

-返回: 这个函数的返回有三个元素的元组(Program，feed_target_names, fetch_targets)。Program 是一个 ``Program`` ，它是预测 ``Program``。  ``feed_target_names`` 是一个str列表，它包含需要在预测 ``Program`` 中提供数据的变量的名称。``fetch_targets`` 是一个 ``Variable`` 列表，从中我们可以得到推断结果。
+返回：该接口返回一个包含三个元素的列表(program，feed_target_names, fetch_targets)。它们的含义描述如下：
+  - **program** （Program）– ``Program`` （详见 :ref:`api_guide_Program` ）类的实例。此处它被用于预测，因此可被称为Inference Program。
+  - **feed_target_names** （list）– 字符串列表，包含着Inference Program预测时所需提供数据的所有变量名称（即所有输入变量的名称）。
+  - **fetch_targets** （list）– ``Variable`` （详见 :ref:`api_guide_Program` ）类型列表，包含着模型的所有输出变量。通过这些输出变量即可得到模型的预测结果。

-返回类型：元组(tuple)
+**返回类型：** 列表（list）

 抛出异常：
-   - ``ValueError`` – 如果 ``dirname`` 非法 
+  - ``ValueError`` – 如果接口参数 ``dirname`` 指向一个不存在的文件路径，则抛出异常。
+
+**代码示例**

 .. code-block:: python

        import paddle.fluid as fluid
        import numpy as np
+
+        # 构建模型
        main_prog = fluid.Program()
        startup_prog = fluid.Program()
        with fluid.program_guard(main_prog, startup_prog):
@@ -36,26 +43,29 @@ load_inference_model
        place = fluid.CPUPlace()
        exe = fluid.Executor(place)
        exe.run(startup_prog)
+
+        # 保存预测模型
        path = "./infer_model"
        fluid.io.save_inference_model(dirname=path, feeded_var_names=['img'],target_vars=[hidden_b], executor=exe, main_program=main_prog)
-        tensor_img = np.array(np.random.random((1, 64, 784)), dtype=np.float32)
+
+        # 示例一: 不需要指定分布式查找表的模型加载示例，即训练时未用到distributed lookup table。
        [inference_program, feed_target_names, fetch_targets] = (fluid.io.load_inference_model(dirname=path, executor=exe))
-        
+        tensor_img = np.array(np.random.random((1, 64, 784)), dtype=np.float32)
        results = exe.run(inference_program,
                  feed={feed_target_names[0]: tensor_img},
                  fetch_list=fetch_targets)

-        # endpoints是pserver服务器终端列表，下面仅为一个样例
+        # 示例二: 若训练时使用了distributed lookup table，则模型加载时需要通过endpoints参数指定pserver服务器结点列表。
+        # pserver服务器结点列表主要用于分布式查找表进行ID查找时使用。下面的["127.0.0.1:2023","127.0.0.1:2024"]仅为一个样例。
        endpoints = ["127.0.0.1:2023","127.0.0.1:2024"]
-        # 如果需要查询表格，我们可以使用：
        [dist_inference_program, dist_feed_target_names, dist_fetch_targets] = (
            fluid.io.load_inference_model(dirname=path,
                                          executor=exe,
                                          pserver_endpoints=endpoints))

-        # 在这个示例中，inference program 保存在“ ./infer_model/__model__”中
-        # 参数保存在“./infer_mode ”单独的若干文件中
-        # 加载 inference program 后， executor 使用 fetch_targets 和 feed_target_names 执行Program，得到预测结果
+        # 在上述示例中，inference program 被保存在“ ./infer_model/__model__”文件内，
+        # 参数保存在“./infer_mode ”单独的若干文件内。
+        # 加载 inference program 后， executor可使用 fetch_targets 和 feed_target_names 执行Program，并得到预测结果。




--- a/doc/fluid/api_cn/layers_cn.rst
+++ b/doc/fluid/api_cn/layers_cn.rst
@@ -61,6 +61,7 @@ fluid.layers
    layers_cn/create_tensor_cn.rst
    layers_cn/crf_decoding_cn.rst
    layers_cn/crop_cn.rst
+    layers_cn/crop_tensor_cn.rst
    layers_cn/cross_entropy_cn.rst
    layers_cn/ctc_greedy_decoder_cn.rst
    layers_cn/cumsum_cn.rst
@@ -157,7 +158,6 @@ fluid.layers
    layers_cn/lstm_cn.rst
    layers_cn/lstm_unit_cn.rst
    layers_cn/margin_rank_loss_cn.rst
-    layers_cn/match_matrix_tensor_cn.rst
    layers_cn/matmul_cn.rst
    layers_cn/maxout_cn.rst
    layers_cn/mean_cn.rst
@@ -285,7 +285,6 @@ fluid.layers
    layers_cn/thresholded_relu_cn.rst
    layers_cn/topk_cn.rst
    layers_cn/transpose_cn.rst
-    layers_cn/tree_conv_cn.rst
    layers_cn/unfold_cn.rst
    layers_cn/Uniform_cn.rst
    layers_cn/uniform_random_cn.rst
@@ -294,7 +293,6 @@ fluid.layers
    layers_cn/unique_with_counts_cn.rst
    layers_cn/unsqueeze_cn.rst
    layers_cn/unstack_cn.rst
-    layers_cn/var_conv_2d_cn.rst
    layers_cn/warpctc_cn.rst
    layers_cn/where_cn.rst
    layers_cn/While_cn.rst

--- a/doc/fluid/api_cn/layers_cn/abs_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/abs_cn.rst
@@ -11,11 +11,12 @@ abs
    out = |x|

 参数:
+    - **x** (Variable)- 多维Tenosr，数据类型为float32或float64。
+    - **name** (str) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。

-    - **x** - abs算子的输入
-    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
+返回：表示绝对值结果的Tensor，数据类型与x相同。

-返回：        abs算子的输出。
+返回类型：Variable

 **代码示例**：

@@ -24,5 +25,3 @@ abs
        import paddle.fluid as fluid
        data = fluid.layers.data(name="input", shape=[32, 784])
        result = fluid.layers.abs(data)
-
-
--- a/doc/fluid/api_cn/layers_cn/accuracy_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/accuracy_cn.rst
@@ -7,37 +7,43 @@ accuracy

 accuracy layer。 参考 https://en.wikipedia.org/wiki/Precision_and_recall

-使用输入和标签计算准确率。 每个类别中top k 中正确预测的个数。注意：准确率的 dtype 由输入决定。 输入和标签 dtype 可以不同。
+使用输入和标签计算准确率。 如果正确的标签在topk个预测值里，则计算结果加1。注意：输出正确率的类型由input类型决定，input和lable的类型可以不一样。

 参数：
-    - **input** (Variable)-该层的输入，即网络的预测。支持 Carry LoD。
-    - **label** (Variable)-数据集的标签。
-    - **k** (int) - 每个类别的 top k
-    - **correct** (Variable)-正确的预测个数。
-    - **total** (Variable)-总共的样本数。
+    - **input** (Tensor|LoDTensor)-数据类型为float32,float64。输入为网络的预测值。
+    - **label** (Tensor|LoDTensor)-数据类型为int64，int32。输入为数据集的标签。
+    - **k** (int64|int32) - 取每个类别中k个预测值用于计算。
+    - **correct** (int64|int32)-正确预测值的个数。
+    - **total** (int64|int32)-总共的预测值。

-返回: 正确率
+返回: 计算出来的正确率。

-返回类型: 变量（Variable）
+返回类型: Variable（Tensor），数据类型为float32的Tensor

 **代码示例**

 .. code-block:: python

    import paddle.fluid as fluid
-    data = fluid.layers.data(name="data", shape=[-1, 32, 32], dtype="float32")
-    label = fluid.layers.data(name="label", shape=[-1,1], dtype="int32")
-    predict = fluid.layers.fc(input=data, size=10)
-    accuracy_out = fluid.layers.accuracy(input=predict, label=label, k=5)
-
-
-
-
-
-
-
-
-
-
-
-
+    import numpy as np
+
+    data = fluid.layers.data(name="input", shape=[-1, 32, 32], dtype="float32")
+    label = fluid.layers.data(name="label", shape=[-1,1], dtype="int")
+    fc_out = fluid.layers.fc(input=data, size=10)
+    predict = fluid.layers.softmax(input=fc_out)
+    result = fluid.layers.accuracy(input=predict, label=label, k=5)
+
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    exe.run(fluid.default_startup_program())
+    x = np.random.rand(3, 32, 32).astype("float32")
+    y = np.array([[1],[0],[1]])
+    output= exe.run(feed={"input": x,"label": y},
+                     fetch_list=[result[0]])
+    print(output)
+    
+    """
+    Output:
+    [array([0.6666667], dtype=float32)]
+    """
\ No newline at end of file
--- a/doc/fluid/api_cn/layers_cn/auc_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/auc_cn.rst
@@ -3,7 +3,7 @@
 auc
 -------------------------------

-.. py:function:: paddle.fluid.layers.auc(input, label, curve='ROC', num_thresholds=4095, topk=1, slide_steps=1)
+.. py:function:: paddle.fluid.layers.auc(input, label, curve='ROC', num_thresholds=200, topk=1, slide_steps=1)

 **Area Under the Curve(AUC) Layer**

@@ -18,32 +18,49 @@ auc
 2. PR:准确率召回率曲线

 参数：
-    - **input** (Variable) - 浮点二维变量，值的范围为[0,1]。每一行降序排列。输入应为topk的输出。该变量显示了每个标签的概率。
-    - **label** (Variable) - 二维整型变量，表示训练数据的标注。批尺寸的高度和宽度始终为1.
+    - **input** (Tensor|LoDTensor) - 数据类型为float32，float64。浮点二维变量，值的范围为[0,1]。每一行降序排列。该输入为网络预测值的输入。
+    - **label** (Tensor|LoDTensor) - 数据类型为int32，int64。二维整型变量，为训练数据的标签。
    - **curve** (str) - 曲线类型，可以为 ``ROC`` 或 ``PR``，默认 ``ROC``。
-    - **num_thresholds** (int) - 将roc曲线离散化时使用的临界值数。默认200
-    - **topk** (int) - 只有预测输出的topk数才被用于auc
-    - **slide_steps** - 计算批auc时，不仅用当前步也用先前步。slide_steps=1，表示用当前步；slide_steps = 3表示用当前步和前两步；slide_steps = 0，则用所有步
+    - **num_thresholds** (int) - 将roc曲线离散化时使用的临界值数。默认200。
+    - **topk** (int) -  取topk的输出值用于计算。
+    - **slide_steps** (int) - 当计算batch auc时，不仅用当前步也用于先前步。slide_steps=1，表示用当前步；slide_steps = 3表示用当前步和前两步；slide_steps = 0，则用所有步。

-返回：代表当前AUC的一个元组
+返回：代表当前AUC的一个元组。
 返回的元组为auc_out, batch_auc_out, [batch_stat_pos, batch_stat_neg, stat_pos, stat_neg]。
+auc_out为准确率的结果。
+batch_auc_out为batch准确率的结果。
+batch_stat_pos为batch计算时label=1的统计值
+batch_stat_neg为batch计算时label=0的统计值
+stat_pos计算时label=1的统计值
+stat_neg为计算时label=0的统计值

-返回类型：变量（Variable）
+返回类型： Variable（Tensor），数据类型为float32或float64的Tensor。

 **代码示例**：

 .. code-block:: python

    import paddle.fluid as fluid
-    data = fluid.layers.data(name="data", shape=[32, 32], dtype="float32")
-    label = fluid.layers.data(name="label", shape=[1], dtype="int32")
-    predict = fluid.layers.fc(input=data, size=2)
-    auc_out=fluid.layers.auc(input=predict, label=label)
-
-
-
-
-
-
+    import numpy as np
+
+    data = fluid.layers.data(name="input", shape=[-1, 32,32], dtype="float32")
+    label = fluid.layers.data(name="label", shape=[1], dtype="int")
+    fc_out = fluid.layers.fc(input=data, size=2)
+    predict = fluid.layers.softmax(input=fc_out)
+    result=fluid.layers.auc(input=predict, label=label)
+
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    exe.run(fluid.default_startup_program())
+    x = np.random.rand(3,32,32).astype("float32")
+    y = np.array([1,0,1])
+    output= exe.run(feed={"input": x,"label": y},
+                     fetch_list=[result[0]])
+    print(output)
+    """
+    output:
+    [array([0.5])]
+    """


--- a/doc/fluid/api_cn/layers_cn/crop_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/crop_cn.rst
@@ -7,6 +7,8 @@ crop

 根据偏移量（offsets）和形状（shape），裁剪输入张量。

+**注意:** 此功能已被弃用，它将在以后的版本中被删除。更新说明：使用 `fluid.layers.crop_tensor <https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/layers_cn/nn_cn.html#crop_tensor>`_ 替代。
+
 **样例**：

 ::

--- a/doc/fluid/api_cn/layers_cn/crop_tensor_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/crop_tensor_cn.rst
+.. _cn_api_fluid_layers_crop_tensor:
+
+crop_tensor
+-------------------------------
+
+.. py:function:: paddle.fluid.layers.crop_tensor(x, shape=None, offsets=None, name=None)
+
+根据偏移量（offsets）和形状（shape），裁剪输入（x）Tensor。
+
+**示例**：
+
+::
+
+    * 示例1（输入为2-D Tensor）：
+
+        输入：
+            X.shape = [3, 5]
+            X.data = [[0, 1, 2, 0, 0],
+                      [0, 3, 4, 0, 0],
+                      [0, 0, 0, 0, 0]]
+
+        参数：
+            shape = [2, 2]
+            offsets = [0, 1]
+
+        输出：
+            Out.shape = [2, 2]
+            Out.data = [[1, 2],
+                        [3, 4]]
+
+    * 示例2（输入为3-D Tensor）：
+
+        输入：
+
+            X.shape = [2, 3, 4]
+            X.data =  [[[0, 1, 2, 3],
+                        [0, 5, 6, 7],
+                        [0, 0, 0, 0]],
+                       [[0, 3, 4, 5],
+                        [0, 6, 7, 8],
+                        [0, 0, 0, 0]]]
+
+        参数：
+            shape = [2, 2, 3]
+            offsets = [0, 0, 1]
+
+        输出：
+            Out.shape = [2, 2, 3]
+            Out.data = [[[1, 2, 3],
+                         [5, 6, 7]],
+                        [[3, 4, 5],
+                         [6, 7, 8]]]
+
+参数:
+  - **x** (Variable): 1-D到6-D Tensor，数据类型为float32或float64。
+  - **shape** (list|tuple|Variable) - 输出Tensor的形状，数据类型为int32。如果是列表或元组，则其长度必须与x的维度大小相同，如果是Variable，则其应该是1-D Tensor。当它是列表时，每一个元素可以是整数或者形状为[1]的Tensor。含有Variable的方式适用于每次迭代时需要改变输出形状的情况。列表和元组中只有第一个元素可以被设置为-1，这意味着输出的第一维大小与输入相同。
+  - **offsets** (list|tuple|Variable，可选) - 每个维度上裁剪的偏移量，数据类型为int32。如果是列表或元组，则其长度必须与x的维度大小相同，如果是Variable，则其应是1-D Tensor。当它是列表时，每一个元素可以是整数或者形状为[1]的Variable。含有Variable的方式适用于每次迭代的偏移量（offset）都可能改变的情况。默认值：None，每个维度的偏移量为0。
+  - **name** (str，可选) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
+
+返回: 裁剪后的Tensor，数据类型与输入（x）相同。
+
+返回类型: Variable
+
+抛出异常：
+    - :code:`ValueError` - shape 应该是列表、元组或Variable。
+    - :code:`ValueError` - offsets 应该是列表、元组、Variable或None。
+
+**代码示例**:
+
+..  code-block:: python
+    
+    import paddle.fluid as fluid
+    x = fluid.layers.data(name="x", shape=[3, 5], dtype="float32")
+    # x.shape = [-1, 3, 5], where -1 indicates batch size, and it will get the exact value in runtime.
+
+    # shape is a 1-D tensor variable
+    crop_shape = fluid.layers.data(name="crop_shape", shape=[3], dtype="int32", append_batch_size=False)
+    crop0 = fluid.layers.crop_tensor(x, shape=crop_shape)
+    # crop0.shape = [-1, -1, -1], it means crop0.shape[0] = x.shape[0] in runtime.
+
+    # or shape is a list in which each element is a constant
+    crop1 = fluid.layers.crop_tensor(x, shape=[-1, 2, 3])
+    # crop1.shape = [-1, 2, 3]
+
+    # or shape is a list in which each element is a constant or variable
+    y = fluid.layers.data(name="y", shape=[3, 8, 8], dtype="float32")
+    dim1 = fluid.layers.data(name="dim1", shape=[1], dtype="int32", append_batch_size=False)
+    crop2 = fluid.layers.crop_tensor(y, shape=[-1, 3, dim1, 4])
+    # crop2.shape = [-1, 3, -1, 4]
+
+    # offsets is a 1-D tensor variable
+    crop_offsets = fluid.layers.data(name="crop_offsets", shape=[3], dtype="int32", append_batch_size=False)
+    crop3 = fluid.layers.crop_tensor(x, shape=[-1, 2, 3], offsets=crop_offsets)
+    # crop3.shape = [-1, 2, 3]
+
+    # offsets is a list in which each element is a constant or variable
+    offsets_var =  fluid.layers.data(name="dim1", shape=[1], dtype="int32", append_batch_size=False)
+    crop4 = fluid.layers.crop_tensor(x, shape=[-1, 2, 3], offsets=[0, 1, offsets_var])
+    # crop4.shape = [-1, 2, 3]
+
--- a/doc/fluid/api_cn/layers_cn/deformable_conv_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/deformable_conv_cn.rst
@@ -3,16 +3,21 @@
 deformable_conv
 -------------------------------

-.. py:function:: paddle.fluid.layers.deformable_conv(input, offset, mask, num_filters, filter_size, stride=1, padding=0, dilation=1, groups=None, deformable_groups=None, im2col_step=None, param_attr=None, bias_attr=None, name=None)
+.. py:function:: paddle.fluid.layers.deformable_conv(input, offset, mask, num_filters, filter_size, stride=1, padding=0, dilation=1, groups=None, deformable_groups=None, im2col_step=None, param_attr=None, bias_attr=None, modulated=True, name=None)

 可变形卷积层

 在4-D输入上计算2-D可变形卷积。给定输入图像x，输出特征图y，可变形卷积操作如下所示：
+可形变卷积v2:

- :math:`y(p) = \sum_{k=1}^{K}{w_k * x(p + p_k + \Delta p_k) * \Delta m_k}`
+  :math:`y(p) = \sum_{k=1}^{K}{w_k * x(p + p_k + \Delta p_k) * \Delta m_k}`

-其中 :math:`\Delta p_k 和 \Delta m_k` 分别为第k个位置的可学习偏移和调制标量。
-参考可变形卷积网络v2: `可变形程度越高，结果越好 <https://arxiv.org/abs/1811.11168v2>`_。
+可形变卷积v1:
+
+  :math:`y(p) = \sum_{k=1}^{K}{w_k * x(p + p_k + \Delta p_k)}`
+
+其中 :math:`\Delta p_k 和 \Delta m_k` 分别为第k个位置的可学习偏移和调制标量。其中在可形变卷积中:math:`\Delta m_k`为1.
+参考可变形卷积网络v2: `可变形程度越高，结果越好 <https://arxiv.org/abs/1811.11168v2>`_ 和 `形变卷积<https://arxiv.org/abs/1703.06211>`_。

 **示例**
     
@@ -51,6 +56,7 @@ deformable_conv
    - **im2col_step** (int) – 每个im2col计算的最大图像数。总batch大小应该可以被该值整除或小于该值。如果您面临内存问题，可以尝试在此处使用一个更小的值。默认im2col_step = 64。
    - **param_attr** (ParamAttr|None) – 可变形卷积的可学习参数/权重的参数属性。如果将其设置为None或ParamAttr的一个属性，可变形卷积将创建ParamAttr作为param_attr。如果没有设置此param_attr的Initializer，该参数将被Normal(0.0, std)初始化，且其中的std为 :math:`(\frac{2.0 }{filter\_elem\_num})^{0.5}`。默认值None。
    - **bias_attr** (ParamAttr|bool|None) – 可变形卷积层的偏置的参数属性。如果设为False，则输出单元不会加偏置。如果设为None或者ParamAttr的一个属性，conv2d会创建ParamAttr作为bias_attr。如果不设置bias_attr的Initializer，偏置会被初始化为0。默认值None。
+    - **modulated** （bool）- 确定使用v1和v2中的哪个版本，如果为True，则选择使用v2。默认为True。
    - **name** (str|None) – 该层的名字（可选项）。如果设为None，该层将会被自动命名。默认值None。
 
 返回：储存可变形卷积结果的张量变量。
@@ -63,13 +69,22 @@ deformable_conv

 ..  code-block:: python

+    #deformable conv v2:
+         
    import paddle.fluid as fluid
    data = fluid.layers.data(name='data', shape=[3, 32, 32], dtype='float32')
    offset = fluid.layers.data(name='offset', shape=[18, 32, 32], dtype='float32')
    mask = fluid.layers.data(name='mask', shape=[9, 32, 32], dtype='float32')
-    out = fluid.layers.deformable_conv(input=data, offset=offset, mask=mask, num_filters=2, filter_size=3, padding=1)
+    out = fluid.layers.deformable_conv(input=data, offset=offset, mask=mask,
+                                       num_filters=2, filter_size=3, padding=1, modulated=True)

+    #deformable conv v1:

+    import paddle.fluid as fluid
+    data = fluid.layers.data(name='data', shape=[3, 32, 32], dtype='float32')
+    offset = fluid.layers.data(name='offset', shape=[18, 32, 32], dtype='float32')
+    out = fluid.layers.deformable_conv(input=data, offset=offset, mask=None,
+                                       num_filters=2, filter_size=3, padding=1, modulated=False)




--- a/doc/fluid/api_cn/layers_cn/deformable_roi_pooling_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/deformable_roi_pooling_cn.rst
@@ -5,31 +5,45 @@ deformable_roi_pooling

 .. py:function:: paddle.fluid.layers.deformable_roi_pooling(input, rois, trans, no_trans=False, spatial_scale=1.0, group_size=[1, 1], pooled_height=1, pooled_width=1, part_size=None, sample_per_part=1, trans_std=0.1, position_sensitive=False, name=None)

-可变形PSROI池层
+可变形感兴趣区域（ROI）池化层
+
+该OP对输入进行了可形变的感兴趣区域(ROI)池化操作。如同 `可形变卷积网络 <https://arxiv.org/abs/1703.06211>`_  描述的一样，它将为每个bin中的像素获取一个偏移量，以便于在合适的位置进行池化。在完成可变形感兴趣区域（ROI）池化操作之后，批量数将变为候选框的数量。
+
+可变形感兴趣区域（ROI）池化包含三个步骤：
+    
+1、将获取的候选区域按照设定的池化宽度和池化高度划分成相同大小的区域。
+
+2、将得到的位置偏移量添加到候选区域的像素来得到新的位置，并且通过双线性插值去获取那些偏移之后位置不为整数的像素的值。
+
+3、在每一个bin中去均匀采样一些像素点，获取其中的均值去作为我们的输出。
+

 参数:
-    - **input** (Variable) - 可变形PSROI池层的输入。输入张量的形状为[N，C，H，W]。其中N是批量大小，C是输入通道的数量，H是特征的高度，W是特征的宽度。
-    - **rois** （Variable）- 将池化的ROIs（感兴趣区域）。应为一个形状为(num_rois, 4)的2-D LoDTensor，且lod level为1。给出[[x1, y1, x2, y2], ...]，(x1, y1)为左上角坐标，(x2, y2)为右下角坐标。
-    - **trans** （Variable）- 池化时ROIs上的特征偏移。格式为NCHW，其中N是ROIs的数量，C是通道的数量，指示x和y方向上的偏移距离，H是池化的高度，W是池化的宽度。
-    - **no_trans** （bool）- roi池化阶段是否加入偏移以获取新值。取True或False。默认为False。
-    - **spatial_scale** (float) - 输入特征图的高度（或宽度）与原始图像高度（或宽度）的比率。等于卷积图层中总步长的倒数，默认为1.0。
-    - **group_size** （list|tuple）- 输入通道划分成的组数（例如，输入通道的数量是k1 * k2 *（C + 1），其中k1和k2是组宽度和高度，C + 1是输出通道的数量。如（ 4,6）中4是组的高度，6是组的宽度）。默认为[1,1]。
-    - **pooled_height** （integer）- 池化后输出的高度。
-    - **pooled_width** （integer）- 池化后输出的宽度。
-    - **part_size** （list|tuple）- 偏移高度和宽度，如(4, 6)代表高度为4、宽度为6，默认为None，此时默认值[pooled_height, pooled_width]。
-    - **sample_per_part** （integer）- 每个bin中的样本数量，默认为1。
-    - **trans_std** （float）- 偏移系数，默认为0.1。
-    - **position_sensitive** （bool）- 是否选择可变形psroi池化模式，默认为False。
-    - **name** （str）- 层名，默认为None。
-
-返回: 存储可变形psroi池层的张量变量
-
-返回类型:  变量(Variable)
+    - **input** (Variable) - 可变形感兴趣区域(ROI)池化层的输入，输入为数据类型为float32的Tensor。输入张量的形状为[N，C，H，W]。其中N是批量大小，C是输入通道的数量，H是特征的高度，W是特征的宽度。
+    - **rois** （Variable）- 将池化的ROIs（感兴趣区域），应为一个形状为(num_rois，4)的2-D LoDTensor，且lod level为1。其中值为[[x1，y1，x2，y2]，...]，(x1，y1)为左上角坐标，(x2， y2)为右下角坐标。
+    - **trans** （Variable）- 池化时ROIs上的特征偏移，输入为数据类型为float32的Tensor。格式为[N，C，H，W]，其中N是ROIs的数量，C是通道的数量，指示x和y方向上的偏移距离，H是池化的高度，W是池化的宽度。
+    - **no_trans** （bool）- 确定roi池化阶段是否加入偏移以获取新的输出。其中值为bool变量，取True或False。如果为True，则表示不加入偏移。默认为False。
+    - **spatial_scale** (float) - 输入特征图的高度（或宽度）与原始图像高度（或宽度）的比率，其中数值的类型为float32，并且等于卷积图层中总步长的倒数，默认为1.0。
+    - **group_size** （list|tuple）- 输入通道划分成的组数，输入为list 或者 tuple，其中数值类型为int32（例如，输入通道的数量是k1 * k2 * (C + 1)，其中k1和k2是组宽度和高度，C + 1是输出通道的数量。如（4，6）中4是组的高度，6是组的宽度）。默认为[1，1]。
+    - **pooled_height** （int）- 池化后输出的高度, 值的类型为int32，默认值：1。
+    - **pooled_width** （int）- 池化后输出的宽度， 值的类型为int32， 默认值：1。
+    - **part_size** （list|tuple）- 偏移的高度和宽度，如(4，6)代表高度为4、宽度为6，常规是高度和宽度等于pooled_height和pooled_width。默认为None，此时默认值为[pooled_height，pooled_width]。
+    - **sample_per_part** （int）- 每个bin中的样本数量，设置值越大，采样结果越精细，但是更加消耗性能。默认为1。
+    - **trans_std** （float）- 偏移系数，控制偏移量的大小，默认为0.1。
+    - **position_sensitive** （bool）- 是否选择可变形位置敏感型感兴趣区域（PSROI）池化模式，数值类型为bool型。如果为False，输入维度和输出维度相等。如果为True，输入维度等于输出维度乘以pooled_width和pooled_height。默认为False。
+    - **name** （str）- 此层的名称，默认为None。
+
+返回: 可变形感兴趣区域(ROI)池化的输出，如果position_sensitive为False，输出维度和输出维度相等。如果position_sensitive为True，输出维度等于输入维度除以pooled_width和pooled_height。
+
+
+返回类型: Variable， 数据类型为float32.

 **代码示例**

 ..  code-block:: python

+    #position_sensitive为False
+
    import paddle.fluid as fluid
    input = fluid.layers.data(name="input",
                              shape=[2, 192, 64, 64],
@@ -56,3 +70,31 @@ deformable_roi_pooling
                                                 trans_std=0.1,
                                                 position_sensitive=False)

+    #position_sensitive为True
+
+    import paddle.fluid as fluid
+    input = fluid.layers.data(name="input",
+                              shape=[2, 192, 64, 64],
+                              dtype='float32',
+                              append_batch_size=False)
+    rois = fluid.layers.data(name="rois",
+                             shape=[4],
+                             dtype='float32',
+                             lod_level=1)
+    trans = fluid.layers.data(name="trans",
+                              shape=[2, 384, 64, 64],
+                              dtype='float32',
+                              append_batch_size=False)
+    x = fluid.layers.nn.deformable_roi_pooling(input=input,
+                                                 rois=rois,
+                                                 trans=trans,
+                                                 no_trans=False,
+                                                 spatial_scale=1.0,
+                                                 group_size=(1, 1),
+                                                 pooled_height=8,
+                                                 pooled_width=8,
+                                                 part_size=(8, 8),
+                                                 sample_per_part=4,
+                                                 trans_std=0.1,
+                                                 position_sensitive=True)
+
--- a/doc/fluid/api_cn/layers_cn/density_prior_box_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/density_prior_box_cn.rst
@@ -6,13 +6,7 @@ density_prior_box
 .. py:function:: paddle.fluid.layers.density_prior_box(input, image, densities=None, fixed_sizes=None, fixed_ratios=None, variance=[0.1, 0.1, 0.2, 0.2], clip=False, steps=[0.0, 0.0], offset=0.5, flatten_to_2d=False, name=None)


-**Density Prior Box Operator**
-
-为SSD算法(Single Shot MultiBox Detector)生成density prior box。
-每个 ``input`` 的位置产生N个prior box，其中，N通过 ``densities`` , ``fixed_sizes`` 和 ``fixed_ratios``
-的量来决定。在每个input位置附近的box center格点，通过此op生成。格点坐标由 ``densities`` 决定，
-density prior box的量由 ``fixed_sizes`` 和 ``fixed_ratios`` 决定。显然地，``fixed_sizes``
-和 ``densities`` 相等。对于 ``densities`` 中的densities_i：
+该OP为SSD算法(Single Shot MultiBox Detector)生成density prior box，在每个 ``input`` 的位置产生N个候选框，其中，N由 ``densities`` , ``fixed_sizes`` 和 ``fixed_ratios`` 来计算。生成的每个输入位置附近的候选框中心（网格点）由 ``densities`` 和 ``density prior box`` 的数量计算，其中 ``density prior box`` 的数量由 ``fixed_sizes`` 和 ``fixed_ratios`` 决定。``fixed_sizes`` 和 ``densities`` 的大小一致。

 .. math::

@@ -20,40 +14,33 @@ density prior box的量由 ``fixed_sizes`` 和 ``fixed_ratios`` 决定。显然


 参数：
-  - **input** (Variable) - 输入变量，格式为NCHW
-  - **image** (Variable) - PriorBoxOp的输入图像数据，格式为NCHW
-  - **densities** (list|tuple|None) - 被生成的density prior boxes的densities，此属性应该是一个整数列表或数组。默认值为None
-  - **fixed_sizes** (list|tuple|None) - 被生成的density prior boxes的固定大小，此属性应该为和 :attr:`densities` 有同样长度的列表或数组。默认值为None
-  - **fixed_ratios** (list|tuple|None) - 被生成的density prior boxes的固定长度，如果该属性未被设置，同时 :attr:`densities` 和 :attr:`fix_sizes` 被设置，则 :attr:`aspect_ratios` 被用于生成 density prior boxes
+  - **input** (Variable) - 形状为NCHW的4-D Tensor，数据类型为float32或float64。
+  - **image** (Variable) - 输入图像，形状为NCHW的4-D Tensor，数据类型为float32或float64。
+  - **densities** (list|tuple|None) - 生成的density prior boxes的densities，此属性应该是一个整数列表或数组。默认值为None。
+  - **fixed_sizes** (list|tuple|None) - 生成的density prior boxes的大小，此属性应该为和 :attr:`densities` 有同样长度的列表或数组。默认值为None。
+  - **fixed_ratios** (list|tuple|None) - 生成的density prior boxes的比值，如果该属性未被设置，同时 :attr:`densities` 和 :attr:`fix_sizes` 被设置，则 :attr:`aspect_ratios` 被用于生成 density prior boxes
  - **variance** (list|tuple) - 将被用于density prior boxes编码的方差，默认值为:[0.1, 0.1, 0.2, 0.2]
-  - **clip(bool)** - 是否clip超出范围的box。默认值：False
-  - **step** (list|tuple) - Prior boxes在宽度和高度的步长，如果step[0] == 0.0/step[1] == 0.0, input的the density prior boxes的高度/宽度的步长将被自动计算。默认值：Default: [0., 0.]
-  - **offset** (float) - Prior boxes中心补偿值，默认为：0.5
-  - **flatten_to_2d** (bool) - 是否将output prior boxes和方差 ``flatten`` 至2维形状，第二个dim为4。默认值：False
-  - **name(str)** - density prior box op的名字，默认值: None
-
-返回：
-  tuple: 有两个变量的数组 (boxes, variances)
-
-  boxes: PriorBox的输出density prior boxes
-
-    当flatten_to_2d为False时，形式为[H, W, num_priors, 4]
-
-    当flatten_to_2d为True时，形式为[H * W * num_priors, 4]
+  - **clip** (bool) - 是否裁剪超出范围的box。默认值：False
+  - **step** (list|tuple) - Prior boxes在宽度和高度的步长，如果step[0]等于0.0或step[1]等于0.0, input的the density prior boxes的高度/宽度的步长将被自动计算。默认值：Default: [0., 0.]
+  - **offset** (float) - Prior boxes中心偏移值，默认为：0.5
+  - **flatten_to_2d** (bool) - 是否将output prior boxes和方差 ``flatten`` 至2-D，其中第二个dim为4。默认值：False
+  - **name** (str|None) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` 。默认值为None。

-    H是输入的高度，W是输入的宽度

-    num_priors是输入中每个位置的总box count
+返回：含有两个变量的元组，包括：
+  候选框：

-  variances:  PriorBox的expanded variance
+    当flatten_to_2d为False时，形状为[H, W, num_priors, 4]的4-D Tensor。
+    当flatten_to_2d为True时，形式为[H * W * num_priors, 4]的 2-D Tensor。
+    其中，H是输入的高度，W是输入的宽度，num_priors是输入中每个位置的候选框数。

-    当flatten_to_2d为False时，形式为[H, W, num_priors, 4]
+  候选框的方差：

-    当flatten_to_2d为True时，形式为[H * W * num_priors, 4]
+    当flatten_to_2d为False时，形状为[H, W, num_priors, 4]的4-D Tensor。
+    当flatten_to_2d为True时，形式为[H * W * num_priors, 4]的2-D Tensor。
+    其中，H是输入的高度，W是输入的宽度，num_priors是输入中每个位置的候选框数。

-    H是输入的高度，W是输入的宽度
-
-    num_priors是输入中每个位置的总box count
+返回类型：元组

 **代码示例**

@@ -70,14 +57,3 @@ density prior box的量由 ``fixed_sizes`` 和 ``fixed_ratios`` 决定。显然
        fixed_ratios=[1.],
        clip=True,
        flatten_to_2d=True)
-
-
-
-
-
-
-
-
-
-
-
--- a/doc/fluid/api_cn/layers_cn/diag_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/diag_cn.rst
@@ -5,14 +5,14 @@ diag

 .. py:function:: paddle.fluid.layers.diag(diagonal)

-该功能创建一个方阵，含有diagonal指定的对角线值。
+该OP创建一个方阵，使用输入diagonal来指定方阵的对角线元素的值。

 参数：
-    - **diagonal** (Variable|numpy.ndarray) - 指定对角线值的输入张量，其秩应为1。
+    - **diagonal** (Variable|numpy.ndarray) — 数据shape为 :math:`[N]` 一维Tensor，会把该Tensor的元素赋在方阵的对角线上。数据类型可以是 float32，float64，int32，int64。

-返回：存储着方阵的张量变量
+返回：存储着方阵的Tensor，对角线值是输入Tensor diagonal的值， 数据shape为 :math:`[N, N]` 二维Tensor。

-返回类型：变量（Variable）
+返回类型：Variable，数据类型和输入数据类型一致。

 **代码示例**：

@@ -24,7 +24,9 @@ diag

        import paddle.fluid as fluid
        import numpy as np
-        data = fluid.layers.diag(np.arange(3, 6, dtype='int32'))
+        diagonal = np.arange(3, 6, dtype='int32')
+        data = fluid.layers.diag(diagonal)
+        # diagonal.shape=(3,) data.shape=(3, 3)




--- a/doc/fluid/api_cn/layers_cn/edit_distance_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/edit_distance_cn.rst
@@ -6,9 +6,7 @@ edit_distance

 .. py:function:: paddle.fluid.layers.edit_distance(input,label,normalized=True,ignored_tokens=None, input_length=None, label_length=None）

-编辑距离算子
-
-计算一批给定字符串及其参照字符串间的编辑距离。编辑距离也称Levenshtein距离，通过计算从一个字符串变成另一个字符串所需的最少操作步骤来衡量两个字符串的相异度。这里的操作包括插入、删除和替换。
+该OP计算一批给定字符串及其参照字符串间的编辑距离。编辑距离也称Levenshtein距离，通过计算从一个字符串变成另一个字符串所需的最少操作步骤来衡量两个字符串的相异度。这里的操作包括插入、删除和替换。

 比如给定假设字符串A=“kitten”和参照字符串B=“sitting”，从A变换成B编辑距离为3，至少需要两次替换和一次插入：

@@ -19,14 +17,15 @@ edit_distance
 输出包含批尺寸大小的结果，代表一对字符串中每个字符串的编辑距离。如果Attr(normalized)为真，编辑距离则处以参照字符串的长度。

 参数：
-    - **input** (Variable)-假设字符串的索引，为两列并且类型为int64
-    - **label** (Variable)-参照字符串的索引，为两列并且类型为int64
-    - **normalized** (bool,默认为True)-表示是否用参照字符串的长度进行归一化
-    - **ignored_tokens** (list<int>,默认为None)-计算编辑距离前需要移除的token
-    - **name** (str)-该层名称，可选
+    - **input** (Variable) - 假设字符串的索引，rank为2的Tensor或LoDTensor，数据类型为int64。
+    - **label** (Variable) - 参照字符串的索引，rank为2的Tensor或LoDTensor，数据类型为int64。
+    - **normalized** (bool)-表示是否用参照字符串的长度进行归一化，默认值为True。
+    - **ignored_tokens** (list<int>)-计算编辑距离前需要移除的token，默认值为None。
+    - **name** (None|str) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_name` ，默认值为None。
+
+返回：包含有形为[batch_size,1]的编辑距离和形为[ ]的序列数元组。

-返回：形为[batch_size,1]的编辑距离。
-sequence_num(Variable):形为[ ]的序列数
+返回类型：元组

 **代码示例**

@@ -47,4 +46,4 @@ sequence_num(Variable):形为[ ]的序列数
    x_len = fluid.layers.data(name='x_len', shape=[], dtype='int64')
    y_len = fluid.layers.data(name='y_len', shape=[], dtype='int64')
    distance_pad, seq_num_pad = fluid.layers.edit_distance(
-                    input=x_pad, label=y_pad, input_length=x_len, label_length=y_len)
\ No newline at end of file
+                    input=x_pad, label=y_pad, input_length=x_len, label_length=y_len)
--- a/doc/fluid/api_cn/layers_cn/expand_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/expand_cn.rst
@@ -27,7 +27,7 @@ expand运算会按给定的次数对输入各维度进行复制（tile）运算

 参数:
        - **x** (Variable)- 一个秩在[1, 6]范围中的张量（Tensor）.
-        - **expand_times** (list|tuple) - 每一个维度要扩展的次数.
+        - **expand_times** (list|tuple|Variable) - 每一个维度要扩展的次数。

 返回：     expand变量是LoDTensor。expand运算后，输出（Out）的每个维度的大小等于输入（X）的相应维度的大小乘以 ``expand_times`` 给出的相应值。

@@ -38,10 +38,17 @@ expand运算会按给定的次数对输入各维度进行复制（tile）运算
 ..  code-block:: python

        import paddle.fluid as fluid
-        x = fluid.layers.fill_constant(shape=[2, 3, 1], dtype='int32', value=0)
-        out = fluid.layers.expand(x=x, expand_times=[1, 2, 2])

+        # example 1:
+        data_1 = fluid.layers.fill_constant(shape=[2, 3, 1], dtype='int32', value=0)
+        expanded_1 = fluid.layers.expand(data_1, expand_times=[1, 2, 2])
+        # the shape of expanded_1 is [2, 6, 2].

+        # example 2:
+        data_2 = fluid.layers.fill_constant(shape=[12, 14], dtype="int32", value=3)
+        expand_times = fluid.layers.fill_constant(shape=[2], dtype="int32", value=4)
+        expanded_2 = fluid.layers.expand(data_2, expand_times=expand_times)
+        # the shape of expanded_2 is [48, 56].




--- a/doc/fluid/api_cn/layers_cn/fc_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/fc_cn.rst
@@ -3,14 +3,14 @@
 fc
 -------------------------------

-.. py:function::  paddle.fluid.layers.fc(input, size, num_flatten_dims=1, param_attr=None, bias_attr=None, act=None, is_test=False, name=None)
+.. py:function::  paddle.fluid.layers.fc(input, size, num_flatten_dims=1, param_attr=None, bias_attr=None, act=None, name=None)


 **全连接层**

-该函数在神经网络中建立一个全连接层。 它可以将一个或多个tensor（ ``input`` 可以是一个list或者Variable，详见参数说明）作为自己的输入，并为每个输入的tensor创立一个变量，称为“权”（weights），等价于一个从每个输入单元到每个输出单元的全连接权矩阵。FC层用每个tensor和它对应的权相乘得到形状为[M, size]输出tensor，M是批大小。如果有多个输入tensor，那么形状为[M, size]的多个输出张量的结果将会被加起来。如果 ``bias_attr`` 非空，则会新创建一个偏向变量（bias variable），并把它加入到输出结果的运算中。最后，如果 ``act`` 非空，它也会加入最终输出的计算中。
+该OP将在神经网络中构建一个全连接层。其输入可以是一个Tensor（或LoDTensor）或多个Tensor（或LoDTensor）组成的list（详见参数说明），该OP会为每个输入的Tensor创建一个权重（weights）变量，即一个从每个输入单元到每个输出单元的全连接权重矩阵。FC层将每个输入Tensor和其对应的权重(weights)相乘得到shape为 :math:`[M, size]` 输出Tensor，其中 ``M`` 为batch_size大小。如果有多个输入Tensor，则多个shape为 :math:`[M, size]` 的Tensor计算结果会被累加起来，作为最终输出。如果 ``bias_attr`` 非空，则会创建一个偏置变量（bias variable），并把它累加到输出结果中。如果 ``act`` 非空，将会在输出结果上应用相应的激活函数。

-当输入为单个张量：
+当输入为单个Tensor（或LoDTensor）：

 .. math::

@@ -18,7 +18,7 @@ fc



-当输入为多个张量：
+当输入为多个Tensor（或LoDTensor）组成的list时：

 .. math::

@@ -26,46 +26,59 @@ fc


 上述等式中：
-  - :math:`N` ：输入的数目,如果输入是变量列表，N等于len（input）
-  - :math:`X_i` ：第i个输入的tensor
+  - :math:`N` ：输入的数目，如果输入是Tensor列表，N等于len(input)
+  - :math:`X_i` ：第i个输入的Tensor
  - :math:`W_i` ：对应第i个输入张量的第i个权重矩阵
-  - :math:`b` ：该层创立的bias参数
-  - :math:`Act` ：activation function(激励函数)
-  - :math:`Out` ：输出tensor
+  - :math:`b` ：该层创建的bias参数
+  - :math:`Act` ：activation function(激活函数)
+  - :math:`Out` ：输出Tensor

 ::
+            
+        Case 1： 
+            给定单个输入Tensor data_1, 且num_flatten_dims = 2:
+                data_1.data = [[[0.1, 0.2],
+                               [0.3, 0.4]]]
+                data_1.shape = (1, 2, 2) # 1是batch_size
+
+                out = fluid.layers.fc(input=data_1, size=1， num_flatten_dims=2)
+
+          则输出为：
+                out.data = [[0.83234344], [0.34936576]]
+                out.shape = (1, 2, 1)
+

-            Given:
+        Case 2: 
+            给定多个Tensor组成的list:
                data_1.data = [[[0.1, 0.2],
                               [0.3, 0.4]]]
-                data_1.shape = (1, 2, 2) # 1 is batch_size
+                data_1.shape = (1, 2, 2) # 1 是 batch_size

                data_2 = [[[0.1, 0.2, 0.3]]]
                data_2.shape = (1, 1, 3)

                out = fluid.layers.fc(input=[data_1, data_2], size=2)

-            Then:
+            则输出为：
                out.data = [[0.18669507, 0.1893476]]
                out.shape = (1, 2)


 参数:
-  - **input** (Variable|list of Variable) – 该层的输入tensor(s)（张量），其维度至少是2
-  - **size** (int) – 该层输出单元的数目
-  - **num_flatten_dims** (int, default 1) – fc层可以接受一个维度大于2的tensor。此时， 它首先会被扁平化(flattened)为一个二维矩阵。 参数``num_flatten_dims`` 决定了输入tensor的flattened方式: 前 ``num_flatten_dims`` (包含边界，从1开始数) 个维度会被扁平化为最终矩阵的第一维 (维度即为矩阵的高), 剩下的 rank(X) - num_flatten_dims 维被扁平化为最终矩阵的第二维 (即矩阵的宽)。 例如， 假设X是一个五维tensor，其形可描述为(2, 3, 4, 5, 6), 且num_flatten_dims = 3。那么扁平化的矩阵形状将会如此： (2 x 3 x 4, 5 x 6) = (24, 30)
-  - **param_attr** (ParamAttr|list of ParamAttr, default None) – 该层可学习的参数/权的参数属性
-  - **bias_attr** (ParamAttr|list of ParamAttr, default None) – 该层bias变量的参数属性。如果值为False， 则bias变量不参与输出单元运算。 如果值为None，bias变量被初始化为0。默认为 None。
-  - **act** (str, default None) – 应用于输出的Activation（激励函数）
-  - **is_test** (bool) – 表明当前执行是否处于测试阶段的标志
-  - **name** (str, default None) – 该层的命名
+  - **input** (Variable|list of Variable) – 维度为 :math:`[N_1, N_2, ..., N_k]` 的多维Tensor（或LoDTensor）或由多个Tensor（或LoDTensor）组成的list，输入Tensor的shape至少是2。
+  - **size** (int) – 全连接层输出单元的数目，即输出Tensor（或LoDTensor）特征维度。
+  - **num_flatten_dims** (int) – 输入可以接受维度大于2的Tensor。在计算时，输入首先会被扁平化（flatten）为一个二维矩阵，之后再与权重(weights)相乘。参数 ``num_flatten_dims`` 决定了输入Tensor的flatten方式: 前 ``num_flatten_dims`` (包含边界，从1开始数) 个维度会被扁平化为二维矩阵的第一维 (即为矩阵的高), 剩下的 :math:`rank(X) - num\_flatten\_dims` 维被扁平化为二维矩阵的第二维 (即矩阵的宽)。 例如， 假设X是一个五维的Tensor，其shape为(2, 3, 4, 5, 6), 若 :math:`num\_flatten\_dims = 3` ，则扁平化的矩阵shape为： :math:`(2 x 3 x 4, 5 x 6) = (24, 30)` ，最终输出Tensor的shape为 :math:`(2, 3, 4, size)` 。默认为1。
+  - **param_attr** (ParamAttr) – 指定权重参数属性的对象。默认值为None，表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+  - **bias_attr** (ParamAttr) – 指定偏置参数属性的对象。默认值为None，表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+  - **act** (str) – 应用于输出上的激活函数，如tanh、softmax、sigmoid，relu等，支持列表请参考 :ref:`api_guide_activations` ，默认值为None。
+  - **name** (str) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。


-返回：转换结果
+返回：经过全连接层计算后的Tensor或LoDTensor，数据类型与input类型一致。

 返回类型: Variable

-弹出异常：``ValueError`` - 如果输入tensor的维度小于2
+弹出异常：``ValueError`` - 如果输入Tensor（或LoDTensor）的维度小于2

 **代码示例**


--- a/doc/fluid/api_cn/layers_cn/greater_equal_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/greater_equal_cn.rst
@@ -5,25 +5,30 @@ greater_equal

 .. py:function:: paddle.fluid.layers.greater_equal(x, y, cond=None)

-该层逐元素地返回 :math:`x >= y` 的逻辑值，和重载算子 `>=` 相同。
+该OP逐元素地返回 :math:`x >= y` 的逻辑值，使用重载算子 `>=` 可以有相同的计算函数效果。
+

 参数：
-    - **x** (Variable) - *greater_equal* 的第一个操作数
-    - **y** (Variable) - *greater_equal* 的第二个操作数
-    - **cond** (Variable|None) - 可选的输出变量，存储 *greater_equal* 的结果
+    - **x** (Variable) – 进行比较的第一个输入，是一个多维的Tensor，数据类型可以是float32，float64，int32，int64。 
+    - **y** (Variable) – 进行比较的第二个输入，是一个多维的Tensor，数据类型可以是float32，float64，int32，int64。
+    - **cond** (Variable，可选) – 如果为None，则创建一个Tensor来作为进行比较的输出结果，该Tensor的shape，数据类型和输入x一致；如果不为None，则将Tensor作为该OP的输出，数据shape和数据类型需要和输入x一致。默认值为None。 

-返回：存储 *greater_equal* 的输出的张量变量。
+返回：输出结果的Tensor，数据的shape和输入x一致。

-返回类型：变量（Variable）
+返回类型：Variable，数据类型为bool类型。

 **代码示例**:

 .. code-block:: python

     import paddle.fluid as fluid
-     label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-     limit = fluid.layers.fill_constant(shape=[1], value=1, dtype='int64')
-     out = fluid.layers.greater_equal(x=label, y=limit)
+     import paddle.fluid.layers as layers
+     import numpy as np
+     label = layers.assign(np.array([2, 2], dtype='int32'))
+     limit = layers.assign(np.array([2, 3], dtype='int32'))
+     out = fluid.layers.greater_equal(x=label, y=limit) #out=[True, False]
+     out_1 = label >= limit #out1=[True, False]
+



--- a/doc/fluid/api_cn/layers_cn/greater_than_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/greater_than_cn.rst
@@ -5,25 +5,28 @@ greater_than

 .. py:function:: paddle.fluid.layers.greater_than(x, y, cond=None)

-该层逐元素地返回 :math:`x > y` 的逻辑值，和重载算子 `>` 相同。
+该OP逐元素地返回 :math:`x > y` 的逻辑值，使用重载算子 `>` 可以有相同的计算函数效果。

 参数：
-    - **x** (Variable) - *greater_than* 的第一个操作数
-    - **y** (Variable) - *greater_than* 的第二个操作数
-    - **cond** (Variable|None) - 可选的输出变量，存储 *greater_than* 的结果
+    - **x** (Variable) – 进行比较的第一个输入，是一个多维的Tensor，数据类型可以是float32，float64，int32，int64。 
+    - **y** (Variable) – 进行比较的第二个输入，是一个多维的Tensor，数据类型可以是float32，float64，int32，int64。
+    - **cond** (Variable，可选) – 如果为None，则创建一个Tensor来作为进行比较的输出结果，该Tensor的shape和数据类型和输入x一致；如果不为None，则将Tensor作为该OP的输出，数据类型和数据shape需要和输入x一致。默认值为None。 

-返回：存储 *greater_than* 的输出的张量变量。
+返回：输出结果的Tensor，数据的shape和输入x一致。

-返回类型：变量（Variable）
+返回类型：Variable，数据类型为bool类型。

 **代码示例**:

 .. code-block:: python

     import paddle.fluid as fluid
-     label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-     limit = fluid.layers.fill_constant(shape=[1], value=1, dtype='int64')
-     out = fluid.layers.greater_than(x=label, y=limit)
+     import paddle.fluid.layers as layers
+     import numpy as np
+     label = layers.assign(np.array([2, 3], dtype='int32'))
+     limit = layers.assign(np.array([3, 2], dtype='int32'))
+     out = fluid.layers.greater_than(x=label, y=limit) #out=[False, True]
+     out1 = label > limit #out1=[False, True]




--- a/doc/fluid/api_cn/layers_cn/group_norm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/group_norm_cn.rst
@@ -8,16 +8,18 @@ group_norm
 参考论文： `Group Normalization <https://arxiv.org/abs/1803.08494>`_

 参数：
-  - **input** (Variable)：输入张量变量
-  - **groups** (int)：从 channel 中分离出来的 group 的数目
-  - **epsilon** (float)：为防止方差除零，增加一个很小的值
-  - **param_attr** (ParamAttr|None)：可学习标度的参数属性 :math:`g`,如果设置为False，则不会向输出单元添加标度。如果设置为0，偏差初始化为1。默认值:None
-  - **bias_attr** (ParamAttr|None)：可学习偏置的参数属性 :math:`b ` , 如果设置为False，则不会向输出单元添加偏置量。如果设置为零，偏置初始化为零。默认值:None。
-  - **act** (str):将激活应用于输出的 group normalizaiton
-  - **data_layout** (string|NCHW): 只支持NCHW。
-  - **name** (str):这一层的名称（可选）
-
-返回： Variable: 一个张量变量，它是对输入进行 group normalization 后的结果。
+  - **input** (Variable)：输入为4-D Tensor，数据类型为float32或float64。
+  - **groups** (int)：从 channel 中分离出来的 group 的数目。
+  - **epsilon** (float)：为防止方差除以零，增加一个很小的值。
+  - **param_attr** (ParamAttr，可选)：可学习标度的参数属性 :math:`g` ，如果设置为False，则不会向输出单元添加标度。如果设置为0，偏差初始化为1。默认值：None
+  - **bias_attr** (ParamAttr，可选)：可学习偏置的参数属性 :math:`b` ，如果设置为False，则不会向输出单元添加偏置量。如果设置为零，偏置初始化为零。默认值：None。
+  - **act** (str，可选)：将激活应用于输出的 group normalizaiton。
+  - **data_layout** (str，可选)：数据格式，支持 NCHW (num_batches，channels，height，width) 或 NHWC (num_batches，height，width，channels)，默认值：‘NCHW’。
+  - **name** (str，可选)：该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
+
+返回：group normalization 的结果，数据类型和格式与 input 一致。
+
+返回类型：Variable

 **代码示例：**


--- a/doc/fluid/api_cn/layers_cn/image_resize_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/image_resize_cn.rst
@@ -5,11 +5,13 @@ image_resize

 .. py:function:: paddle.fluid.layers.image_resize(input, out_shape=None, scale=None, name=None, resample='BILINEAR', actual_shape=None, align_corners=True, align_mode=1)

-调整一个batch中图片的大小。
+**注意:** 参数 ``actual_shape`` 将被弃用，请使用 ``out_shape`` 替代。

-输入张量的shape为(num_batches, channels, in_h, in_w)或者(num_batches, channels, in_d, in_h, in_w)，并且调整大小只适用于最后两、三个维度(深度，高度和宽度)。
+该OP用于调整一个batch中图片的大小。

-支持重新取样方法:
+输入张量的shape为(num_batches, channels, in_h, in_w)或者(num_batches, channels, in_d, in_h, in_w)，并且调整大小只适用于最后两或三个维度(深度，高度和宽度)。
+
+支持的插值方法:

    BILINEAR：双线性插值

@@ -105,25 +107,25 @@ Align_corners和align_mode是可选参数，插值的计算方法可以由它们


 有关最近邻插值的详细信息，请参阅维基百科：
-https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation。
+https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation

 有关双线性插值的详细信息，请参阅维基百科：
-https://en.wikipedia.org/wiki/Bilinear_interpolation。
+https://en.wikipedia.org/wiki/Bilinear_interpolation

 有关三线插值的详细信息，请参阅维基百科：
-https://en.wikipedia.org/wiki/Trilinear_interpolation。
+https://en.wikipedia.org/wiki/Trilinear_interpolation

 参数:
-    - **input** (Variable) - 图片调整层的输入张量，这是一个shape=4的张量(num_batches, channels, in_h, in_w)或者5维张量(num_batches, channels, in_d, in_h, in_w)。
-    - **out_shape** (list|tuple|Variable|None) - 图片调整层的输出，输入为4D张量时shape为(out_h, out_w)。输入为5D张量时shape为(out_d, out_h, out_w)，默认值:None
-    - **scale** (float|None)-输入的高度或宽度的乘数因子 。 out_shape和scale至少要设置一个。out_shape的优先级高于scale。默认值:None
-    - **name** (str|None) - 该层的名称(可选)。如果设置为None，该层将被自动命名
-    - **resample** (str) - 重采样方法。支持“双线性”,“三线性”,“临近插值”,。默认值:双线性插值
-    - **actual_shape** (Variable) - 可选输入，用于动态指定输出形状。如果指定actual_shape，图像将根据给定的形状调整大小，而不是根据指定形状的 :code:`out_shape` 和 :code:`scale` 进行调整。也就是说， :code:`actual_shape` 具有最高的优先级。如果希望动态指定输出形状，建议使用 :code:`actual_shape` 而不是 :code:`out_shape` 。在使用actual_shape指定输出形状时，还需要设置out_shape和scale之一，否则在图形构建阶段会出现错误。默认值:None
-    - **align_corners** （bool）- 一个可选的bool型参数，如果为True，则将输入和输出张量的4个角落像素的中心对齐，并保留角点像素的值。 默认值：True
-    - **align_mode** （int）- 双线性插值的可选项。 可以是 '0' 代表src_idx = scale *（dst_indx + 0.5）-0.5；可以为'1' ，代表src_idx = scale * dst_index。
+    - **input** (Variable) - 形状为(num_batches, channels, in_h, in_w)的4-D Tensor或者形状为(num_batches, channels, in_d, in_h, in_w)的5-D Tensor。
+    - **out_shape** (list|tuple|Variable|None) - 输出Tensor，输入为4D张量时，形状为为(out_h, out_w)的2-D Tensor。输入为5-D Tensor时，形状为(out_d, out_h, out_w)的3-D Tensor。如果 :code:`out_shape` 是列表，每一个元素可以是整数或者形状为[1]的变量。如果 :code:`out_shape` 是变量，则其维度大小为1。默认值为None。
+    - **scale** (float|Variable|None)-输入的高度或宽度的乘数因子 。 out_shape和scale至少要设置一个。out_shape的优先级高于scale。默认值为None。
+    - **name** (str|None) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` 。默认值为None。
+    - **resample** (str) - 插值方法。支持“双线性”,“三线性”,“临近插值”。默认值为双线性插值。
+    - **actual_shape** (Variable) - 可选输入，用于动态指定输出形状。如果指定actual_shape，图像将根据给定的形状调整大小，而不是根据指定形状的 :code:`out_shape` 和 :code:`scale` 进行调整。也就是说， :code:`actual_shape` 具有最高的优先级。如果希望动态指定输出形状，建议使用 :code:`out_shape` ，因为 :code:`actual_shape` 未来将被弃用。在使用actual_shape指定输出形状时，还需要设置out_shape和scale之一，否则在图形构建阶段会出现错误。默认值:None
+    - **align_corners** （bool）- 一个可选的bool型参数，如果为True，则将输入和输出张量的4个角落像素的中心对齐，并保留角点像素的值。 默认值为True
+    - **align_mode** （int）- 双线性插值的可选项。 可以是 '0' 代表src_idx = scale *（dst_indx + 0.5）-0.5；如果为'1' ，代表src_idx = scale * dst_index。

-返回： 4维tensor，shape为 (num_batches, channls, out_h, out_w).或者5维tensor，shape为 (num_batches, channls, out_d, out_h, out_w).
+返回： 4-D Tensor，形状为 [num_batches, channls, out_h, out_w]。或者5-D Tensor，形状为 [num_batches, channls, out_d, out_h, out_w]。

 返回类型: 变量（variable）

@@ -145,9 +147,31 @@ https://en.wikipedia.org/wiki/Trilinear_interpolation。

  import paddle.fluid as fluid
  input = fluid.layers.data(name="input", shape=[3,6,9], dtype="float32")
-  out = fluid.layers.image_resize(input, out_shape=[12, 12], resample="NEAREST")
-
+  # input.shape = [-1, 3, 6, 9], where -1 indicates batch size, and it will get the exact value in runtime.

+  out = fluid.layers.image_resize(input, out_shape=[12, 12], resample="NEAREST")
+  out0 = fluid.layers.image_resize(input, out_shape=[12, 12], resample="NEAREST")
+  # out0.shape = [-1, 3, 12, 12], it means out0.shape[0] = input.shape[0] in runtime.
+
+  # out_shape is a list in which each element is a integer or a tensor Variable
+  dim1 = fluid.layers.data(name="dim1", shape=[1], dtype="int32", append_batch_size=False)
+  out1 = fluid.layers.image_resize(input, out_shape=[12, dim1], resample="NEAREST")
+  # out1.shape = [-1, 3, 12, -1]
+
+  # out_shape is a 1-D tensor Variable
+  shape_tensor = fluid.layers.data(name="shape_tensor", shape=[2], dtype="int32", append_batch_size=False)
+  out2 = fluid.layers.image_resize(input, out_shape=shape_tensor, resample="NEAREST")
+  # out2.shape = [-1, 3, -1, -1]
+
+  # when use actual_shape
+  actual_shape_tensor = fluid.layers.data(name="actual_shape_tensor", shape=[2], dtype="int32", append_batch_size=False)
+  out3 = fluid.layers.image_resize(input, out_shape=[4, 4], resample="NEAREST", actual_shape=actual_shape_tensor)
+  # out3.shape = [-1, 3, 4, 4]
+
+  # scale is a Variable
+  scale_tensor = fluid.layers.data(name="scale", shape=[1], dtype="float32", append_batch_size=False)
+  out4 = fluid.layers.image_resize(input, scale=scale_tensor)
+  # out4.shape = [-1, 3, -1, -1]




--- a/doc/fluid/api_cn/layers_cn/image_resize_short_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/image_resize_short_cn.rst
@@ -5,10 +5,10 @@ image_resize_short

 .. py:function:: paddle.fluid.layers.image_resize_short(input, out_short_len, resample='BILINEAR')

-调整一批图片的大小。输入图像的短边将被调整为给定的out_short_len 。输入图像的长边按比例调整大小，最终图像的长宽比保持不变。
+该OP用于调整一批图片的大小。输入图像的短边将被调整为给定的out_short_len 。输入图像的长边按比例调整大小，最终图像的长宽比保持不变。

 参数:
-        - **input** (Variable) -  图像调整图层的输入张量，这是一个4维的形状张量(num_batch, channels, in_h, in_w)。
+        - **input** (Variable) -  图像调整图层的输入张量，这是一个维度为[num_batch, channels, in_h, in_w]的4-D Tensor。
        - **out_short_len** (int) -  输出图像的短边长度。
        - **resample** (str) - resample方法，默认为双线性插值。


--- a/doc/fluid/api_cn/layers_cn/l2_normalize_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/l2_normalize_cn.rst
@@ -5,25 +5,24 @@ l2_normalize

 .. py:function:: paddle.fluid.layers.l2_normalize(x,axis,epsilon=1e-12,name=None)

-L2正则（L2 normalize Layer）
-
-该层用欧几里得距离之和对维轴的x归一化。对于1-D张量（系数矩阵的维度固定为0），该层计算公式如下：
+该OP计算欧几里得距离之和对x进行归一化。对于1-D张量（系数矩阵的维度固定为0）
+计算公式如下：

 .. math::

-    y=\frac{x}{\sqrt{\sum x^{2}+epsion}}
+    y=\frac{x}{\sqrt{\sum x^{2}+epsilon}}

-对于x多维的情况，该函数分别对维度轴上的每个1-D切片单独归一化
+对于输入为多维Tensor的情况，该OP分别对维度轴上的每个1-D切片单独归一化

 参数：
-    - **x** (Variable|list)- l2正则层（l2_normalize layer）的输入
-    - **axis** (int)-运用归一化的轴。如果轴小于0，归一化的维是rank(X)+axis。-1是最后维
-    - **epsilon** (float)-epsilon用于避免分母为0，默认值为1e-12
-    - **name** (str|None)-该层名称（可选）。如果设为空，则自动为该层命名
+    - **x** (Variable) - 维度为 :math:`[N_1, N_2, ..., N_k, D]` 的多维Tensor，其中最后一维D是类别数目。数据类型为float32或float64。
+    - **axis** (int) - 归一化的轴。如果轴小于0，归一化的维是rank(X)+axis。其中，-1用来表示最后一维。
+    - **epsilon** (float) - epsilon，用于避免除0，默认值为1e-12。
+    - **name** (str|None) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` 。默认值为None。

-    返回：输出张量，同x的维度一致
+    返回：与输入x的维度一致的Tensor

-    返回类型：变量
+    返回类型：Variable

 **代码示例**：


--- a/doc/fluid/api_cn/layers_cn/less_equal_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/less_equal_cn.rst
@@ -5,25 +5,28 @@ less_equal

 .. py:function:: paddle.fluid.layers.less_equal(x, y, cond=None)

-该层逐元素地返回 :math:`x <= y` 的逻辑值，和重载算子 `<=` 相同。
+该OP逐元素地返回 :math:`x <= y` 的逻辑值，使用重载算子 `<=` 可以有相同的计算函数效果。

 参数：
-    - **x** (Variable) - *less_equal* 的第一个操作数
-    - **y** (Variable) - *less_equal* 的第二个操作数
-    - **cond** (Variable|None) - 可选的输出变量，存储 *less_equal* 的结果
+    - **x** (Variable) – 进行比较的第一个输入，是一个多维的Tensor，数据类型可以是float32，float64，int32，int64。 
+    - **y** (Variable) – 进行比较的第二个输入，是一个多维的Tensor，数据类型可以是float32，float64，int32，int64。
+    - **cond** (Variable，可选) – 如果为None，则创建一个Tensor来作为进行比较的输出结果，该Tensor的shape和数据类型和输入x一致；如果不为None，则将Tensor作为该OP的输出，数据类型和数据shape需要和输入x一致。默认值为None。 

-返回：存储 *less_equal* 的输出的张量变量。
+返回：输出结果的Tensor，数据的shape和输入x一致。

-返回类型：变量（Variable）
+返回类型：Variable，数据类型为bool类型。

 **代码示例**:

 .. code-block:: python

     import paddle.fluid as fluid
-     label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-     limit = fluid.layers.fill_constant(shape=[1], value=1, dtype='int64')
-     out = fluid.layers.less_equal(x=label, y=limit)
+     import paddle.fluid.layers as layers
+     import numpy as np
+     label = layers.assign(np.array([1, 3], dtype='int32'))
+     limit = layers.assign(np.array([1, 2], dtype='int32'))
+     out = fluid.layers.less_equal(x=label, y=limit) #out=[True, False]
+     out1 = label<= limit #out1=[True, False]




--- a/doc/fluid/api_cn/layers_cn/linear_chain_crf_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/linear_chain_crf_cn.rst
@@ -3,7 +3,7 @@
 linear_chain_crf
 -------------------------------

-.. py:function:: paddle.fluid.layers.linear_chain_crf(input, label, param_attr=None)
+.. py:function:: paddle.fluid.layers.linear_chain_crf(input, label, param_attr=None, length=None)

 线性链条件随机场（Linear Chain CRF）

@@ -23,7 +23,7 @@ linear_chain_crf

 其中Z是归一化值，所有可能序列的P(s)之和为1，x是线性链条件随机场（linear chain CRF）的发射（emission）特征权重。

-线性链条件随机场最终输出mini-batch每个训练样本的条件概率的对数
+线性链条件随机场最终输出每个batch训练样本的条件概率的对数


  1.这里 :math:`x` 代表Emission
@@ -48,41 +48,89 @@ linear_chain_crf
    3.Emission的第二维度必须和标记数字（tag number）相同。

 参数：
-    - **input** (Variable，LoDTensor，默认float类型LoDTensor) - 一个二维LoDTensor，shape为[N*D]，N是mini-batch的大小，D是总标记数。线性链条件随机场的未缩放发射权重矩阵
-    - **input** (Tensor，默认float类型LoDTensor) - 一个二维张量，shape为[(D+2)*D]。linear_chain_crf操作符的可学习参数。更多详情见operator注释
-    - **label** (Variable，LoDTensor，默认int64类型LoDTensor） - shape为[N*10的LoDTensor，N是mini-batch的总元素数
-    - **param_attr** (ParamAttr) - 可学习参数的属性
+    - **input** (LoDTensor|Tensor) - 数据类型为float32， float64的Tensor或者LoDTensor。线性链条件随机场的发射矩阵emission。输入为LoDTensor时，是一个shape为[N*D]的2-D LoDTensor，N是每一个batch中batch对应的长度数想加的总数，D是维度。当输入为Tensor时，应该是一个shape为[N x S x D]的Tensor，N是batch_size，S为序列的最大长度，D是维度。
+    - **label** (Tensor|LoDTensor） - 数据类型为int64类型Tensor或者LoDTensor。该值为标签值。输入为LoDTensor时[N x 1]，N是mini-batch的总数;输入为Tensor时，[N x S],N为batch数量，S为序列的最大长度。
+    - **Length** (Tensor) - 数据类型为int64类型的Tensor。 shape为[M x 1]的Tensor,M为mini_batch中序列的数量。
+    - **param_attr** (ParamAttr) - 可学习参数的属性，为transition矩阵。详见代码示例。

 返回：
-    output(Variable，Tensor，默认float类型Tensor)：shape为[N*D]的二维张量。Emission的指数。这是前向计算中的中间计算结果，在后向计算中还会复用
+    Emission的指数形式。shape与Emission相同。这是前向计算中的中间计算结果，在反向计算中还会复用。

-    output(Variable，Tensor，默认float类型Tensor)：shape为[(D+2)*D]的二维张量。Transition的指数。这是前向计算中的中间计算结果，在后向计算中还会复用
+    Transition的指数形式。shape为[(D+2)*D]的二维张量。这是前向计算中的中间计算结果，在反向计算中还会复用。

-    output(Variable,Tensor，默认float类型Tensor)：mini-batch每个训练样本的条件概率的对数。这是一个shape为[S*1]的二维张量，S是mini-batch的序列数。注：S等同于mini-batch的序列数。输出不再是LoDTensor
+    条件概率的对数形式。每个batch训练样本的条件概率的对数。这是一个shape为[S*1]的二维张量，S是mini-batch的序列数。注：S等于mini-batch的序列数。输出不再是LoDTensor。
+
+返回类型：
+    Emission的指数形式。Variable(Tensor|LoDTensor)：数据类型为float32， float64的Tensor或者LoDTensor。
+
+    Transition的指数形式。Variable(Tensor|LoDTensor)：数据类型为float32， float64的Tensor或者LoDTensor。
+
+    条件概率的对数形式。Variable(Tensor)：数据类型为float32， float64的Tensor。

-返回类型：output（Variable）

 **代码示例：**

 .. code-block:: python

    import paddle.fluid as fluid
-    emission = fluid.layers.data(name='emission', shape=[1000], dtype='float32')
-    target = fluid.layers.data(name='target', shape=[1], dtype='int32')
-    crf_cost = fluid.layers.linear_chain_crf(
-        input=emission,
-        label=target,
-        param_attr=fluid.ParamAttr(
+    import numpy as np
+
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    with fluid.program_guard(train_program, startup_program):
+        input_data = fluid.layers.data(name='input_data', shape=[10], dtype='float32', lod_level=1)
+        label = fluid.layers.data(name='label', shape=[1], dtype='int', lod_level=1)
+        emission= fluid.layers.fc(input=input_data, size=10, act="tanh")
+        crf_cost = fluid.layers.linear_chain_crf(
+            input=emission,
+            label=label,
+            param_attr=fluid.ParamAttr(
            name='crfw',
-            learning_rate=0.2))
-
-
-
-
-
-
-
-
-
-
-
+            learning_rate=0.01))
+    use_cuda = False
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+    #using LoDTensor, define network
+    a = fluid.create_lod_tensor(np.random.rand(12,10).astype('float32'), [[3,3,4,2]], place)
+    b = fluid.create_lod_tensor(np.array([[1],[1],[2],[3],[1],[1],[1],[3],[1],[1],[1],[1]]),[[3,3,4,2]] , place)
+    feed1 = {'input_data':a,'label':b}
+    loss= exe.run(train_program,feed=feed1, fetch_list=[crf_cost])
+    print(loss)
+
+    #using padding, define network
+    train_program = fluid.Program()
+    startup_program = fluid.Program()
+    with fluid.program_guard(train_program, startup_program):
+        input_data2 = fluid.layers.data(name='input_data2', shape=[10,10], dtype='float32')
+        label2 = fluid.layers.data(name='label2', shape=[10,1], dtype='int')
+        label_length = fluid.layers.data(name='length', shape=[1], dtype='int')
+        emission2= fluid.layers.fc(input=input_data2, size=10, act="tanh", num_flatten_dims=2)
+        crf_cost2 = fluid.layers.linear_chain_crf(
+            input=emission2,
+            label=label2,
+            length=label_length,
+            param_attr=fluid.ParamAttr(
+             name='crfw',
+             learning_rate=0.01))
+
+    use_cuda = False
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(startup_program)
+
+    #define input data
+    cc=np.random.rand(4,10,10).astype('float32')
+    dd=np.random.rand(4,10,1).astype('int64')
+    ll=np.array([[3,3,4,2]])
+    feed2 = {'input_data2':cc,'label2':dd,'length':ll}
+
+    loss2= exe.run(train_program,feed=feed2, fetch_list=[crf_cost2])
+    print(loss2)
+    """
+    output:
+    [array([[ 7.8902354],
+            [ 7.3602567],
+            [ 10.004011],
+            [ 5.86721  ]], dtype=float32)]
+    """
\ No newline at end of file
--- a/doc/fluid/api_cn/layers_cn/linspace_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/linspace_cn.rst
@@ -5,19 +5,17 @@ linspace

 .. py:function:: paddle.fluid.layers.linspace(start, stop, num, dtype)

-在给定区间内返回固定数目的均匀间隔的值。
+该OP在给定区间内返回固定数目的均匀间隔的值。
 
-第一个entry是start，最后一个entry是stop。在Num为1的情况下，仅返回start。类似numpy的linspace功能。
-
 参数：
-    - **start** (float|Variable)-序列中的第一个entry。 它是一个浮点标量，或是一个数据类型为'float32'|'float64'、形状为[1]的张量。
-    - **stop** (float|Variable)-序列中的最后一个entry。 它是一个浮点标量，或是一个数据类型为'float32'|'float64'、形状为[1]的张量。
-    - **num** (int|Variable)-序列中的entry数。 它是一个整型标量，或是一个数据类型为int32、形状为[1]的张量。
-    - **dtype** (string)-‘float32’|’float64’，输出张量的数据类型。
+    - **start** (float|Variable) – start是区间开始的变量，可以是一个浮点标量，或是一个shape为[1]的Tensor，该Tensor的数据类型可以是float32或者是float64。
+    - **stop** (float|Variable) – end是区间结束的变量，可以是一个浮点标量，或是一个shape为[1]的Tensor，该Tensor的数据类型可以是float32或者是float64。
+    - **num** (int|Variable) – num是给定区间内需要划分的区间数，可以是一个整型标量，或是一个shape为[1]的Tensor，该Tensor的数据类型需为int32。
+    - **dtype** (string) – 输出Tensor的数据类型，可以是‘float32’或者是‘float64’。

-返回：存储一维张量的张量变量
+返回：表示等间隔划分结果的1-D Tensor，该Tensor的shape大小为 :math:`[num]` ，在mum为1的情况下，仅返回包含start元素值的Tensor。

-返回类型：变量（Variable）
+返回类型：Variable

 **代码示例**：


--- a/doc/fluid/api_cn/layers_cn/logsigmoid_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/logsigmoid_cn.rst
@@ -5,7 +5,7 @@ logsigmoid

 .. py:function:: paddle.fluid.layers.logsigmoid(x, name=None)

-Logsigmoid激活函数。
+Logsigmoid激活函数


 .. math::
@@ -14,18 +14,20 @@ Logsigmoid激活函数。


 参数:
-    - **x** - LogSigmoid算子的输入
-    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
+    - **x** (Variable)- 张量（Tensor）
+    - **name** (str|None) - 该层名称（可选），若设为None，则自动为该层命名。

-返回：        LogSigmoid算子的输出
+返回: 张量(Tensor)
+
+返回类型: 变量(Variable)

 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[32, 784])
-        result = fluid.layers.logsigmoid(data)
+    import paddle.fluid as fluid
+    data = fluid.layers.data(name="input", shape=[32, 784])
+    result = fluid.layers.logsigmoid(data)




--- a/doc/fluid/api_cn/layers_cn/match_matrix_tensor_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/match_matrix_tensor_cn.rst
-.. _cn_api_fluid_layers_match_matrix_tensor:
-
-match_matrix_tensor
-------------------------------
-
-.. py:function:: paddle.fluid.layers.match_matrix_tensor(x, y, channel_num, act=None, param_attr=None, dtype='float32', name=None)
-
-计算两个长度可变词序列的语义匹配矩阵，给一个长度为n的问题A，和一个长度为m的标题B，输入形状为[n, h]和[m, h]，h为hidden_size。如果channel_num设置为3，将会生成一个形为[h, 3, h]的参数可学习的矩阵W。接着语义匹配矩阵将会通过A * W * B.T = [n, h]*[h, 3, h]*[h, m] = [n, 3, m]来计算A和B。可学习参数矩阵W在这个过程中相当于一个全链接层。如果提供了激活函数，相关激活函数将会被用到输出中。x和y应当为LodTensor并且仅支持一个level LoD。
-
-给一个1-level LoDTensor x:
-
-    x.lod =  [[2,                     3,                               ]]
-
-    x.data = [[0.3, 0.1], [0.2, 0.3], [0.5, 0.6], [0.7, 0.1], [0.3, 0.4]]
-
-    x.dims = [5, 2]
-
-y是一个Tensor:
-
-    y.lod =  [[3,                                 1,       ]]
-
-    y.data = [[0.1, 0.2], [0.3, 0.7], [0.9, 0.2], [0.4, 0.1]]
-
-    y.dims = [4, 2]
-
-channel_num设为2，我们就可以得到一个 1-level LoDTensor:
-
-    out.lod =  [[12, 6]]   # where 12 = channel_num * x.lod[0][0] * y.lod[0][0]
-
-    out.dims = [18, 1]     # where 18 = 12 + 6
-
-参数：
-    - **x** (Variable) - 1-level的输入LoDTensor。
-    - **y** (Variable) - 1-level的输入LoDTensor。
-    - **channel_num** (int) - 可学习参数W的通道数。
-    - **act** (str,默认为None) - 激活函数。
-    - **param_attr** (ParamAttr|ParamAttr的列表，默认为None) - 此层可学习参数的属性。
-    - **dtype** ('float32') - w数据的数据类型。
-    - **name** (str|None) - 层名，若为None，则自动设置。
-    
-返回：由此层指定LoD的输出
-
-返回类型：变量（Variable）
-
-**代码示例**：
-
-.. code-block:: python
-
-    import numpy as np
-    from paddle.fluid import layers
-
-    x_lod_tensor = layers.data(name='x', shape=[10], lod_level=1)
-    y_lod_tensor = layers.data(name='y', shape=[10], lod_level=1)
-    out, out_tmp = layers.match_matrix_tensor(x=x_lod_tensor, y=y_lod_tensor, channel_num=3)
-
-
-
-
-
-
-
--- a/doc/fluid/api_cn/layers_cn/not_equal_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/not_equal_cn.rst
@@ -5,26 +5,28 @@ not_equal

 .. py:function:: paddle.fluid.layers.not_equal(x, y, cond=None)

-该层逐元素地返回 :math:`x != y` 的逻辑值，和重载算子 `!=` 相同。
+该OP逐元素地返回 :math:`x != y` 的逻辑值，使用重载算子 `!=` 可以有相同的计算函数效果。

 参数：
-    - **x** (Variable) - *not_equal* 的第一个操作数
-    - **y** (Variable) - *not_equal* 的第二个操作数
-    - **cond** (Variable|None) - 可选的输出变量，存储 *not_equal* 的结果
+    - **x** (Variable) – 进行比较的第一个输入，是一个多维的Tensor，数据类型可以是float32，float64，int32，int64。 
+    - **y** (Variable) – 进行比较的第二个输入，是一个多维的Tensor，数据类型可以是float32，float64，int32，int64。
+    - **cond** (Variable，可选) – 如果为None，则创建一个Tensor来作为进行比较的输出结果，该Tensor的shape和数据类型和输入x一致；如果不为None，则将Tensor作为该OP的输出，数据类型和数据shape需要和输入x一致。默认值为None。 

-返回：存储 *not_equal* 的输出的张量变量。
+返回：输出结果的Tensor，数据的shape和输入x一致。

-返回类型：变量（Variable）
+返回类型：变量（Variable），数据类型为bool类型。

 **代码示例**:

 .. code-block:: python

     import paddle.fluid as fluid
-
-     label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-     limit = fluid.layers.fill_constant(shape=[1], value=1, dtype='int64')
-     out = fluid.layers.not_equal(x=label, y=limit)
+     import paddle.fluid.layers as layers
+     import numpy as np
+     label = layers.assign(np.array([2, 3], dtype='int32'))
+     limit = layers.assign(np.array([3, 2], dtype='int32'))
+     out = fluid.layers.not_equal(x=label, y=limit) #out=[True, True]
+     out1 = label != limit #out1=[True, True]




--- a/doc/fluid/api_cn/layers_cn/pixel_shuffle_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pixel_shuffle_cn.rst
@@ -5,9 +5,7 @@ pixel_shuffle

 .. py:function:: paddle.fluid.layers.pixel_shuffle(x, upscale_factor)

-pixel shuffle 层（像素重组层）
-
-该层将一个形为[N, C, H, W]的张量重新排列成形为 [N, C/r**2, H*r, W*r] 的张量。这样做有利于实现步长（stride）为1/r的高效sub-pixel（亚像素）卷积。详见Shi等人在2016年发表的论文 `Real Time Single Image and Video Super Resolution Using an Efficient Sub Pixel Convolutional Neural Network <https://arxiv.org/abs/1609.05158v2>`_ 。
+该OP将一个形为[N, C, H, W]的Tensor重新排列成形为 [N, C/r**2, H*r, W*r] 的Tensor。这样做有利于实现步长（stride）为1/r的高效sub-pixel（亚像素）卷积。详见Shi等人在2016年发表的论文 `Real Time Single Image and Video Super Resolution Using an Efficient Sub Pixel Convolutional Neural Network <https://arxiv.org/abs/1609.05158v2>`_ 。

 .. code-block:: text

@@ -16,7 +14,7 @@ pixel shuffle 层（像素重组层）
    那么输出张量的形为：[1, 1, 12, 12]

 参数：
-          - **x** （Variable）- 输入Tensor变量。
+          - **x** （Variable）- 维度为 :math:`[N_1, N_2, ..., N_k, D]` 的多维Tensor，其中最后一维D是类别数目。数据类型为float32或float64。
          - **upscale_factor** （int）- 增大空间分辨率的增大因子


@@ -24,7 +22,7 @@ pixel shuffle 层（像素重组层）

 返回类型：  Variable

-抛出异常： ``ValueError``  - 如果upscale_factor的平方不能整除输入的通道维(C)大小。
+抛出异常： ``ValueError``  - 如果upscale_factor的平方不能整除输入的通道维度(C)的大小。


 **示例代码**

--- a/doc/fluid/api_cn/layers_cn/pow_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pow_cn.rst
@@ -12,8 +12,8 @@ pow
    out = x^{factor}

 参数
-    - **x** (Variable) - Pow operator的输入
-    - **factor** (FLOAT|1.0) - Pow的指数因子
+    - **x** (Variable) - Pow operator的输入。
+    - **factor** (FLOAT|Variable|1.0) - Pow的指数因子。
    - **name** (str|None) -这个层的名称(可选)。如果设置为None，该层将被自动命名。

 返回: 输出Pow操作符
@@ -26,8 +26,17 @@ pow
 .. code-block:: python

    import paddle.fluid as fluid
+
    x = fluid.layers.data(name="x", shape=[3,10,32,32], dtype="float32")
-    y = fluid.layers.pow(x, factor=2.0)
+
+    # example 1: argument factor is float
+    y_1 = fluid.layers.pow(x, factor=2.0)
+    # y_1 is x^{2.0}
+
+    # example 2: argument factor is Variable
+    factor_tensor = fluid.layers.fill_constant([1], "float32", 3.0)
+    y_2 = fluid.layers.pow(x, factor=factor_tensor)
+    # y_2 is x^{2.0}




--- a/doc/fluid/api_cn/layers_cn/prior_box_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/prior_box_cn.rst
@@ -4,28 +4,25 @@ prior_box
 -------------------------------
 .. py:function:: paddle.fluid.layers.prior_box(input,image,min_sizes=None,max_sizes=None,aspect_ratios=[1.0],variance=[0.1,0.1,0.2,0.2],flip=False,clip=False,steps=[0.0,0.0],offset=0.5,name=None,min_max_aspect_ratios_order=False)

-**Prior Box操作符**
-
-为SSD(Single Shot MultiBox Detector)算法生成先验框。输入的每个位产生N个先验框，N由min_sizes,max_sizes和aspect_ratios的数目决定，先验框的尺寸在(min_size,max_size)之间，该尺寸根据aspect_ratios在序列中生成。
+该OP为SSD(Single Shot MultiBox Detector)算法生成候选框。输入的每个位产生N个候选框，N由min_sizes,max_sizes和aspect_ratios的数目决定，候选框的尺寸在(min_size,max_size)之间，该尺寸根据aspect_ratios在序列中生成。

 参数：
-    - **input** (Variable)-输入变量，格式为NCHW
-    - **image** (Variable)-PriorBoxOp的输入图像数据，布局为NCHW
-    - **min_sizes** (list|tuple|float值)-生成的先验框的最小尺寸
-    - **max_sizes** (list|tuple|None)-生成的先验框的最大尺寸。默认：None
-    - **aspect_ratios** (list|tuple|float值)-生成的先验框的纵横比。默认：[1.]
-    - **variance** (list|tuple)-先验框中的变量，会被解码。默认：[0.1,0.1,0.2,0.2]
-    - **flip** (bool)-是否忽略纵横比。默认：False。
-    - **clip** (bool)-是否修建溢界框。默认：False。
-    - **step** (list|tuple)-先验框在width和height上的步长。如果step[0] == 0.0/step[1] == 0.0，则自动计算先验框在宽度和高度上的步长。默认：[0.,0.]
-    - **offset** (float)-先验框中心位移。默认：0.5
-    - **name** (str)-先验框操作符名称。默认：None
-    - **min_max_aspect_ratios_order** (bool)-若设为True,先验框的输出以[min,max,aspect_ratios]的顺序，和Caffe保持一致。请注意，该顺序会影响后面卷基层的权重顺序，但不影响最后的检测结果。默认：False。
-
-返回：
-    含有两个变量的元组(boxes,variances)
-    boxes:PriorBox的输出先验框。布局是[H,W,num_priors,4]。H是输入的高度，W是输入的宽度，num_priors是输入每位的总框数
-    variances:PriorBox的扩展变量。布局上[H,W,num_priors,4]。H是输入的高度，W是输入的宽度，num_priors是输入每位的总框数
+    - **input** (Variable) - 形状为NCHW的4-DTensor，数据类型为float32或float64。
+    - **image** (Variable) - PriorBoxOp的输入图像数据，形状为NCHW的4-D Tensor，数据类型为float32或float64。
+    - **min_sizes** (list|tuple|float) - 生成的候选框的最小尺寸。
+    - **max_sizes** (list|tuple|None) - 生成的候选框的最大尺寸。默认值为None
+    - **aspect_ratios** (list|tuple|float) - 生成的候选框的长宽比。默认值为[1.]。
+    - **variance** (list|tuple) - 在候选框中解码的方差。默认值为[0.1,0.1,0.2,0.2]。
+    - **flip** (bool) - 是否翻转。默认值为False。
+    - **clip** (bool) - 是否裁剪。默认值为False。
+    - **step** (list|tuple) - 候选框在width和height上的步长。如果step[0]等于0.0或者step[1]等于0.0，则自动计算候选框在宽度和高度上的步长。默认：[0.,0.]
+    - **offset** (float) - 候选框中心位移。默认：0.5
+    - **name** (str|None) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` 。默认值为None。
+    - **min_max_aspect_ratios_order** (bool) - 若设为True，候选框的输出以[min, max, aspect_ratios]的顺序输出，和Caffe保持一致。请注意，该顺序会影响后面卷基层的权重顺序，但不影响最后的检测结果。默认：False。
+
+返回：含有两个变量的元组，包括：
+    boxes: 候选框。形状为[H,W,num_priors,4]的4-D Tensor。其中，H是输入的高度，W是输入的宽度，num_priors是输入每位的总框数。
+    variances: 候选框的方差，形状为[H,W,num_priors,4]的4-D Tensor。其中，H是输入的高度，W是输入的宽度，num_priors是输入每位的总框数。

 返回类型：元组

@@ -42,5 +39,3 @@ prior_box
        min_sizes=[100.],
        flip=True,
        clip=True)
-
-
--- a/doc/fluid/api_cn/layers_cn/rank_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/rank_cn.rst
@@ -5,24 +5,22 @@ rank

 .. py:function::  paddle.fluid.layers.rank(input)

-排序层
+该OP用于计算输入Tensor的维度（秩）。

-返回张量的维数，一个数据类型为int32的0-D Tensor。
+参数：
+    - **input** (Variable) — 输入input是shape为 :math:`[N_1, N_2, ..., N_k]` 的多维Tensor，数据类型可以任意类型。

-参数:
-    - **input** (Variable)：输入变量
+返回：输出Tensor的秩，是一个0-D Tensor。

-返回：输入变量的秩
-
-返回类型： 变量（Variable）
+返回类型：Variable，数据类型为int32。

 **代码示例**

 .. code-block:: python

       import paddle.fluid as fluid
-       input = layers.data(
+       input = fluid.layers.data(
            name="input", shape=[3, 100, 100], dtype="float32")
-       rank = layers.rank(input) # 4
+       rank = fluid.layers.rank(input) # rank=(3,)


--- a/doc/fluid/api_cn/layers_cn/reduce_all_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reduce_all_cn.rst
@@ -5,17 +5,17 @@ reduce_all

 .. py:function:: paddle.fluid.layers.reduce_all(input, dim=None, keep_dim=False, name=None)

-计算给定维度上张量（Tensor）元素的与逻辑。
+该OP是对指定维度上的Tensor元素进行与逻辑（&）计算，并输出相应的计算结果。

 参数：
-          - **input** （Variable）：输入变量为Tensor或LoDTensor。
-          - **dim** （list | int | None）：与逻辑运算的维度。如果为None，则计算所有元素的与逻辑并返回包含单个元素的Tensor变量，否则必须在  :math:`[−rank(input),rank(input))` 范围内。如果 :math:`dim [i] <0` ，则维度将减小为 :math:`rank+dim[i]` 。
-          - **keep_dim** （bool | False）：是否在输出Tensor中保留减小的维度。除非 ``keep_dim`` 为true，否则结果张量的维度将比输入张量小。
-          - **name** （str | None）：这一层的名称（可选）。如果设置为None，则将自动命名这一层。
+    - **input** （Variable）— 输入变量为多维Tensor或LoDTensor，数据类型需要为bool类型。
+    - **dim** （list | int，可选）— 与逻辑运算的维度。如果为None，则计算所有元素的与逻辑并返回包含单个元素的Tensor变量，否则必须在  :math:`[−rank(input),rank(input))` 范围内。如果 :math:`dim [i] <0` ，则维度将减小为 :math:`rank+dim[i]` 。默认值为None。
+    - **keep_dim** （bool）— 是否在输出Tensor中保留减小的维度。如 keep_dim 为true，否则结果张量的维度将比输入张量小，默认值为False。
+    - **name** （str， 可选）— 这一层的名称。如果设置为None，则将自动命名这一层。默认值为None。

-返回：  减少维度之后的Tensor变量。
+返回：在指定dim上进行与逻辑计算的Tensor，数据类型为bool类型。

-返回类型：  变量（Variable）
+返回类型：Variable，数据类型为bool类型。

 **代码示例**

@@ -35,6 +35,8 @@ reduce_all
        out = layers.reduce_all(x)  # False
        out = layers.reduce_all(x, dim=0)  # [True, False]
        out = layers.reduce_all(x, dim=-1)  # [False, True]
+        # keep_dim=False, x.shape=(2,2), out.shape=(2,)
        
        out = layers.reduce_all(x, dim=1, keep_dim=True)  # [[False], [True]]
+        # keep_dim=True, x.shape=(2,2), out.shape=(2,1)
     
--- a/doc/fluid/api_cn/layers_cn/reduce_any_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reduce_any_cn.rst
@@ -5,17 +5,17 @@ reduce_any

 .. py:function:: paddle.fluid.layers.reduce_any(input, dim=None, keep_dim=False, name=None)

-计算给定维度上张量（Tensor）元素的或逻辑。     
+该OP是对指定维度上的Tensor元素进行或逻辑（|）计算，并输出相应的计算结果。

 参数：
-          - **input** （Variable）：输入变量为Tensor或LoDTensor。
-          - **dim** （list | int | None）：或逻辑运算的维度。如果为None，则计算所有元素的或逻辑并返回仅包含单个元素的Tensor变量，否则必须在  :math:`[−rank(input),rank(input))` 范围内。如果 :math:`dim [i] <0` ，则维度将减小为 :math:`rank+dim[i]` 。
-          - **keep_dim** （bool | False）：是否在输出Tensor中保留减小的维度。除非 ``keep_dim`` 为true，否则结果张量的维度将比输入张量小。
-          - **name** （str | None）：这一层的名称（可选）。如果设置为None，则将自动命名这一层。
+    - **input** （Variable）— 输入变量为多维Tensor或LoDTensor，数据类型需要为bool类型。
+    - **dim** （list | int，可选）— 与逻辑运算的维度。如果为None，则计算所有元素的与逻辑并返回包含单个元素的Tensoe变量，否则必须在  :math:`[−rank(input),rank(input))` 范围内。如果 :math:`dim [i] <0` ，则维度将减小为 :math:`rank+dim[i]` 。默认值为None。
+    - **keep_dim** （bool）— 是否在输出Tensor中保留减小的维度。如 keep_dim 为true，否则结果张量的维度将比输入张量小，默认值为False。
+    - **name** （str，可选）— 这一层的名称（可选）。如果设置为None，则将自动命名这一层。默认值为None。

-返回：  减少维度之后的Tensor变量。
+返回：在指定dim上进行或逻辑计算的Tensor，数据类型为bool类型。

-返回类型：  变量（Variable）
+返回类型：Variable，数据类型为bool类型。

 **代码示例**

@@ -35,8 +35,12 @@ reduce_any
        out = layers.reduce_any(x)  # True
        out = layers.reduce_any(x, dim=0)  # [True, False]
        out = layers.reduce_any(x, dim=-1)  # [True, False]
+        # keep_dim=False, x.shape=(2,2), out.shape=(2,)
+
        out = layers.reduce_any(x, dim=1,
                         keep_dim=True)  # [[True], [False]]
+        # keep_dim=True, x.shape=(2,2), out.shape=(2,1)
+




--- a/doc/fluid/api_cn/layers_cn/reshape_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reshape_cn.rst
@@ -7,8 +7,8 @@ reshape

 保持输入张量数据不变的情况下，改变张量的形状。

-目标形状可由 ``shape`` 或 ``actual_shape`` 给出。``shape`` 是一个整数列表，而 ``actual_shape`` 是一个张量变量。
-当两个属性同时被指定时，``actual_shape`` 的优先级高于 ``shape`` ，但在编译时仍然应该正确地设置 ``shape`` 以保证形状推断。
+目标形状可由 ``shape`` 或 ``actual_shape`` 给出。``shape`` 可以是一个包含整数或张量的列表，或者是一个张量变量，而 ``actual_shape`` 是一个张量变量。
+当两个属性同时被指定时，``actual_shape`` 的优先级高于 ``shape`` ，但此时 ``shape`` 只能是整数列表，且在编译时仍然应该正确地设置 ``shape`` 以保证形状推断。

 在指定目标shape时存在一些技巧：

@@ -26,27 +26,40 @@ reshape
  2. 给定一个形状为[2,4,6]的三维张量x，指定的目标形状为[2,3,-1,2]， ``reshape``将x变换为形状为[2,3,4,2]的4- d张量，不改变x的数据。在这种情况下，目标形状的一个维度被设置为-1，这个维度的值是从x的元素总数和剩余维度推断出来的。
  3. 给定一个形状为[2,4,6]的三维张量x，目标形状为[- 1,0,3,2]，整形算子将x变换为形状为[2,4,3,2]的四维张量，使x的数据保持不变。在这种情况下，0意味着实际的维值将从x的对应维数中复制,-1位置的维度由x的元素总数和剩余维度计算得来。

+**注意:** 参数``actual_shape`` 之后将被舍弃，只用参数 ``shape`` 来表示目标形状。
+
 参数：
-  - **x** (variable) - 输入张量
-  - **shape** (list) - 新的形状。新形状最多只能有一个维度为-1。
-  - **actual_shape** (variable) - 一个可选的输入。如果提供，则根据 ``actual_shape`` 进行 reshape，而不是指定 ``shape`` 。也就是说，actual_shape具有比shape更高的优先级。
-  - **act** (str) - 对reshpe后的tensor变量执行非线性激活
+  - **x** (Variable) - 输入张量。
+  - **shape** (list|tuple|Variable) - 新的形状。新形状最多只能有一个维度为-1。如果 ``shape``是一个 list 或 tuple, 它可以包含整数或者 Variable 类型的元素，但是 Variable 类型元素的形状只能是[1]。
+  - **actual_shape** (Variable) - 一个可选的输入。如果提供，则根据 ``actual_shape`` 进行 reshape，而不是指定 ``shape`` 。也就是说，``actual_shape`` 具有比 ``shape`` 更高的优先级，此时 ``shape`` 只能是整数列表。 ``actual_shape`` 将在未来的版本中舍弃。更新提示：``actual_shape`` 将被舍弃并用 ``shape`` 代替。
+  - **act** (str) - 对reshpe后的tensor变量执行非线性激活。
  - **inplace** (bool) - 如果 ``inplace`` 为True，则 ``layers.reshape`` 的输入和输出是同一个变量，否则， ``layers.reshape`` 的输入和输出是不同的变量。请注意，如果x作为多个层的输入，则 ``inplace`` 必须为False。
-  - **name** (str) -  可选变量，此层的名称
+  - **name** (str) -  可选变量，此层的名称。

 返回：如果 ``act`` 为 ``None``,返回reshape后的tensor变量。如果 ``inplace`` 为 ``False`` ,将返回一个新的Tensor变量，否则，将改变x自身。如果 ``act`` 不是 ``None`` ，则返回激活的张量变量。

-抛出异常：``TypeError`` - 如果 actual_shape 既不是变量也不是None
+抛出异常：``TypeError`` - 如果 actual_shape 既不是变量也不是None.

 **代码示例**

 .. code-block:: python

  import paddle.fluid as fluid
-  data = fluid.layers.data(
-      name='data', shape=[2, 4, 6], dtype='float32')
-  reshaped = fluid.layers.reshape(
-      x=data, shape=[-1, 0, 3, 2], inplace=True)
+
+  # example 1:
+  # attr shape is a list which doesn't contain tensor Variable.
+  data_1 = fluid.layers.data(
+      name='data_1', shape=[2, 4, 6], dtype='float32')
+  reshaped_1 = fluid.layers.reshape(
+      x=data_1, shape=[-1, 0, 3, 2], inplace=True)
+  # the shape of reshaped_1 is [2,4,3,2].
+
+  # example 2:
+  # attr shape is a list which contains tensor Variable.
+  data_2 = fluid.layers.fill_constant([2,25], "int32", 3)
+  dim = fluid.layers.fill_constant([1], "int32", 5)
+  reshaped_2 = fluid.layers.reshape(data_2, shape=[dim, 10])
+  # the shape of reshaped_2 is [5,10].




--- a/doc/fluid/api_cn/layers_cn/resize_bilinear_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/resize_bilinear_cn.rst
@@ -5,8 +5,9 @@ resize_bilinear

 .. py:function:: paddle.fluid.layers.resize_bilinear(input, out_shape=None, scale=None, name=None, actual_shape=None, align_corners=True, align_mode=1)

+**注意:** 参数 ``actual_shape`` 将被弃用，请使用 ``out_shape`` 替代。

-根据指定的out_shape执行双线性插值调整输入大小，输出形状按优先级由actual_shape、out_shape和scale指定。
+该OP应用双向性插值法调整输入图片的大小，输出形状按优先级由actual_shape、out_shape和scale指定。

 双线性插值是对线性插值的扩展,即二维变量方向上(如h方向和w方向)插值。关键思想是先在一个方向上执行线性插值，然后再在另一个方向上执行线性插值。

@@ -51,16 +52,16 @@ align_corners和align_mode是可选参数，插值的计算方法可以由它们


 参数:
-    - **input** (Variable) - 输入为4d张量。
-    - **out_shape** (list|tuple|Variable|None) - 调整双线性层的输出形状，形式为(out_h, out_w)。默认值：None。
-    - **scale** (float|None) - 用于输入高度或宽度的乘数因子。out_shape和scale至少要设置一个。out_shape的优先级高于scale。默认值：None。
-    - **name** (str|None) - 输出变量名。
-    - **actual_shape** (Variable) - 可选输入，用于动态指定输出形状。如果指定actual_shape，图像将根据给定的形状调整大小，而不是根据指定形状的 :code:`out_shape` 和 :code:`scale` 进行调整。也就是说， :code:`actual_shape` 具有最高的优先级。如果希望动态指定输出形状，建议使用 :code:`actual_shape` 而不是 :code:`out_shape` 。在使用actual_shape指定输出形状时，还需要设置out_shape和scale之一，否则在图形构建阶段会出现错误。默认值:None
-    - **align_corners** （bool）- 一个可选的bool型参数，如果为True，则将输入和输出张量的4个角落像素的中心对齐，并保留角点像素的值。 默认值：True
-    - **align_mode** （int）- 双线性插值的可选项。 可以是'0'代表src_idx = scale *（dst_indx + 0.5）-0.5；可以为'1' ，代表src_idx = scale * dst_index。
+    - **input** (Variable) - 输入维度为[num_batches, channels, in_h, in_w]的4-D Tensor。
+    - **out_shape** (list|tuple|Variable|None) - 双线性层的输出形状，维度为[out_h, out_w]的二维Tensor。如果 :code:`out_shape` 是列表，每一个元素可以是整数或者维度为[1]的变量。如果 :code:`out_shape` 是变量，则其维度大小为1。默认值为None。
+    - **scale** (float|Variable|None) - 用于输入高度或宽度的乘数因子。out_shape和scale至少要设置一个。out_shape的优先级高于scale。默认值为None。
+    - **name** (str|None) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` 。默认值为None。
+    - **actual_shape** (Variable) - 可选输入，用于动态指定输出形状。如果指定actual_shape，图像将根据给定的形状调整大小，而不是根据指定形状的 :code:`out_shape` 和 :code:`scale` 进行调整。也就是说， :code:`actual_shape` 具有最高的优先级。如果希望动态指定输出形状，建议使用 :code:`out_shape` , 因为 :code:`out_shape` 未来将被弃用。在使用actual_shape指定输出形状时，还需要设置out_shape和scale之一，否则在图形构建阶段会出现错误。默认值为None。
+    - **align_corners** （bool）- 一个可选的bool型参数，如果为True，则将输入和输出张量的4个角落像素的中心对齐，并保留角点像素的值。 默认值为True
+    - **align_mode** （int）- 双线性插值的可选项。 可以是'0'代表src_idx = scale *（dst_indx + 0.5）-0.5；如果为'1' ，代表src_idx = scale * dst_index。


-返回： 4D张量，shape为(num_batches, channels, out_h, out_w)
+返回：维度为[num_batches, channels, out_h, out_w]的4-D Tensor。


 **代码示例**
@@ -69,12 +70,27 @@ align_corners和align_mode是可选参数，插值的计算方法可以由它们
  
  import paddle.fluid as fluid
  input = fluid.layers.data(name="input", shape=[3,6,9], dtype="float32")
-  out = fluid.layers.resize_bilinear(input, out_shape=[12, 12])
-
-
-
-
-
-
-
-
+  # input.shape = [-1, 3, 6, 9], where -1 indicates batch size, and it will get the exact value in runtime.
+
+  out0 = fluid.layers.resize_bilinear(input, out_shape=[12, 12])
+  # out0.shape = [-1, 3, 12, 12], it means out0.shape[0] = input.shape[0] in runtime.
+
+  # out_shape is a list in which each element is a integer or a tensor Variable
+  dim1 = fluid.layers.data(name="dim1", shape=[1], dtype="int32", append_batch_size=False)
+  out1 = fluid.layers.resize_bilinear(input, out_shape=[12, dim1])
+  # out1.shape = [-1, 3, 12, -1]
+
+  # out_shape is a 1-D tensor Variable
+  shape_tensor = fluid.layers.data(name="shape_tensor", shape=[2], dtype="int32", append_batch_size=False)
+  out2 = fluid.layers.resize_bilinear(input, out_shape=shape_tensor)
+  # out2.shape = [-1, 3, -1, -1]
+
+  # when use actual_shape
+  actual_shape_tensor = fluid.layers.data(name="actual_shape_tensor", shape=[2], dtype="int32", append_batch_size=False)
+  out3 = fluid.layers.resize_bilinear(input, out_shape=[4, 4], actual_shape=actual_shape_tensor)
+  # out3.shape = [-1, 3, 4, 4]
+
+  # scale is a Variable
+  scale_tensor = fluid.layers.data(name="scale", shape=[1], dtype="float32", append_batch_size=False)
+  out4 = fluid.layers.resize_bilinear(input, scale=scale_tensor)
+  # out4.shape = [-1, 3, -1, -1]
--- a/doc/fluid/api_cn/layers_cn/resize_nearest_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/resize_nearest_cn.rst
--- a/doc/fluid/api_cn/layers_cn/resize_trilinear_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/resize_trilinear_cn.rst
@@ -5,6 +5,8 @@ resize_trilinear

 .. py:function:: paddle.fluid.layers.resize_trilinear(input, out_shape=None, scale=None, name=None, actual_shape=None, align_corners=True, align_mode=1)

+**注意:** 参数 ``actual_shape`` 将被弃用，请使用 ``out_shape`` 替代。
+
 该层对输入进行放缩，基于给定的由 ``actual_shape`` , ``out_shape`` , ``scale`` 确定的输出shape，进行三线插值。三线插值是包含三个参数的线性插值方程（D方向，H方向， W方向）,在一个3D格子上进行三个方向的线性插值。更多细节，请参考维基百科：https://en.wikipedia.org/wiki/Trilinear_interpolation
 Align_corners和align_mode都是可选参数，可以用来设置插值的计算方法，如下：

@@ -45,11 +47,11 @@ Align_corners和align_mode都是可选参数，可以用来设置插值的计算
                W_out = W_{in} * scale_{factor}

 参数:
-  - **input** (Variable) – 输入的四维张量
-  - **out_shape** (list|tuple|Variable|None) – 调整最近邻层的输出形状，形式为(out_h, out_w)。默认值：None。
+  - **input** (Variable) – 输入是shape为(num_batches, channels, in_d, in_h, in_w)的5-D张量。
+  - **out_shape** (list|tuple|Variable|None) – 调整最近邻层的输出形状，形式为(out_h, out_w)。默认值：None。如果 :code:`out_shape` 是列表，每一个元素可以是整数或者shape为[1]的变量。如果 :code:`out_shape` 是变量，则其维度大小为1。
  - **scale** (float|None) – 输入高、宽的乘法器。 ``out_shape`` 和 ``scale`` 二者至少设置其一。 ``out_shape`` 具有比 ``scale`` 更高的优先级。 默认: None
  - **name** (str|None) – 输出变量的命名
-  - **actual_shape** (Variable) – 可选输入， 动态设置输出张量的形状。 如果提供该值， 图片放缩会依据此形状进行， 而非依据 ``out_shape`` 和 ``scale`` 。 即为， ``actual_shape`` 具有最高的优先级。 如果想动态指明输出形状，推荐使用 ``actual_shape`` 取代 ``out_shape`` 。 当使用 ``actual_shape`` 来指明输出形状， ``out_shape`` 和 ``scale`` 也应该进行设置, 否则在图形生成阶段将会报错。默认: None
+  - **actual_shape** (Variable) – 可选输入， 动态设置输出张量的形状。 如果提供该值， 图片放缩会依据此形状进行， 而非依据 ``out_shape`` 和 ``scale`` 。 即为， ``actual_shape`` 具有最高的优先级。 如果想动态指明输出形状，推荐使用 ``out_shape`` ，因为 ``actual_shape`` 未来将被弃用。 当使用 ``actual_shape`` 来指明输出形状， ``out_shape`` 和 ``scale`` 也应该进行设置, 否则在图形生成阶段将会报错。默认: None
  - **align_corners** （bool）- 一个可选的bool型参数，如果为True，则将输入和输出张量的4个角落像素的中心对齐，并保留角点像素的值。 默认值：True
  - **align_mode** (bool) - (int,默认为'1')，双线性插值选项，src_idx = scale*(dst_index+0.5)-0.5时取'0'，src_idx = scale*dst_index时取'1'。

@@ -61,5 +63,27 @@ Align_corners和align_mode都是可选参数，可以用来设置插值的计算
    
    import paddle.fluid as fluid
    input = fluid.layers.data(name="input", shape=[3,6,9,11], dtype="float32")
-    out = fluid.layers.resize_trilinear(input, out_shape=[12, 12, 12])
-
+    # input.shape = [-1, 3, 6, 9, 11], where -1 indicates batch size, and it will get the exact value in runtime.
+
+    out0 = fluid.layers.resize_trilinear(input, out_shape=[12, 12, 12])
+    # out0.shape = [-1, 3, 12, 12, 12], it means out0.shape[0] = input.shape[0] in runtime.
+
+    # out_shape is a list in which each element is a integer or a tensor Variable
+    dim1 = fluid.layers.data(name="dim1", shape=[1], dtype="int32", append_batch_size=False)
+    out1 = fluid.layers.resize_trilinear(input, out_shape=[12, dim1, 4])
+    # out1.shape = [-1, 3, 12, -1, 4]
+
+    # out_shape is a 1-D tensor Variable
+    shape_tensor = fluid.layers.data(name="shape_tensor", shape=[3], dtype="int32", append_batch_size=False)
+    out2 = fluid.layers.resize_trilinear(input, out_shape=shape_tensor)
+    # out2.shape = [-1, 3, -1, -1, -1]
+
+    # when use actual_shape
+    actual_shape_tensor = fluid.layers.data(name="actual_shape_tensor", shape=[3], dtype="int32", append_batch_size=False)
+    out3 = fluid.layers.resize_trilinear(input, out_shape=[4, 4, 8], actual_shape=actual_shape_tensor)
+    # out3.shape = [-1, 3, 4, 4, 8]
+
+    # scale is a Variable
+    scale_tensor = fluid.layers.data(name="scale", shape=[1], dtype="float32", append_batch_size=False)
+    out4 = fluid.layers.resize_trilinear(input, scale=scale_tensor)
+    # out4.shape = [-1, 3, -1, -1, -1]
--- a/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst
@@ -5,32 +5,30 @@ retinanet_detection_output

 .. py:function:: paddle.fluid.layers.retinanet_detection_output(bboxes, scores, anchors, im_info, score_threshold=0.05, nms_top_k=1000, keep_top_k=100, nms_threshold=0.3, nms_eta=1.0)

-**Retinanet的检测输出层**
+**注意：该OP目前仅支持CPU** 。

-此操作通过执行以下步骤获取检测结果：
+在 `RetinaNet <https://arxiv.org/abs/1708.02002>`_ 中，有多个 `FPN <https://arxiv.org/abs/1612.03144>`_ 层会输出用于分类的预测值和位置回归的预测值，该OP通过执行以下步骤将这些预测值转换成最终的检测结果：

-1. 根据anchor框解码每个FPN级别的最高得分边界框预测。
-2. 合并所有级别的顶级预测并对其应用多级非最大抑制（NMS）以获得最终检测。
+1. 在每个FPN层上，先剔除分类预测值小于score_threshold的anchor，然后按分类预测值从大到小排序，选出排名前nms_top_k的anchor，并将这些anchor与其位置回归的预测值做解码操作得到检测框。
+2. 合并全部FPN层上的检测框，对这些检测框进行非极大值抑制操作（NMS）以获得最终的检测结果。


 参数：
-    - **bboxes**  (List) – 来自多个FPN级别的张量列表。每个元素都是一个三维张量，形状[N，Mi，4]代表Mi边界框的预测位置。N是batch大小，Mi是第i个FPN级别的边界框数，每个边界框有四个坐标值，布局为[xmin，ymin，xmax，ymax]。
-    - **scores**  (List) – 来自多个FPN级别的张量列表。每个元素都是一个三维张量，各张量形状为[N，Mi，C]，代表预测的置信度预测。 N是batch大小，C是类编号（不包括背景），Mi是第i个FPN级别的边界框数。对于每个边界框，总共有C个评分。
-    - **anchors**  (List) – 具有形状[Mi，4]的2-D Tensor表示来自所有FPN级别的Mi anchor框的位置。每个边界框有四个坐标值，布局为[xmin，ymin，xmax，ymax]。
-    - **im_info**  (Variable) – 形状为[N，3]的2-D LoDTensor表示图像信息。 N是batch大小，每个图像信息包括高度，宽度和缩放比例。
-    - **score_threshold**  (float) – 用置信度分数剔除边界框的过滤阈值。
-    - **nms_top_k**  (int) – 根据NMS之前的置信度保留每个FPN层的最大检测数。
-    - **keep_top_k**  (int) – NMS步骤后每个图像要保留的总边界框数。 -1表示在NMS步骤之后保留所有边界框。
-    - **nms_threshold**  (float) – NMS中使用的阈值.
-    - **nms_eta**  (float) – adaptive NMS的参数.
+    - **bboxes**  (List) – 由来自不同FPN层的Tensor组成的列表，表示全部anchor的位置回归预测值。列表中每个元素是一个维度为 :math:`[N, Mi, 4]` 的3-D Tensor，其中，第一维N表示批量训练时批量内的图片数量，第二维Mi表示每张图片第i个FPN层上的anchor数量，第三维4表示每个anchor有四个坐标值。数据类型为float32或float64。
+    - **scores**  (List) – 由来自不同FPN层的Tensor组成的列表，表示全部anchor的分类预测值。列表中每个元素是一个维度为 :math:`[N, Mi, C]` 的3-D Tensor，其中第一维N表示批量训练时批量内的图片数量，第二维Mi表示每张图片第i个FPN层上的anchor数量，第三维C表示类别数量（ **不包括背景类** ）。数据类型为float32或float64。
+    - **anchors**  (List) – 由来自不同FPN层的Tensor组成的列表，表示全部anchor的坐标值。列表中每个元素是一个维度为 :math:`[Mi, 4]` 的2-D Tensor，其中第一维Mi表示第i个FPN层上的anchor数量，第二维4表示每个anchor有四个坐标值（[xmin, ymin, xmax, ymax]）。数据类型为float32或float64。
+    - **im_info**  (Variable) – 维度为 :math:`[N, 3]` 的2-D Tensor，表示输入图片的尺寸信息。 其中，第一维N表示批量训练时各批量内的图片数量，第二维3表示各图片的尺寸信息，分别是网络输入尺寸的高和宽，以及原图缩放至网络输入大小时的缩放比例。数据类型为float32或float64。
+    - **score_threshold**  (float32) – 在NMS步骤之前，用于滤除每个FPN层的检测框的阈值，默认值为0.05。
+    - **nms_top_k**  (int32) – 在NMS步骤之前，保留每个FPN层的检测框的数量，默认值为1000。
+    - **keep_top_k**  (int32) – 在NMS步骤之后，每张图像要保留的检测框数量，默认值为100，若设为-1，则表示保留NMS步骤后剩下的全部检测框。
+    - **nms_threshold**  (float32) – NMS步骤中用于剔除检测框的Intersection-over-Union（IoU）阈值，默认为0.3。
+    - **nms_eta**  (float32) – NMS步骤中用于调整nms_threshold的参数。默认值为1.，表示nms_threshold的取值在NMS步骤中一直保持不变，即其设定值。若nms_eta小于1.，则表示当nms_threshold的取值大于0.5时，每保留一个检测框就调整一次nms_threshold的取值，即nms_threshold = nms_threshold * nms_eta，直到nms_threshold的取值小于等于0.5后结束调整。
+**注意：在模型输入尺寸特别小的情况，此时若用score_threshold滤除anchor，可能会导致没有任何检测框剩余。为避免这种情况出现，该OP不会对最高FPN层上的anchor做滤除。因此，要求bboxes、scores、anchors中最后一个元素是来自最高FPN层的Tensor** 。

+返回：维度是 :math:`[No, 6]` 的2-D LoDTensor，表示批量内的检测结果。第一维No表示批量内的检测框的总数，第二维6表示每行有六个值：[label， score，xmin，ymin，xmax，ymax]。该LoDTensor的LoD中存放了每张图片的检测框数量，第i张图片的检测框数量为 :math:`LoD[i + 1] - LoD[i]` 。如果 :math:`LoD[i + 1] - LoD[i]` 为0，则第i个图像没有检测结果。 如果批量内的全部图像都没有检测结果，则LoD中所有元素被设置为0，LoDTensor被赋为空（None）。


-返回：
-检测输出是具有形状[No，6]的LoDTensor。 每行有六个值：[标签，置信度，xmin，ymin，xmax，ymax]。 No是此mini batch中的检测总数。 对于每个实例，第一维中的偏移称为LoD，偏移值为N + 1，N是batch大小。 第i个图像具有LoD [i + 1]  -  LoD [i]检测结果，如果为0，则第i个图像没有检测到结果。 如果所有图像都没有检测到结果，则LoD将设置为0，输出张量为空（None）。
-
-
-返回类型：变量（Variable）
+返回类型：变量（Variable），数据类型为float32或float64。

 **代码示例**

@@ -38,24 +36,26 @@ retinanet_detection_output

  import paddle.fluid as fluid

-  bboxes = layers.data(name='bboxes', shape=[1, 21, 4],
+  bboxes_low = fluid.layers.data(name='bboxes_low', shape=[1, 44, 4],
+      append_batch_size=False, dtype='float32')
+  bboxes_high = fluid.layers.data(name='bboxes_high', shape=[1, 11, 4],
+  scores_low = fluid.layers.data(name='scores_low', shape=[1, 44, 10],
+      append_batch_size=False, dtype='float32')
+  scores_high = fluid.layers.data(name='scores_high', shape=[1, 11, 10],
      append_batch_size=False, dtype='float32')
-  scores = layers.data(name='scores', shape=[1, 21, 10],
+  anchors_low = fluid.layers.data(name='anchors_low', shape=[44, 4],
      append_batch_size=False, dtype='float32')
-  anchors = layers.data(name='anchors', shape=[21, 4],
+  anchors_high = fluid.layers.data(name='anchors_high', shape=[11, 4],
      append_batch_size=False, dtype='float32')
-  im_info = layers.data(name="im_info", shape=[1, 3],
+  im_info = fluid.layers.data(name="im_info", shape=[1, 3],
      append_batch_size=False, dtype='float32')
  nmsed_outs = fluid.layers.retinanet_detection_output(
-                                          bboxes=[bboxes, bboxes],
-                                          scores=[scores, scores],
-                                          anchors=[anchors, anchors],
+                                          bboxes=[bboxes_low, bboxes_high],
+                                          scores=[scores_low, scores_high],
+                                          anchors=[anchors_low, anchors_high],
                                          im_info=im_info,
                                          score_threshold=0.05,
                                          nms_top_k=1000,
                                          keep_top_k=100,
-                                          nms_threshold=0.3,
+                                          nms_threshold=0.45,
                                          nms_eta=1.)
-
-
-
--- a/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst
@@ -5,70 +5,71 @@ retinanet_target_assign

 .. py:function:: paddle.fluid.layers.retinanet_target_assign(bbox_pred, cls_logits, anchor_box, anchor_var, gt_boxes, gt_labels, is_crowd, im_info, num_classes=1, positive_overlap=0.5, negative_overlap=0.4)

-**Retinanet的目标分配层**
+**注意：该OP目前仅支持CPU** 。

-对于给定anchors和真实(ground-truth)框之间的Intersection-over-Union（IoU）重叠，该层可以为每个anchor分配分类和回归目标，同时这些目标标签用于训练Retinanet。每个anchor都分配有长度为num_classes的一个one-hot分类目标向量，以及一个4向量的框回归目标。分配规则如下：
+该OP是从输入anchor中找出训练检测模型 `RetinaNet <https://arxiv.org/abs/1708.02002>`_ 所需的正负样本，并为每个正负样本分配用于分类的目标值和位置回归的目标值，同时从全部anchor的类别预测值cls_logits、位置预测值bbox_pred中取出属于各正负样本的部分。

-1.在以下情况下，anchor被分配到真实框：
-（i）它与真实框具有最高的IoU重叠，或者（ii）与任何真实框具有高于positive_overlap（0.5）的IoU重叠。
+正负样本的查找准则如下：
+    - 若anchor与某个真值框之间的Intersection-over-Union（IoU）大于其他anchor与该真值框的IoU，则该anchor是正样本，且被分配给该真值框；
+    - 若anchor与某个真值框之间的IoU大于等于positive_overlap，则该anchor是正样本，且被分配给该真值框；
+    - 若anchor与某个真值框之间的IoU介于[0, negative_overlap)，则该anchor是负样本；
+    - 不满足以上准则的anchor不参与模型训练。

-2.对于所有真实框，当其IoU比率低于negative_overlap（0.4）时，将anchor点分配给背景。
-
-当为锚点分配了第i个类别的真实框时，其C向量目标中的第i项设置为1，所有其他条目设置为0.当anchor被分配支背景时，所有项都设置为0。未被分配的锚点不会影响训练目标。回归目标是与指定anchor相关联的已编码真实框。
+在RetinaNet中，对于每个anchor，模型都会预测一个C维的向量用于分类，和一个4维的向量用于位置回归，因此各正负样本的分类目标值也是一个C维向量，各正样本的位置回归目标值也是一个4维向量。对于正样本而言，若其被分配的真值框的类别是i，则其分类目标值的第i-1维为1，其余维度为0；其位置回归的目标值由anchor和真值框之间位置差值计算得到。对于负样本而言，其分类目标值的所有维度都为0，因负样本不参与位置回归的训练，故负样本无位置回归的目标值。

+分配结束后，从全部anchor的类别预测值cls_logits中取出属于各正负样本的部分，从针对全部anchor的位置预测值bbox_pred中取出属于各正样本的部分。


 参数：
-    - **bbox_pred**  (Variable) – 具有形状[N，M，4]的3-D张量表示M个边界框(bounding box)的预测位置。 N是batch大小，每个边界框有四个坐标值，为[xmin，ymin，xmax，ymax]。
-    - **cls_logits**  (Variable) – 具有形状[N，M，C]的3-D张量，表示预测的置信度。 N是batch大小，C是类别的数量（不包括背景），M是边界框的数量。
-    - **anchor_box**  (Variable) – 具有形状[M，4]的2-D张量，存有M个框，每个框表示为[xmin，ymin，xmax，ymax]，[xmin，ymin]是anchor的左上顶部坐标，如果输入是图像特征图，则它们接近坐标系的原点。 [xmax，ymax]是anchor的右下坐标。
-    - **anchor_var**  (Variable) – 具有形状[M，4]的2-D张量，存有anchor的扩展方差。
-    - **gt_boxes**  (Variable) – 真实框是具有形状[Ng，4]的2D LoDTensor，Ng是mini batch中真实框的总数。
-    - **gt_labels**  (variable) – 真实值标签是具有形状[Ng，1]的2D LoDTensor，Ng是mini batch输入真实值标签的总数。
-    - **is_crowd**  (Variable) – 1-D LoDTensor，标志真实值是聚群。
-    - **im_info**  (Variable) – 具有形状[N，3]的2-D LoDTensor。 N是batch大小，3分别为高度，宽度和比例。
-    - **num_classes**  (int32) – 种类数量。
-    - **positive_overlap**  (float) – 判定（anchor，gt框）对是一个正例的anchor和真实框之间最小重叠阀值。
-    - **negative_overlap**  (float) – （锚点，gt框）对是负例时anchor和真实框之间允许的最大重叠阈值。
+    - **bbox_pred**  (Variable) – 维度为 :math:`[N, M, 4]` 的3-D Tensor，表示全部anchor的位置回归预测值。其中，第一维N表示批量训练时批量内的图片数量，第二维M表示每张图片的全部anchor的数量，第三维4表示每个anchor有四个坐标值。数据类型为float32或float64。
+    - **cls_logits**  (Variable) – 维度为 :math:`[N, M, C]` 的3-D Tensor，表示全部anchor的分类预测值。 其中，第一维N表示批量训练时批量内的图片数量，第二维M表示每张图片的全部anchor的数量，第三维C表示每个anchor需预测的类别数量（ **注意：不包括背景** ）。数据类型为float32或float64。
+
+    - **anchor_box**  (Variable) – 维度为 :math:`[M, 4]` 的2-D Tensor，表示全部anchor的坐标值。其中，第一维M表示每张图片的全部anchor的数量，第二维4表示每个anchor有四个坐标值 :math:`[xmin, ymin, xmax, ymax]` ，:math:`[xmin, ymin]` 是anchor的左上顶部坐标，:math:`[xmax, ymax]` 是anchor的右下坐标。数据类型为float32或float64。anchor_box的生成请参考OP `anchor_generate <https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/api/layers/detection.html#anchor-generator>`_ 。
+    - **anchor_var**  (Variable) – 维度为 :math:`[M, 4]` 的2-D Tensor，表示在后续计算损失函数时anchor坐标值的缩放比例。其中，第一维M表示每张图片的全部anchor的数量，第二维4表示每个anchor有四个坐标缩放因子。数据类型为float32或float64。anchor_var的生成请参考OP `anchor_generate <https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/api/layers/detection.html#anchor-generator>`_ 。
+    - **gt_boxes**  (Variable) – 维度为 :math:`[G, 4]` 且LoD level必须为1的2-D LoDTensor，表示批量训练时批量内的真值框位置。其中，第一维G表示批量内真值框的总数，第二维表示每个真值框有四个坐标值。数据类型为float32或float64。
+    - **gt_labels**  (variable) – 维度为 :math:`[G, 1]` 且LoD level必须为1的2-D LoDTensor，表示批量训练时批量内的真值框类别，数值范围为 :math:`[1, C]` 。其中，第一维G表示批量内真值框的总数，第二维表示每个真值框只有1个类别。数据类型为int32。
+    - **is_crowd**  (Variable) – 维度为 :math:`[G]` 且LoD level必须为1的1-D LoDTensor，表示各真值框是否位于重叠区域，值为1表示重叠，则不参与训练。第一维G表示批量内真值框的总数。数据类型为int32。
+    - **im_info**  (Variable) – 维度为 :math:`[N, 3]` 的2-D Tensor，表示输入图片的尺寸信息。其中，第一维N表示批量训练时批量内的图片数量，第二维3表示各图片的尺寸信息，分别是网络输入尺寸的高和宽，以及原图缩放至网络输入尺寸的缩放比例。数据类型为float32或float64。
+    - **num_classes**  (int32) – 分类的类别数量，默认值为1。
+    - **positive_overlap**  (float32) – 判定anchor是一个正样本时anchor和真值框之间的最小IoU，默认值为0.5。
+    - **negative_overlap**  (float32) – 判定anchor是一个负样本时anchor和真值框之间的最大IoU，默认值为0.4。该参数的设定值应小于等于positive_overlap的设定值，若大于，则positive_overlap的取值为negative_overlap的设定值。


 返回：
-返回元组（predict_scores，predict_location，target_label，target_bbox，bbox_inside_weight，fg_num）。 predict_scores和predict_location是Retinanet的预测结果。target_label和target_bbox为真实值。 predict_location是形为[F，4]的2D张量，target_bbox的形状与predict_location的形状相同，F是前景anchor的数量。 predict_scores是具有形状[F + B，C]的2D张量，target_label的形状是[F + B，1]，B是背景anchor的数量，F和B取决于此算子的输入。 Bbox_inside_weight标志预测位置是否为假前景，形状为[F，4]。 Fg_num是focal loss所需的前景数（包括假前景）。
+    - **predict_scores** (Variable) – 维度为 :math:`[F + B, C]` 的2-D Tensor，表示正负样本的分类预测值。其中，第一维F为批量内正样本的数量，B为批量内负样本的数量，第二维C为分类的类别数量。数据类型为float32或float64。
+    - **predict_location** (Variable) — 维度为 :math:`[F, 4]` 的2-D Tensor，表示正样本的位置回归预测值。其中，第一维F为批量内正样本的数量，第二维4表示每个样本有4个坐标值。数据类型为float32或float64。
+    - **target_label** (Variable) — 维度为 :math:`[F + B, 1]` 的2-D Tensor，表示正负样本的分类目标值。其中，第一维F为正样本的数量，B为负样本的数量，第二维1表示每个样本的真值类别只有1类。数据类型为int32。
+    - **target_bbox** (Variable) — 维度为 :math:`[F, 4]` 的2-D Tensor，表示正样本的位置回归目标值。其中，第一维F为正样本的数量，第二维4表示每个样本有4个坐标值。数据类型为float32或float64。
+    - **bbox_inside_weight** (Variable) — 维度为 :math:`[F, 4]` 的2-D LoDTensor，表示位置回归预测值中是否属于假正样本，若某个正样本为假，则bbox_inside_weight中对应维度的值为0，否则为1。第一维F为正样本的数量，第二维4表示每个样本有4个坐标值。数据类型为float32或float64。
+    - **fg_num** (Variable) — 维度为 :math:`[N, 1]` 的2-D Tensor，表示正样本的数量。其中，第一维N表示批量内的图片数量。 **注意：由于正样本数量会用作后续损失函数的分母，为避免出现除以0的情况，该OP已将每张图片的正样本数量做加1操作** 。数据类型为int32。


-返回类型：tuple
+返回类型：元组(tuple)，元组中的元素predict_scores，predict_location，target_label，target_bbox，bbox_inside_weight，fg_num都是Variable。
+

 **代码示例**

 .. code-block:: python

    import paddle.fluid as fluid
-    bbox_pred = layers.data(name='bbox_pred', shape=[1, 100, 4],
+    import numpy as np
+ 
+    bbox_pred = fluid.layers.data(name='bbox_pred', shape=[1, 100, 4],
                      append_batch_size=False, dtype='float32')
-    cls_logits = layers.data(name='cls_logits', shape=[1, 100, 10],
+    cls_logits = fluid.layers.data(name='cls_logits', shape=[1, 100, 10],
                      append_batch_size=False, dtype='float32')
-    anchor_box = layers.data(name='anchor_box', shape=[100, 4],
+    anchor_box = fluid.layers.data(name='anchor_box', shape=[100, 4],
                      append_batch_size=False, dtype='float32')
-    anchor_var = layers.data(name='anchor_var', shape=[100, 4],
+    anchor_var = fluid.layers.data(name='anchor_var', shape=[100, 4],
                      append_batch_size=False, dtype='float32')
-    gt_boxes = layers.data(name='gt_boxes', shape=[10, 4],
+    gt_boxes = fluid.layers.data(name='gt_boxes', shape=[10, 4],
                      append_batch_size=False, dtype='float32')
-    gt_labels = layers.data(name='gt_labels', shape=[10, 1],
+    gt_labels = fluid.layers.data(name='gt_labels', shape=[10, 1],
                      append_batch_size=False, dtype='float32')
    is_crowd = fluid.layers.data(name='is_crowd', shape=[1],
                      append_batch_size=False, dtype='float32')
-    im_info = fluid.layers.data(name='im_infoss', shape=[1, 3],
+    im_info = fluid.layers.data(name='im_info', shape=[1, 3],
                      append_batch_size=False, dtype='float32')
    loc_pred, score_pred, loc_target, score_target, bbox_inside_weight, fg_num =
          fluid.layers.retinanet_target_assign(bbox_pred, cls_logits, anchor_box,
          anchor_var, gt_boxes, gt_labels, is_crowd, im_info, 10)
-
-
-
-
-
-
-
-
-
-
--- a/doc/fluid/api_cn/layers_cn/rsqrt_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/rsqrt_cn.rst
@@ -5,20 +5,23 @@ rsqrt

 .. py:function:: paddle.fluid.layers.rsqrt(x, name=None)

-rsqrt激活函数
+该OP为rsqrt激活函数。

-请确保输入合法以免出现数字错误。
+注：输入x应确保为非 **0** 值，否则程序会抛异常退出。
+
+其运算公式如下：

 .. math::
    out = \frac{1}{\sqrt{x}}


 参数:
+    - **x** (Variable) – 输入是多维Tensor或LoDTensor，数据类型可以是float32和float64。 
+    - **name** (str，可选）— 这一层的名称（可选）。如果设置为None，则将自动命名这一层。默认值为None。

-    - **x** - rsqrt算子的输入 
-    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
+返回：对输入x进行rsqrt激活函数计算后的Tensor或LoDTensor，数据shape和输入x的shape一致。

-返回：     rsqrt运算输出
+返回类型：Variable，数据类型和输入数据类型一致。

 **代码示例**：

@@ -28,5 +31,3 @@ rsqrt激活函数
        data = fluid.layers.data(name="input", shape=[32, 784])
        result = fluid.layers.rsqrt(data)

-
-
--- a/doc/fluid/api_cn/layers_cn/sampling_id_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sampling_id_cn.rst
@@ -5,18 +5,18 @@ sampling_id

 .. py:function:: paddle.fluid.layers.sampling_id(x, min=0.0, max=1.0, seed=0, dtype='float32')

-sampling_id算子。用于从输入的多项分布中对id进行采样的图层。为一个样本采样一个id。
+该OP从输入的多项分布中进行采样。

 参数：
-        - **x** （Variable）- softmax的输入张量（Tensor）。2-D形状[batch_size，input_feature_dimensions]
-        - **min** （Float）- 随机的最小值。（浮点数，默认为0.0）
-        - **max** （Float）- 随机的最大值。（float，默认1.0）
-        - **seed** （Float）- 用于随机数引擎的随机种子。0表示使用系统生成的种子。请注意，如果seed不为0，则此算子将始终每次生成相同的随机数。（int，默认为0）
-        - **dtype** （np.dtype | core.VarDesc.VarType | str）- 输出数据的类型为float32，float_16，int等。
+        - **x** （Variable）- 输入Tensor。一个形如[batch_size，input_feature_dimensions]的2-D Tensor。
+        - **min** （Float）- 随机的最小值。默认值为为0.0。
+        - **max** （Float）- 随机的最大值。默认值为1.0。
+        - **seed** （int）- 随机种子。0表示使用系统生成的种子。请注意，如果seed不为0，则此算子将始终每次生成相同的随机数。默认值为0
+        - **dtype** （np.dtype | core.VarDesc.VarType | str）- 指定输出数据的类型。

-返回：       Id采样的数据张量。
+返回：采样的数据Tensor

-返回类型：        输出（Variable）。
+返回类型：Variable


 **代码示例：**

--- a/doc/fluid/api_cn/layers_cn/shuffle_channel_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/shuffle_channel_cn.rst
@@ -5,9 +5,7 @@ shuffle_channel

 .. py:function:: paddle.fluid.layers.shuffle_channel(x, group, name=None)

-**Shuffle Channel 运算（通道重排运算）**
-
-该算子将输入 ``x`` 的通道混洗重排。 它将每个组中的输入通道分成 ``group`` 个子组，并通过逐个从每个子组中选择元素来获得新的顺序。
+该OP将输入 ``x`` 的通道混洗重排。 它将每个组中的输入通道分成 ``group`` 个子组，并通过逐一从每个子组中选择元素来获得新的顺序。

 请参阅 https://arxiv.org/pdf/1707.01083.pdf

@@ -45,12 +43,12 @@ shuffle_channel
                  [0.8, 0.9]]]]

 参数：
-  - **x** (Variable) – 输入张量变量。 应是形状为[N，C，H，W]的4-D张量
+  - **x** (Variable) – 输入Tensor。 维度为[N，C，H，W]的4-D Tensor。
  - **group** (int) – 表示子组的数目，它应该整除通道数。

-返回：通道混洗结果是一个张量变量，其形状和类型与输入相同。
+返回：一个形状和类型与输入相同的Tensor。

-返回类型：输出（Variable）
+返回类型：Variable


 **代码示例：**

--- a/doc/fluid/api_cn/layers_cn/sigmoid_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sigmoid_cn.rst
@@ -11,22 +11,35 @@ sigmoid激活函数
    out = \frac{1}{1 + e^{-x}}


-参数:
+参数：

-    - **x** - Sigmoid算子的输入
-    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
+    - **x** (Tensor|LoDTensor)- 数据类型为float32，float64。激活函数的输入值。
+    - **name** (str|None) - 该层名称（可选）。若为空，则自动为该层命名。默认：None

-返回：     Sigmoid运算输出.
+返回：激活函数的输出值
+
+返回类型：Variable（Tensor），数据类型为float32的Tensor。

 **代码示例**：

 .. code-block:: python

        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[32, 784])
-        result = fluid.layers.sigmoid(data)
-
+        import numpy as np

+        data = fluid.layers.data(name="input", shape=[-1, 3])
+        result = fluid.layers.sigmoid(data)
+        place = fluid.CPUPlace()
+        exe = fluid.Executor(place)
+        exe.run(fluid.default_startup_program())
+        x = np.random.rand(3, 3)
+        output= exe.run(feed={"input": x},
+                         fetch_list=[result[0]])
+        print(output)
+        """
+        output:
+        [array([0.50797188, 0.71353652, 0.5452265 ])]
+        """




--- a/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst
@@ -5,35 +5,36 @@ sigmoid_focal_loss

 .. py:function:: paddle.fluid.layers.sigmoid_focal_loss(x, label, fg_num, gamma=2, alpha=0.25)

-**Sigmoid Focal loss损失计算**
+`Focal Loss <https://arxiv.org/abs/1708.02002>`_ 被提出用于解决计算机视觉任务中前景-背景不平衡的问题。该OP先计算输入x中每个元素的sigmoid值，然后计算sigmoid值与类别目标值label之间的Focal Loss。

-focal损失用于解决在one-stage探测器的训练阶段存在的前景 - 背景类不平衡问题。 此运算符计算输入张量中每个元素的sigmoid值，然后计算focal损失。
-
-focal损失计算过程：
+Focal Loss的计算过程如下：

 .. math::

-  loss_j = (-label_j * alpha * {(1 - \sigma(x_j))}^{gamma} * \log(\sigma(x_j)) -
-  (1 - labels_j) * (1 - alpha) * {(\sigma(x_j)}^{ gamma} * \log(1 - \sigma(x_j)))
-  / fg\_num, j = 1,...,K
+  \mathop{loss_{i,\,j}}\limits_{i\in\mathbb{[0,\,N-1]},\,j\in\mathbb{[0,\,C-1]}}=\left\{
+  \begin{array}{rcl}
+  - \frac{1}{fg\_num} * \alpha * {(1 - \sigma(x_{i,\,j}))}^{\gamma} * \log(\sigma(x_{i,\,j})) & & {(j +1) = label_{i,\,0}}\\
+  - \frac{1}{fg\_num} * (1 - \alpha) * {\sigma(x_{i,\,j})}^{ \gamma} * \log(1 - \sigma(x_{i,\,j})) & & {(j +1)!= label_{i,\,0}}
+  \end{array} \right.

 其中，已知：

 .. math::

-  \sigma(x_j) = \frac{1}{1 + \exp(-x_j)}
+  \sigma(x_{i,\,j}) = \frac{1}{1 + \exp(-x_{i,\,j})}
+

 参数：
-    - **x**  (Variable) – 具有形状[N，D]的2-D张量，其中N是batch大小，D是类的数量（不包括背景）。 此输入是由前一个运算符计算出的logits张量。
-    - **label**  (Variable) – 形状为[N，1]的二维张量，是所有可能的标签。
-    - **fg_num**  (Variable) – 具有形状[1]的1-D张量，是前景的数量。
-    - **gamma**  (float) –  用于平衡简单和复杂实例的超参数。 默认值设置为2.0。
-    - **alpha**  (float) – 用于平衡正面和负面实例的超参数。 默认值设置为0.25。
+    - **x**  (Variable) – 维度为 :math:`[N, D]` 的2-D Tensor，表示全部样本的分类预测值。其中，第一维N是批量内参与训练的样本数量，例如在目标检测中，样本为框级别，N为批量内所有图像的正负样本的数量总和；在图像分类中，样本为图像级别，N为批量内的图像数量总和。第二维D是类别数量（ **不包括背景类** ）。数据类型为float32或float64。
+    - **label**  (Variable) – 维度为 :math:`[N, 1]` 的2-D Tensor，表示全部样本的分类目标值。其中，第一维N是批量内参与训练的样本数量，第二维1表示每个样本只有一个类别目标值。正样本的目标类别值的取值范围是 :math:`[1, D]` , 负样本的目标类别值是0。数据类型为int32。
+    - **fg_num**  (Variable) – 维度为 :math:`[1]` 的1-D Tensor，表示批量内正样本的数量，需在进入此OP前获取正样本的数量。数据类型为int32。
+    - **gamma**  (float) –  用于平衡易分样本和难分样本的超参数， 默认值设置为2.0。
+    - **alpha**  (float) – 用于平衡正样本和负样本的超参数，默认值设置为0.25。


-返回：  具有形状[N，D]的2-D张量，即focal损失。
+返回：  输入x中每个元素的Focal loss，即维度为 :math:`[N, D]` 的2-D Tensor。

-返回类型： out(Variable)
+返回类型： 变量（Variable），数据类型为float32或float64。

 **代码示例**

@@ -53,7 +54,3 @@ focal损失计算过程：
                                           fg_num=fg_num,
                                           gamma=2.,
                                           alpha=0.25)
-
-
-
-
--- a/doc/fluid/api_cn/layers_cn/sign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sign_cn.rst
@@ -5,14 +5,14 @@ sign

 .. py:function:: paddle.fluid.layers.sign(x)

-此函数返回x中每个元素的正负号：1代表正，-1代表负，0代表零。
+此OP对输入x中每个元素进行正负判断，并且输出正负判断值：1代表正，-1代表负，0代表零。

 参数：
-    - **x** (Variable|numpy.ndarray) – 输入张量。
+    - **x** (Variable|numpy.ndarray) – 进行正负值判断的多维Tensor或者是多维的numpy数组，数据类型为 float32，float64。

-返回：输出正负号张量，和x有着相同的形状和数据类型。
+返回：输出正负号Tensor，数据的shape大小和输入x的数据shape一致。

-返回类型：Variable
+返回类型：Variable，数据类型和输入数据类型一致。

 **代码示例**

@@ -21,10 +21,6 @@ sign
    import paddle.fluid as fluid
    import numpy as np

-    # [1, 0, -1]
-    data = fluid.layers.sign(np.array([3, 0, -2], dtype='int32'))
-
-
-
-
+    data = fluid.layers.sign(np.array([3.0, 0.0, -2.0], dtype='float32'))
+    # data=[1.0, 0.0, -1.0]

--- a/doc/fluid/api_cn/layers_cn/slice_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/slice_cn.rst
@@ -31,8 +31,8 @@ slice算子。
 参数：
        - **input** （Variable）- 提取切片的数据张量（Tensor）。
        - **axes** （List）- （list <int>）开始和结束的轴适用于。它是可选的。如果不存在，将被视为[0,1，...，len（starts）- 1]。
-        - **starts** （List）- （list <int>）在轴上开始相应轴的索引。
-        - **ends** （List）- （list <int>）在轴上结束相应轴的索引。
+        - **starts** （List|Variable）- （list <int>）在轴上开始相应轴的索引。
+        - **ends** （List|Variable）- （list <int>）在轴上结束相应轴的索引。

 返回：        切片数据张量（Tensor）.

@@ -45,15 +45,23 @@ slice算子。

    import paddle.fluid as fluid

-    starts = [1, 0, 2]
-    ends = [3, 3, 4]
-    axes = [0, 1, 2]
-
    input = fluid.layers.data(
        name="input", shape=[3, 4, 5, 6], dtype='float32')

-    out = fluid.layers.slice(input, axes=axes, starts=starts, ends=ends)
+    # example 1:
+    # attr starts is a list which doesn't contain tensor Variable.
+    axes = [0, 1, 2]
+    starts = [-3, 0, 2]
+    ends = [3, 2, 4]
+    sliced_1 = fluid.layers.slice(input, axes=axes, starts=starts, ends=ends)
+    # sliced_1 is input[:, 0:3, 0:2, 2:4].
+

+    # example 2:
+    # attr starts is a list which contain tensor Variable.
+    minus_3 = fluid.layers.fill_constant([1], "int32", -3)
+    sliced_2 = fluid.layers.slice(input, axes=axes, starts=[minus_3, 0, 2], ends=ends)
+    # sliced_2 is input[:, 0:3, 0:2, 2:4].




--- a/doc/fluid/api_cn/layers_cn/smooth_l1_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/smooth_l1_cn.rst
@@ -9,26 +9,39 @@ smooth_l1


 参数:
-        - **x** (Variable) - rank至少为2的张量。输入x的smmoth L1 loss 的op，shape为[batch_size, dim1，…],dimN]。
-        - **y** (Variable) - rank至少为2的张量。与 ``x`` 形状一致的的smooth L1 loss  op目标值。
-        - **inside_weight** (Variable|None) - rank至少为2的张量。这个输入是可选的，与x的形状应该相同。如果给定， ``(x - y)`` 的结果将乘以这个张量元素。
-        - **outside_weight** (变量|None) - 一个rank至少为2的张量。这个输入是可选的，它的形状应该与 ``x`` 相同。如果给定，那么 smooth L1 loss 就会乘以这个张量元素。
-        - **sigma** (float|None) - smooth L1 loss layer的超参数。标量，默认值为1.0。
+        - **x** (Tensor|LoDTensor) - 数据类型为float32，rank至少为2的张量。smooth L1损失函数的输入，shape为[batch_size, dim1，…，dimN]。
+        - **y** (Tensor|LoDTensor) - 数据类型为float32，rank至少为2的张量。与 ``x`` shape相同的目标值。
+        - **inside_weight** (Tensor|None) - 数据类型为float32，rank至少为2的张量。这个输入是可选的，与x的shape应该相同。如果给定， ``(x - y)`` 的结果将乘以这个张量元素。
+        - **outside_weight** (Tensor|None) - 数据类型为float32，一个rank至少为2的张量。这个输入是可选的，它的shape应该与 ``x`` 相同。 smooth L1 loss的输出会乘以这个张量。
+        - **sigma** (float|NoneType) - smooth L1 loss layer的超参数。标量，默认值为1.0。

-返回： smooth L1 loss, shape为 [batch_size, 1]
+返回： smooth L1损失的输出值, shape为 [batch_size, 1]

-返回类型:  Variable
+返回类型：Variable（Tensor），数据类型为float32的Tensor。

 **代码示例**

 ..  code-block:: python
-
+    
    import paddle.fluid as fluid
-    data = fluid.layers.data(name='data', shape=[128], dtype='float32')
-    label = fluid.layers.data(
-        name='label', shape=[100], dtype='float32')
-    fc = fluid.layers.fc(input=data, size=100)
-    out = fluid.layers.smooth_l1(x=fc, y=label)
+    import numpy as np
+    data = fluid.layers.data(name="x", shape=[-1, 3], dtype="float32")
+    label = fluid.layers.data(name="y", shape=[-1, 3], dtype="float32")
+    result = fluid.layers.smooth_l1(data,label)
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    x = np.random.rand(3,3).astype("float32")
+    y = np.random.rand(3,3).astype("float32")
+    output= exe.run(feed={"x":x, "y":y},
+                     fetch_list=[result])
+    print(output)
+    """
+    output:
+    [array([[0.08220536],
+           [0.36652038],
+           [0.20541131]], dtype=float32)]
+    """




--- a/doc/fluid/api_cn/layers_cn/softmax_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/softmax_cn.rst
@@ -5,41 +5,50 @@ softmax

 .. py:function:: paddle.fluid.layers.softmax(input, use_cudnn=False, name=None, axis=-1)

-softmax操作符的输入是任意阶的张量，输出张量和输入张量的维度相同。
+softmax输出张量和输入张量的维度相同。

-输入变量的 ``axis`` 维会被排列到最后一维。然后逻辑上将输入张量压平至二维矩阵。矩阵的第二维（行数）和输入张量的 ``axis`` 维相同。第一维（列数）
-是输入张量除最后一维之外的所有维长度乘积。对矩阵的每一行来说,softmax操作将含有任意实数值的K维向量(K是矩阵的宽度,也就是输入张量 ``axis`` 维度的大小)压缩成K维含有取值为[0,1]中实数的向量，并且这些值和为1。
+输入变量的 ``axis`` 维会被排列到最后一维。在OP底层计算上将输入张量resize成二维矩阵。矩阵的第二维（行数）和输入张量的 ``axis`` 维相同。第一维（列数）
+是输入张量除最后一维之外的所有维长度乘积。对矩阵的每一行来说,softmax函数将含有任意实数值的K维向量(K是矩阵的宽度,也就是输入张量 ``axis`` 维度的大小),resize成K维含有取值为[0,1]的向量，并且这些值和为1。

-
-softmax操作符计算k维向量输入中所有其他维的指数和指数值的累加和。维的指数比例和所有其他维的指数值之和作为softmax操作符的输出。
+softmax函数计算k维向量输入中所有其他维的指数指数值的累加和。该维的指数比例和所有其他维的指数值之和作为softmax函数的输出。

 对矩阵中的每行i和每列j有：

 .. math::

+
    Out[i,j] = \frac{exp(X[i,j])}{\sum_j exp(X[i,j])}

 参数：
-    - **input** (Variable) - 输入变量
-    - **use_cudnn** (bool) - 是否用cudnn核，只有在cudnn库安装时有效。为了数学稳定性，默认该项为False。
+    - **input** (Tensor|LoDTensor)- 数据类型为float32，float64。激活函数softmax的输入。
+    - **use_cudnn** (bool) - 是否使用cudnn，只有cudnn库安装时该参数才有效。为了底层数学计算的稳定性，默认该项为False。
    - **name** (str|None) - 该层名称（可选）。若为空，则自动为该层命名。默认：None
-    - **axis** (int) - 执行softmax计算的维度索引，应该在 :math:`[-1，rank-1]` 范围内，其中rank是输入变量的秩。 默认值：-1。
+    - **axis** (int) - 指定softmax计算的轴，应该在 :math:`[-1，rank-1]` 范围内，其中rank是输入变量的秩。 默认值：-1。-1为最后一维。

-返回： softmax输出
+返回： softmax函数的输出。

-返回类型：变量（Variable）
+返回类型：Variable（Tensor），数据类型为float32或float64的Tensor。

 **代码示例**

 .. code-block:: python

    import paddle.fluid as fluid
-    x = fluid.layers.data(name='x', shape=[2], dtype='float32')
-    fc = fluid.layers.fc(input=x, size=10)
-    # 在第二维执行softmax
-    softmax = fluid.layers.softmax(input=fc, axis=1)
-    # 在最后一维执行softmax
-    softmax = fluid.layers.softmax(input=fc, axis=-1)
+    import numpy as np
+
+    data = fluid.layers.data(name="input", shape=[-1, 3],dtype="float32")
+    result = fluid.layers.softmax(data,axis=1)
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    x = np.random.rand(3, 3).astype("float32")
+    output= exe.run(feed={"input": x},
+                     fetch_list=[result[0]])
+    print(output)
+    """
+    output:
+    array([0.22595254, 0.39276356, 0.38128382], dtype=float32)]
+    """




--- a/doc/fluid/api_cn/layers_cn/softplus_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/softplus_cn.rst
@@ -5,24 +5,26 @@ softplus

 .. py:function:: paddle.fluid.layers.softplus(x,name=None)

-softplus激活函数。
+softplus激活函数

 .. math::
    out = \ln(1 + e^{x})

 参数：
-    - **x** - Softplus操作符的输入
-    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
+    - **x** (Variable) - 张量（Tensor）
+    - **name** (str|None) - 该层名称（可选）。若设为None，则自动为该层命名。

-返回：Softplus操作后的结果
+返回: 张量(Tensor)
+
+返回类型: 变量(Variable)

 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[32, 784])
-        result = fluid.layers.softplus(data)
+    import paddle.fluid as fluid
+    data = fluid.layers.data(name="input", shape=[32, 784])
+    result = fluid.layers.softplus(data)




--- a/doc/fluid/api_cn/layers_cn/softshrink_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/softshrink_cn.rst
@@ -3,30 +3,32 @@
 softshrink
 -------------------------------

-.. py:function:: paddle.fluid.layers.softshrink(x, name=None)
+.. py:function:: paddle.fluid.layers.softshrink(x, alpha=None)

-Softshrink激活算子
+Softshrink激活函数

 .. math::
-        out = \begin{cases}
-                    x - \lambda, \text{if } x > \lambda \\
-                    x + \lambda, \text{if } x < -\lambda \\
-                    0,  \text{otherwise}
-              \end{cases}
+    out = \begin{cases}
+        x - \alpha, \text{if } x > \alpha \\
+        x + \alpha, \text{if } x < -\alpha \\
+        0,  \text{otherwise}
+        \end{cases}

 参数：
-        - **x** - Softshrink算子的输入
-        - **lambda** （FLOAT）- 非负偏移量。
+    - **x** (Variable0 - 张量（Tensor）
+    - **alpha** (float) - 上面公式中alpha的值

-返回：       Softshrink算子的输出
+返回: 张量(Tensor)
+
+返回类型: 变量(Variable)

 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[32, 784])
-        result = fluid.layers.softshrink(data)
+    import paddle.fluid as fluid
+    data = fluid.layers.data(name="input", shape=[32, 784])
+    result = fluid.layers.softshrink(data)




--- a/doc/fluid/api_cn/layers_cn/softsign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/softsign_cn.rst
@@ -6,25 +6,26 @@ softsign
 .. py:function:: paddle.fluid.layers.softsign(x,name=None)


-softsign激活函数。
+softsign激活函数

 .. math::
    out = \frac{x}{1 + |x|}

 参数：
-    - **x** : Softsign操作符的输入
-    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
+    - **x** (Variable) - 张量（Tensor）
+    - **name** (str|None) - 该层名称（可选）。若设为None，则自动为该层命名。

+返回: 张量(Tensor)

-返回：Softsign操作后的结果
+返回类型: 变量(Variable)

 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[32, 784])
-        result = fluid.layers.softsign(data)
+    import paddle.fluid as fluid
+    data = fluid.layers.data(name="input", shape=[32, 784])
+    result = fluid.layers.softsign(data)




--- a/doc/fluid/api_cn/layers_cn/square_error_cost_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/square_error_cost_cn.rst
@@ -5,26 +5,19 @@ square_error_cost

 .. py:function:: paddle.fluid.layers.square_error_cost(input,label)

-方差估计层（Square error cost layer）
+该OP用于计算预测值和目标值的方差估计。

-该层接受输入预测值和目标值，并返回方差估计
-
-对于预测值X和目标值Y，公式为：
+对于预测值input和目标值label，公式为：

 .. math::

-    Out = (X-Y)^{2}
-
-在以上等式中：
-    - **X** : 输入预测值，张量（Tensor)
-    - **Y** : 输入目标值，张量（Tensor）
-    - **Out** : 输出值，维度和X的相同
+    Out = (input-label)^{2}

 参数：
-    - **input** (Variable) - 输入张量（Tensor），带有预测值
-    - **label** (Variable) - 标签张量（Tensor），带有目标值
+    - **input** (Variable) - 预测值，维度为 :math:`[N_1, N_2, ..., N_k, D]` 的多维Tensor，其中最后一维D是类别数目。数据类型为float32或float64。
+    - **label** (Variable) - 目标值，维度为 :math:`[N_1, N_2, ..., N_k, D]` 的多维Tensor，其中最后一维D是类别数目。数据类型为float32或float64。

-返回：张量变量，存储输入张量和标签张量的方差
+返回：预测值和目标值的方差

 返回类型：变量（Variable）


--- a/doc/fluid/api_cn/layers_cn/stanh_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/stanh_cn.rst
@@ -3,7 +3,7 @@
 stanh
 -------------------------------

-.. py:function:: paddle.fluid.layers.stanh(x, scale_a=0.6666666666666666, scale_b=1.7159, name=None)
+.. py:function:: paddle.fluid.layers.stanh(x, scale_a=0.67, scale_b=1.7159, name=None)

 STanh 激活算子（STanh Activation Operator.）

@@ -11,26 +11,35 @@ STanh 激活算子（STanh Activation Operator.）
          \\out=b*\frac{e^{a*x}-e^{-a*x}}{e^{a*x}+e^{-a*x}}\\

 参数：
-    - **x** (Variable) - STanh operator的输入
-    - **scale_a** (FLOAT|2.0 / 3.0) - 输入的a的缩放参数
-    - **scale_b** (FLOAT|1.7159) - b的缩放参数
-    - **name** (str|None) - 这个层的名称(可选)。如果设置为None，该层将被自动命名。
+    - **x** (Tensor|LoDTensor) - 数据类型为float32,float64。STanh operator的输入
+    - **scale_a** (float) - 输入的a的缩放参数
+    - **scale_b** (float) - b的缩放参数
+    - **name** (str|None) - 这个层的名称(可选)。如果设置为None，该层将被自动命名

-返回: STanh操作符的输出
+返回: 与输入shape相同的张量

-返回类型: 输出(Variable)
+返回类型: Variable（Tensor），数据类型为float32的Tensor。

 **代码示例：**

 .. code-block:: python

    import paddle.fluid as fluid
-    x = fluid.layers.data(name="x", shape=[3,10,32,32], dtype="float32")
-    y = fluid.layers.stanh(x, scale_a=0.67, scale_b=1.72)
-
-
-
-
-
+    import numpy as np
+    data = fluid.layers.data(name="input", shape=[-1, 3])
+    result = fluid.layers.stanh(data,scale_a=0.67, scale_b=1.72)
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    x = np.random.random(size=(3, 3)).astype('float32')
+    output= exe.run(feed={"input": x},
+                 fetch_list=[result])
+    print(output)
+    """
+    output:
+    [array([[0.626466  , 0.89842904, 0.7501062 ],
+           [0.25147712, 0.7484996 , 0.22902708],
+           [0.62705994, 0.23110689, 0.56902856]], dtype=float32)]
+    """


--- a/doc/fluid/api_cn/layers_cn/sums_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sums_cn.rst
@@ -5,43 +5,50 @@ sums

 .. py:function:: paddle.fluid.layers.sums(input,out=None)

-该函数对输入进行求和，并返回求和结果作为输出。
+该OP计算多个输入Tensor逐个元素相加的和。

-参数：
-    - **input** (Variable|list)-输入张量，有需要求和的元素
-    - **out** (Variable|None)-输出参数。求和结果。默认：None
-
-返回：输入的求和。和参数'out'等同
-
-返回类型：变量（Variable）
-
-**代码示例**：
+- 示例：3个Tensor求和

 .. code-block:: python

-    import paddle.fluid as fluid
-     
-    # sum of several tensors
-    a0 = fluid.layers.fill_constant(shape=[1], dtype='int64', value=1)
-    a1 = fluid.layers.fill_constant(shape=[1], dtype='int64', value=2)
-    a2 = fluid.layers.fill_constant(shape=[1], dtype='int64', value=3)
-    sums = fluid.layers.sums(input=[a0, a1, a2])
+  输入：
+      x0.shape = [2, 3]
+      x0.data = [[1., 2., 3.],
+                 [4., 5., 6.]]
+      x1.shape = [2, 3]
+      x1.data = [[10., 20., 30.],
+                 [40., 50., 60.]]
+      x2.shape = [2, 3]
+      x2.data = [[100., 200., 300.],
+                 [400., 500., 600.]]

-    # sum of a tensor array
-    array = fluid.layers.create_array('int64')
-    i = fluid.layers.zeros(shape=[1], dtype='int64', force_cpu=True)
-    fluid.layers.array_write(a0, array=array, i=i)
-    i = fluid.layers.increment(x=i)
-    fluid.layers.array_write(a1, array=array, i=i)
-    i = fluid.layers.increment(x=i)
-    fluid.layers.array_write(a2, array=array, i=i)
-    sums = fluid.layers.sums(input=array)
+  输出：
+      out.shape = [2, 3]
+      out.data = [[111., 222., 333.],
+                  [444., 555., 666.]]


+参数：
+    - **input** (list) - 多个维度相同的Tensor组成的元组。支持的数据类型：float32，float64，int32，int64。
+    - **out** (Variable，可选) - 指定求和的结果Tensor，可以是程序中已经创建的任何Variable。默认值为None，此时将创建新的Variable来保存输出结果。

+返回：输入的和，数据类型和维度与输入Tensor相同。若 ``out`` 为 ``None`` ，返回值是一个新的Variable；否则，返回值就是 ``out`` 。

+返回类型：Variable

+**代码示例**：

+.. code-block:: python
+
+    import paddle.fluid as fluid

+    x0 = fluid.layers.fill_constant(shape=[16, 32], dtype='int64', value=1)
+    x1 = fluid.layers.fill_constant(shape=[16, 32], dtype='int64', value=2)
+    x2 = fluid.layers.fill_constant(shape=[16, 32], dtype='int64', value=3)
+    x3 = fluid.layers.fill_constant(shape=[16, 32], dtype='int64', value=0)

+    # 多个Tensor求和，结果保存在一个新建的Variable sum0，即sum0=x0+x1+x2，值为[[6, ..., 6], ..., [6, ..., 6]]
+    sum0 = fluid.layers.sums(input=[x0, x1, x2])

+    # 多个Tensor求和，sum1和x3是同一个Variable，相当于x3=x0+x1+x2，值为[[6, ..., 6], ..., [6, ..., 6]]
+    sum1 = fluid.layers.sums(input=[x0, x1, x2], out=x3)
--- a/doc/fluid/api_cn/layers_cn/tanh_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/tanh_cn.rst
@@ -6,9 +6,7 @@ tanh
 .. py:function:: paddle.fluid.layers.tanh(x, name=None)


-
-
-tanh 激活函数。
+tanh 激活函数

 .. math::
    out = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}
@@ -17,17 +15,19 @@ tanh 激活函数。
 参数:

    - **x** - Tanh算子的输入
-    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
+    - **name** (str|None) - 该层名称（可选）。若设为None，则自动为该层命名。
+
+返回: 张量(Tensor)

-返回：     Tanh算子的输出。
+返回类型: 变量(Variable)

 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[32, 784])
-        result = fluid.layers.tanh(data)
+    import paddle.fluid as fluid
+    data = fluid.layers.data(name="input", shape=[32, 784])
+    result = fluid.layers.tanh(data)




--- a/doc/fluid/api_cn/layers_cn/tanh_shrink_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/tanh_shrink_cn.rst
@@ -5,7 +5,7 @@ tanh_shrink

 .. py:function:: paddle.fluid.layers.tanh_shrink(x, name=None)

-tanh_shrink激活函数。
+tanh_shrink激活函数

 .. math::
    out = x - \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}
@@ -13,17 +13,19 @@ tanh_shrink激活函数。
 参数:

    - **x** - TanhShrink算子的输入
-    - **use_cudnn** (BOOLEAN) – （bool，默认为false）是否仅用于cudnn核，需要安装cudnn
+    - **name** (str|None) - 该层名称（可选）。若设为None，则自动为该层命名。

-返回：     tanh_shrink算子的输出
+返回: 张量(Tensor)
+
+返回类型: 变量(Variable)

 **代码示例**：

 .. code-block:: python

-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[32, 784])
-        result = fluid.layers.tanh_shrink(data)
+    import paddle.fluid as fluid
+    data = fluid.layers.data(name="input", shape=[32, 784])
+    result = fluid.layers.tanh_shrink(data)




--- a/doc/fluid/api_cn/layers_cn/tree_conv_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/tree_conv_cn.rst
-.. _cn_api_fluid_layers_tree_conv:
-
-tree_conv
-------------------------------
-
-.. py:function:: paddle.fluid.layers.tree_conv(nodes_vector, edge_set, output_size, num_filters=1, max_depth=2, act='tanh', param_attr=None, bias_attr=None, name=None)
-
-基于树结构的卷积Tree-Based Convolution运算。
-
-基于树的卷积是基于树的卷积神经网络（TBCNN，Tree-Based Convolution Neural Network）的一部分，它用于对树结构进行分类，例如抽象语法树。 Tree-Based Convolution提出了一种称为连续二叉树的数据结构，它将多路（multiway）树视为二叉树。 提出基于树的卷积论文： https：//arxiv.org/abs/1409.5718v1
-
-参数：
-    - **nodes_vector**  (Variable) – (Tensor) 树上每个节点的特征向量(vector)。特征向量的形状必须为[max_tree_node_size，feature_size]
-    - **edge_set**  (Variable) – (Tensor) 树的边。边必须带方向。边集的形状必须是[max_tree_node_size，2]
-    - **output_size**  (int) – 输出特征宽度
-    - **num_filters**  (int) – filter数量，默认值1
-    - **max_depth**  (int) – filter的最大深度，默认值2
-    - **act**  (str) – 激活函数，默认 tanh
-    - **param_attr**  (ParamAttr) – filter的参数属性，默认None
-    - **bias_attr**  (ParamAttr) – 此层bias的参数属性，默认None
-    - **name**  (str) – 此层的名称（可选）。如果设置为None，则将自动命名层，默认为None
-
-
-返回： （Tensor）子树的特征向量。输出张量的形状是[max_tree_node_size，output_size，num_filters]。输出张量可以是下一个树卷积层的新特征向量
-
-返回类型：out（Variable）
-
-**代码示例**:
-
-.. code-block:: python
-    
-    import paddle.fluid as fluid
-    # 10 代表数据集的最大节点大小max_node_size，5 代表向量宽度
-    nodes_vector = fluid.layers.data(name='vectors', shape=[10, 5], dtype='float32')
-    # 10 代表数据集的最大节点大小max_node_size, 2 代表每条边连接两个节点
-    # 边必须为有向边
-    edge_set = fluid.layers.data(name='edge_set', shape=[10, 2], dtype='float32')
-
-    # 输出的形状会是[None, 10, 6, 1],
-    # 10 代表数据集的最大节点大小max_node_size, 6 代表输出大小output size, 1 代表 1 个filter
-    
-    out_vector = fluid.layers.tree_conv(nodes_vector, edge_set, 6, 1, 2)
-    # reshape之后, 输出张量output tensor为下一个树卷积的nodes_vector
-    out_vector = fluid.layers.reshape(out_vector, shape=[-1, 10, 6])
-    
-    
-    out_vector_2 = fluid.layers.tree_conv(out_vector, edge_set, 3, 4, 2)
-    
-    # 输出tensor也可以用来池化(论文中称为global pooling)
-    pooled = fluid.layers.reduce_max(out_vector, dims=2) # 全局池化
-
-
-
-
-
-
--- a/doc/fluid/api_cn/layers_cn/unique_with_counts_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/unique_with_counts_cn.rst
@@ -4,16 +4,20 @@ unique_with_counts
 -------------------------------

 .. py:function:: paddle.fluid.layers.unique_with_counts(x, dtype='int32')
+该OP对输入Tensor元素进行去重，获取去重后结果Tensor，同时获取去重后结果在原始输入中的计数Tensor以及在原始输入中的索引Tensor。

-unique_with_count为 ``x`` 返回一个unique张量和一个指向该unique张量的索引以及 ``x`` 中unique元素的数量。
+注:该OP仅支持 **CPU** ，同时仅支持 **Tensor**

 参数：
-    - **x** (Variable) - 一个1维输入张量
-    - **dtype** (np.dtype|core.VarDesc.VarType|str) – 索引张量的类型，int32，int64。
+    - **x** (Variable) – 数据shape为 :math:`[N]` 的一维Tensor，数据类型为 float32，float64，int32，int64。
+    - **dtype** (np.dtype|core.VarDesc.VarType|str) – 索引和计数Tensor的类型，默认为 int32，数据类型需要为 int32或int64。

-返回：元组(out, index, count)。 ``out`` 为 ``x`` 的指定dtype的unique张量, ``index`` 是一个指向 ``out`` 的索引张量, 用户可以通过该函数来转换原始的 ``x`` 张量的索引， ``count`` 是 ``x`` 中unique元素的数量。
+返回: 
+    - **out** 表示对输入进行去重后结果一维Tensor，数据shape为 :math:`[K]` ，K和输入x的shape中的N可能不一致。 
+    - **index** 表示原始输入在去重后结果中的索引Tensor :math:`[N]` ，shape和输入x的shape一致。 
+    - **count** 表示去重后元素的计数结果Tensor，数据shape为 :math:`[K]` ，数据shape和out的shape一致。 

-返回类型：元组(tuple)
+返回类型：tuple，tuple中元素类型为Variable(Tensor)，输出中的out和输入x的数据类型一致，输出中index以及count的数据类型为 int32，int64。

 **代码示例**：

@@ -21,17 +25,9 @@ unique_with_count为 ``x`` 返回一个unique张量和一个指向该unique张

    import numpy as np
    import paddle.fluid as fluid
-    x = fluid.assign(np.array([2, 3, 3, 1, 5, 3], dtype='int32'))
+    x = fluid.layers.assign(np.array([2, 3, 3, 1, 5, 3], dtype='int32'))
    out, index, count = fluid.layers.unique_with_counts(x) # out is [2, 3, 1, 5];
                                               # index is [0, 1, 1, 2, 3, 1];
                                               # count is [1, 3, 1, 1]
-
-
-
-
-
-
-
-
-
+    # x.shape=(6,) out.shape=(4,), index.shape=(6,), count.shape=(4,)

--- a/doc/fluid/api_cn/layers_cn/var_conv_2d_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/var_conv_2d_cn.rst
-.. _cn_api_fluid_layers_var_conv_2d:
-
-var_conv_2d
-------------------------------
-
-.. py:function:: paddle.fluid.layers.var_conv_2d(input, row, col, input_channel, output_channel, filter_size, stride=1, param_attr=None, act=None, dtype='float32', name=None)
-
-var_conv_2d层依据给定的参数来计算输出， ``input`` 、 ``row`` 和 ``col`` 都是1-level的 ``LodTensor`` 卷积操作与普通的conv2d卷积层一样，值得注意的是，输入数据的第二个维度即input.dim[1]应该为1。
-如果 ``input_channel`` 是2，并且给了如下的row lodTensor 和 col lodTensor:
-
-.. code-block:: text
-
-    row.lod = [[5, 4]]
-    col.lod = [[6, 7]]
-    输入是一个lodTensor:
-    input.lod = [[60, 56]]  # where 60 = input_channel * 5 * 6
-    input.dims = [116, 1]   # where 116 = 60 + 56
-    如果设置 output_channel 为3, filter_size 为 [3, 3], stride 为 [1, 1]:
-    output.lod = [[90, 84]] # where 90 = output_channel * [(5-1)/stride + 1] * [(6-1)/stride + 1]
-    output.dims = [174, 1]  # where 174 = 90 + 84
-
-参数:
-    - **input** (Variable) – dims[1]等于1的1-level的LodTensor。
-    - **row** (Variable) – 1-level的LodTensor提供height。
-    - **col** (Variable) – 1-level的LodTensor提供width。
-    - **input_channel** (int) – 输入通道的数目。
-    - **output_channel** (int) – 输出通道的数目。
-    - **filter_size** (int|tuple|None) – 过滤器尺寸。 如果是元组，则应当为两个整型数字(filter_size_H, filter_size_W)。否则，过滤器会变为正方形。
-    - **stride** (int|tuple) – 步长。 如果是元组，则应当为两个整型数字(stride_H, stride_W)。否则，stride_H = stride_W = stride。默认: stride = 1.
-    - **param_attr** (ParamAttr|None) – 为var_conv2d可学习的权重分配参数属性如果设置为None，或者ParamAttr的一个属性, var_conv2d将会创建ParamAttr做为param_attr。如果param_attr的初始化没有设定，参数将会以 \(Normal(0.0, std)\),进行初始化，\(std\) 为 \((\frac{2.0 }{filter\_elem\_num})^{0.5}\). 默认: None。
-    - **act** (str) – 激活类型，如果设置为None，则不会激活。默认:None
-    - **dtype** ('float32') – 输出与参数的数据类型
-    - **name** (str|None) – 层名。如果没有设置，将会被自动命名。默认: None。
-
-
-返回: 由该层指定LoD的输出变量
-
-返回类型: 变量(Variable)
-
-**代码示例**：
-
-.. code-block:: python
-
-    import numpy as np
-    from paddle.fluid import layers
-
-    x_lod_tensor = layers.data(name='x', shape=[1], lod_level=1)
-    row_lod_tensor = layers.data(name='row', shape=[6], lod_level=1)
-    col_lod_tensor = layers.data(name='col', shape=[6], lod_level=1)
-    out = layers.var_conv_2d(input=x_lod_tensor,
-                             row=row_lod_tensor,
-                             col=col_lod_tensor,
-                             input_channel=3,
-                             output_channel=5,
-                             filter_size=[3, 3],
-                             stride=1)
-
-
-
-
-
-
-
--- a/doc/fluid/api_cn/layers_cn/where_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/where_cn.rst
@@ -4,17 +4,15 @@ where
 -------------------------------

 .. py:function:: paddle.fluid.layers.where(condition)
-     
-返回一个秩为2的int64型张量，指定condition中真实元素的坐标。
-     
-输出的第一维是真实元素的数量，第二维是condition的秩（维数）。如果没有真实元素，则将生成空张量。
+
+该OP计算输入元素中为True的元素在输入中的坐标（index）。
        
 参数：
-    - **condition** （Variable） - 秩至少为1的布尔型张量。
+    - **condition** （Variable）– 输入秩至少为1的多维Tensor，数据类型是bool类型。

-返回：存储一个二维张量的张量变量
+返回：输出condition元素为True的坐标（index），将所有的坐标（index）组成一个2-D的Tensor。

-返回类型：变量（Variable）
+返回类型：Variable，数据类型是int64。
     
 **代码示例**：

@@ -39,5 +37,3 @@ where
        out = layers.where(condition) # [[]]


-
-
--- a/doc/fluid/api_cn/layers_cn/zeros_like_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/zeros_like_cn.rst
@@ -5,17 +5,16 @@ zeros_like

 .. py:function:: paddle.fluid.layers.zeros_like(x, out=None)

-**zeros_like**

-该函数创建一个和x具有相同的形状和数据类型的全零张量
+该OP创建一个和x具有相同的形状和数据类型的全零Tensor。

 参数：
-    - **x** (Variable)-指定形状和数据类型的输入张量
-    - **out** (Variable)-输出张量
+    - **x** (Variable) – 指定输入为一个多维的Tensor，数据类型可以是bool，float32，float64，int32，int64。
+    - **out** (Variable|可选) – 如果为None，则创建一个Variable作为输出，创建后的Variable的数据类型，shape大小和输入变量x一致。如果是输入的一个Tensor，数据类型和数据shape大小需要和输入变量x一致。默认值为None。
    
-返回：存储输出的张量变量
+返回：返回一个多维的Tensor，具体的元素值和输入的数据类型相关，如果是bool类型的，则全False，其它均为0。数据shape大小和输入x一致。

-返回类型：变量（Variable）
+返回类型：Variable

 **代码示例**：

@@ -25,8 +24,3 @@ zeros_like
    x = fluid.layers.data(name='x', dtype='float32', shape=[3], append_batch_size=False)
    data = fluid.layers.zeros_like(x) # [0.0, 0.0, 0.0]

-
-
-
-
-
--- a/doc/fluid/api_cn/optimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn.rst
@@ -9,6 +9,7 @@ fluid.optimizer
    :maxdepth: 1

    optimizer_cn/Adadelta_cn.rst
+    optimizer_cn/AdadeltaOptimizer_cn.rst
    optimizer_cn/Adagrad_cn.rst
    optimizer_cn/AdagradOptimizer_cn.rst
    optimizer_cn/Adam_cn.rst

--- a/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst
+.. _cn_api_fluid_optimizer_AdadeltaOptimizer:
+
+AdadeltaOptimizer
+-------------------------------
+
+.. py:class:: paddle.fluid.optimizer.AdadeltaOptimizer(learning_rate, epsilon=1.0e-6, rho=0.95, regularization=None, name=None)
+
+**注意：此接口不支持稀疏参数更新。**
+
+Adadelta优化器，具体细节可参考论文 `ADADELTA: AN ADAPTIVE LEARNING RATE METHOD <https://arxiv.org/abs/1212.5701>`_ 。
+
+更新公式如下：
+
+.. math::
+
+    E(g_t^2) &= \rho * E(g_{t-1}^2) + (1-\rho) * g^2\\
+    learning\_rate &= \sqrt{ ( E(dx_{t-1}^2) + \epsilon ) / ( E(g_t^2) + \epsilon ) }\\
+    E(dx_t^2) &= \rho * E(dx_{t-1}^2) + (1-\rho) * (-g*learning\_rate)^2
+
+
+参数：
+    - **learning_rate** (float|Variable) - 全局学习率。
+    - **epsilon** (float) - 维持数值稳定性的浮点型值，默认值为1.0e-6。
+    - **rho** (float) - 算法中的衰减率，默认值为0.95。
+    - **regularization** (WeightDecayRegularizer，可选) - 正则化方法，例如fluid.regularizer.L2DecayRegularizer等。默认值为None，表示无正则化。
+    - **name** (str，可选) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+
+    image = fluid.layers.data(name='image', shape=[28], dtype='float32')
+    fc = fluid.layers.fc(image, size=10)
+    cost = fluid.layers.reduce_mean(fc)
+    optimizer = fluid.optimizer.AdadeltaOptimizer(
+        learning_rate=0.0003, epsilon=1.0e-6, rho=0.95)
+    optimizer_ops, params_grads = optimizer.minimize(cost)
+
+
+.. py:method:: minimize(loss, startup_program=None, parameter_list=None, no_grad_set=None, grad_clip=None)
+
+为训练网络添加反向和参数优化部分，进而使损失最小化。
+
+参数：
+    - **loss** (Variable) – 优化器的损失变量。
+    - **startup_program** (Program，可选) – 参数所在的startup program。默认值为None，表示 :ref:`cn_api_fluid_default_startup_program` 。
+    - **parameter_list** (list(Variable)，可选) – 待更新的参数列表。默认值为None，表示所有参数均需要更新。
+    - **no_grad_set** (set，可选) – 无需计算梯度的变量集合。默认值为None，表示所有变量均需计算梯度。
+    - **grad_clip** (GradClipBase，可选) – 梯度裁剪的策略，目前仅在动态图模式下有效。
+
+返回: tuple(optimize_ops, params_grads)，其中optimize_ops为参数优化OP列表；param_grads为由(param, param_grad)组成的列表，其中param和param_grad分别为参数和参数的梯度。
+
+返回类型: tuple
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+
+    image = fluid.layers.data(name='image', shape=[28], dtype='float32')
+    fc = fluid.layers.fc(image, size=10)
+    cost = fluid.layers.reduce_mean(fc)
+    optimizer = fluid.optimizer.AdadeltaOptimizer(
+        learning_rate=0.0003, epsilon=1.0e-6, rho=0.95)
+    optimizer_ops, params_grads = optimizer.minimize(cost)
+
+
--- a/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst
@@ -17,7 +17,7 @@ Adam更新如下：
 .. math::
    learning\_rate=\frac{learning\_rate}{1-\beta_1^t}
 .. math::
-    param\_out=param−learning\_rate*\frac{moment\_out}{inf\_norm\_out}\\
+    param\_out=param-learning\_rate*\frac{moment\_1}{\sqrt{moment\_2}+\epsilon}\\

 参数: 
    - **learning_rate** (float|Variable)-学习率，用于更新参数。作为数据参数，可以是一个浮点类型值或有一个浮点类型值的变量

--- a/doc/fluid/api_guides/low_level/program.rst
+++ b/doc/fluid/api_guides/low_level/program.rst
@@ -65,6 +65,17 @@ Fluid 中的 :code:`Variable` 可以包含任何类型的值———在大多

 模型中所有的可学习参数都以 :code:`Variable` 的形式保留在内存空间中，您在绝大多数情况下都不需要自己来创建网络中的可学习参数， Fluid 为几乎常见的神经网络基本计算模块都提供了封装。以最简单的全连接模型为例，调用 :code:`fluid.layers.fc` 会直接为全连接层创建连接权值( W )和偏置（ bias ）两个可学习参数，无需显示地调用 :code:`variable` 相关接口创建可学习参数。

+.. _api_guide_Name:
+
+=========
+Name
+=========
+
+.. _api_guide_ParamAttr:
+
+=========
+ParamAttr
+=========

 =========
 相关API

--- a/doc/fluid/api_guides/low_level/program_en.rst
+++ b/doc/fluid/api_guides/low_level/program_en.rst
@@ -64,6 +64,18 @@ In Fluid， :code:`Variable` can contain any type of value -- in most cases a Lo

 All the learnable parameters in the model are kept in the memory space in form of :code:`Variable` . In most cases, you do not need to create the learnable parameters in the network by yourself. Fluid provides encapsulation for almost common basic computing modules of the neural network. Taking the simplest full connection model as an example, calling :code:`fluid.layers.fc` directly creates two learnable parameters for the full connection layer, namely, connection weight (W) and bias, without explicitly calling :code:`Variable` related interfaces to create learnable parameters.

+.. _api_guide_Name:
+
+=========
+Name
+=========
+
+.. _api_guide_ParamAttr:
+
+=========
+ParamAttr
+=========
+
 ==================
 Related API
 ==================

--- a/doc/fluid/beginners_guide/install/Tables.md
+++ b/doc/fluid/beginners_guide/install/Tables.md
@@ -242,7 +242,7 @@ PaddePaddle通过编译时指定路径来实现引用各种BLAS/CUDA/cuDNN库。
 您可以在 [Release History](https://pypi.org/project/paddlepaddle-gpu/#history) 中找到PaddlePaddle-gpu的各个发行版本。
 > 其中`postXX` 对应的是CUDA和cuDNN的版本，`postXX`之前的数字代表Paddle的版本

-需要注意的是，命令中<code> paddlepaddle-gpu </code> 在windows环境下，会默认安装支持CUDA 9和cuDNN 7的对应[版本号]的PaddlePaddle安装包
+需要注意的是，命令中<code> paddlepaddle-gpu </code> 在windows环境下，会默认安装支持CUDA 10.0和cuDNN 7的对应[版本号]的PaddlePaddle安装包
 ***

 <a name="ciwhls-release"></a>
@@ -323,40 +323,100 @@ PaddePaddle通过编译时指定路径来实现引用各种BLAS/CUDA/cuDNN库。
 		paddlepaddle_gpu-1.5.2-cp37-cp37m-linux_x86_64.whl</a></td>
 	</tr> 
 	<tr>
-		<td> win_cpu_openblas </td>
+		<td> win_cpu_mkl </td>
+		<td> - </td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle-1.5.2-cp27-cp27m-win_amd64.whl">
+		paddlepaddle-1.5.2-cp27-cp27m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle-1.5.2-cp35-cp35m-win_amd64.whl">
+		paddlepaddle-1.5.2-cp35-cp35m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle-1.5.2-cp36-cp36m-win_amd64.whl">
+		paddlepaddle-1.5.2-cp36-cp36m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle-1.5.2-cp37-cp37m-win_amd64.whl">
+		paddlepaddle-1.5.2-cp37-cp37m-win_amd64.whl</a></td>
+	</tr> 
+	<tr>
+		<td> win_cuda8_cudnn7_mkl </td>
+		<td> - </td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post87-cp27-cp27m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp27-cp27m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post87-cp35-cp35m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp35-cp35m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post87-cp36-cp36m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp36-cp36m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post87-cp37-cp37m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp37-cp37m-win_amd64.whl</a></td>
+	</tr>  
+	<tr>
+		<td> win_cuda9_cudnn7_mkl </td>
+		<td> - </td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post97-cp27-cp27m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp27-cp27m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post97-cp35-cp35m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp35-cp35m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post97-cp36-cp36m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp36-cp36m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post97-cp37-cp37m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp37-cp37m-win_amd64.whl</a></td>
+	</tr>  
+	<tr>
+		<td> win_cuda10_cudnn7_mkl </td>
 		<td> - </td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle-1.5.1-cp27-cp27m-win_amd64.whl">
-		paddlepaddle-1.5.1-cp27-cp27m-win_amd64.whl</a></td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle-1.5.1-cp35-cp35m-win_amd64.whl">
-		paddlepaddle-1.5.1-cp35-cp35m-win_amd64.whl</a></td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle-1.5.1-cp36-cp36m-win_amd64.whl">
-		paddlepaddle-1.5.1-cp36-cp36m-win_amd64.whl</a></td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle-1.5.1-cp37-cp37m-win_amd64.whl">
-		paddlepaddle-1.5.1-cp37-cp37m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post107-cp27-cp27m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp27-cp27m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post107-cp35-cp35m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp35-cp35m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post107-cp36-cp36m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp36-cp36m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-mkl/paddlepaddle_gpu-1.5.2.post107-cp37-cp37m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp37-cp37m-win_amd64.whl</a></td>
 	</tr>
+	<tr>
+		<td> win_cpu_openblas </td>
+		<td> - </td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle-1.5.2-cp27-cp27m-win_amd64.whl">
+		paddlepaddle-1.5.2-cp27-cp27m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle-1.5.2-cp35-cp35m-win_amd64.whl">
+		paddlepaddle-1.5.2-cp35-cp35m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle-1.5.2-cp36-cp36m-win_amd64.whl">
+		paddlepaddle-1.5.2-cp36-cp36m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle-1.5.2-cp37-cp37m-win_amd64.whl">
+		paddlepaddle-1.5.2-cp37-cp37m-win_amd64.whl</a></td>
+	</tr>  
 	<tr>
 		<td> win_cuda8_cudnn7_openblas </td>
 		<td> - </td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post87-cp27-cp27m-win_amd64.whl">
-		paddlepaddle_gpu-1.5.1-cp27-cp27m-win_amd64.whl</a></td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post87-cp35-cp35m-win_amd64.whl">
-		paddlepaddle_gpu-1.5.1-cp35-cp35m-win_amd64.whl</a></td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post87-cp36-cp36m-win_amd64.whl">
-		paddlepaddle_gpu-1.5.1-cp36-cp36m-win_amd64.whl</a></td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post87-cp37-cp37m-win_amd64.whl">
-		paddlepaddle_gpu-1.5.1-cp37-cp37m-win_amd64.whl</a></td>
-	</tr>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post87-cp27-cp27m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp27-cp27m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post87-cp35-cp35m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp35-cp35m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post87-cp36-cp36m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp36-cp36m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post87-cp37-cp37m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp37-cp37m-win_amd64.whl</a></td>
+	</tr>  
 	<tr>
 		<td> win_cuda9_cudnn7_openblas </td>
 		<td> - </td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post97-cp27-cp27m-win_amd64.whl">
-		paddlepaddle_gpu-1.5.1-cp27-cp27m-win_amd64.whl</a></td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post97-cp35-cp35m-win_amd64.whl">
-		paddlepaddle_gpu-1.5.1-cp35-cp35m-win_amd64.whl</a></td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post97-cp36-cp36m-win_amd64.whl">
-		paddlepaddle_gpu-1.5.1-cp36-cp36m-win_amd64.whl</a></td>
-		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.1-win-open/paddlepaddle_gpu-1.5.1.post97-cp37-cp37m-win_amd64.whl">
-		paddlepaddle_gpu-1.5.1-cp37-cp37m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post97-cp27-cp27m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp27-cp27m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post97-cp35-cp35m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp35-cp35m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post97-cp36-cp36m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp36-cp36m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post97-cp37-cp37m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp37-cp37m-win_amd64.whl</a></td>
+	</tr>  
+	<tr>
+		<td> win_cuda10_cudnn7_openblas </td>
+		<td> - </td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post107-cp27-cp27m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp27-cp27m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post107-cp35-cp35m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp35-cp35m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post107-cp36-cp36m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp36-cp36m-win_amd64.whl</a></td>
+		<td> <a href="https://paddle-wheel.bj.bcebos.com/1.5.2-win-open/paddlepaddle_gpu-1.5.2.post107-cp37-cp37m-win_amd64.whl">
+		paddlepaddle_gpu-1.5.2-cp37-cp37m-win_amd64.whl</a></td>
 	</tr>  
 	<tr>
 		<td> mac_cpu </td>

--- a/doc/fluid/design/mkldnn/data_transformation/data_transform.md
+++ b/doc/fluid/design/mkldnn/data_transformation/data_transform.md
+# Design Doc: MKL-DNN Data Transformation
+
+When fluid is using MKL-DNN engine to execute program, not all operators are having mkl-dnn kernels and some of operators are executed by CPU. MKL-DNN kernels of operators expect input Tensors to be provided in MKL-DNN layout , while Paddle(CPU) kernels are expecting input Tensors to be of Paddle layout.
+
+We can distinguish following scenarios(presented below on the picture):
+* Paddle(CPU) kernel is followed by MKL-DNN kernel
+* MKL-DNN kernel is followed by Paddle(CPU) kernel
+* MKL-DNN kernel is followed by fetch operator
+
+
+![](images/data_transform.svg)
+
+
+### Paddle(CPU) kernel is followed by MKL-DNN kernel
+In a situation when Paddle(CPU) kernel finished execution, its outcome is one or many Tensors of Paddle layout. Each of those
+Tensors to be feed into MKL-DNN kernel, needs to be transformed to be of MKL-DNN layout. For this scenario conversion of Paddle Tensor to MKL-DNN Tensor is done by just
+changing layout flag to MKL-DNN and picking MKL-DNN format that match Paddle Tensor rank. This is computationally cheap operation as there is no real data rearrangement.
+
+This scenario is drawn on the picture with bold lines. Starting from Paddle(CPU) op on the left side , following arrows drawn in bold and finishing with MKL-DNN op on the right side of picture.
+
+### MKL-DNN kernel is followed by Paddle(CPU) kernel
+In this situation MKL-DNN kernel finished its execution and as a result it produced one or more output Tensors. Each of those Tensors are of MKL-DNN layout and to be fed into Paddle(CPU) kernel,
+they need to be converted into Paddle layout. In a detail MKL-DNN Tensor arrangement (mkl-dnn memory format) is checked if it is compatible with Paddle(CPU) layout and if positive then
+just layout of Tensor is set as Paddle and mkl-dnn format is set to ``undef``. In case when MKL-DNN Tensor data arrangement is not compatible with Paddle layout then actual data arrangement
+is performed. For example MKL-DNN Tensor is 4D and having format ``NCHW16C`` and to convert it into Paddle layout we need to rearrange data to be ``NCHW`` format. To do so
+MKL-DNN Reorder primitive is created that can do data rearrangement. 
+
+This scenario is marked on the picture with outlined, empty inside arrows. Starting from MKL-DNN op on the left side , following empty arrows finishing with Paddle(CPU) op on the right side of picture.
+### MKL-DNN kernel is followed by fetch operator
+This situation is similar conceptually to previous section, but because fetch operator is an operator without kernel then it does not share data transformation code with operators that are having kernel registered.
+Hence execution flow looks a bit different, although conceptually conversion of MKL-DNN Tensor into Paddle(CPU) Tensor is the same as in a described above
+
+This scenario is marked on the picture with regular arrows. Starting from MKL-DNN op on the left side , following regular arrows finishing with fetch op on the right side of picture.
+### GPU and MKL-DNN kernels interoperability.
+Currently Fluid is not supporting execution of programs by using combination of MKL-DNN and GPU kernels
--- a/doc/fluid/design/mkldnn/data_transformation/images/data_transform.svg
+++ b/doc/fluid/design/mkldnn/data_transformation/images/data_transform.svg
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.38.0 (20140413.2041)
+ -->
+<!-- Title: Q Pages: 1 -->
+<svg width="1314pt" height="456pt"
+ viewBox="0.00 0.00 1313.58 456.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 452)">
+<title>Q</title>
+<polygon fill="white" stroke="none" points="-4,4 -4,-452 1309.58,-452 1309.58,4 -4,4"/>
+<g id="clust1" class="cluster"><title>cluster_in</title>
+<polygon fill="none" stroke="black" stroke-dasharray="1,5" points="0,-68 0,-399 186.788,-399 186.788,-68 0,-68"/>
+<text text-anchor="start" x="7.89389" y="-384.8" font-family="Times,serif" font-size="14.00">Possible </text>
+<text text-anchor="start" x="56.8939" y="-384.8" font-family="Times,serif" font-weight="bold" font-size="14.00">i</text>
+<text text-anchor="start" x="60.8939" y="-384.8" font-family="Times,serif" font-size="14.00"> &#160;order operator types</text>
+</g>
+<g id="clust2" class="cluster"><title>cluster_out</title>
+<polygon fill="none" stroke="black" stroke-dasharray="1,5" points="1103.79,-8 1103.79,-440 1305.58,-440 1305.58,-8 1103.79,-8"/>
+<text text-anchor="start" x="1111.68" y="-425.8" font-family="Times,serif" font-size="14.00">Possible </text>
+<text text-anchor="start" x="1160.68" y="-425.8" font-family="Times,serif" font-weight="bold" font-size="14.00">i+1</text>
+<text text-anchor="start" x="1179.68" y="-425.8" font-family="Times,serif" font-size="14.00"> &#160;order operator types</text>
+</g>
+<g id="clust3" class="cluster"><title>cluster_Trans</title>
+<polygon fill="none" stroke="black" stroke-dasharray="1,5" points="316.788,-164 316.788,-293 1041.79,-293 1041.79,-164 316.788,-164"/>
+<text text-anchor="middle" x="679.288" y="-277.8" font-family="Times,serif" font-size="14.00">Transformation of Tensors</text>
+</g>
+<!-- TransData -->
+<g id="node1" class="node"><title>TransData</title>
+<polygon fill="none" stroke="black" points="424.788,-226 324.788,-226 324.788,-190 424.788,-190 424.788,-226"/>
+<text text-anchor="middle" x="374.788" y="-204.3" font-family="Times,serif" font-size="14.00">TransformData</text>
+</g>
+<!-- toMKLDNN -->
+<g id="node2" class="node"><title>toMKLDNN</title>
+<polygon fill="none" stroke="black" points="753.288,-262 581.288,-262 581.288,-226 753.288,-226 753.288,-262"/>
+<text text-anchor="middle" x="667.288" y="-240.3" font-family="Times,serif" font-size="14.00">Label Tensor as MKL&#45;DNN</text>
+</g>
+<!-- TransData&#45;&gt;toMKLDNN -->
+<g id="edge7" class="edge"><title>TransData&#45;&gt;toMKLDNN</title>
+<path fill="none" stroke="black" stroke-width="2" d="M425.067,-214.107C464.972,-219.052 522.371,-226.165 571.166,-232.212"/>
+<polygon fill="black" stroke="black" stroke-width="2" points="570.766,-235.689 581.12,-233.446 571.626,-228.742 570.766,-235.689"/>
+</g>
+<!-- fromMKLDNN -->
+<g id="node3" class="node"><title>fromMKLDNN</title>
+<polygon fill="none" stroke="black" points="767.788,-208 566.788,-208 566.788,-172 767.788,-172 767.788,-208"/>
+<text text-anchor="middle" x="667.288" y="-186.3" font-family="Times,serif" font-size="14.00">TransDataLayoutFromMKLDNN</text>
+</g>
+<!-- TransData&#45;&gt;fromMKLDNN -->
+<g id="edge8" class="edge"><title>TransData&#45;&gt;fromMKLDNN</title>
+<path fill="none" stroke="black" d="M424.943,-202.95C460.912,-200.722 511.091,-197.612 556.374,-194.807"/>
+<path fill="none" stroke="none" d="M425.067,-204.947C461.035,-202.718 511.215,-199.609 556.497,-196.803"/>
+<path fill="none" stroke="black" d="M425.191,-206.943C461.159,-204.714 511.339,-201.605 556.621,-198.799"/>
+<polygon fill="black" stroke="black" points="556.915,-200.284 566.679,-196.172 556.482,-193.297 556.915,-200.284"/>
+</g>
+<!-- mkldnnop2 -->
+<g id="node9" class="node"><title>mkldnnop2</title>
+<ellipse fill="none" stroke="black" stroke-width="2" cx="1204.68" cy="-345" rx="63.8893" ry="63.8893"/>
+<text text-anchor="middle" x="1204.68" y="-341.3" font-family="Times,serif" font-size="14.00">MKL&#45;DNN op</text>
+</g>
+<!-- toMKLDNN&#45;&gt;mkldnnop2 -->
+<g id="edge4" class="edge"><title>toMKLDNN&#45;&gt;mkldnnop2</title>
+<path fill="none" stroke="black" stroke-width="2" d="M753.418,-260.06C857.998,-279.788 1033.58,-312.91 1131.73,-331.426"/>
+<polygon fill="black" stroke="black" stroke-width="2" points="1131.38,-334.921 1141.85,-333.336 1132.67,-328.043 1131.38,-334.921"/>
+<text text-anchor="middle" x="919.288" y="-317.8" font-family="Times,serif" font-size="14.00">MKL&#45;DNN Tensor </text>
+</g>
+<!-- innerMKLDNN -->
+<g id="node4" class="node"><title>innerMKLDNN</title>
+<polygon fill="none" stroke="black" points="1033.79,-208 804.788,-208 804.788,-172 1033.79,-172 1033.79,-208"/>
+<text text-anchor="start" x="812.788" y="-193.8" font-family="Times,serif" font-size="14.00">innerTransDataLayoutFromMKLDNN</text>
+<text text-anchor="start" x="850.288" y="-179.8" font-family="Times,serif" font-size="14.00"> {</text>
+<text text-anchor="start" x="861.288" y="-179.8" font-family="Times,serif" font-weight="bold" font-size="14.00">MKL&#45;DNN Reorder</text>
+<text text-anchor="start" x="981.288" y="-179.8" font-family="Times,serif" font-size="14.00">}</text>
+</g>
+<!-- fromMKLDNN&#45;&gt;innerMKLDNN -->
+<g id="edge9" class="edge"><title>fromMKLDNN&#45;&gt;innerMKLDNN</title>
+<path fill="none" stroke="black" d="M767.813,-188C776.618,-188 785.599,-188 794.571,-188"/>
+<path fill="none" stroke="none" d="M767.813,-190C776.618,-190 785.599,-190 794.571,-190"/>
+<path fill="none" stroke="black" d="M767.813,-192C776.618,-192 785.599,-192 794.571,-192"/>
+<polygon fill="black" stroke="black" points="794.688,-193.5 804.688,-190 794.688,-186.5 794.688,-193.5"/>
+</g>
+<!-- cpuop2 -->
+<g id="node7" class="node"><title>cpuop2</title>
+<ellipse fill="none" stroke="black" cx="1204.68" cy="-190" rx="68.7937" ry="68.7937"/>
+<ellipse fill="none" stroke="black" cx="1204.68" cy="-190" rx="72.7879" ry="72.7879"/>
+<text text-anchor="middle" x="1204.68" y="-186.3" font-family="Times,serif" font-size="14.00">Paddle(CPU) op</text>
+</g>
+<!-- innerMKLDNN&#45;&gt;cpuop2 -->
+<g id="edge5" class="edge"><title>innerMKLDNN&#45;&gt;cpuop2</title>
+<path fill="none" stroke="black" d="M1033.92,-188C1063.14,-188 1094.02,-188 1121.32,-188"/>
+<path fill="none" stroke="none" d="M1033.92,-190C1063.14,-190 1094.02,-190 1121.32,-190"/>
+<path fill="none" stroke="black" d="M1033.92,-192C1063.14,-192 1094.02,-192 1121.32,-192"/>
+<polygon fill="black" stroke="black" points="1121.59,-193.5 1131.59,-190 1121.59,-186.5 1121.59,-193.5"/>
+<text text-anchor="middle" x="1072.79" y="-193.8" font-family="Times,serif" font-size="14.00">Tensor </text>
+</g>
+<!-- fetchop -->
+<g id="node8" class="node"><title>fetchop</title>
+<ellipse fill="none" stroke="black" cx="1204.68" cy="-58" rx="41.6928" ry="41.6928"/>
+<text text-anchor="middle" x="1204.68" y="-54.3" font-family="Times,serif" font-size="14.00">Fetch op</text>
+</g>
+<!-- innerMKLDNN&#45;&gt;fetchop -->
+<g id="edge6" class="edge"><title>innerMKLDNN&#45;&gt;fetchop</title>
+<path fill="none" stroke="black" d="M959.098,-171.923C1010.83,-147.827 1102.26,-105.242 1157.48,-79.5189"/>
+<polygon fill="black" stroke="black" points="1159.1,-82.6258 1166.69,-75.2308 1156.15,-76.2804 1159.1,-82.6258"/>
+<text text-anchor="middle" x="1072.79" y="-130.8" font-family="Times,serif" font-size="14.00">Tensor </text>
+</g>
+<!-- cpuop -->
+<g id="node5" class="node"><title>cpuop</title>
+<ellipse fill="none" stroke="black" stroke-width="2" cx="92.8939" cy="-299" rx="68.7879" ry="68.7879"/>
+<text text-anchor="middle" x="92.8939" y="-295.3" font-family="Times,serif" font-size="14.00">Paddle(CPU) op</text>
+</g>
+<!-- cpuop&#45;&gt;TransData -->
+<g id="edge1" class="edge"><title>cpuop&#45;&gt;TransData</title>
+<path fill="none" stroke="black" stroke-width="2" d="M158.811,-277.894C205.902,-262.584 269.086,-242.041 314.756,-227.193"/>
+<polygon fill="black" stroke="black" stroke-width="2" points="316.085,-230.441 324.513,-224.02 313.921,-223.784 316.085,-230.441"/>
+<text text-anchor="middle" x="251.788" y="-267.8" font-family="Times,serif" font-size="14.00">Tensor </text>
+</g>
+<!-- mkldnnop -->
+<g id="node6" class="node"><title>mkldnnop</title>
+<ellipse fill="none" stroke="black" cx="92.8939" cy="-144" rx="63.8777" ry="63.8777"/>
+<ellipse fill="none" stroke="black" cx="92.8939" cy="-144" rx="67.8893" ry="67.8893"/>
+<text text-anchor="middle" x="92.8939" y="-140.3" font-family="Times,serif" font-size="14.00">MKL&#45;DNN op</text>
+</g>
+<!-- mkldnnop&#45;&gt;TransData -->
+<g id="edge2" class="edge"><title>mkldnnop&#45;&gt;TransData</title>
+<path fill="none" stroke="black" d="M159.609,-156.975C206.615,-167.723 269.517,-182.106 315.062,-192.52"/>
+<path fill="none" stroke="none" d="M159.163,-158.924C206.169,-169.673 269.071,-184.056 314.616,-194.47"/>
+<path fill="none" stroke="black" d="M158.718,-160.874C205.723,-171.622 268.625,-186.005 314.17,-196.42"/>
+<polygon fill="black" stroke="black" points="314.109,-197.944 324.638,-196.761 315.67,-191.12 314.109,-197.944"/>
+<text text-anchor="middle" x="251.788" y="-195.8" font-family="Times,serif" font-size="14.00">MKL&#45;DNN Tensor </text>
+</g>
+<!-- mkldnnop&#45;&gt;innerMKLDNN -->
+<g id="edge3" class="edge"><title>mkldnnop&#45;&gt;innerMKLDNN</title>
+<path fill="none" stroke="black" d="M160.586,-142.592C281.825,-140.763 546.292,-140.138 767.788,-163 784.784,-164.754 802.748,-167.333 820.118,-170.211"/>
+<polygon fill="black" stroke="black" points="819.82,-173.711 830.266,-171.937 820.994,-166.81 819.82,-173.711"/>
+<text text-anchor="middle" x="495.788" y="-150.8" font-family="Times,serif" font-size="14.00">MKL&#45;DNN Tensor</text>
+</g>
+</g>
+</svg>
--- a/doc/fluid/design/mkldnn/data_transformation/index_en.rst
+++ b/doc/fluid/design/mkldnn/data_transformation/index_en.rst
+MKL-DNN Data Transformation
+--------------------------------------
+
+.. toctree::
+  :maxdepth: 1
+
+  data_transform.md
--- a/doc/fluid/design/mkldnn/data_transformation/scripts/data_transform.dot
+++ b/doc/fluid/design/mkldnn/data_transformation/scripts/data_transform.dot
+
+digraph Q {
+
+  rankdir=LR
+  node[shape=box]
+
+  TransData[label="TransformData"] 
+  toMKLDNN[label="Label Tensor as MKL-DNN"]
+  fromMKLDNN[label="TransDataLayoutFromMKLDNN"]
+  innerMKLDNN[label=<innerTransDataLayoutFromMKLDNN<br/> {<b>MKL-DNN Reorder</b>}>]
+
+
+ node[shape=circle]
+
+ subgraph cluster_in {
+ label=<Possible <b>i</b>  order operator types>
+ style=dotted
+ cpuop[label="Paddle(CPU) op",style=bold]
+ mkldnnop[label="MKL-DNN op",shape=doublecircle]
+ }
+
+ subgraph cluster_out {
+ label=<Possible <b>i+1</b>  order operator types>
+ style=dotted
+ cpuop2[label="Paddle(CPU) op",shape=doublecircle]
+ fetchop[label="Fetch op"]
+ mkldnnop2[label="MKL-DNN op", style=bold]
+ }
+
+
+   cpuop -> TransData[label="Tensor ", style=bold] 
+   mkldnnop -> TransData[label="MKL-DNN Tensor ", color="black:invis:black"]
+   mkldnnop -> innerMKLDNN[label="MKL-DNN Tensor"]
+   toMKLDNN -> mkldnnop2[style=bold, label="MKL-DNN Tensor "] 
+   innerMKLDNN -> cpuop2[label="Tensor ", color="black:invis:black"]
+   innerMKLDNN -> fetchop[label="Tensor "] 
+ subgraph cluster_Trans {
+ label="Transformation of Tensors"
+ style=dotted
+   TransData -> toMKLDNN[style=bold]
+   TransData -> fromMKLDNN[color="black:invis:black"]
+   fromMKLDNN -> innerMKLDNN[color="black:invis:black"]
+    
+ }
+}
+
--- a/doc/fluid/flags/cudnn_cn.rst
+++ b/doc/fluid/flags/cudnn_cn.rst
@@ -11,7 +11,7 @@ FLAGS_conv_workspace_size_limit

 取值范围
 ---------------
-Uint64型，缺省值为4096。即4G内存工作区。
+Uint64型，缺省值为512。即512MB显存工作区。

 示例
 -------

--- a/doc/fluid/flags/cudnn_en.rst
+++ b/doc/fluid/flags/cudnn_en.rst
@@ -11,7 +11,7 @@ The workspace limit size in MB unit for choosing cuDNN convolution algorithms. T

 Values accepted
 ---------------
-Uint64. The default value is 4096. That is to say, 4G memory workspace.
+Uint64. The default value is 512. That is to say, 512MB memory workspace.

 Example
 -------

--- a/doc/fluid/flags/memory_cn.rst
+++ b/doc/fluid/flags/memory_cn.rst
@@ -7,17 +7,17 @@ FLAGS_allocator_strategy
 ********************
 (始于1.2)

-用于选择PaddlePaddle的分配器策略。 分配器策略正在开发中，且非legacy分配器尚未稳定。
+用于选择PaddlePaddle的分配器策略。其中auto_growth策略尚未稳定。

 取值范围
 ---------------
-String型，['legacy', 'naive_best_fit']中的一个。缺省值为'legacy'。
+String型，['naive_best_fit', 'auto_growth']中的一个。缺省值为'naive_best_fit'。

 示例
 --------
-FLAGS_allocator_strategy=legacy - 使用legacy分配器。
+FLAGS_allocator_strategy=naive_best_fit - 使用预分配best fit分配器。

-FLAGS_allocator_strategy=naive_best_fit - 使用新设计的分配器。
+FLAGS_allocator_strategy=auto_growth - 使用auto growth分配器。


 FLAGS_eager_delete_scope
@@ -39,15 +39,15 @@ FLAGS_eager_delete_tensor_gb
 *******************************************
 (始于1.0.0)

-表示是否使用垃圾回收策略来优化网络的内存使用。如果FLAGS_eager_delete_tensor_gb >= 0，则启用垃圾回收策略，并在运行网络时回收内存垃圾，这有利于节省内存使用量。它仅在您使用Executor运行程序、编译程序或使用并行数据编译程序时才有用。如果FLAGS_eager_delete_tensor_gb < 0，则禁用垃圾回收策略。垃圾回收器直到垃圾的内存大小达到FLAGS_eager_delete_tensor_gb GB时才会释放内存垃圾。
+表示是否使用垃圾回收策略来优化网络的内存使用。如果FLAGS_eager_delete_tensor_gb < 0，则禁用垃圾回收策略。如果FLAGS_eager_delete_tensor_gb >= 0，则启用垃圾回收策略，并在运行网络时回收内存垃圾，这有利于节省内存使用量。它仅在您使用Executor运行程序、编译程序或使用并行数据编译程序时才有用。垃圾回收器直到垃圾的内存大小达到FLAGS_eager_delete_tensor_gb GB时才会释放内存垃圾。

 取值范围
 ---------------
-Double型，单位为GB，缺省值为-1.0。
+Double型，单位为GB，缺省值为0.0。

 示例
 -------
-FLAGS_eager_delete_tensor_gb=0.0 - 一旦不再使用即释放内存垃圾。
+FLAGS_eager_delete_tensor_gb=0.0 - 垃圾占用大小达到0.0GB时释放内存垃圾，即一旦出现垃圾则马上释放。

 FLAGS_eager_delete_tensor_gb=1.0 - 垃圾占用内存大小达到1.0GB时释放内存垃圾。

@@ -58,72 +58,70 @@ FLAGS_eager_delete_tensor_gb=-1.0 - 禁用垃圾回收策略。
 建议用户在训练大型网络时设置FLAGS_eager_delete_tensor_gb=0.0以启用垃圾回收策略。


-FLAGS_enable_inplace_whitelist
+FLAGS_fast_eager_deletion_mode
 *******************************************
-(始于1.4)
+(始于1.3)

-该flag用于调试，在某些ops中禁止内存原位复用。设置后，一些ops不会执行原位复用优化以节省内存。这些Ops包括：sigmoid, exp, relu, tanh, sqrt, ceil, floor, reciprocal, relu6, soft_relu, hard_sigmoid, batch_norm, batch_norm_grad, sum, sum_grad, scale, reshape, elementwise_add, and elementwise_add_grad。
+是否使用快速垃圾回收策略。如果未设置，则在CUDA内核结束时释放gpu内存。否则gpu内存将在CUDA内核尚未结束的情况下被释放，从而使垃圾回收策略更快。仅在启用垃圾回收策略时有效。

 取值范围
 ---------------
-Bool型，缺省值为False。
+Bool型，缺省值为True。

 示例
 -------
-FLAGS_enable_inplace_whitelist=True - 在特定op上禁止内存原位复用优化。
+FLAGS_fast_eager_deletion_mode=True - 启用快速垃圾回收策略。

+FLAGS_fast_eager_deletion_mode=False - 禁用快速垃圾回收策略。

-FLAGS_fast_eager_deletion_mode
+
+FLAGS_fraction_of_cpu_memory_to_use
 *******************************************
-(始于1.3)
+(始于1.2.0)

-是否使用快速垃圾回收策略。如果未设置，则在CUDA内核结束时释放gpu内存。否则gpu内存将在CUDA内核尚未结束的情况下被释放，从而使垃圾回收策略更快。仅在启用垃圾回收策略时有效。
+表示分配的内存块占CPU总内存大小的比例。将来的内存使用将从该内存块分配。 如果内存块没有足够的cpu内存，将从cpu请求分配与内存块相同大小的新的内存块，直到cpu没有足够的内存为止。

 取值范围
 ---------------
-Bool型，缺省值为True。
+Double型，范围[0, 1]，表示初始分配的内存块占CPU内存的比例。缺省值为1.0。

 示例
 -------
-FLAGS_fast_eager_deletion_mode=True - 启用快速垃圾回收策略。
-
-FLAGS_fast_eager_deletion_mode=False - 禁用快速垃圾回收策略。
+FLAGS_fraction_of_cpu_memory_to_use=0.1 - 分配总CPU内存大小的10%作为初始CPU内存块。


-FLAGS_fraction_of_gpu_memory_to_use
+FLAGS_fraction_of_cuda_pinned_memory_to_use
 *******************************************
 (始于1.2.0)

-表示分配的内存块占GPU总内存大小的比例。将来的内存使用将从该内存块分配。 如果内存块没有足够的gpu内存，将从gpu请求分配与内存块同样大小的新的内存块，直到gpu没有足够的内存为止。
+表示分配的CUDA Pinned内存块占CPU总内存大小的比例。将来的CUDA Pinned内存使用将从该内存块分配。 如果内存块没有足够的cpu内存，将从cpu请求分配与内存块相同大小的新的内存块，直到cpu没有足够的内存为止。

 取值范围
 ---------------
-Uint64型，大于0，表示初始分配的内存块占GPU内存的比例。
+Double型，范围[0, 1]，表示初始分配的内存块占CPU内存的比例。缺省值为0.5。

 示例
 -------
-FLAGS_fraction_of_gpu_memory_to_use=0.1 - 分配总GPU内存大小的10%作为初始GPU 内存块。
-
-注意
-------
-Windows系列平台会将FLAGS_fraction_of_gpu_memory_to_use默认设为0.5，Linux则会默认设为0.92。
+FLAGS_fraction_of_cuda_pinned_memory_to_use=0.1 - 分配总CPU内存大小的10%作为初始CUDA Pinned内存块。


-FLAGS_free_idle_memory
+FLAGS_fraction_of_gpu_memory_to_use
 *******************************************
-(始于0.15.0)
+(始于1.2.0)

-是否在运行时释放从系统预分配的空闲内存。设置后，如果预分配的分配器中有太多空闲内存，则释放空闲内存。
+表示分配的显存块占GPU总可用显存大小的比例。将来的显存使用将从该显存块分配。 如果显存块没有足够的gpu显存，将从gpu请求分配与显存块同样大小的新的显存块，直到gpu没有足够的显存为止。

 取值范围
 ---------------
-Bool型，缺省值为False。
+Double型，范围[0, 1]，表示初始分配的显存块占GPU可用显存的比例。

 示例
 -------
-FLAGS_free_idle_memory=True - 空闲内存太多时释放。
+FLAGS_fraction_of_gpu_memory_to_use=0.1 - 分配GPU总可用显存大小的10%作为初始GPU显存块。

-FLAGS_free_idle_memory=False - 不释放空闲内存。
+注意
+-------
+Windows系列平台会将FLAGS_fraction_of_gpu_memory_to_use默认设为0.5，Linux则会默认设为0.92。


 FLAGS_fuse_parameter_groups_size
@@ -207,21 +205,6 @@ FLAGS_initial_gpu_memory_in_mb=4096 - 分配4GB作为初始GPU内存块大小。
 如果设置该flag，则FLAGS_fraction_of_gpu_memory_to_use设置的内存大小将被该flag覆盖。如果未设置该flag，PaddlePaddle将使用FLAGS_fraction_of_gpu_memory_to_use分配GPU内存。


-FLAGS_limit_of_tmp_allocation
-*******************************************
-(始于1.3)
-
-FLAGS_limit_of_tmp_allocation表示temporary_allocation大小的上限，单位为字节。如果FLAGS_limit_of_tmp_allocation为-1，temporary_allocation的大小将没有限制。
-
-取值范围
---------------
-Int64型，缺省值为-1。
-
-示例
-------
-FLAGS_limit_of_tmp_allocation=1024 - 将temporary_allocation大小的上限设为1024字节。
-
-
 FLAGS_memory_fraction_of_eager_deletion
 *******************************************
 (始于1.4)
@@ -261,21 +244,6 @@ FLAGS_reallocate_gpu_memory_in_mb=1024 - 如果耗尽了分配的GPU内存块，
 如果设置了该flag，PaddlePaddle将重新分配该flag指定大小的gpu内存。否则分配FLAGS_fraction_of_gpu_memory_to_use指定比例的gpu内存。


-FLAGS_times_excess_than_required_tmp_allocation
-*******************************************
-(始于1.3)
-
-FLAGS_times_excess_than_required_tmp_allocation表示TemporaryAllocator可以返回的最大大小。例如，如果所需的内存大小为N，且times_excess_than_required_tmp_allocation为2.0，则TemporaryAllocator将返回大小范围为N~2*N的可用分配。
-
-取值范围
---------------
-Int64型，缺省值为2。
-
-示例
-------
-FLAGS_times_excess_than_required_tmp_allocation=1024 - 设置TemporaryAllocator可以返回的最大大小为1024*N。
-
-
 FLAGS_use_pinned_memory
 *******************************************
 (始于0.12.0)

--- a/doc/fluid/flags/memory_en.rst
+++ b/doc/fluid/flags/memory_en.rst
@@ -7,17 +7,17 @@ FLAGS_allocator_strategy
 **************************************
 (since 1.2)

-Use to choose allocator strategy of PaddlePaddle. The allocator strategy is under development, and the non-legacy allocator is not stable yet.
+Use to choose allocator strategy of PaddlePaddle. Auto growth allocator is not stable yet.

 Values accepted
 ---------------
-String, enum in ['legacy', 'naive_best_fit']. The default value is 'legacy'.
+String, enum in ['naive_best_fit', 'auto_growth']. The default value is 'naive_best_fit'.

 Example
 --------
-FLAGS_allocator_strategy=legacy would use the legacy allocator.
+FLAGS_allocator_strategy=naive_best_fit would use the pre-allocated best fit allocator.

-FLAGS_allocator_strategy=naive_best_fit would use the new-designed allocator.
+FLAGS_allocator_strategy=auto_growth would use the auto growth allocator.



@@ -40,15 +40,15 @@ FLAGS_eager_delete_tensor_gb
 *******************************************
 (since 1.0.0)

-Whether to use garbage collection strategy to optimize the memory usage of network. If FLAGS_eager_delete_tensor_gb >= 0, garbage collection strategy would be enabled, and collect memory garbages when running network, which is beneficial to saving memory usage. It is only useful when you use Executor to run program, or compile program, or compile program with data parallel. If FLAGS_eager_delete_tensor_gb < 0, garbage collection strategy is disabled. Garbage collector would not release memory garbages until the memory size of garbages reaches FLAGS_eager_delete_tensor_gb GB.
+Whether to use garbage collection strategy to optimize the memory usage of network. If FLAGS_eager_delete_tensor_gb < 0, garbage collection strategy is disabled. If FLAGS_eager_delete_tensor_gb >= 0, garbage collection strategy would be enabled, and collect memory garbages when running network, which is beneficial to saving memory usage. It is only useful when you use Executor to run program, or compile program, or compile program with data parallel. Garbage collector would not release memory garbages until the memory size of garbages reaches FLAGS_eager_delete_tensor_gb GB.

 Values accepted
 ---------------
-Double, in GB unit. The default value is -1.0.
+Double, in GB unit. The default value is 0.0.

 Example
 -------
-FLAGS_eager_delete_tensor_gb=0.0 would make memory garbage release immediately once it is not used. 
+FLAGS_eager_delete_tensor_gb=0.0 would make memory garbage release till the memory size of garbages reaches 0.0GB, i.e., release immediately once there is any garbage.

 FLAGS_eager_delete_tensor_gb=1.0 would make memory garbage release till the memory size of garbages reaches 1.0GB. 

@@ -59,75 +59,70 @@ Note
 It is recommended that users enable garbage collection strategy by setting FLAGS_eager_delete_tensor_gb=0.0 when training large network.


-
-FLAGS_enable_inplace_whitelist
+FLAGS_fast_eager_deletion_mode
 *******************************************
-(since 1.4)
+(since 1.3)

-Debug use to disable memory in-place in some ops. If set, some ops would not perform in-place optimization to save memory. These ops include: sigmoid, exp, relu, tanh, sqrt, ceil, floor, reciprocal, relu6, soft_relu, hard_sigmoid, batch_norm, batch_norm_grad, sum, sum_grad, scale, reshape, elementwise_add, and elementwise_add_grad.
+Whether to use fast garbage collection strategy. If not set, gpu memory would be released when CUDA kernel ends. Otherwise, gpu memory would be released without waiting CUDA kernel ends, making garbage collection strategy faster. Only valid when garbage collection strategy is enabled.

 Values accepted
 ---------------
-Bool. The default value is False.
+Bool. The default value is True.

 Example
 -------
-FLAGS_enable_inplace_whitelist=True would disable memory in-place optimization on certain ops.
-
+FLAGS_fast_eager_deletion_mode=True would turn on fast garbage collection strategy. 

+FLAGS_fast_eager_deletion_mode=False would turn off fast garbage collection strategy.

-FLAGS_fast_eager_deletion_mode
+FLAGS_fraction_of_cpu_memory_to_use
 *******************************************
-(since 1.3)
+(since 1.2.0)

-Whether to use fast garbage collection strategy. If not set, gpu memory would be released when CUDA kernel ends. Otherwise, gpu memory would be released without waiting CUDA kernel ends, making garbage collection strategy faster. Only valid when garbage collection strategy is enabled.
+Allocate a chunk of cpu memory that is this fraction of the total cpu memory size. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough cpu memory, additional chunks of the same size will be requested from cpu until the cpu has no memory left for another chunk.

 Values accepted
 ---------------
-Bool. The default value is True.
+Double value in range [0, 1] which is the initial CPU memory percentage. The default value is 1.0.

 Example
 -------
-FLAGS_fast_eager_deletion_mode=True would turn on fast garbage collection strategy. 
+FLAGS_fraction_of_cpu_memory_to_use=0.1 will allocate 10% total cpu memory size as initial CPU chunk.

-FLAGS_fast_eager_deletion_mode=False would turn off fast garbage collection strategy.

-
-FLAGS_fraction_of_gpu_memory_to_use
+FLAGS_fraction_of_cuda_pinned_memory_to_use
 *******************************************
 (since 1.2.0)

-Allocate a chunk of gpu memory that is this fraction of the total gpu memory size. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough gpu memory, additional chunks of the same size will be requested from gpu until the gpu has no memory left for another chunk.
+Allocate a chunk of CUDA pinned memory that is this fraction of the total cpu memory size. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough cpu memory, additional chunks of the same size will be requested from cpu until the cpu has no memory left for another chunk.

 Values accepted
 ---------------
-Uint64 value greater than 0 which is the initial GPU memory percentage.
+Double value in range [0, 1] which is the initial CUDA pinned memory percentage. The default value is 0.5.

 Example
 -------
-FLAGS_fraction_of_gpu_memory_to_use=0.1 will allocate 10% total gpu memory size as initial GPU chunk.
+FLAGS_fraction_of_cuda_pinned_memory_to_use=0.1 will allocate 10% total cpu memory size as initial CUDA Pinned chunk.

-Note
-------
-Windows series platform will set FLAGS_fraction_of_gpu_memory_to_use to 0.5 by default.
-Linux will set FLAGS_fraction_of_gpu_memory_to_use to 0.92 by default.

-
-FLAGS_free_idle_memory
+FLAGS_fraction_of_gpu_memory_to_use
 *******************************************
-(since 0.15.0)
+(since 1.2.0)

-Whether to free idle memory pre-allocated from system during runtime. If set, free idle memory would be released if there is too much free idle memory in the pre-allocated allocator.
+Allocate a chunk of gpu memory that is this fraction of the available gpu memory size. Future memory usage will be allocated from the chunk. If the chunk doesn't have enough gpu memory, additional chunks of the same size will be requested from gpu until the gpu has no memory left for another chunk.

 Values accepted
 ---------------
-Bool. The default value is False.
+Double value in range [0, 1] which is the initial GPU memory percentage.

 Example
 -------
-FLAGS_free_idle_memory=True will free idle memory when there is too much of it. 
+FLAGS_fraction_of_gpu_memory_to_use=0.1 will allocate 10% available gpu memory size as initial GPU chunk.

-FLAGS_free_idle_memory=False will not free idle memory.
+Note
+-------
+Windows series platform will set FLAGS_fraction_of_gpu_memory_to_use to 0.5 by default.
+Linux will set FLAGS_fraction_of_gpu_memory_to_use to 0.92 by default.


 FLAGS_fuse_parameter_groups_size
@@ -213,20 +208,6 @@ If you set this flag, the memory size set by FLAGS_fraction_of_gpu_memory_to_use
 If you don't set this flag, PaddlePaddle will use FLAGS_fraction_of_gpu_memory_to_use to allocate gpu memory.


-FLAGS_limit_of_tmp_allocation
-*******************************************
-(since 1.3)
-
-The FLAGS_limit_of_tmp_allocation indicates the up limit of temporary_allocation size, the unit is byte. If the FLAGS_limit_of_tmp_allocation is -1, the size of temporary_allocation will not be limited.
-
-Values accepted
---------------
-Int64. The default value is -1.
-
-Example
-------
-FLAGS_limit_of_tmp_allocation=1024 will set the up limit of temporary_allocation size to 1024 bytes.
-

 FLAGS_memory_fraction_of_eager_deletion
 *******************************************
@@ -268,21 +249,6 @@ If this flag is set, PaddlePaddle will reallocate the gpu memory with size speci
 Else PaddlePaddle will reallocate with size set by FLAGS_fraction_of_gpu_memory_to_use.


-FLAGS_times_excess_than_required_tmp_allocation
-*******************************************
-(since 1.3)
-
-The FLAGS_times_excess_than_required_tmp_allocation indicates the max size the TemporaryAllocator can return. For Example
-, if the required memory size is N, and FLAGS_times_excess_than_required_tmp_allocation is 2.0, the TemporaryAllocator will return the available allocation that the range of size is N ~ 2*N.
-
-Values accepted
---------------
-Int64. The default value is 2.
-
-Example
-------
-FLAGS_times_excess_than_required_tmp_allocation=1024 will set the max size of the TemporaryAllocator can return to 1024*N.
-

 FLAGS_use_pinned_memory
 *******************************************

--- a/Anakin @ beec126e
+++ b/Anakin @ beec126e
-Subproject commit beec126e4cfe762e4b6b542496069323dca35ee7
--- a/book @ cdb5d193
+++ b/book @ cdb5d193
-Subproject commit 4d3d1663c2dd28241ab7ee32396fb0d92793f9fc
+Subproject commit cdb5d19301ad7757272342933430dfbaaf6acec7
--- a/models @ bc0200b9
+++ b/models @ bc0200b9
-Subproject commit bc0200b971b0e951b4a3f13822a1e1db33388b29
--- a/paddle-mobile @ 2c088e20
+++ b/paddle-mobile @ 2c088e20
-Subproject commit 2c088e20d8083accacaf2057bc35531ac7fba7ce
--- a/scripts/checkapproval.sh
+++ b/scripts/checkapproval.sh
@@ -6,12 +6,12 @@ for API_FILE in ${API_FILES[*]}; do
  if [ "${API_CHANGE}" ];then
    approval_line=`curl -H "Authorization: token ${GITHUB_API_TOKEN}" https://api.github.com/repos/PaddlePaddle/FluidDoc/pulls/${GIT_PR_ID}/reviews?per_page=10000`
    if [ "${API_FILE}" == "doc/fluid" ];then
-      APPROVALS=`echo ${approval_line}|python ./scripts/check_pr_approval.py 2 7534971 14105589 12605721 3064195 328693 47554610 39645414 11195205 20274488 45024560 ` 
+      APPROVALS=`echo ${approval_line}|python ./scripts/check_pr_approval.py 1 7534971 14105589 12605721 3064195 328693 47554610 39645414 11195205 20274488 45024560 ` 
    fi
  fi
  if [ "${APPROVALS}" == "FALSE" ]; then
    if [ "${API_FILE}" == "doc/fluid" ];then
-      echo "You must have two RD (wanghaoshuang or guoshengCS or heavengate or kuke or Superjomn or lanxianghit or cyj1986 or hutuxian or frankwhzhang or nepeplwu) approval for the api change! ${API_FILE} for the management reason of API interface and API document."
+      echo "You must have one RD (wanghaoshuang or guoshengCS or heavengate or kuke or Superjomn or lanxianghit or cyj1986 or hutuxian or frankwhzhang or nepeplwu) approval for the api change! ${API_FILE} for the management reason of API interface and API document."
    fi
    exit 1
  fi