Merge branch 'develop' of https://github.com/PaddlePaddle/FluidDoc into refine_dataloader_doc

0fab5dcc · dengkaipeng · c66f5175 · d29ece4c · 0fab5dcc · 0fab5dcc
196 changed file
--- a/doc/fluid/advanced_guide/addon_development/new_op/op_notes.md
+++ b/doc/fluid/advanced_guide/addon_development/new_op/op_notes.md
@@ -181,7 +181,7 @@ REGISTER_OPERATOR(

 - Fluid提供的`DefaultGradOpMaker`，默认会将前向op的所有输入(`Input`）、输出(`Output`)以及输出变量所对应的梯度(`Output@Grad`)作为反向Op的输入，将前向Op输入所对应的梯度(`Input@Grad`)作为反向Op的输出。所以在使用`DefaultGradOpMaker`时需要考虑是否有些变量在计算中不被用到。
 - 如果`DefaultGradOpMaker`不能够满足需求，需要用户自己手动构建`GradOpMaker`，具体实现请参考[相关文档](new_op.html#gradopmaker);
- 如果有些反向Op需要依赖前向Op的输入或输出变量的的Shape或LoD，但不依赖于变量中Tensor的Buffer，且不能根据其他变量推断出该Shape和LoD，需要对该变量（以下称该变量为`X`）在反向Op中进行注册`NoNeedBufferVarsInference`。**一旦注册了`NoNeedBufferVarsIference`，反向op中就不能读写该变量对应的Tensor中的buffer，只能调用Tensor的dims()和lod()方法，同时，反向Op中的`GetExpectedKernelType()`必须要重写，并且`GetExpectedKernelType()`中不能访问`X`变量中Tensor的type()方法**。比如在`SliceOpGrad`中只会用到`Input`中变量的Shape信息，所以需要为对`Input`在`SliceOpGrad`上进行注册：
+- 如果有些反向Op需要依赖前向Op的输入或输出变量的的Shape或LoD，但不依赖于变量中Tensor的Buffer，且不能根据其他变量推断出该Shape和LoD，则可以通过`DECLARE_NO_NEED_BUFFER_VARS_INFERER`接口对该变量（以下称该变量为`X`）在反向Op中进行注册`NoNeedBufferVars`。**一旦注册了`NoNeedBufferVars`，反向op中就不能读写该变量对应的Tensor中的buffer，只能调用Tensor的dims()和lod()方法，同时，反向Op中的`GetExpectedKernelType()`必须要重写，并且`GetExpectedKernelType()`中不能访问`X`变量中Tensor的type()方法**。比如在`SliceOpGrad`中只会用到`Input`中变量的Shape信息，所以需要为对`Input`在`SliceOpGrad`上进行注册：
 ```
 namespace paddle {
 namespace operators {
@@ -230,8 +230,8 @@ class SliceOpGradMaker : public framework::SingleGradOpMaker<T> {
  }
 };

-DECLARE_NO_NEED_BUFFER_VARS_INFERENCE(SliceOpGradNoNeedBufferVarsInference,
-                                      "Input");
+DECLARE_NO_NEED_BUFFER_VARS_INFERER(SliceOpGradNoNeedBufferVarsInference,
+                                    "Input");
 }  // namespace operators
 }  // namespace paddle
 namespace ops = paddle::operators;

--- a/doc/fluid/advanced_guide/distributed_training/cluster_quick_start.rst
+++ b/doc/fluid/advanced_guide/distributed_training/cluster_quick_start.rst
@@ -14,7 +14,7 @@


 * 
-  [x] 成功安装Paddle Fluid，如果尚未安装，请参考 `快速开始 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/quick_start_cn.html>`_
+  [x] 成功安装Paddle Fluid，如果尚未安装，请参考 `快速开始 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.7/beginners_guide/quick_start_cn.html>`_

 * 
  [x] 学会最基本的单机训练方法，请参考 `单机训练 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/user_guides/howto/training/single_node.html>`_ 中描述的单卡训练，进行学习
@@ -113,7 +113,7 @@
       main_function(args.is_local)


-* 说明：示例中使用的IO方法是dataset，想了解具体的文档和用法请参考 `Dataset API <hhttps://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/api_cn/dataset_cn.html>`_ 。示例中使用的 ``train_from_dataset`` 接口，想了解具体的文档和使用方法请参考 `Executor API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/api_cn/executor_cn.html>`_ 。示例中的 ``from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet`` 表示引入参数服务器架构进行分布式训练，如果想更进一步了解Fleet API的更多选项和示例，请参考 `Fleet API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/user_guides/howto/training/fleet_api_howto_cn.html>`_
+* 说明：示例中使用的IO方法是dataset，想了解具体的文档和用法请参考 `Dataset API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.7/api_cn/dataset_cn.html>`_ 。示例中使用的 ``train_from_dataset`` 接口，想了解具体的文档和使用方法请参考 `Executor API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.7/api_cn/executor_cn.html>`_ 。示例中的 ``from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet`` 表示引入参数服务器架构进行分布式训练，如果想更进一步了解Fleet API的更多选项和示例，请参考 `Fleet API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.6/user_guides/howto/training/fleet_api_howto_cn.html>`_


 单机训练启动命令

--- a/doc/fluid/advanced_guide/distributed_training/cluster_quick_start_en.rst
+++ b/doc/fluid/advanced_guide/distributed_training/cluster_quick_start_en.rst
-..  _cluster_quick_start_en:
+Quick start for distributed training
+====================================

-Quick Start with Distributed Training
-==========================
+Distributed training with Fleet API
+-----------------------------------

-Preparation
--------------------
-In this article, we'll show you how to quickly start a PaddlePaddle distributed training task in a cluster. Before you start, do some preparatory work as follows:
-
-1. Prepare a connected training cluster. Here we use 4 training nodes with format ``*.paddlepaddle.com`` to represent the host name of the node. You can modify it according to the actual situation.
-
-2. Make sure you have read :ref:`install_steps` before you start and can run PaddlePaddle on all nodes of the cluster.
+Since Paddle Fluid `Release
+1.5.1 <https://github.com/PaddlePaddle/Paddle/releases/tag/v1.5.1>`__,
+it is officially recommended to use the Fleet API for distributed
+training. For the introduction of the Fleet API, please refer to `Fleet
+Design Doc <https://github.com/PaddlePaddle/Fleet>`__.

-Example code
-------------
-
-Let's use a very simple linear regression model as an example to explain how to start a distributed training task with 2 pserver server nodes and 2 trainer nodes. You can save this code as ``dist_train.py`` .
+Preparation
+~~~~~~~~~~~
+
+-  [x] Install Paddle Fluid. If not already installed, please refer to
+   `Beginner’s
+   Guide <https://www.paddlepaddle.org.cn/documentation/docs/en/1.7/beginners_guide/index_en.html>`__.
+-  [x] Master the most basic single node training method. Please refer
+   to the single card training described in `Single-node
+   training <https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/user_guides/howto/training/single_node_en.html>`__.
+
+Click-through rate prediction
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Here, we will use a simple example, click-through rate prediction task,
+to illustrate how to configure Fleet API for distributed training, and
+gives an example by using a single node environment to simulate the
+distributed environment. The source code of the example comes from `CTR
+with
+Fleet <https://github.com/PaddlePaddle/Fleet/tree/develop/examples/ctr>`__.
+
+In order to facilitate learning, the example given here is a mixed code
+of single node and multi node. You can start single node or multi node
+tasks through different startup commands. For the part of obtaining data
+and the logic of data preprocessing, please refer to the source code and
+description of `CTR with
+Fleet <https://github.com/PaddlePaddle/Fleet/tree/develop/examples/ctr>`__.

 .. code:: python

-
+    from __future__ import print_function
+    from args import parse_args
    import os
-    import paddle
    import paddle.fluid as fluid
-
-    # train reader
-    BATCH_SIZE = 20
-    EPOCH_NUM = 30
-    BATCH_SIZE = 8
-
-    train_reader = paddle.batch(
-        paddle.reader.shuffle(
-            paddle.dataset.uci_housing.train(), buf_size=500),
-        batch_size=BATCH_SIZE)
-
-    def train():
-        y = fluid.layers.data(name='y', shape=[1], dtype='float32')
-        x = fluid.layers.data(name='x', shape=[13], dtype='float32')
-        y_predict = fluid.layers.fc(input=x, size=1, act=None)
-
-        loss = fluid.layers.square_error_cost(input=y_predict, label=y)
-        avg_loss = fluid.layers.mean(loss)
-        opt = fluid.optimizer.SGD(learning_rate=0.001)
-        opt.minimize(avg_loss)
-
-        place = fluid.CPUPlace()
-        feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
-        exe = fluid.Executor(place)
-
-        # fetch distributed training environment setting
-        training_role = os.getenv("PADDLE_TRAINING_ROLE", None)
-        port = os.getenv("PADDLE_PSERVER_PORT", "6174")
-        pserver_ips = os.getenv("PADDLE_PSERVER_IPS", "")
-        trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
-        eplist = []
-        for ip in pserver_ips.split(","):
-            eplist.append(':'.join([ip, port]))
-        pserver_endpoints = ",".join(eplist)
-        trainers = int(os.getenv("PADDLE_TRAINERS"))
-        current_endpoint = os.getenv("PADDLE_CURRENT_IP", "") + ":" + port
-
-        t = fluid.DistributeTranspiler()
-        t.transpile(
-            trainer_id = trainer_id,
-            pservers = pserver_endpoints,
-            trainers = trainers)
-
-        if training_role == "PSERVER":
-            pserver_prog = t.get_pserver_program(current_endpoint)
-            startup_prog = t.get_startup_program(current_endpoint, pserver_prog)
-            exe.run(startup_prog)
-            exe.run(pserver_prog)
-        elif training_role == "TRAINER":
-            trainer_prog = t.get_trainer_program()
+    import sys
+    from network_conf import ctr_dnn_model_dataset
+    import paddle.fluid.incubate.fleet.base.role_maker as role_maker
+
+    from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
+    from paddle.fluid.transpiler.distribute_transpiler import DistributeTranspilerConfig
+
+    dense_feature_dim = 13
+    sparse_feature_dim = 10000001
+    batch_size = 100
+    thread_num = 10
+    embedding_size = 10
+    args = parse_args()
+
+    def main_function(is_local):
+      # common code for local training and distributed training
+      dense_input = fluid.layers.data(
+        name="dense_input", shape=[dense_feature_dim], dtype='float32')
+
+      sparse_input_ids = [
+            fluid.layers.data(name="C" + str(i), shape=[1], lod_level=1,
+                              dtype="int64") for i in range(1, 27)]
+
+        label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+        dataset = fluid.DatasetFactory().create_dataset()
+        dataset.set_use_var([dense_input] + sparse_input_ids + [label])
+        pipe_command = "python criteo_reader.py %d" % sparse_feature_dim
+        dataset.set_pipe_command(pipe_command)
+        dataset.set_batch_size(batch_size)
+        dataset.set_thread(thread_num)
+
+        whole_filelist = ["raw_data/part-%d" % x 
+                           for x in range(len(os.listdir("raw_data")))]
+
+        dataset.set_filelist(whole_filelist)
+        loss, auc_var, batch_auc_var = ctr_dnn_model_dataset(
+            dense_input, sparse_input_ids, label, embedding_size,
+            sparse_feature_dim)
+
+        exe = fluid.Executor(fluid.CPUPlace())
+        def train_loop(epoch=20):
+            for i in range(epoch):
+                exe.train_from_dataset(program=fluid.default_main_program(),
+                                       dataset=dataset,
+                                       fetch_list=[auc_var],
+                                       fetch_info=["auc"],
+                                       debug=False)
+        # local training
+        def local_train():
+            optimizer = fluid.optimizer.SGD(learning_rate=1e-4)
+            optimizer.minimize(loss)
            exe.run(fluid.default_startup_program())
-
-            for epoch in range(EPOCH_NUM):
-                for batch_id, batch_data in enumerate(train_reader()):
-                    avg_loss_value, = exe.run(trainer_prog,
-                                          feed=feeder.feed(batch_data),
-                                          fetch_list=[avg_loss])
-                    if (batch_id + 1) % 10 == 0:
-                        print("Epoch: {0}, Batch: {1}, loss: {2}".format(
-                            epoch, batch_id, avg_loss_value[0]))
-            # destory the resource of current trainer node in pserver server node
-            exe.close()
+            train_loop()
+
+      # distributed training
+        def dist_train():
+            role = role_maker.PaddleCloudRoleMaker()
+            fleet.init(role)
+            strategy = DistributeTranspilerConfig()
+            strategy.sync_mode = False
+            optimizer = fluid.optimizer.SGD(learning_rate=1e-4)
+            optimizer = fleet.distributed_optimizer(optimizer, strategy)
+            optimizer.minimize(loss)
+
+            if fleet.is_server():
+                fleet.init_server()
+                fleet.run_server()
+            elif fleet.is_worker():
+                fleet.init_worker()
+                exe.run(fluid.default_startup_program())
+                train_loop()
+        if is_local:
+            local_train()
        else:
-            raise AssertionError("PADDLE_TRAINING_ROLE should be one of [TRAINER, PSERVER]")
-
-    train()
-
-
-Environment Variables
------------------------------------
-
-When starting a distributed training task, different environment variables are used to represent different node roles, details as follows:
-
-.. list-table::
-  :header-rows: 1
-
-  * - Environment Variable
-    - Data Type 
-    - Example 
-    - Description
-  * - :code:`PADDLE_TRAINING_ROLE`
-    - str 
-    - :code:`PSERVER,TRANERR`
-    - role of current training node
-  * - :code:`PADDLE_PSERVER_IPS`
-    - str 
-    - :code:`ps0.paddlepaddle.com, ps1.paddlepaddle.com`
-    - The IP addresses or hostnames of all pserver nodes in the distributed training task, separated by ","
-  * - :code:`PADDLE_PSERVER_PORT`
-    - int 
-    - 6174 
-    - port that the pserver process listens to
-  * - :code:`PADDLE_TRAINERS`
-    - int
-    - 2 
-    - Number of trainer nodes in a distributed training task
-  * - :code:`PADDLE_CURRENT_IP`
-    - str 
-    - :code:`ps0.paddlepaddle.com`
-    - IP address or hostname of the current pserver node
-  * - :code:`PADDLE_TRAINER_ID`
-    - str 
-    - 0 
-    - ID of the current trainer node (unique), in the range of [0, PADDLE_TRAINERS)
-
-**Note:** Environment variables are just a way to get runtime information. In practical tasks, you can use command line parameters to obtain runtime information.
-
-API related to Distributed Training
---------------------------------
-
-DistributeTranspiler
-~~~~~~~~~~~~~~~~~~~~~~
-
-The machines in distributed training tasks based on the pserver-trainer architecture are divided into two roles: Parameter Server (pserver) and trainer. In Fluid, users only need to configure the network configuration required for single node training. The ``DistributeTranspiler`` module automatically modifies the single-node network settings into settings on which pserver and trainer needs to run based on the role of current training node:
+            dist_train()

-.. code:: python
+    if __name__ == '__main__':
+        main_function(args.is_local)

-  t = fluid.DistributeTranspiler()
-  t.transpile(
-    trainer_id = trainer_id,
-    pservers = pserver_endpoints,
-    trainers = trainers)
-  if PADDLE_TRAINING_ROLE == "TRAINER":
-    # fetch the trainer program and execute it
-    trainer_prog = t.get_trainer_program()
-    ...
+-  Note: The IO method used in this example is dataset, please refer to
+   `Dataset
+   API <https://www.paddlepaddle.org.cn/documentation/docs/en/1.7/api/dataset.html>`__
+   for specific documents and usage. For the ``train_from_dataset``
+   interface, please refer to `Executor
+   API <https://www.paddlepaddle.org.cn/documentation/docs/en/1.7/api/executor.html>`__.
+   ``from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet``
+   in this example means to introduce parameter server architecture for
+   distributed training, which you can refer to `Fleet
+   API <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.6/user_guides/howto/training/fleet_api_howto_cn.html>`__
+   for getting more about the options and examples of Fleet API.

-  elif PADDLE_TRAINER_ROLE == "PSERVER":
-    # fetch the pserver program and execute it
-    pserver_prog = t.get_pserver_program(current_endpoint)
-    ...
+Start command of single node training
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+.. code:: bash

-Exe.close()
-~~~~~~~~~~~~~~
+    python train.py --is_local 1

+Start command of single machine simulation distributed training
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-The status information of all trainer nodes is saved in the pserver node. When trainer finishes training, ``exe.close()`` should be called to notify all PServer nodes to release the resources of the current Trainer nodes:
+Here we use launch\_ps, a built-in launcher of paddle, which users can
+specify the number of workers and servers to start the parameter server
+tasks.

-.. code:: python
+.. code:: bash
+
+    python -m paddle.distributed.launch_ps --worker_num 2 --server_num 2 train.py

-  exe = fluid.Executor(fluid.CPUPlace())
-  # training process ...
-  exe.close() # notify PServer to destory the resource
-
-Note: every trainer needs to call exe.close() when the trainer finishes.
-
-Start a Distributed Training Task
----------------------------------
-
-.. list-table::
-   :header-rows: 1
-
-
-   * - Start Node 
-     - Start Command 
-     - Description
-   * - ps0.paddlepaddle.com 
-     - :code:`PADDLE_TRAINING_ROLE=PSERVER PADDLE_CURRENT_IP=ps0.paddlepaddle.com PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_PSERVER_PORT=6174 python fluid_dist.py`
-     - Start pserver node
-   * - ps1.paddlepaddle.com
-     - :code:`PADDLE_TRAINING_ROLE=PSERVER PADDLE_CURRENT_IP=ps1.paddlepaddle.com PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_PSERVER_PORT=6174 python fluid_dist.py`
-     - Start pserver node
-   * - trainer0.paddlepaddle.com       
-     - :code:`PADDLE_TRAINING_ROLE=TRAINER PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_TRAINER_ID=0 PADDLE_PSERVER_PORT=6174 python fluid_dist.py`
-     - Start the number 0 Trainer Node 
-   * - trainer1.paddlepaddle.com       
-     - :code:`PADDLE_TRAINING_ROLE=TRAINER PADDLE_PSERVER_IPS=ps0.paddlepaddle.com, ps1.paddlepaddle.com PADDLE_TRAINERS=2 PADDLE_TRAINER_ID=1 PADDLE_PSERVER_PORT=6174 python fluid_dist.py`
-     - Start the number 1 trainer node
+The task running log can be viewed in the logs directory of the working
+directory. When you can use a single machine to simulate distributed
+training, you can perform true multi node distributed training. We
+recommend that users refer directly to
+`百度云运行分布式任务的示例 <https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/user_guides/howto/training/deploy_ctr_on_baidu_cloud_cn.html>`__.
--- a/doc/fluid/api/dygraph.rst
+++ b/doc/fluid/api/dygraph.rst
@@ -13,9 +13,17 @@ fluid.dygraph
    dygraph/Conv3D.rst
    dygraph/Conv3DTranspose.rst
    dygraph/CosineDecay.rst
+    dygraph/DataParallel.rst
+    dygraph/disable_dygraph.rst
+    dygraph/dygraph_to_static_code.rst
+    dygraph/dygraph_to_static_func.rst
    dygraph/dygraph_to_static_output.rst
+    dygraph/dygraph_to_static_program.rst
    dygraph/Embedding.rst
+    dygraph/enable_dygraph.rst
+    dygraph/enabled.rst
    dygraph/ExponentialDecay.rst
+    dygraph/grad.rst
    dygraph/GroupNorm.rst
    dygraph/GRUUnit.rst
    dygraph/guard.rst

--- a/doc/fluid/api/dygraph/DataParallel.rst
+++ b/doc/fluid/api/dygraph/DataParallel.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_dygraph_DataParallel:
+
+DataParallel
+------------
+
+..  autoclass:: paddle.fluid.dygraph.DataParallel
+    :members:
+    :noindex:
+
--- a/doc/fluid/api/dygraph/disable_dygraph.rst
+++ b/doc/fluid/api/dygraph/disable_dygraph.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_dygraph_disable_dygraph:
+
+disable_dygraph
+---------------
+
+..  autofunction:: paddle.fluid.dygraph.disable_dygraph
+    :noindex:
+
--- a/doc/fluid/api/dygraph/dygraph_to_static_code.rst
+++ b/doc/fluid/api/dygraph/dygraph_to_static_code.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_dygraph_dygraph_to_static_code:
+
+dygraph_to_static_code
+----------------------
+
+..  autofunction:: paddle.fluid.dygraph.dygraph_to_static_code
+    :noindex:
+
--- a/doc/fluid/api/dygraph/dygraph_to_static_func.rst
+++ b/doc/fluid/api/dygraph/dygraph_to_static_func.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_dygraph_dygraph_to_static_func:
+
+dygraph_to_static_func
+----------------------
+
+..  autofunction:: paddle.fluid.dygraph.dygraph_to_static_func
+    :noindex:
+
--- a/doc/fluid/api/dygraph/dygraph_to_static_program.rst
+++ b/doc/fluid/api/dygraph/dygraph_to_static_program.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_dygraph_dygraph_to_static_program:
+
+dygraph_to_static_program
+-------------------------
+
+..  autofunction:: paddle.fluid.dygraph.dygraph_to_static_program
+    :noindex:
+
--- a/doc/fluid/api/dygraph/enable_dygraph.rst
+++ b/doc/fluid/api/dygraph/enable_dygraph.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_dygraph_enable_dygraph:
+
+enable_dygraph
+--------------
+
+..  autofunction:: paddle.fluid.dygraph.enable_dygraph
+    :noindex:
+
--- a/doc/fluid/api/dygraph/enabled.rst
+++ b/doc/fluid/api/dygraph/enabled.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_dygraph_enabled:
+
+enabled
+-------
+
+..  autofunction:: paddle.fluid.dygraph.enabled
+    :noindex:
+
--- a/doc/fluid/api/dygraph/grad.rst
+++ b/doc/fluid/api/dygraph/grad.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_dygraph_grad:
+
+grad
+----
+
+..  autofunction:: paddle.fluid.dygraph.grad
+    :noindex:
+
--- a/doc/fluid/api/fluid.rst
+++ b/doc/fluid/api/fluid.rst
@@ -20,14 +20,15 @@ fluid
    fluid/DataFeeder.rst
    fluid/default_main_program.rst
    fluid/default_startup_program.rst
-    fluid/disable_dygraph.rst
    fluid/device_guard.rst
+    fluid/disable_dygraph.rst
    fluid/DistributeTranspiler.rst
    fluid/DistributeTranspilerConfig.rst
    fluid/embedding.rst
    fluid/enable_dygraph.rst
    fluid/ExecutionStrategy.rst
    fluid/Executor.rst
+    fluid/get_flags.rst
    fluid/global_scope.rst
    fluid/gradients.rst
    fluid/in_dygraph_mode.rst
@@ -47,6 +48,7 @@ fluid
    fluid/require_version.rst
    fluid/save.rst
    fluid/scope_guard.rst
+    fluid/set_flags.rst
    fluid/Tensor.rst
    fluid/Variable.rst
    fluid/WeightNormParamAttr.rst
--- a/doc/fluid/api/fluid/device_guard.rst
+++ b/doc/fluid/api/fluid/device_guard.rst
@@ -4,7 +4,7 @@
 .. _api_fluid_device_guard:

 device_guard
-----------------------
+------------

 ..  autofunction:: paddle.fluid.device_guard
    :noindex:

--- a/doc/fluid/api/fluid/get_flags.rst
+++ b/doc/fluid/api/fluid/get_flags.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_get_flags:
+
+get_flags
+---------
+
+..  autofunction:: paddle.fluid.get_flags
+    :noindex:
+
--- a/doc/fluid/api/fluid/set_flags.rst
+++ b/doc/fluid/api/fluid/set_flags.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_set_flags:
+
+set_flags
+---------
+
+..  autofunction:: paddle.fluid.set_flags
+    :noindex:
+
--- a/doc/fluid/api/gen_doc.py
+++ b/doc/fluid/api/gen_doc.py
@@ -19,6 +19,9 @@ import types
 import os
 import contextlib
 import paddle.fluid as fluid
+import paddle.tensor as tensor
+import paddle.nn as nn
+#import paddle.framework as framework

 def parse_arg():
    parser = argparse.ArgumentParser()
@@ -29,8 +32,13 @@ def parse_arg():
        '--module_prefix', type=str, help='Generate the prefix of module')
    parser.add_argument(
        '--output', type=str, help='Output file or output directory for output rst')
+    parser.add_argument(
+        '--output_name', type=str, help='Output file or output directory for output rst')
+    parser.add_argument(
+        '--output_dir', type=str, help='Output file or output directory for output rst')
    parser.add_argument(
        '--to_multiple_files', type=bool, default=False, help='Whether to separate to multiple files')
+
    return parser.parse_args()

    def print_item(self, name):
@@ -140,7 +148,7 @@ class DocGenerator(object):
        self.stream.write(".. _api_{0}_{1}:\n\n".format("_".join(
            self.module_prefix.split(".")), name))

-def generate_doc(module_name, module_prefix, output, to_multiple_files):
+def generate_doc(module_name, module_prefix, output, output_name, to_multiple_files, output_dir):
    if module_name == "":
        module_name = None

@@ -150,24 +158,29 @@ def generate_doc(module_name, module_prefix, output, to_multiple_files):
    gen = DocGenerator()

    if module_name is None:
-        gen.module = fluid
-        gen.module_name = 'fluid'
+        gen.module = eval(output_name)
+        gen.module_name = str(output_name)
    else:
-        gen.module = fluid
+        gen.module = eval(output_name)
        for each_module_name in module_name.split('.'):
            if not hasattr(gen.module, each_module_name):
                raise ValueError("Cannot find fluid.{0}".format(module_name))
            else:
                gen.module = getattr(gen.module, each_module_name)

-        gen.module_name = "fluid." + module_name
+        gen.module_name = output_name + "." + module_name

    if module_prefix is None:
        gen.module_prefix = gen.module_name
    else:
-        gen.module_prefix = "fluid." + module_prefix
+        gen.module_prefix = output_name + "." + module_prefix

    dirname = output if to_multiple_files else os.path.dirname(output) 
+
+    if output_dir != None:
+        dirname = output_dir + "/" + dirname
+        output = output_dir + "/" + output
+
    if len(dirname) > 0 and (not os.path.exists(dirname) or not os.path.isdir(dirname)): 
        os.makedirs(dirname)

@@ -199,7 +212,7 @@ def generate_doc(module_name, module_prefix, output, to_multiple_files):

 def main():
    args = parse_arg()
-    generate_doc(args.module_name, args.module_prefix, args.output, args.to_multiple_files)
+    generate_doc(args.module_name, args.module_prefix, args.output, args.output_name, args.to_multiple_files, args.output_dir)


 if __name__ == '__main__':

--- a/doc/fluid/api/gen_doc.sh
+++ b/doc/fluid/api/gen_doc.sh
 #!/bin/bash

-#for module in nn
-#do
-#  python gen_doc.py --module_name layers.${module} --module_prefix layers --output layers/${module} --to_multiple_files True
-#done
-
-#for module in control_flow nn io ops tensor learning_rate_scheduler detection metric_op
-#do
-#  python gen_doc.py --module_name layers.${module} --module_prefix layers --output layers/${module}.rst
-#done 
-
 for module in layers dataset clip metrics executor initializer io nets optimizer profiler regularizer transpiler backward profiler unique_name dygraph
 do
-  python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --to_multiple_files True
+  python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name fluid --to_multiple_files True
  python gen_module_index.py ${module}  fluid.${module}
 done

-python gen_doc.py --module_name "" --module_prefix "" --output fluid --to_multiple_files True
+python gen_doc.py --module_name "" --module_prefix "" --output fluid --output_name fluid --to_multiple_files True
 python gen_module_index.py fluid  fluid

+for module in math random stat
+do
+  python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name tensor --to_multiple_files True --output_dir tensor
+  python gen_module_index.py tensor.${module} ${module}
+done
+
+python gen_module_index.py tensor paddle.tensor
+
+for module in loss
+do
+  python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name nn --to_multiple_files True --output_dir nn
+  python gen_module_index.py nn.${module} ${module}
+done
+
+python gen_module_index.py nn paddle.nn
+
 python gen_index.py

--- a/doc/fluid/api/index_en.rst
+++ b/doc/fluid/api/index_en.rst
@@ -19,8 +19,10 @@ API Reference
    layers.rst
    metrics.rst
    nets.rst
+    nn.rst
    optimizer.rst
    profiler.rst
    regularizer.rst
+    tensor.rst
    transpiler.rst
    unique_name.rst
--- a/doc/fluid/api/io/ComposeNotAligned.rst
+++ b/doc/fluid/api/io/ComposeNotAligned.rst
@@ -11,4 +11,3 @@ ComposeNotAligned
    :inherited-members:
    :noindex:

-This indicates an error state of compose API, which will raise when outputs of readers are not aligned.
--- a/doc/fluid/api/layers.rst
+++ b/doc/fluid/api/layers.rst
@@ -25,6 +25,7 @@ fluid.layers
    layers/atan.rst
    layers/auc.rst
    layers/autoincreased_step_counter.rst
+    layers/BasicDecoder.rst
    layers/batch_norm.rst
    layers/beam_search.rst
    layers/beam_search_decode.rst
@@ -68,6 +69,7 @@ fluid.layers
    layers/cumsum.rst
    layers/data.rst
    layers/data_norm.rst
+    layers/DecodeHelper.rst
    layers/Decoder.rst
    layers/deformable_conv.rst
    layers/deformable_roi_pooling.rst
@@ -104,6 +106,7 @@ fluid.layers
    layers/eye.rst
    layers/fc.rst
    layers/fill_constant.rst
+    layers/fill_constant_batch_size_like.rst
    layers/filter_by_instag.rst
    layers/flatten.rst
    layers/floor.rst
@@ -112,6 +115,7 @@ fluid.layers
    layers/gather_nd.rst
    layers/gather_tree.rst
    layers/gaussian_random.rst
+    layers/gaussian_random_batch_size_like.rst
    layers/gelu.rst
    layers/generate_mask_labels.rst
    layers/generate_proposal_labels.rst
@@ -119,6 +123,7 @@ fluid.layers
    layers/get_tensor_from_selected_rows.rst
    layers/greater_equal.rst
    layers/greater_than.rst
+    layers/GreedyEmbeddingHelper.rst
    layers/grid_sampler.rst
    layers/group_norm.rst
    layers/gru_unit.rst
@@ -136,6 +141,7 @@ fluid.layers
    layers/image_resize.rst
    layers/image_resize_short.rst
    layers/increment.rst
+    layers/inplace_abn.rst
    layers/instance_norm.rst
    layers/inverse_time_decay.rst
    layers/iou_similarity.rst
@@ -237,6 +243,7 @@ fluid.layers
    layers/rpn_target_assign.rst
    layers/rsqrt.rst
    layers/sampled_softmax_with_cross_entropy.rst
+    layers/SampleEmbeddingHelper.rst
    layers/sampling_id.rst
    layers/scale.rst
    layers/scatter.rst
@@ -302,10 +309,12 @@ fluid.layers
    layers/tensor_array_to_tensor.rst
    layers/thresholded_relu.rst
    layers/topk.rst
+    layers/TrainingHelper.rst
    layers/transpose.rst
    layers/unfold.rst
    layers/Uniform.rst
    layers/uniform_random.rst
+    layers/uniform_random_batch_size_like.rst
    layers/unique.rst
    layers/unique_with_counts.rst
    layers/unsqueeze.rst

--- a/doc/fluid/api/layers/BasicDecoder.rst
+++ b/doc/fluid/api/layers/BasicDecoder.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_layers_BasicDecoder:
+
+BasicDecoder
+------------
+
+..  autoclass:: paddle.fluid.layers.BasicDecoder
+    :members:
+    :inherited-members:
+    :noindex:
+
--- a/doc/fluid/api/layers/DecodeHelper.rst
+++ b/doc/fluid/api/layers/DecodeHelper.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_layers_DecodeHelper:
+
+DecodeHelper
+------------
+
+..  autoclass:: paddle.fluid.layers.DecodeHelper
+    :members:
+    :inherited-members:
+    :noindex:
+
--- a/doc/fluid/api/layers/GreedyEmbeddingHelper.rst
+++ b/doc/fluid/api/layers/GreedyEmbeddingHelper.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_layers_GreedyEmbeddingHelper:
+
+GreedyEmbeddingHelper
+---------------------
+
+..  autoclass:: paddle.fluid.layers.GreedyEmbeddingHelper
+    :members:
+    :inherited-members:
+    :noindex:
+
--- a/doc/fluid/api/layers/SampleEmbeddingHelper.rst
+++ b/doc/fluid/api/layers/SampleEmbeddingHelper.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_layers_SampleEmbeddingHelper:
+
+SampleEmbeddingHelper
+---------------------
+
+..  autoclass:: paddle.fluid.layers.SampleEmbeddingHelper
+    :members:
+    :inherited-members:
+    :noindex:
+
--- a/doc/fluid/api/layers/TrainingHelper.rst
+++ b/doc/fluid/api/layers/TrainingHelper.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_layers_TrainingHelper:
+
+TrainingHelper
+--------------
+
+..  autoclass:: paddle.fluid.layers.TrainingHelper
+    :members:
+    :inherited-members:
+    :noindex:
+
--- a/doc/fluid/api/layers/fill_constant_batch_size_like.rst
+++ b/doc/fluid/api/layers/fill_constant_batch_size_like.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_layers_fill_constant_batch_size_like:
+
+fill_constant_batch_size_like
+-----------------------------
+
+..  autofunction:: paddle.fluid.layers.fill_constant_batch_size_like
+    :noindex:
+
--- a/doc/fluid/api/layers/gaussian_random_batch_size_like.rst
+++ b/doc/fluid/api/layers/gaussian_random_batch_size_like.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_layers_gaussian_random_batch_size_like:
+
+gaussian_random_batch_size_like
+-------------------------------
+
+..  autofunction:: paddle.fluid.layers.gaussian_random_batch_size_like
+    :noindex:
+
--- a/doc/fluid/api/layers/inplace_abn.rst
+++ b/doc/fluid/api/layers/inplace_abn.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_layers_inplace_abn:
+
+inplace_abn
+-----------
+
+..  autofunction:: paddle.fluid.layers.inplace_abn
+    :noindex:
+
--- a/doc/fluid/api/layers/uniform_random_batch_size_like.rst
+++ b/doc/fluid/api/layers/uniform_random_batch_size_like.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_fluid_layers_uniform_random_batch_size_like:
+
+uniform_random_batch_size_like
+------------------------------
+
+..  autofunction:: paddle.fluid.layers.uniform_random_batch_size_like
+    :noindex:
+
--- a/doc/fluid/api/nn.rst
+++ b/doc/fluid/api/nn.rst
+=========
+paddle.nn
+=========
+
+..  toctree::
+    :maxdepth: 1
+
+    nn/loss.rst
--- a/doc/fluid/api/nn/loss.rst
+++ b/doc/fluid/api/nn/loss.rst
+====
+loss
+====
+
+..  toctree::
+    :maxdepth: 1
+
+    loss/L1Loss.rst
--- a/doc/fluid/api/nn/loss/L1Loss.rst
+++ b/doc/fluid/api/nn/loss/L1Loss.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_nn_loss_L1Loss:
+
+L1Loss
+------
+
+..  autoclass:: paddle.nn.loss.L1Loss
+    :members:
+    :inherited-members:
+    :noindex:
+
--- a/doc/fluid/api/tensor.rst
+++ b/doc/fluid/api/tensor.rst
+=============
+paddle.tensor
+=============
+
+..  toctree::
+    :maxdepth: 1
+
+    tensor/math.rst
+    tensor/random.rst
--- a/doc/fluid/api/tensor/math.rst
+++ b/doc/fluid/api/tensor/math.rst
+====
+math
+====
+
+..  toctree::
+    :maxdepth: 1
+
+    math/add.rst
+    math/atan.rst
+    math/div.rst
+    math/elementwise_sum.rst
+    math/mm.rst
+    math/mul.rst
+    math/pow.rst
+    math/sin.rst
+    math/sqrt.rst
+    math/sum.rst
+    math/tanh.rst
--- a/doc/fluid/api/tensor/math/add.rst
+++ b/doc/fluid/api/tensor/math/add.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_add:
+
+add
+---
+
+..  autofunction:: paddle.tensor.math.add
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/atan.rst
+++ b/doc/fluid/api/tensor/math/atan.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_atan:
+
+atan
+----
+
+..  autofunction:: paddle.tensor.math.atan
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/div.rst
+++ b/doc/fluid/api/tensor/math/div.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_div:
+
+div
+---
+
+..  autofunction:: paddle.tensor.math.div
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/elementwise_sum.rst
+++ b/doc/fluid/api/tensor/math/elementwise_sum.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_elementwise_sum:
+
+elementwise_sum
+---------------
+
+..  autofunction:: paddle.tensor.math.elementwise_sum
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/mm.rst
+++ b/doc/fluid/api/tensor/math/mm.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_mm:
+
+mm
+--
+
+..  autofunction:: paddle.tensor.math.mm
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/mul.rst
+++ b/doc/fluid/api/tensor/math/mul.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_mul:
+
+mul
+---
+
+..  autofunction:: paddle.tensor.math.mul
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/pow.rst
+++ b/doc/fluid/api/tensor/math/pow.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_pow:
+
+pow
+---
+
+..  autofunction:: paddle.tensor.math.pow
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/sin.rst
+++ b/doc/fluid/api/tensor/math/sin.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_sin:
+
+sin
+---
+
+..  autofunction:: paddle.tensor.math.sin
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/sqrt.rst
+++ b/doc/fluid/api/tensor/math/sqrt.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_sqrt:
+
+sqrt
+----
+
+..  autofunction:: paddle.tensor.math.sqrt
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/sum.rst
+++ b/doc/fluid/api/tensor/math/sum.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_sum:
+
+sum
+---
+
+..  autofunction:: paddle.tensor.math.sum
+    :noindex:
+
--- a/doc/fluid/api/tensor/math/tanh.rst
+++ b/doc/fluid/api/tensor/math/tanh.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_math_tanh:
+
+tanh
+----
+
+..  autofunction:: paddle.tensor.math.tanh
+    :noindex:
+
--- a/doc/fluid/api/tensor/random.rst
+++ b/doc/fluid/api/tensor/random.rst
+======
+random
+======
+
+..  toctree::
+    :maxdepth: 1
+
+    random/randint.rst
+    random/randperm.rst
--- a/doc/fluid/api/tensor/random/randint.rst
+++ b/doc/fluid/api/tensor/random/randint.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_random_randint:
+
+randint
+-------
+
+..  autofunction:: paddle.tensor.random.randint
+    :noindex:
+
--- a/doc/fluid/api/tensor/random/randperm.rst
+++ b/doc/fluid/api/tensor/random/randperm.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_random_randperm:
+
+randperm
+--------
+
+..  autofunction:: paddle.tensor.random.randperm
+    :noindex:
+
--- a/doc/fluid/api_cn/clip_cn/GradientClipByGlobalNorm_cn.rst
+++ b/doc/fluid/api_cn/clip_cn/GradientClipByGlobalNorm_cn.rst
@@ -3,13 +3,19 @@
 GradientClipByGlobalNorm
 -------------------------------

-.. py:class:: paddle.fluid.clip.GradientClipByGlobalNorm(clip_norm, group_name='default_group')
+.. py:class:: paddle.fluid.clip.GradientClipByGlobalNorm(clip_norm, group_name='default_group', need_clip=None)
 
-通过多个 Tensor 的范数之和的比率，来剪切（clip）多个 Tensor （ Tensor 不是从该类传入， 通过 ``fluid.program_guard`` 的 ``main_program`` 参数传入，即公式中的 :math:`t\_list` 见代码实例）。
+将一个 Tensor列表 :math:`t\_list` 中所有Tensor的L2范数之和，限定在 ``clip_norm`` 范围内。

-给定一个 Tensor 列表 :math:`t\_list` 和一个剪切比率 ``clip_norm`` ，返回该类的实例作为 ``set_gradient_clip`` 方法的第一个参数， ``set_gradient_clip`` 第二个参数是用来计算被剪切的 Tensor 列表（该值默认为 ``None`` 会基于所有 Tensor 列表来计算全局范数 ``global_norm`` 。
+- 如果范数之和大于 ``clip_norm`` ，则所有 Tensor 会乘以一个系数进行压缩

-剪切过程如下：
+- 如果范数之和小于或等于 ``clip_norm`` ，则不会进行任何操作。
+
+输入的 Tensor列表 不是从该类里传入， 而是默认会选择 ``Program`` 中全部的梯度，如果 ``need_clip`` 不为None，则可以只选择部分参数进行梯度裁剪。
+
+该类需要在 ``optimizer.minimize(grad_clip)`` 进行设置后才能生效，可参看 ``optimizer`` 文档(例如： :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
+
+裁剪公式如下：

 .. math::
            \\t\_list[i]=t\_list[i]∗\frac{clip\_norm}{max(global\_norm,clip\_norm)}\\
@@ -21,67 +27,73 @@ GradientClipByGlobalNorm


 参数:
- - **clip_norm** (float) - 范数最大值
+ - **clip_norm** (float) - 所允许的范数最大值
 - **group_name** (str, optional) - 剪切的组名
+ - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数，该函数接收一个 ``Parameter`` ，返回一个 ``bool`` (True表示需要裁剪，False不需要裁剪)。默认为None，此时会裁剪网络中全部参数。
  
-**代码示例**
+**代码示例1：静态图**
 
 .. code-block:: python
-        
-    import paddle.fluid as fluid
-    import paddle.fluid.core as core
+            
    import paddle
-
-    place = core.CPUPlace()
-    prog = fluid.framework.Program()
-    startup_program = fluid.framework.Program()
+    import paddle.fluid as fluid
+    import numpy as np
+                
+    main_prog = fluid.Program()
+    startup_prog = fluid.Program()
    with fluid.program_guard(
-            main_program=prog, startup_program=startup_program):
-        image = fluid.layers.data(name='x', shape=[784], dtype='float32')
-        label = fluid.layers.data(name='y', shape=[1], dtype='int64')
-        hidden1 = fluid.layers.fc(input=image, size=128, act='relu')
-        hidden2 = fluid.layers.fc(input=hidden1, size=64, act='relu')
-        predict = fluid.layers.fc(input=hidden2, size=10, act='softmax')
-        cost = fluid.layers.cross_entropy(input=predict, label=label)
-        avg_cost = fluid.layers.mean(cost)
-
-    prog_clip = prog.clone()
-    avg_cost_clip = prog_clip.block(0).var(avg_cost.name)
-
-    p_g = fluid.backward.append_backward(loss=avg_cost)
-    p_g_clip = fluid.backward.append_backward(loss=avg_cost_clip)
-
-    with fluid.program_guard(main_program=prog_clip, startup_program=startup_program):
-        fluid.clip.set_gradient_clip(
-            fluid.clip.GradientClipByGlobalNorm(clip_norm=2.0))
-        p_g_clip = fluid.clip.append_gradient_clip_ops(p_g_clip)
-
-    grad_list = [elem[1] for elem in p_g]
-    grad_clip_list = [elem[1] for elem in p_g_clip]
-
-    train_reader = paddle.batch(
-        paddle.reader.shuffle(
-            paddle.dataset.mnist.train(), buf_size=8192),
-        batch_size=128)
-
-    exe = fluid.Executor(place)
-    feeder = fluid.DataFeeder(feed_list=[image, label], place=place)
-    exe.run(startup_program)
-
-    count = 0
-    for data in train_reader():
-        count += 1
-        print("count:%s" % count)
-        if count > 5:
-            break
-        out = exe.run(prog, feed=feeder.feed(data), fetch_list=grad_list)
-        out_clip = exe.run(prog_clip,
-                           feed=feeder.feed(data),
-                           fetch_list=grad_clip_list)
-
+            main_program=main_prog, startup_program=startup_prog):
+        image = fluid.data(
+            name='x', shape=[-1, 2], dtype='float32')
+        predict = fluid.layers.fc(input=image, size=3, act='relu') #Trainable parameters: fc_0.w.0, fc_0.b.0
+        loss = fluid.layers.mean(predict)
+        
+        # 裁剪网络中全部参数：
+        clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0)
+        
+        # 仅裁剪参数fc_0.w_0时：
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为Parameter，返回类型为bool
+        # def fileter_func(Parameter):
+        # # 可以较为方便的通过Parameter.name判断（name可以在fluid.ParamAttr中设置，默认为fc_0.w_0、fc_0.b_0）
+        #   return Parameter.name=="fc_0.w_0"
+        # clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)

+        sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1)
+        sgd_optimizer.minimize(loss, grad_clip=clip)

+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
+    exe.run(startup_prog)
+    out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)


+**代码示例2：动态图**

+.. code-block:: python

+    import paddle
+    import paddle.fluid as fluid
+    
+    with fluid.dygraph.guard():
+        linear = fluid.dygraph.Linear(10, 10)  #可训练参数: linear_0.w.0, linear_0.b.0
+        inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
+        out = linear(fluid.dygraph.to_variable(inputs))
+        loss = fluid.layers.reduce_mean(out)
+        loss.backward()
+
+        # 裁剪网络中全部参数：
+        clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0)
+
+        # 仅裁剪参数linear_0.w_0时：
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为ParamBase，返回类型为bool
+        # def fileter_func(ParamBase):
+        # # 可以通过ParamBase.name判断（name可以在fluid.ParamAttr中设置，默认为linear_0.w_0、linear_0.b_0）
+        #   return ParamBase.name == "linear_0.w_0"
+        # # 注：linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差，也可以此来判断
+        #   return ParamBase.name == linear.weight.name
+        # clip = fluid.clip.GradientClipByGlobalNorm(clip_norm=1.0, need_clip=fileter_func)
+
+        sgd_optimizer = fluid.optimizer.SGD(
+        learning_rate=0.1, parameter_list=linear.parameters())
+        sgd_optimizer.minimize(loss, grad_clip=clip)
\ No newline at end of file
--- a/doc/fluid/api_cn/clip_cn/GradientClipByNorm_cn.rst
+++ b/doc/fluid/api_cn/clip_cn/GradientClipByNorm_cn.rst
@@ -3,11 +3,19 @@
 GradientClipByNorm
 -------------------------------

-.. py:class:: paddle.fluid.clip.GradientClipByNorm(clip_norm)
+.. py:class:: paddle.fluid.clip.GradientClipByNorm(clip_norm, need_clip=None)

-将输入多维Tensor :math:`X` 转换为L2范数不超过给定的二范数最大值（ ``clip_norm`` ）的多维Tensor。（多维Tensor不是从该类传入， 而是通过 ``fluid.program_guard`` 的 ``main_program`` 参数传入）。
+将输入的多维Tensor :math:`X` 的L2范数限制在 ``clip_norm`` 范围之内。

-该类限制了输入多维Tensor :math:`X` 的L2范数不会超过 ``clip_norm`` 。
+- 如果L2范数大于 ``clip_norm`` ，则该 Tensor 会乘以一个系数进行压缩
+
+- 如果L2范数小于或等于 ``clip_norm`` ，则不会进行任何操作。
+
+输入的 Tensor 不是从该类里传入， 而是默认会选择 ``Program`` 中全部的梯度，如果 ``need_clip`` 不为None，则可以只选择部分参数进行梯度裁剪。
+
+该类需要在 ``optimizer.minimize(grad_clip)`` 进行设置后才能生效，可参看 ``optimizer`` 文档(例如： :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。
+
+裁剪公式如下：

 .. math::

@@ -26,54 +34,72 @@ GradientClipByNorm
  \\norm(X) = (\sum_{i=1}^{n}|x_i|^2)^{\frac{1}{2}}\\

 参数:
- - **clip_norm** (float) - 二范数最大值
-
+ - **clip_norm** (float) - 所允许的二范数最大值。
+ - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数，该函数接收一个 ``Parameter`` ，返回一个 ``bool`` (True表示需要裁剪，False不需要裁剪)。默认为None，此时会裁剪网络中全部参数。

-**代码示例**
+**代码示例1：静态图**
+ 
+.. code-block:: python
+            
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+                
+    main_prog = fluid.Program()
+    startup_prog = fluid.Program()
+    with fluid.program_guard(
+            main_program=main_prog, startup_program=startup_prog):
+        image = fluid.data(
+            name='x', shape=[-1, 2], dtype='float32')
+        predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0
+        loss = fluid.layers.mean(predict)
+        
+        # 裁剪网络中全部参数：
+        clip = fluid.clip.GradientClipByNorm(clip_norm=1.0)
+        
+        # 仅裁剪参数fc_0.w_0时：
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为Parameter，返回类型为bool
+        # def fileter_func(Parameter):
+        # # 可以较为方便的通过Parameter.name判断（name可以在fluid.ParamAttr中设置，默认为fc_0.w_0、fc_0.b_0）
+        #   return Parameter.name=="fc_0.w_0"
+        # clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func)
+
+        sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1)
+        sgd_optimizer.minimize(loss, grad_clip=clip)
+
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
+    exe.run(startup_prog)
+    out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
+
+
+**代码示例2：动态图**

 .. code-block:: python

-  import paddle.fluid as fluid
-  import paddle.fluid.core as core
-  import paddle
-  place = core.CPUPlace()
-  prog = fluid.framework.Program()
-  startup_program = fluid.framework.Program()
-  with fluid.program_guard(
-              main_program=prog, startup_program=startup_program):
-      image = fluid.layers.data(name='x', shape=[784], dtype='float32')
-      label = fluid.layers.data(name='y', shape=[1], dtype='int64')
-      hidden1 = fluid.layers.fc(input=image, size=128, act='relu')
-      hidden2 = fluid.layers.fc(input=hidden1, size=64, act='relu')
-      predict = fluid.layers.fc(input=hidden2, size=10, act='softmax')
-      cost = fluid.layers.cross_entropy(input=predict, label=label)
-      avg_cost = fluid.layers.mean(cost)
-  prog_clip = prog.clone()
-  avg_cost_clip = prog_clip.block(0).var(avg_cost.name)
-  p_g = fluid.backward.append_backward(loss=avg_cost)
-  p_g_clip = fluid.backward.append_backward(loss=avg_cost_clip)
-  with fluid.program_guard(main_program=prog_clip, startup_program=startup_program):
-      fluid.clip.set_gradient_clip(
-          fluid.clip.GradientClipByNorm(clip_norm=2.0))
-      p_g_clip = fluid.clip.append_gradient_clip_ops(p_g_clip)
-  grad_list = [elem[1] for elem in p_g]
-  grad_clip_list = [elem[1] for elem in p_g_clip]
-  train_reader = paddle.batch(
-      paddle.reader.shuffle(
-          paddle.dataset.mnist.train(), buf_size=8192),
-      batch_size=128)
-
-  exe = fluid.Executor(place)
-  feeder = fluid.DataFeeder(feed_list=[image, label], place=place)
-  exe.run(startup_program)
-
-  count = 0
-  for data in train_reader():
-      count += 1
-      print("count:%s" % count)
-      if count > 5:
-         break
-      out = exe.run(prog, feed=feeder.feed(data), fetch_list=grad_list)
-      out_clip = exe.run(prog_clip,
-                         feed=feeder.feed(data),
-                         fetch_list=grad_clip_list)
+    import paddle
+    import paddle.fluid as fluid
+    
+    with fluid.dygraph.guard():
+        linear = fluid.dygraph.Linear(10, 10)  #可训练参数: linear_0.w.0, linear_0.b.0
+        inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
+        out = linear(fluid.dygraph.to_variable(inputs))
+        loss = fluid.layers.reduce_mean(out)
+        loss.backward()
+
+        # 裁剪网络中全部参数：
+        clip = fluid.clip.GradientClipByNorm(clip_norm=1.0)
+
+        # 仅裁剪参数linear_0.w_0时：
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为ParamBase，返回类型为bool
+        # def fileter_func(ParamBase):
+        # # 可以通过ParamBase.name判断（name可以在fluid.ParamAttr中设置，默认为linear_0.w_0、linear_0.b_0）
+        #   return ParamBase.name == "linear_0.w_0"
+        # # 注：linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差，也可以此来判断
+        #   return ParamBase.name == linear.weight.name
+        # clip = fluid.clip.GradientClipByNorm(clip_norm=1.0, need_clip=fileter_func)
+
+        sgd_optimizer = fluid.optimizer.SGD(
+        learning_rate=0.1, parameter_list=linear.parameters())
+        sgd_optimizer.minimize(loss, grad_clip=clip)
\ No newline at end of file
--- a/doc/fluid/api_cn/clip_cn/GradientClipByValue_cn.rst
+++ b/doc/fluid/api_cn/clip_cn/GradientClipByValue_cn.rst
@@ -3,10 +3,14 @@
 GradientClipByValue
 -------------------------------

-.. py:class:: paddle.fluid.clip.GradientClipByValue(max, min=None)
+.. py:class:: paddle.fluid.clip.GradientClipByValue(max, min=None, need_clip=None)

-将梯度值(gradient values)的范围压缩到 [min, max]。

+将输入的多维Tensor :math:`X` 的值限制在 [min, max] 范围。
+
+输入的 Tensor 不是从该类里传入， 而是默认会选择 ``Program`` 中全部的梯度，如果 ``need_clip`` 不为None，则可以只选择部分参数进行梯度裁剪。
+
+该类需要在 ``optimizer.minimize(grad_clip)`` 进行设置后才能生效，可参看 ``optimizer`` 文档(例如： :ref:`cn_api_fluid_optimizer_SGDOptimizer` )。

 给定一个 Tensor  ``t`` ，该操作将它的值压缩到 ``min`` 和 ``max`` 之间

@@ -16,25 +20,75 @@ GradientClipByValue

 参数:
 - **max** (foat) - 要修剪的最大值。
- - **min** (float，optional) - 要修剪的最小值。如果用户没有设置，将被 ``framework`` 设置为 ``-max`` 。
+ - **min** (float，optional) - 要修剪的最小值。如果用户没有设置，将被自动设置为 ``-max`` （此时 ``max`` 必须大于0）。
+ - **need_clip** (function, optional) - 类型: 函数。用于指定需要梯度裁剪的参数，该函数接收一个 ``Parameter`` ，返回一个 ``bool`` (True表示需要裁剪，False不需要裁剪)。默认为None，此时会裁剪网络中全部参数。
  
-**代码示例**
+**代码示例1：静态图**
 
 .. code-block:: python
+            
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+                
+    main_prog = fluid.Program()
+    startup_prog = fluid.Program()
+    with fluid.program_guard(
+            main_program=main_prog, startup_program=startup_prog):
+        image = fluid.data(
+            name='x', shape=[-1, 2], dtype='float32')
+        predict = fluid.layers.fc(input=image, size=3, act='relu') #可训练参数: fc_0.w.0, fc_0.b.0
+        loss = fluid.layers.mean(predict)
+        
+        # 裁剪网络中全部参数：
+        clip = fluid.clip.GradientClipByValue(min=-1, max=1)
        
-     import paddle.fluid as fluid
-     w_param_attrs = fluid.ParamAttr(name=None,
-                                     initializer=fluid.initializer.UniformInitializer(low=-1.0, high=1.0, seed=0),
-                                     learning_rate=1.0,
-                                     regularizer=fluid.regularizer.L1Decay(1.0),
-                                     trainable=True,
-                                     gradient_clip=fluid.clip.GradientClipByValue(-1.0, 1.0))
-     x = fluid.layers.data(name='x', shape=[10], dtype='float32')
-     y_predict = fluid.layers.fc(input=x, size=1, param_attr=w_param_attrs)
-     
+        # 仅裁剪参数fc_0.w_0时：
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为Parameter，返回类型为bool
+        # def fileter_func(Parameter):
+        # # 可以较为方便的通过Parameter.name判断（name可以在fluid.ParamAttr中设置，默认为fc_0.w_0、fc_0.b_0）
+        #   return Parameter.name=="fc_0.w_0"
+        # clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func)
+
+        sgd_optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.1)
+        sgd_optimizer.minimize(loss, grad_clip=clip)
+
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    x = np.random.uniform(-100, 100, (10, 2)).astype('float32')
+    exe.run(startup_prog)
+    out = exe.run(main_prog, feed={'x': x}, fetch_list=loss)
+
+
+**代码示例2：动态图**
+
+.. code-block:: python
+
+    import paddle
+    import paddle.fluid as fluid
+    
+    with fluid.dygraph.guard():
+        linear = fluid.dygraph.Linear(10, 10)  #可训练参数: linear_0.w.0, linear_0.b.0
+        inputs = fluid.layers.uniform_random([32, 10]).astype('float32')
+        out = linear(fluid.dygraph.to_variable(inputs))
+        loss = fluid.layers.reduce_mean(out)
+        loss.backward()

+        # 裁剪网络中全部参数：
+        clip = fluid.clip.GradientClipByValue(min=-1, max=1)

+        # 仅裁剪参数linear_0.w_0时：
+        # 为need_clip参数传入一个函数fileter_func，fileter_func接收参数的类型为ParamBase，返回类型为bool
+        # def fileter_func(ParamBase):
+        # # 可以通过ParamBase.name判断（name可以在fluid.ParamAttr中设置，默认为linear_0.w_0、linear_0.b_0）
+        #   return ParamBase.name == "linear_0.w_0"
+        # # 注：linear.weight、linear.bias能分别返回dygraph.Linear层的权重与偏差，可以此来判断
+        #   return ParamBase.name == linear.weight.name
+        # clip = fluid.clip.GradientClipByValue(min=-1, max=1, need_clip=fileter_func)

+        sgd_optimizer = fluid.optimizer.SGD(
+        learning_rate=0.1, parameter_list=linear.parameters())
+        sgd_optimizer.minimize(loss, grad_clip=clip)



--- a/doc/fluid/api_cn/clip_cn/set_gradient_clip_cn.rst
+++ b/doc/fluid/api_cn/clip_cn/set_gradient_clip_cn.rst
@@ -7,12 +7,17 @@ set_gradient_clip

 .. py:function:: paddle.fluid.clip.set_gradient_clip(clip, param_list=None, program=None)

+.. warning::
+    此API对位置使用的要求较高，其必须位于组建网络之后， ``minimize`` 之前，因此在未来版本中可能被删除，故不推荐使用。推荐使用 ``minimize(loss, grad_clip=clip)`` 做梯度裁剪。
+    有三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。
+    如果 ``set_gradient_clip(clip)`` 与 ``minimize(loss, grad_clip=clip)`` 被同时使用，``set_gradient_clip`` 将不会生效。
+
 给指定参数做梯度裁剪。

 参数:
-    - **clip** (BaseGradientClipAttr) - BaseGradientClipAttr子类的实例，如 :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 等，用于描述具体的裁剪方法和属性。
+    - **clip** (GradientClipBase) - 梯度裁剪的策略，如 :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 等，用于描述具体的裁剪方法和属性。
    - **param_list** (list(Variable)，可选) - 需要裁剪的参数列表，可以是参数或参数名称列表。默认值为None，表示裁剪 ``program`` 中的所有参数。
-    - **program** (Program，可选) - 参数所在的Program。默认值为None，表示使用 :ref:`cn_api_fluid_default_main_program`。
+    - **program** (Program，可选) - 参数所在的Program。默认值为None，表示使用 :ref:`cn_api_fluid_default_main_program` 。

 返回: 无。

@@ -59,3 +64,17 @@ set_gradient_clip
            param_list=[param_var1, param_var2])
        sgd = fluid.optimizer.SGD(learning_rate=1e-3)
        sgd.minimize(loss)
+
+    # network 4: use set_gradient_clip and minimize(grad_clip=clip) together
+    with fluid.program_guard(fluid.Program(), fluid.Program()):
+        loss = network()
+        param_var1 = fluid.default_main_program().global_block().var("fc1_param")
+        param_var2 = fluid.default_main_program().global_block().var("fc2_param")
+        clip1 = fluid.clip.GradientClipByValue(min=-1.0, max=1.0), param_list=[param_var1, param_var2])
+        clip2 = fluid.clip.GradientClipByNorm(clip_norm=1.0), param_list=[param_var1, param_var2])
+        # 设置梯度裁剪策略：clip1
+        fluid.clip.set_gradient_clip(clip1)
+        sgd = fluid.optimizer.SGD(learning_rate=1e-3)
+        # 设置梯度裁剪策略：clip2
+        sgd.minimize(loss, grad_clip=clip2)
+        # 有设置冲突时，set_gradient_clip将不会生效，将以clip2的策略进行梯度裁剪
--- a/doc/fluid/api_cn/dygraph_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn.rst
@@ -19,6 +19,7 @@ fluid.dygraph
    dygraph_cn/Embedding_cn.rst
    dygraph_cn/ExponentialDecay_cn.rst
    dygraph_cn/FC_cn.rst
+    dygraph_cn/grad_cn.rst
    dygraph_cn/GroupNorm_cn.rst
    dygraph_cn/GRUUnit_cn.rst
    dygraph_cn/guard_cn.rst

--- a/doc/fluid/api_cn/dygraph_cn/LayerNorm_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/LayerNorm_cn.rst
@@ -47,7 +47,7 @@ LayerNorm
    x = numpy.random.random((3, 32, 32)).astype('float32')
    with fluid.dygraph.guard():
        x = to_variable(x)
-        layernorm = fluid.LayerNorm('LayerNorm', begin_norm_axis=1)
-        ret = layernorm(x)
+        layerNorm = fluid.LayerNorm([32, 32])
+        ret = layerNorm(x)


--- a/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Layer_cn.rst
@@ -21,6 +21,100 @@ Layer的全名。组成方式为： ``name_scope`` + “/” + MyLayer.__class__

 返回类型：str

+.. py:method:: register_forward_pre_hook(hook)
+
+为Layer注册一个 ``forward pre-hook`` 函数，该 ``hook`` 函数将会在 ``forward`` 函数调用之前被调用。
+
+``hook`` 函数具有以下形式：它的 ``input`` 是 ``Layer`` 的 ``input`` ，并且可以返回一个元组或者单个修改值；如果返回单个修改值，则将值包装到一个元组中。用户可以使用该函数来查看或修改 ``Layer`` ``forward`` 函数的输入。
+
+hook(Layer, input) -> None or modified input
+
+参数：
+    - **hook** (function) - 被注册为 ``forward pre-hook`` 的函数
+
+返回：一个 ``HookRemoveHelper`` 类对象，可通过调用 ``hook_remove_helper.remove()`` 来删除注册的hook函数。
+
+返回类型： ``HookRemoveHelper`` 类对象
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    import numpy as np
+
+    # forward_pre_hook函数修改了layer的输入：input = input * 2
+    def forward_pre_hook(layer, input):
+        # 改变输入值
+        input_return = (input[0] * 2)
+        return input_return
+
+    with fluid.dygraph.guard():
+        linear = fluid.Linear(13, 5, dtype="float32")
+
+        # 注册hook
+        forward_pre_hook_handle = linear.register_forward_pre_hook(forward_pre_hook)
+
+        value0 = np.arange(26).reshape(2, 13).astype("float32")
+        in0 = fluid.dygraph.to_variable(value0)
+        out0 = linear(in0)
+
+        # 移除hook
+        forward_pre_hook_handle.remove()
+
+        value1 = value0 * 2
+        in1 = fluid.dygraph.to_variable(value1)
+        out1 = linear(in1)
+
+        # hook改变了layer的输入（input = input * 2），所以out0等于out1
+        assert (out0.numpy() == out1.numpy()).any()
+
+.. py:method:: register_forward_post_hook(hook)
+
+为Layer注册一个 ``forward post-hook`` 函数，该 ``hook`` 函数将会在 ``forward`` 函数调用之后被调用。
+
+``hook`` 函数具有以下形式，它的 ``input`` 和 ``output`` 是 ``Layer`` 的 ``input`` 和 ``output`` 。用户可以用该函数来查看和修改 ``Layer`` ``forward`` 函数的输出。
+
+hook(Layer, input, output) -> None or modified output
+
+参数：
+    - **hook** (function) - 被注册为 ``forward post-hook`` 的函数
+
+返回：一个 ``HookRemoveHelper`` 类对象，可通过调用 ``hook_remove_helper.remove()`` 来删除注册的hook函数。
+
+返回类型： ``HookRemoveHelper`` 类对象
+
+**代码示例**
+
+.. code-block:: python
+
+    import paddle.fluid as fluid
+    import numpy as np
+
+    # forward_post_hook函数改变了layer的输出：output = output * 2
+    def forward_post_hook(layer, input, output):
+        # 改变输出值
+        return output * 2
+
+    with fluid.dygraph.guard():
+        linear = fluid.Linear(13, 5, dtype="float32")
+
+        # 注册hook
+        forward_post_hook_handle = linear.register_forward_post_hook(forward_post_hook)
+
+        value1 = np.arange(26).reshape(2, 13).astype("float32")
+        in1 = fluid.dygraph.to_variable(value1)
+
+        out0 = linear(in1)
+
+        # remove the hook
+        forward_post_hook_handle.remove()
+
+        out1 = linear(in1)
+
+        # hook改变了layer的输出（output = output * 2），所以out0等于out1 * 2
+        assert (out0.numpy() == (out1.numpy()) * 2).any()
+
 .. py:method:: create_parameter(shape, attr=None, dtype="float32", is_bias=False, default_initializer=None)

 为Layer创建参数。

--- a/doc/fluid/api_cn/dygraph_cn/NoamDecay_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/NoamDecay_cn.rst
@@ -5,7 +5,7 @@ NoamDecay

 **注意：该API仅支持【动态图】模式**

-.. py:class:: paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32')
+.. py:class:: paddle.fluid.dygraph.NoamDecay(d_model, warmup_steps, begin=1, step=1, dtype='float32', learning_rate=1.0)

 该接口提供Noam衰减学习率的功能。

@@ -13,7 +13,7 @@ Noam衰减的计算方式如下。

 .. math::

-    decayed\_learning\_rate = d_{model}^{-0.5} * min(global\_steps^{-0.5}, global\_steps * warmup\_steps^{-1.5})
+    decayed\_learning\_rate = learning\_rate * d_{model}^{-0.5} * min(global\_steps^{-0.5}, global\_steps * warmup\_steps^{-1.5})

 关于Noam衰减的更多细节请参考 `attention is all you need <https://arxiv.org/pdf/1706.03762.pdf>`_

@@ -28,6 +28,7 @@ Noam衰减的计算方式如下。
    - **begin** (int，可选) – 起始步。即以上运算式子中global_steps的初始值。默认值为0。
    - **step** (int，可选) – 步大小。即以上运算式子中global_steps的递增值。默认值为1。
    - **dtype** (str，可选) – 学习率值的数据类型，可以为"float32", "float64"。默认值为"float32"。
+    - **learning_rate** (Variable|float|int，可选) - 初始学习率。如果类型为Variable，则为shape为[1]的Tensor，数据类型为float32或float64；也可以是python的int类型。默认值为1.0。

 返回： 无

@@ -39,7 +40,9 @@ Noam衰减的计算方式如下。
    warmup_steps = 100
    learning_rate = 0.01
    with fluid.dygraph.guard():
+        emb = fluid.dygraph.Embedding([10, 10])
        optimizer  = fluid.optimizer.SGD(
            learning_rate = fluid.dygraph.NoamDecay(
                   1/(warmup_steps *(learning_rate ** 2)),
-                   warmup_steps) )
+                   warmup_steps),
+            parameter_list = emb.parameters())
--- a/doc/fluid/api_cn/dygraph_cn/PiecewiseDecay_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/PiecewiseDecay_cn.rst
@@ -35,8 +35,10 @@ PiecewiseDecay
    boundaries = [10000, 20000]
    values = [1.0, 0.5, 0.1]
    with fluid.dygraph.guard():
+        emb = fluid.dygraph.Embedding( [10, 10] )
        optimizer = fluid.optimizer.SGD(
-           learning_rate=fluid.dygraph.PiecewiseDecay(boundaries, values, 0) )
+            learning_rate=fluid.dygraph.PiecewiseDecay(boundaries, values, 0),
+            parameter_list = emb.parameters() )




--- a/doc/fluid/api_cn/dygraph_cn/grad_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/grad_cn.rst
+.. _cn_api_fluid_dygraph_grad:
+
+grad
+-------------------------------
+
+**注意：该API仅支持【动态图】模式**
+
+.. py:method:: paddle.fluid.dygraph.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False, no_grad_vars=None, backward_strategy=None)
+
+对于每个 `inputs` ，计算所有 `outputs` 相对于其的梯度和。
+
+参数:
+    - **outputs** (Variable|list(Variable)|tuple(Variable)) – 用于计算梯度的图的输出变量，或多个输出变量构成的list/tuple。
+    - **inputs** (Variable|list(Variable)|tuple(Variable)) - 用于计算梯度的图的输入变量，或多个输入变量构成的list/tuple。该API的每个返回值对应每个 `inputs` 的梯度。
+    - **grad_outputs** (Variable|list(Variable|None)|tuple(Variable|None), 可选) - `outputs` 变量梯度的初始值。若 `grad_outputs` 为None，则 `outputs` 梯度的初始值均为全1的Tensor。若 `grad_outputs` 不为None，它必须与 `outputs` 的长度相等，此时，若 `grad_outputs` 的第i个元素为None，则第i个 `outputs` 的梯度初始值为全1的Tensor；若 `grad_outputs` 的第i个元素为Variable，则第i个 `outputs` 的梯度初始值为 `grad_outputs` 的第i个元素。默认值为None。
+    - **retain_graph** (bool, 可选) - 是否保留计算梯度的前向图。若值为True，则前向图会保留，用户可对同一张图求两次反向。若值为False，则前向图会释放。默认值为None，表示值与 `create_graph` 相等。
+    - **create_graph** (bool, 可选) - 是否创建计算过程中的反向图。若值为True，则可支持计算高阶导数。若值为False，则计算过程中的反向图会释放。默认值为False。
+    - **only_inputs** (bool, 可选) - 是否只计算 `inputs` 的梯度。若值为False，则图中所有叶节点变量的梯度均会计算，并进行累加。若值为True，则只会计算 `inputs` 的梯度。默认值为True。only_inputs=False功能正在开发中，目前尚不支持。
+    - **allow_unused** (bool, 可选) - 决定当某些 `inputs` 变量不在计算图中时抛出错误还是返回None。若某些 `inputs` 变量不在计算图中（即它们的梯度为None），则当allowed_unused=False时会抛出错误，当allow_unused=True时会返回None作为这些变量的梯度。默认值为False。
+    - **no_grad_vars** (Variable|list(Variable)|tuple(Variable)|set(Variable), 可选) - 指明不需要计算梯度的变量。默认值为None。
+    - **backward_strategy** (BackwardStrategy, 可选) - 计算梯度的策略。详见 :ref:`cn_api_fluid_dygraph_BackwardStrategy` 。默认值为None。
+
+返回: 变量构成的tuple，其长度等于 `inputs` 中的变量个数，且第i个返回的变量是所有 `outputs` 相对于第i个 `inputs` 的梯度之和。
+
+返回类型: tuple
+
+**示例代码 1**
+  .. code-block:: python
+
+        import paddle.fluid as fluid
+
+        def test_dygraph_grad(create_graph):
+            with fluid.dygraph.guard():
+                x = fluid.layers.ones(shape=[1], dtype='float32')
+                x.stop_gradient = False
+                y = x * x
+
+                # Since y = x * x, dx = 2 * x
+                dx = fluid.dygraph.grad(
+                        outputs=[y],
+                        inputs=[x],
+                        create_graph=create_graph,
+                        retain_graph=True)[0]
+
+                z = y + dx
+
+                # If create_graph = False, the gradient of dx
+                # would not be backpropagated. Therefore,
+                # z = x * x + dx, and x.gradient() = 2 * x = 2.0
+
+                # If create_graph = True, the gradient of dx
+                # would be backpropagated. Therefore,
+                # z = x * x + dx = x * x + 2 * x, and
+                # x.gradient() = 2 * x + 2 = 4.0
+
+                z.backward()
+                return x.gradient()
+
+        print(test_dygraph_grad(create_graph=False)) # [2.]
+        print(test_dygraph_grad(create_graph=True)) # [4.]
+
+**示例代码 2**
+  .. code-block:: python
+
+        import paddle.fluid as fluid
+
+        fluid.enable_dygraph()
+
+        def test_dygraph_grad(grad_outputs=None):
+            x = fluid.layers.fill_constant(shape=[1], value=2.0, dtype='float32')
+            x.stop_gradient = False
+
+            y1 = x * x
+            y2 = x * 3
+
+            # If grad_outputs=None, dy1 = [1], dy2 = [1].
+            # If grad_outputs=[g1, g2], then:
+            #    - dy1 = [1] if g1 is None else g1
+            #    - dy2 = [1] if g2 is None else g2
+
+            # Since y1 = x * x, dx = 2 * x * dy1.
+            # Since y2 = x * 3, dx = 3 * dy2.
+            # Therefore, the final result would be:
+            # dx = 2 * x * dy1 + 3 * dy2 = 4 * dy1 + 3 * dy2.
+
+            dx = fluid.dygraph.grad(
+                outputs=[y1, y2],
+                inputs=[x],
+                grad_outputs=grad_outputs)[0]
+
+            return dx.numpy()
+
+        THREE = fluid.layers.fill_constant(shape=[1], value=3.0, dtype='float32')
+        FOUR = fluid.layers.fill_constant(shape=[1], value=4.0, dtype='float32')
+
+        # dy1 = [1], dy2 = [1]
+        print(test_dygraph_grad(None)) # [7.]
+
+        # dy1 = [1], dy2 = [4]
+        print(test_dygraph_grad([None, FOUR])) # [16.]
+
+        # dy1 = [4], dy2 = [1]
+        print(test_dygraph_grad([FOUR, None])) # [19.]
+
+        # dy1 = [3], dy2 = [4]
+        print(test_dygraph_grad([THREE, FOUR])) # [24.]
\ No newline at end of file
--- a/doc/fluid/api_cn/executor_cn/Executor_cn.rst
+++ b/doc/fluid/api_cn/executor_cn/Executor_cn.rst
@@ -96,6 +96,7 @@ Executor支持单GPU、多GPU以及CPU运行。在Executor构造时，需要传
  - **scope** (Scope) – 该参数表示执行当前program所使用的作用域，用户可以为不同的program指定不同的作用域。默认值：fluid.global_scope()。
  - **return_numpy** (bool) – 该参数表示是否将返回返回的计算结果（fetch list中指定的变量）转化为numpy；如果为False，则每个变量返回的类型为LoDTensor，否则返回变量的类型为numpy.ndarray。默认为：True。
  - **use_program_cache** (bool) – 该参数表示是否对输入的Program进行缓存。如果该参数为True，在以下情况时，模型运行速度可能会更快：输入的program为 ``fluid.Program`` ，并且模型运行过程中，调用该接口的参数（program、 feed变量名和fetch_list变量）名始终不变。默认为：False。
+  - **use_prune** (bool) – 该参数表示是否对输入的Program进行剪枝。如果该参数为True，输入的Program会在run之前根据 ``feed`` 和 ``fetch_list`` 进行剪枝，剪枝的逻辑是将产生 ``feed`` 的 ``Variable`` 和 ``Operator`` 以及不产生 ``fetch_list`` 的 ``Variable`` 和 ``Operator`` 进行裁剪。默认为：False，表示不进行剪枝。请注意，如果将 ``Optimizer.minimize()`` 方法返回的 ``tuple`` 传入 ``fetch_list`` 中，则 ``use_prune`` 会被重写为True，并且会开启剪枝。
  
 返回：返回fetch_list中指定的变量值


--- a/doc/fluid/api_cn/fluid_cn/BuildStrategy_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/BuildStrategy_cn.rst
@@ -7,7 +7,7 @@ BuildStrategy

 .. py:class:: paddle.fluid.BuildStrategy

-``BuildStrategy`` 使用户更方便地控制[ ``ParallelExecutor`` ](../fluid_cn.html\#parallelexecutor)中计算图的建造方法，可通过设置 ``ParallelExecutor`` 中的 ``BuildStrategy`` 成员来实现此功能。
+``BuildStrategy`` 使用户更方便地控制 :ref:`cn_api_fluid_ParallelExecutor` 中计算图的建造方法，可通过设置 ``ParallelExecutor`` 中的 ``BuildStrategy`` 成员来实现此功能。

 **代码示例**

@@ -68,6 +68,7 @@ bool类型。表明是否融合(fuse) broadcast ops。该选项指在Reduce模
 **代码示例**

 .. code-block:: python
+
    import paddle.fluid as fluid
    build_strategy = fluid.BuildStrategy()
    build_strategy.fuse_broadcast_ops = True
@@ -108,6 +109,7 @@ bool类型。表明是否融合(fuse) relu和depthwise_conv2d，节省GPU内存

    import os
    import numpy as np
+    import paddle.fluid as fluid
    import paddle.fluid.compiler as compiler

    use_cuda = True

--- a/doc/fluid/api_cn/fluid_cn/CompiledProgram_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/CompiledProgram_cn.rst
@@ -22,34 +22,29 @@ CompiledProgram根据 `build_strategy` 的配置将输入的Program或Graph进
 .. code-block:: python
        
    import paddle.fluid as fluid
-    import paddle.fluid.compiler as compiler
    import numpy
-    import os
-    
+
    place = fluid.CUDAPlace(0) # fluid.CPUPlace()
    exe = fluid.Executor(place)
-    
-    data = fluid.layers.data(name='X', shape=[1], dtype='float32')
+
+    data = fluid.data(name='X', shape=[None, 1], dtype='float32')
    hidden = fluid.layers.fc(input=data, size=10)
    loss = fluid.layers.mean(hidden)
    fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)

    exe.run(fluid.default_startup_program())
-    build_strategy = fluid.BuildStrategy()
-    build_strategy.fuse_all_optimizer_ops = True
-    compiled_prog = compiler.CompiledProgram(
-             fluid.default_main_program(), 
-             build_strategy=build_strategy)
-    
+    compiled_prog = fluid.CompiledProgram(
+              fluid.default_main_program())
+
    x = numpy.random.random(size=(10, 1)).astype('float32')
    loss_data, = exe.run(compiled_prog,
-                         feed={"X": x},
-                         fetch_list=[loss.name])
+                          feed={"X": x},
+                          fetch_list=[loss.name])


 .. py:method:: with_data_parallel(loss_name=None, build_strategy=None, exec_strategy=None, share_vars_from=None, places=None)

-该接口用于将输入的Program或Graph进行转换，以便通过数据并行模式运行该模型。用户可以通过 `build_strategy` 和 `exec_strategy` 设置计算图构建和计算图执行过程中可以进行的一些优化，例如：将梯度聚合的AllReduce操作进行融合、指定计算图运行过程中使用的线程池大小等。**注意：如果在构建CompiledProgram和调用with_data_parallel时都指定了build_strategy，在CompiledProgram中的build_strategy会被复写，因此，如果是数据并行训练，建议在调用with_data_parallel接口是设置build_strategy**。
+该接口用于将输入的Program或Graph进行转换，以便通过数据并行模式运行该模型。用户可以通过 `build_strategy` 和 `exec_strategy` 设置计算图构建和计算图执行过程中可以进行的一些优化，例如：将梯度聚合的AllReduce操作进行融合、指定计算图运行过程中使用的线程池大小等。**注意：如果在构建CompiledProgram和调用with_data_parallel时都指定了build_strategy，在CompiledProgram中的build_strategy会被复写，因此，如果是数据并行训练，建议在调用with_data_parallel接口时设置build_strategy**。
     
 参数：
  - **loss_name** （str） - 该参数为模型最后得到的损失变量的名字，**注意：如果是模型训练，必须设置loss_name，否则计算结果可能会有问题。** 默认为：None。
@@ -70,45 +65,47 @@ CompiledProgram根据 `build_strategy` 的配置将输入的Program或Graph进
 **代码示例**

 .. code-block:: python
-            
+
    import paddle.fluid as fluid
-    import paddle.fluid.compiler as compiler
    import numpy
    import os
-    
+
    use_cuda = True
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    parallel_places = [fluid.CUDAPlace(0), fluid.CUDAPlace(1)] if use_cuda else [fluid.CPUPlace()] * 2
+
    # 注意：如果你使用CPU运行程序，需要具体设置CPU_NUM，
    # 否则fluid会把逻辑核的所有数目设为CPU_NUM，
    # 在这种情况下，输入的batch size应大于CPU_NUM，
    # 否则程序会异常中断。
    if not use_cuda:
        os.environ['CPU_NUM'] = str(2)
-    
+
    exe = fluid.Executor(place)
-    
-    data = fluid.layers.data(name='X', shape=[1], dtype='float32')
+
+    data = fluid.data(name='X', shape=[None, 1], dtype='float32')
    hidden = fluid.layers.fc(input=data, size=10)
    loss = fluid.layers.mean(hidden)
+
    test_program = fluid.default_main_program().clone(for_test=True)
    fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)
-    
+
    exe.run(fluid.default_startup_program())
-    build_strategy = fluid.BuildStrategy()
-    build_strategy.fuse_all_reduce_ops = True
-    compiled_train_prog = compiler.CompiledProgram(
-             fluid.default_main_program()).with_data_parallel(
-                      loss_name=loss.name, build_strategy=build_strategy)
-    # 注意：如果此处不设置share_vars_from=compiled_train_prog，测试过程中用的参数与训练使用的参数是不一致
-    compiled_test_prog = compiler.CompiledProgram(
-             test_program).with_data_parallel(
-                      share_vars_from=compiled_train_prog)
+    compiled_train_prog = fluid.CompiledProgram(
+        fluid.default_main_program()).with_data_parallel(
+                loss_name=loss.name, places=parallel_places)
+    # 注意：如果此处不设置share_vars_from=compiled_train_prog，
+    # 测试过程中用的参数与训练使用的参数是不一致
+    compiled_test_prog = fluid.CompiledProgram(
+        test_program).with_data_parallel(
+                share_vars_from=compiled_train_prog,
+                places=parallel_places)

    train_data = numpy.random.random(size=(10, 1)).astype('float32')
    loss_data, = exe.run(compiled_train_prog,
-                         feed={"X": train_data},
-                         fetch_list=[loss.name])
+                      feed={"X": train_data},
+                      fetch_list=[loss.name])
    test_data = numpy.random.random(size=(10, 1)).astype('float32')
    loss_data, = exe.run(compiled_test_prog,
-                         feed={"X": test_data},
-                         fetch_list=[loss.name])
\ No newline at end of file
+                      feed={"X": test_data},
+                      fetch_list=[loss.name])
\ No newline at end of file
--- a/doc/fluid/api_cn/fluid_cn/ExecutionStrategy_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/ExecutionStrategy_cn.rst
@@ -33,7 +33,7 @@ ExecutionStrategy

    train_exe = fluid.ParallelExecutor(use_cuda=False,
                                       loss_name=avg_loss.name,
-                                     exec_strategy=exec_strategy)
+                                       exec_strategy=exec_strategy)


 .. py:attribute:: num_iteration_per_drop_scope

--- a/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/ParamAttr_cn.rst
@@ -5,7 +5,11 @@ ParamAttr
 -------------------------------


-.. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)
+.. py:class:: paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, do_model_average=False)
+
+.. note::
+    该类中的 ``gradient_clip`` 属性在2.0版本会废弃，推荐使用 ``minimize(loss, grad_clip=clip)`` 做梯度裁剪。共有三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 
+    :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。

 创建一个参数属性对象，用户可设置参数的名称、初始化方式、学习率、正则化规则、是否需要训练、梯度裁剪方式、是否做模型平均等属性。

@@ -13,9 +17,10 @@ ParamAttr
    - **name** (str，可选) - 参数的名称。默认值为None，表示框架自动创建参数的名称。
    - **initializer** (Initializer，可选) - 参数的初始化方式。默认值为None，表示权重参数采用Xavier初始化方式，偏置参数采用全0初始化方式。
    - **learning_rate** (float) - 参数的学习率。实际参数的学习率等于全局学习率乘以参数的学习率，再乘以learning rate schedule的系数。
-    - **regularizer** (WeightDecayRegularizer，可选) - 正则化因子。默认值为None，表示没有正则化因子。
+    - **regularizer** (WeightDecayRegularizer，可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 
+      :ref:`cn_api_fluid_regularizer_L2Decay` ，如果在 ``optimizer`` (例如 :ref:`cn_api_fluid_optimizer_SGDOptimizer` ) 中也
+      设置了正则化，``optimizer`` 中的正则化将被忽略。默认值为None，表示没有正则化。
    - **trainable** (bool) - 参数是否需要训练。默认值为True，表示需要训练。
-    - **gradient_clip** (BaseGradientClipAttr，可选) - 梯度裁剪方式。默认值为None，表示不需要梯度裁剪。
    - **do_model_average** (bool) - 是否做模型平均。默认值为False，表示不做模型平均。

 返回: 表示参数属性的对象。

--- a/doc/fluid/api_cn/fluid_cn/Program_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/Program_cn.rst
@@ -57,13 +57,12 @@ Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构
        import paddle.fluid as fluid

        prog = fluid.default_main_program()
-        a = fluid.layers.data(name="X", shape=[2,3], dtype="float32", append_batch_size=False)
-        c = fluid.layers.fc(a, size=3)
+        x = fluid.layers.data(name="X", shape=[2,3], dtype="float32", append_batch_size=False)
+        pred = fluid.layers.fc(x, size=3)
        prog_string = prog.to_string(throw_on_error=True, with_details=False)
        prog_string_with_details = prog.to_string(throw_on_error=False, with_details=True)
-        print(prog_string)
-        print("\n =============== with_details =============== \n")
-        print(prog_string_with_details)
+        print("program string without detail: {}".format(prog_string))
+        print("program string with detail: {}".format(prog_string_with_details))

 .. py:method:: clone(for_test=False)

@@ -82,16 +81,19 @@ Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构

 **代码示例**

- .. code-block:: python
+   ::

-       import paddle.fluid as fluid
-       ## 我们推荐在使用 Optimizer前使用clone()接口
-       test_program = fluid.default_main_program().clone(for_test=True)
-       optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
-       optimizer.minimize()
+        import paddle.fluid as fluid
+        img = fluid.layers.data(name='image', shape=[784])
+        pred = fluid.layers.fc(input=img, size=10, act='relu')
+        loss = fluid.layers.mean(pred)
+        ## 我们推荐在使用 Optimizer前使用clone()接口
+        test_program = fluid.default_main_program().clone(for_test=True)
+        optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
+        optimizer.minimize(loss)

 参数：
- - **for_test** (bool) – 取值为True时，clone方法内部会把operator的属性 ``is_test`` 设置为 True， 并裁剪反向OP和参数优化OP
+ - **for_test** (bool) – 取值为True时，clone方法内部会把operator的属性 ``is_test`` 设置为 True， 并裁剪反向OP和参数优化OP，默认值为False

 返回：当 ``for_test=True`` 时返回一个新的、仅包含当前Program前向内容的Program。否则返回一个新的，和当前Program完全相同的Program

@@ -150,7 +152,7 @@ Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构
                                          input=fluid.layers.fc(hidden, size=10, act='softmax'),
                            label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
                avg_loss = fluid.layers.mean(loss)
-                test_program = train_program.clone(for_test=False)
+                test_program = train_program.clone(for_test=True)
        print_prog(test_program)

        # 由于需要使训练和测试参数共享，我们需要使用训练的 ``startup_program``
@@ -182,7 +184,8 @@ Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构
                for key, value in sorted(six.iteritems(op.all_attrs())):
                    if key not in ['op_callstack', 'op_role_var']:
                        print(" [ attrs: {}:   {} ]".format(key, value))
-        def network(is_test):
+        
+        def network():
            img = fluid.layers.data(name='image', shape=[784])
            hidden = fluid.layers.fc(input=img, size=200, act='relu')
            hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
@@ -192,19 +195,19 @@ Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构
            avg_loss = fluid.layers.mean(loss)
            return avg_loss

-
        train_program_2 = fluid.Program()
        startup_program_2 = fluid.Program()
        test_program_2 = fluid.Program()
        with fluid.program_guard(train_program_2, startup_program_2):
            with fluid.unique_name.guard():
-                 sgd = fluid.optimizer.SGD(learning_rate=1e-3)
-                 sgd.minimize(avg_loss)
+                avg_loss = network()
+                sgd = fluid.optimizer.SGD(learning_rate=1e-3)
+                sgd.minimize(avg_loss)
        # 不使用测试阶段的启动程序
-        with fluid.program_guard(test_program_2, fluid.Program()):
+        with fluid.program_guard(test_program_2, startup_program_2):
            with fluid.unique_name.guard():
-                loss = network(is_test=True)
-        print(test_program_2)
+                avg_loss = network()
+        print_prog(test_program_2)

 上边两个代码片段生成和打印的Program是一样的。

@@ -268,24 +271,7 @@ Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构

 .. py:attribute:: random_seed

-**注意：必须在相关OP被添加之前设置。例如**
-
-**代码示例**
-
-.. code-block:: python
-
-            import paddle.fluid as fluid
-
-            prog = fluid.default_main_program()
-            random_seed = prog.random_seed
-            x_var = fluid.layers.data(name="X", shape=[3,3], dtype="float32", append_batch_size=False)
-
-            # 这里我们必须要在fluid.layers.dropout之前设置random_seed
-            print(random_seed)
-            prog.random_seed = 1
-            z_var = fluid.layers.dropout(x_var, 0.7)
-
-            print(prog.random_seed)
+**注意：必须在相关OP被添加之前设置。**

 程序中随机运算符的默认随机种子。0意味着随机生成随机种子。

@@ -301,12 +287,16 @@ Program是Paddle Fluid对于计算图的一种静态描述，使用Program的构

            prog = fluid.default_main_program()
            random_seed = prog.random_seed
+            x_var = fluid.layers.data(name="X", shape=[3,3], dtype="float32", append_batch_size=False)
            print(random_seed)
-            prog.random_seed = 1
-            print(prog.random_seed)
-
            ## 0
            ## 默认的random seed是 0
+
+            # 这里我们必须要在fluid.layers.dropout之前设置random_seed
+            prog.random_seed = 1
+            z_var = fluid.layers.dropout(x_var, 0.7)
+
+            print(prog.random_seed)
            ## 1
            ## 修改后random seed变成了 1


--- a/doc/fluid/api_cn/fluid_cn/WeightNormParamAttr_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/WeightNormParamAttr_cn.rst
@@ -5,8 +5,11 @@ WeightNormParamAttr

 **注意：该API仅支持【静态图】模式**

-.. py:class:: paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)
+.. py:class:: paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, do_model_average=False)

+.. note::
+    该类中的 ``gradient_clip`` 属性在2.0版本会废弃，推荐使用 ``minimize(loss, grad_clip=clip)`` 做梯度裁剪。共有三种裁剪策略： :ref:`cn_api_fluid_clip_GradientClipByGlobalNorm` 、 
+    :ref:`cn_api_fluid_clip_GradientClipByNorm` 、 :ref:`cn_api_fluid_clip_GradientClipByValue` 。

 该类定义了权重归一化(Weight Normalization)的参数。权重归一化可以将神经网络中权重向量的长度与其方向解耦，详细的定义与实现可以参考论文：`Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks <https://arxiv.org/pdf/1602.07868.pdf>`_

@@ -15,9 +18,10 @@ WeightNormParamAttr
  - **name** (None|str) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认为None。
  - **initializer** （Initializer) - 初始化参数方法，例如 ``initializer = fluid.initializer.ConstantInitializer(1.0)`` 。默认为None，如果为None则使用默认初始化函数 `Xavier()` 。
  - **learning_rate** (float32) - 学习率，优化过程 :math:`global\_lr∗parameter\_lr∗scheduler\_factor` 的学习速率，默认为1.0。
-  - **regularizer** (WeightDecayRegularizer) - 正则化方法，例如 ``regularizer = fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1)`` 。默认为None，如果为None则对权重不做正则化。
+  - **regularizer** (WeightDecayRegularizer，可选) - 正则化方法。支持两种正则化策略: :ref:`cn_api_fluid_regularizer_L1Decay` 、 
+    :ref:`cn_api_fluid_regularizer_L2Decay` ，如果在 ``optimizer`` (例如 :ref:`cn_api_fluid_optimizer_SGDOptimizer` ) 中也
+    设置了正则化，``optimizer`` 中的正则化将被忽略。默认值为None，表示没有正则化。
  - **trainable** (bool) - 可选，指明参数是否可训练，默认为True。
-  - **gradient_clip** - 梯度裁剪(Gradient Clipping)的方法，例如 ``gradient_clip = fluid.clip.GradientClipByNorm(clip_norm=2.0))`` 。默认为None，如果为None则对权重不做裁剪。
  - **do_model_average** (bool) - 可选，指明参数是否需要模型平均化操作(Model Average)，默认为False。


@@ -36,7 +40,6 @@ WeightNormParamAttr
                                learning_rate=1.0,
                                regularizer=fluid.regularizer.L2DecayRegularizer(regularization_coeff=0.1),
                                trainable=True,
-                                gradient_clip=fluid.clip.GradientClipByNorm(clip_norm=2.0),
                                do_model_average=False))



--- a/doc/fluid/api_cn/fluid_cn/gradients_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/gradients_cn.rst
@@ -26,7 +26,7 @@ gradients

            import paddle.fluid as fluid

-            x = fluid.layers.data(name='x', shape=[2,8,8], dtype='float32')
+            x = fluid.data(name='x', shape=[None,2,8,8], dtype='float32')
            x.stop_gradient=False
            y = fluid.layers.conv2d(x, 4, 1, bias_attr=False)
            y = fluid.layers.relu(y)

--- a/doc/fluid/api_cn/fluid_cn/program_guard_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/program_guard_cn.rst
@@ -23,7 +23,7 @@ program_guard
    main_program = fluid.Program()
    startup_program = fluid.Program()
    with fluid.program_guard(main_program, startup_program):
-        data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
+        data = fluid.data(name='image', shape=[None, 784, 784], dtype='float32')
        hidden = fluid.layers.fc(input=data, size=10, act='relu')

 例如，当组的网不需要startup_program初始化各变量时，可以传入一个临时的program。
@@ -36,5 +36,5 @@ program_guard
    main_program = fluid.Program()
    # 如果您不需要关心startup program,传入一个临时值即可
    with fluid.program_guard(main_program, fluid.Program()):
-        data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
+        data = fluid.data(name='image', shape=[None, 784, 784], dtype='float32')

--- a/doc/fluid/api_cn/framework_cn.rst
+++ b/doc/fluid/api_cn/framework_cn.rst
+=======================
+paddle.framework
+=======================
+
+
+
+
+..  toctree::
+    :maxdepth: 1
+
+    framework_cn/get_default_dtype.rst
+    framework_cn/manual_seed.rst
+    framework_cn/set_default_dtype.rst
--- a/doc/fluid/api_cn/framework_cn/get_default_dtype.rst
+++ b/doc/fluid/api_cn/framework_cn/get_default_dtype.rst
--- a/doc/fluid/api_cn/framework_cn/manual_seed.rst
+++ b/doc/fluid/api_cn/framework_cn/manual_seed.rst
--- a/doc/fluid/api_cn/framework_cn/set_default_dtype.rst
+++ b/doc/fluid/api_cn/framework_cn/set_default_dtype.rst
--- a/doc/fluid/api_cn/index_cn.rst
+++ b/doc/fluid/api_cn/index_cn.rst
@@ -7,6 +7,7 @@ API Reference

    ../api_guides/index_cn.rst
    fluid_cn.rst
+    api_tree_cn.rst
    backward_cn.rst
    clip_cn.rst
    dataset_cn.rst

--- a/doc/fluid/api_cn/io_cn/DataLoader_cn.rst
+++ b/doc/fluid/api_cn/io_cn/DataLoader_cn.rst
@@ -138,7 +138,10 @@ DataLoader当前仅支持 ``map-style`` 的数据集(可通过下标索引样本
    # -------------------------------------------------------


-.. py:method:: from_generator(feed_list=None, capacity=None, use_double_buffer=True, iterable=True, return_list=False, use_multiprocess=False)
+.. py:method:: from_generator(feed_list=None, capacity=None, use_double_buffer=True, iterable=True, return_list=False, use_multiprocess=False, drop_last=True)
+
+.. note::
+    框架保证DataLoader的数据加载顺序与用户提供的数据源读取顺序一致。

 创建一个DataLoader对象用于加载Python生成器产生的数据。数据会由Python线程预先读取，并异步送入一个队列中。

@@ -158,12 +161,13 @@ DataLoader当前仅支持 ``map-style`` 的数据集(可通过下标索引样本
    - **iterable** (bool) - 所创建的DataLoader对象是否可迭代。
    - **return_list** (bool) - 每个设备上的数据是否以list形式返回。仅在iterable = True模式下有效。若return_list = False，每个设备上的返回数据均是str -> LoDTensor的映射表，其中映射表的key是每个输入变量的名称。若return_list = True，则每个设备上的返回数据均是list(LoDTensor)。推荐在静态图模式下使用return_list = False，在动态图模式下使用return_list = True。
    - **use_multiprocess** (bool) - 设置是否是用多进程加速动态图的数据载入过程。注意：该参数的设置仅在动态图模式下有效, 在静态图模式下，该参数设置与否均无任何影响。默认值为False。
+    - **drop_last** (bool): 是否丢弃最后的不足CPU/GPU设备数的批次。默认值为True。在网络训练时，用户不能设置drop_last=False，此时所有CPU/GPU设备均应从DataLoader中读取到数据。在网络预测时，用户可以设置drop_last=False，此时最后不足CPU/GPU设备数的批次可以进行预测。

 返回: 被创建的DataLoader对象

 返回类型: loader (DataLoader)

-**代码示例**
+**代码示例 1**

 .. code-block:: python

@@ -297,6 +301,50 @@ DataLoader当前仅支持 ``map-style`` 的数据集(可通过下标索引样本
                    assert relu.shape == [BATCH_SIZE, 784]


+**代码示例 2**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import numpy as np
+            import os
+
+            # We use 2 CPU cores to run inference network
+            os.environ['CPU_NUM'] = '2'
+
+            # The data source has only 3 batches, which can not be
+            # divided evenly to each CPU core
+            def batch_generator():
+                for i in range(3):
+                    yield np.array([i+1]).astype('float32'),
+
+            x = fluid.data(name='x', shape=[None], dtype='float32')
+            y = x * x
+
+            def run_inference(drop_last):
+                loader = fluid.io.DataLoader.from_generator(feed_list=[x],
+                        capacity=8, drop_last=drop_last)
+                loader.set_batch_generator(batch_generator, fluid.cpu_places())
+
+                exe = fluid.Executor(fluid.CPUPlace())
+                prog = fluid.CompiledProgram(fluid.default_main_program())
+                prog = prog.with_data_parallel()
+
+                result = []
+                for data in loader():
+                    each_ret, = exe.run(prog, feed=data, fetch_list=[y])
+                    result.extend(each_ret)
+                return result
+
+            # Set drop_last to True, so that the last batch whose
+            # number is less than CPU core number would be discarded.
+            print(run_inference(drop_last=True)) # [1.0, 4.0]
+
+            # Set drop_last to False, so that the last batch whose
+            # number is less than CPU core number can be tested.
+            print(run_inference(drop_last=False)) # [1.0, 4.0, 9.0]
+
+
 .. py:method:: from_dataset(dataset, places, drop_last=True)

 创建一个DataLoader对象用于加载Dataset产生的数据。目前，Dataset仅支持Linux系统下使用。

--- a/doc/fluid/api_cn/layers_cn/LSTMCell_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/LSTMCell_cn.rst
@@ -38,7 +38,7 @@ LSTMCell
 .. code-block:: python

    import paddle.fluid.layers as layers
-    cell = layers.rnn.LSTMCell(hidden_size=256)
+    cell = layers.LSTMCell(hidden_size=256)


 .. py:method:: call(inputs, states)
@@ -61,4 +61,4 @@ LSTMCell的 :code:`state_shape` 是一个具有两个形状的列表：:math:`[[

 返回：LSTMCell的 :code:`state_shape` 

-返回类型：list
\ No newline at end of file
+返回类型：list
--- a/doc/fluid/api_cn/layers_cn/add_position_encoding_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/add_position_encoding_cn.rst
@@ -34,14 +34,13 @@ add_position_encoding

 .. code-block:: python

-  import paddle.fluid as fluid
-     
-  tensor = fluid.layers.data(
+    import paddle.fluid as fluid
+
+    tensor = fluid.data(
        name='tensor',
-        shape=[32, 64, 512],
-        dtype='float32',
-        append_batch_size=False)
-  position_tensor = fluid.layers.add_position_encoding(
+        shape=[None, 64, 512],
+        dtype='float32')
+    position_tensor = fluid.layers.add_position_encoding(
        input=tensor, alpha=1.0, beta=1.0)


@@ -53,4 +52,3 @@ add_position_encoding



-
--- a/doc/fluid/api_cn/layers_cn/argsort_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/argsort_cn.rst
@@ -9,7 +9,7 @@ argsort


 参数：
-    - **input** (Variable) - 输入的多维 ``Tensor`` ，支持的数据类型：float32、float64。
+    - **input** (Variable) - 输入的多维 ``Tensor`` ，支持的数据类型：float32、float64、int16、int32、int64、uint8。
    - **axis** (int，可选) - 指定对输入Tensor进行运算的轴， ``axis`` 的有效范围是[-R, R)，R是输入 ``x`` 的Rank， ``axis`` 为负时与 ``axis`` +R 等价。默认值为0。
    - **descending** (bool，可选) - 指定算法排序的方向。如果设置为True，算法按照降序排序。如果设置为False或者不设置，按照升序排序。默认值为False。
    - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。

--- a/doc/fluid/api_cn/layers_cn/create_global_var_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/create_global_var_cn.rst
@@ -26,7 +26,7 @@ create_global_var
    import paddle.fluid as fluid
    import paddle.fluid.layers as layers
    var = layers.create_global_var(shape=[2,3], value=1.0, dtype='float32',
-                     persistable=True, force_cpu=True, name='new_var')
+                                   persistable=True, force_cpu=True, name='new_var')




--- a/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst
@@ -5,17 +5,18 @@ ctc_greedy_decoder

 .. py:function:: paddle.fluid.layers.ctc_greedy_decoder(input, blank, name=None)

-**注意：该OP的输入input必须是2维LoDTensor, lod_level为1** 

 该OP用于贪婪策略解码序列，步骤如下:
    1. 获取输入中的每一行的最大值索引，也就是numpy.argmax(input, axis=0)。
    2. 对于step1结果中的每个序列，合并两个空格之间的重复部分并删除所有空格。

+该API支持两种输入，LoDTensor和Tensor输入，不同输入的代码样例如下：

 **样例**：

 ::

+        # for lod tensor input 
        已知：

        input.data = [[0.6, 0.1, 0.3, 0.1],
@@ -45,13 +46,38 @@ ctc_greedy_decoder

        output.lod = [[2, 1]]

+        # for tensor input
+        input.data = [[[0.6, 0.1, 0.3, 0.1],
+                [0.3, 0.2, 0.4, 0.1],
+                [0.1, 0.5, 0.1, 0.3],
+                [0.5, 0.1, 0.3, 0.1]],
+
+               [[0.5, 0.1, 0.3, 0.1],
+                [0.2, 0.2, 0.2, 0.4],
+                [0.2, 0.2, 0.1, 0.5],
+                [0.5, 0.1, 0.3, 0.1]]]
+
+        input_length.data = [[4], [4]]
+        input.shape = [2, 4, 4]
+
+        step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get:
+            [[0], [2], [1], [0]], for input.data[4:8] is [[0], [3], [3], [0]], shape is [2,4,1]
+        step2: Change the argmax result to use padding mode, then argmax result is
+                [[0, 2, 1, 0], [0, 3, 3, 0]], shape is [2, 4], lod is [], input_length is [[4], [4]]
+        step3: Apply ctc_align to padding argmax result, padding_value is 0
+
+        Finally:
+        output.data = [[2, 1, 0, 0],
+                       [3, 0, 0, 0]]
+        output_length.data = [[2], [1]]
+

 参数:
-        - **input** (Variable) — 变长序列的概率，2维LoDTensor, lod_level为1。它的形状是[Lp, num_classes + 1]，其中Lp是所有输入序列长度的和，num_classes是类别数目(不包括空白标签)。数据类型是float32或者float64
+        - **input** (Variable) — 变长序列的概率， 在输入为LoDTensor情况下，它是具有LoD信息的二维LoDTensor。 形状为[Lp，num_classes +1]，其中Lp是所有输入序列的长度之和，num_classes是真实的类数。 在输入为Tensor情况下，它是带有填充的3-D张量，其形状为[batch_size，N，num_classes +1]。 （不包括空白标签）。 数据类型可以是float32或float64。
        - **blank** (int) — Connectionist Temporal Classification (CTC) loss空白标签索引,  其数值属于半开区间[0,num_classes + 1）
        - **name** (str) — (str|None，可选) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None

-返回： CTC贪婪解码结果是一个形为(Lp,1)的2维LoDTensor，lod_level为1，其中Lp是所有输出序列的长度之和。如果结果中的所有序列都为空，则输出LoDTensor为[-1]，其lod信息为空。
+返回：对于输入为LoDTensor的情况，返回CTC贪婪解码器的结果，即2-D LoDTensor，形状为[Lp，1]，数据类型为int64。 “ Lp”是所有输出序列长度的总和。 如果结果中的所有序列均为空，则结果LoDTensor将为[-1]，其中LoD为[[]]。对于输入为Tensor的情况，返回一个元组，(output, output_length), 其中，output是一个形状为 [batch_size, N]，类型为int64的Tensor。output_length是一个形状为[batch_size, 1]，类型为int64的Tensor，表示Tensor输入下，每个输出序列的长度。

 返回类型： Variable

@@ -60,9 +86,15 @@ ctc_greedy_decoder

 ..  code-block:: python

+    # for lod mode
    import paddle.fluid as fluid
-    x = fluid.layers.data(name='x', shape=[8], dtype='float32')
+    x = fluid.data(name='x', shape=[None, 8], dtype='float32', lod_level=1)
    cost = fluid.layers.ctc_greedy_decoder(input=x, blank=0)
+    # for padding mode
+    x_pad = fluid.data(name='x_pad', shape=[10, 4, 8], dtype='float32')
+    x_pad_len = fluid.data(name='x_pad_len', shape=[10, 1], dtype='int64')
+    out, out_len = fluid.layers.ctc_greedy_decoder(input=x_pad, blank=0,
+                input_length=x_pad_len)




--- a/doc/fluid/api_cn/layers_cn/inplace_abn_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/inplace_abn_cn.rst
+.. _cn_api_fluid_layers_inplace_abn:
+
+inplace_abn
+-------------------------------
+
+**注意：该API仅支持【静态图】模式**
+
+.. py:function:: paddle.fluid.layers.inplace_abn(input, act=None, is_test=False, momentum=0.9, epsilon=1e-05, param_attr=None, bias_attr=None, data_layout='NCHW', name=None, moving_mean_name=None, moving_variance_name=None, do_model_average_for_mean_and_var=False, use_global_stats=False, act_alpha=1.0)
+
+就地批正则化化激活层（Inplace Activation Batch Normalization Layer）
+
+此层使用就地内存计算批处理正则化和激活来实现节省内存，有关批量正则化计算，请参见 ``fluid.layers.batch_norm`` ，有关就地激活批正则化化的计算，请参考 `In-Place Activated BatchNorm for Memory-Optimized Training of DNNs <https://arxiv.org/abs/1712.02616>`_。
+
+参数：
+    - **input** (Variable) - inplace_abn算子的输入特征，是一个Variable类型，输入维度可以是 2, 3, 4, 5。数据类型：flaot16, float32, float64。
+    - **act** （string）- 激活函数类型，可以是leaky_realu、relu、prelu等。默认：None。
+    - **is_test** （bool） - 指示它是否在测试阶段，非训练阶段使用训练过程中统计到的全局均值和全局方差。默认：False。
+    - **momentum** （float|Variable）- 此值用于计算 moving_mean 和 moving_var，是一个float类型或者一个shape为[1]，数据类型为float32的Variable类型。更新公式为:  :math:`moving\_mean = moving\_mean * momentum + new\_mean * (1. - momentum)` ， :math:`moving\_var = moving\_var * momentum + new\_var * (1. - momentum)` ， 默认：0.9。
+    - **epsilon** （float）- 加在分母上为了数值稳定的值。默认：1e-5。
+    - **param_attr** (ParamAttr|None) ：指定权重参数属性的对象。默认值为None，表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。inplace_abn算子默认的权重初始化是1.0。
+    - **bias_attr** （ParamAttr|None）- 指定偏置参数属性的对象。默认值为None，表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。inplace_abn算子默认的偏置初始化是0.0。
+    - **data_layout** （string) - 指定输入的数据格式，输出的数据格式将与输入保持一致，可以是"NCHW"和"NHWC"。N是批尺寸，C是通道数，H是特征高度，W是特征宽度。默认值："NCHW"。
+    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
+    - **moving_mean_name** （string）- moving_mean的名称，存储全局均值。如果将其设置为None, ``inplace_abn`` 将随机命名全局均值；否则， ``inplace_abn`` 将命名全局均值为 ``moving_mean_name`` 。默认：None。
+    - **moving_variance_name** （string）- moving_variance的名称，存储全局变量。如果将其设置为None, ``inplace_abn`` 将随机命名全局方差；否则， ``inplace_abn`` 将命名全局方差为 ``moving_variance_name`` 。默认：None。
+    - **do_model_average_for_mean_and_var** （bool，默认False）- 是否为mean和variance做模型均值。
+    - **use_global_stats** （bool） – 是否使用全局均值和方差。 在预测或测试模式下，将use_global_stats设置为true或将is_test设置为true，并且行为是等效的。 在训练模式中，当设置use_global_stats为True时，在训练期间也使用全局均值和方差。默认：False。
+    - **act_alpha** （float） – 当 ``act`` 参数为None、leaky-relu、elu时，会使用就地批正则化激活算法，可通过此参数给定leaky-relu、elu的 ``alpha`` 值。默认：1.0。
+
+
+返回： 维度和输入相同的Tensor，在输入中运用批正则后的结果。
+
+返回类型：Variable
+
+**代码示例**：
+
+.. code-block:: python
+
+		import paddle.fluid as fluid
+		x = fluid.data(name='x', shape=[3, 7, 3, 7], dtype='float32')
+		hidden1 = fluid.layers.fc(input=x, size=200, param_attr='fc1.w')
+		hidden2 = fluid.layers.inplace_abn(input=hidden1)
+		hidden3 = fluid.layers.inplace_abn(input=hidden2, act='leaky_relu', act_alpha=0.2)
--- a/doc/fluid/api_cn/layers_cn/lstm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/lstm_cn.rst
@@ -57,7 +57,7 @@ lstm

 返回： 经过lstm运算输出的三个Tensor的tuple，包括

- rnn_out：LSTM hidden的输出结果的Tensor，数据类型与input一致，维度为 :math:`[seq\_len, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True，则维度为 :math:`[seq\_len, batch\_size, hidden\_size*2]`
+- rnn_out：LSTM hidden的输出结果的Tensor，数据类型与input一致，维度为 :math:`[batch\_size, seq\_len, hidden\_size]` 。如果 ``is_bidirec`` 设置为True，则维度为 :math:`[batch\_size, seq\_len, hidden\_size*2]`
 - last_h：LSTM最后一步的hidden状态的Tensor，数据类型与input一致，维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True，则维度为 :math:`[num\_layers*2, batch\_size, hidden\_size]`
 - last_c：LSTM最后一步的cell状态的Tensor，数据类型与input一致，维度为 :math:`[num\_layers, batch\_size, hidden\_size]` 。如果 ``is_bidirec`` 设置为True，则维度为 :math:`[num\_layers*2, batch\_size, hidden\_size]`

@@ -73,12 +73,11 @@ lstm
  emb_dim = 256
  vocab_size = 10000
  data = fluid.layers.data(name='x', shape=[-1, 100, 1],
-                 dtype='int32')
+                 dtype='int64')
  emb = fluid.layers.embedding(input=data, size=[vocab_size, emb_dim], is_sparse=True)
  batch_size = 20
  max_len = 100
  dropout_prob = 0.2
-  seq_len = 100
  hidden_size = 150
  num_layers = 1
  init_h = layers.fill_constant( [num_layers, batch_size, hidden_size], 'float32', 0.0 )
@@ -87,7 +86,7 @@ lstm
  rnn_out, last_h, last_c = layers.lstm(emb, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=dropout_prob)
  rnn_out.shape  # (-1, 100, 150)
  last_h.shape  # (1, 20, 150)
-  layt_c.shape  # (1, 20, 150)
+  last_c.shape  # (1, 20, 150)




--- a/doc/fluid/api_cn/layers_cn/noam_decay_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/noam_decay_cn.rst
@@ -3,7 +3,7 @@
 noam_decay
 -------------------------------

-.. py:function:: paddle.fluid.layers.noam_decay(d_model,warmup_steps)
+.. py:function:: paddle.fluid.layers.noam_decay(d_model, warmup_steps)

 Noam衰减方法

@@ -14,11 +14,12 @@ noam衰减的numpy实现如下：
    import paddle.fluid as fluid
    import numpy as np
    # 设置超参数
+    base_lr = 0.01
    d_model = 2
    current_steps = 20
    warmup_steps = 200
    # 计算
-    lr_value = np.power(d_model, -0.5) * np.min([
+    lr_value = base_lr * np.power(d_model, -0.5) * np.min([
                           np.power(current_steps, -0.5),
                           np.power(warmup_steps, -1.5) * current_steps])

@@ -27,6 +28,7 @@ noam衰减的numpy实现如下：
 参数：
    - **d_model** (Variable|int) - 模型的输入、输出向量特征维度。类型可设置为标量Tensor，或int值。
    - **warmup_steps** (Variable|int) - 预热步数，类型可设置为标量Tensor，或int值。
+    - **learning_rate** (Variable|float|int，可选) - 初始学习率。如果类型为Variable，则为shape为[1]的Tensor，数据类型为float32或float64；也可以是python的int类型。默认值为1.0。

 返回：衰减的学习率

@@ -41,7 +43,8 @@ noam衰减的numpy实现如下：
        learning_rate = 0.01
        lr = fluid.layers.learning_rate_scheduler.noam_decay(
                       1/(warmup_steps *(learning_rate ** 2)),
-                       warmup_steps)
+                       warmup_steps,
+                       learning_rate)




--- a/doc/fluid/api_cn/layers_cn/pad2d_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pad2d_cn.rst
@@ -19,36 +19,34 @@ pad2d

 返回类型：Variable

-示例：
+**示例**：

 .. code-block:: text

-  假设X是输入图像:
+      Input = [[[[1., 2., 3.],
+                 [4., 5., 6.]]]]

-      X = [[1, 2, 3],
-           [4, 5, 6]]
+      Case 0:
+          paddings = [0, 1, 2, 3],
+          mode = 'constant'
+          pad_value = 0
+          Out = [[[[0., 0., 1., 2., 3., 0., 0., 0.],
+                   [0., 0., 4., 5., 6., 0., 0., 0.],
+                   [0., 0., 0., 0., 0., 0., 0., 0.]]]]

-     Case 0:
-        paddings = [0, 1, 2, 3],
-        mode = 'constant'
-        pad_value = 0
-        Out = [[0, 0, 1, 2, 3, 0, 0, 0]
-               [0, 0, 4, 5, 6, 0, 0, 0]
-               [0, 0, 0, 0, 0, 0, 0, 0]]
+      Case 1:
+          paddings = [0, 1, 2, 1],
+          mode = 'reflect'
+          Out = [[[[3., 2., 1., 2., 3., 2.],
+                   [6., 5., 4., 5., 6., 5.],
+                   [3., 2., 1., 2., 3., 2.]]]]

-     Case 1:
-        paddings = [0, 1, 2, 1],
-        mode = 'reflect'
-        Out = [[3, 2, 1, 2, 3, 2]
-               [6, 5, 4, 5, 6, 5]
-               [3, 2, 1, 2, 3, 2]]
-
-     Case 2:
-        paddings = [0, 1, 2, 1],
-        mode = 'edge'
-        Out = [[1, 1, 1, 2, 3, 3]
-               [4, 4, 4, 5, 6, 6]
-               [4, 4, 4, 5, 6, 6]]
+      Case 2:
+          paddings = [0, 1, 2, 1],
+          mode = 'edge'
+          Out = [[[[1., 1., 1., 2., 3., 3.],
+                   [4., 4., 4., 5., 6., 6.],
+                   [4., 4., 4., 5., 6., 6.]]]]



@@ -56,8 +54,6 @@ pad2d

 .. code-block:: python

-  import paddle.fluid as fluid
-  data = fluid.layers.data(name='data', shape=[3, 32, 32], dtype='float32')
-  result = fluid.layers.pad2d(input=data, paddings=[1,2,3,4], mode='reflect')
-
-
+    import paddle.fluid as fluid
+    data = fluid.data(name='data', shape=[None, 3, 32, 32], dtype='float32')
+    result = fluid.layers.pad2d(input=data, paddings=[0, 1, 2, 3], mode='reflect')
--- a/doc/fluid/api_cn/layers_cn/pad_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pad_cn.rst
@@ -8,23 +8,21 @@ pad
 该OP在Tensor上填充一个由 ``pad_value`` 给出的常数值，填充宽度由 ``paddings`` 指定。
 其中，维度 ``i`` 中 ``x`` 内容前填充的值个数用 ``paddings[2*i]`` 表示，维度 ``i`` 中 ``x`` 内容后填充的值个数用 ``paddings[2*i+1]`` 表示。

-**样例**：
+**示例**：

-::
+.. code-block:: text

        Given:
+            x = [[1, 2], [3, 4]]

-         x = [[1, 2], [3, 4]]
+            paddings = [0, 1, 1, 2]

-        paddings = [0, 1, 1, 2]
-
-        pad_value = 0
+            pad_value = 0

        Return:
-
-        out = [[0, 1, 2, 0, 0]
-               [0, 3, 4, 0, 0]
-               [0, 0, 0, 0, 0]]
+            out = [[0, 1, 2, 0, 0]
+                   [0, 3, 4, 0, 0]
+                   [0, 0, 0, 0, 0]]


 参数:
@@ -44,15 +42,7 @@ pad

    # x 为一个秩为2的张量
    import paddle.fluid as fluid
-    x = fluid.layers.data(name='data', shape=[224], dtype='float32')
+    x = fluid.data(name='data', shape=[300, 300], dtype='float32')
    out = fluid.layers.pad(x=x, paddings=[0, 1, 1, 2], pad_value=0.)


-
-
-
-
-
-
-
-
--- a/doc/fluid/api_cn/layers_cn/pad_constant_like_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/pad_constant_like_cn.rst
@@ -7,9 +7,9 @@ pad_constant_like

 该OP使用 ``pad_value`` 填充 ``y`` ，填充到每个维度值的数量由x和y的形状而指定，((0，x.shape[0] - y.shape[0]), ..., (0, x.shape[i] - y.shape[i]), ..., (0, x.shape[n] - y.shape[n]))是每个维度填充的宽度，对于维度i，填充宽度 ``(0, x.shape[i] - y.shape[i])`` ，表示在y的第i维开头不填充，而在末尾填充 ``x.shape[i] - y.shape[i]`` 个位置。该OP要求y与x具有相同的秩，并且对每个维度i， ``y.shape[i] <= x.shape[i]`` 。

-**样例**
+**示例**：

-::
+.. code-block:: text

    Given:
        X = [[[[ 0,  1,  2],
@@ -24,30 +24,34 @@ pad_constant_like
               [27, 28, 29]],
              [[30, 31, 32],
               [33, 34, 35]]]]
+
        X.shape = (2, 3, 2, 3)

        Y = [[[[35, 36, 37]],
              [[38, 39, 40]],
              [[41, 42, 43]]]]
+
        Y.shape = (1, 3, 1, 3)

-    and
+    And
        pad_value = 0.

-    Output is:
-        out = [[[[35, 36, 37],
-                 [0, 0, 0]],
+    Return:
+        Out = [[[[35, 36, 37],
+                 [ 0,  0,  0]],
                [[38, 39, 40],
-                 [0, 0, 0]],
+                 [ 0,  0,  0]],
                [[41, 42, 43],
-                 [0, 0, 0]]],
-               [[[0, 0, 0], 
-                 [0, 0, 0]],
-                [[0, 0, 0], 
-                 [0, 0, 0]],
-                [[0, 0, 0], 
-                 [0, 0, 0]]]]
-        out.shape = [2, 3, 2, 3]
+                 [ 0,  0,  0]]],
+               [[[ 0,  0,  0], 
+                 [ 0,  0,  0]],
+                [[ 0,  0,  0], 
+                 [ 0,  0,  0]],
+                [[ 0,  0,  0], 
+                 [ 0,  0,  0]]]]
+
+        Out.shape = [2, 3, 2, 3]
+

 参数：
          - **x** （Variable）- 多维Tensor
@@ -66,8 +70,8 @@ pad_constant_like
    # x是秩为4的tensor, x.shape = (2, 3, 2, 3)
    # y是秩为4的tensor, y.shape = (1, 3, 1, 3)
    import paddle.fluid as fluid
-    x = fluid.layers.data(name='x', shape=[2,3,2,3], dtype='float32')
-    y = fluid.layers.data(name='y', shape=[1,3,1,3], dtype='float32')
+    x = fluid.data(name='x', shape=[2,3,2,3], dtype='float32')
+    y = fluid.data(name='y', shape=[1,3,1,3], dtype='float32')
    out = fluid.layers.pad_constant_like(x=x, y=y, pad_value=0.)
    # out是秩为4的tensor, out.shape = [2, 3 ,2 , 3]


--- a/doc/fluid/api_cn/layers_cn/reshape_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/reshape_cn.rst
@@ -54,10 +54,10 @@ reshape

  # example 1:
  # attr shape is a list which doesn't contain tensor Variable.
-  data_1 = fluid.layers.data(
-      name='data_1', shape=[2, 4, 6], dtype='float32')
+  data_1 = fluid.data(
+    name='data_1', shape=[2, 4, 6], dtype='float32')
  reshaped_1 = fluid.layers.reshape(
-      x=data_1, shape=[-1, 0, 3, 2], inplace=True)
+    x=data_1, shape=[-1, 0, 3, 2], inplace=True)
  # the shape of reshaped_1 is [2,4,3,2].

  # example 2:
@@ -69,7 +69,7 @@ reshape

  # example 3:
  data_3 = fluid.data(
-      name="data_3", shape=[2,4,6], dtype='float32')
+    name="data_3", shape=[2,4,6], dtype='float32')
  reshaped_3 = fluid.layers.reshape(x=data_3, shape=[6,8])
  # the shape of reshaped_3 is [6,8].


--- a/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst
@@ -34,27 +34,27 @@ retinanet_detection_output

  import paddle.fluid as fluid

-  bboxes_low = fluid.data(name='bboxes_low', shape=[1, 44, 4],
-      dtype='float32')
-  bboxes_high = fluid.data(name='bboxes_high', shape=[1, 11, 4],
-      dtype='float32')
-  scores_low = fluid.data(name='scores_low', shape=[1, 44, 10],
-      dtype='float32')
-  scores_high = fluid.data(name='scores_high', shape=[1, 11, 10],
-      dtype='float32')
-  anchors_low = fluid.data(name='anchors_low', shape=[44, 4],
-      dtype='float32')
-  anchors_high = fluid.data(name='anchors_high', shape=[11, 4],
-      dtype='float32')
-  im_info = fluid.data(name="im_info", shape=[1, 3],
-      dtype='float32')
+  bboxes_low = fluid.data(
+      name='bboxes_low', shape=[1, 44, 4], dtype='float32')
+  bboxes_high = fluid.data(
+      name='bboxes_high', shape=[1, 11, 4], dtype='float32')
+  scores_low = fluid.data(
+      name='scores_low', shape=[1, 44, 10], dtype='float32')
+  scores_high = fluid.data(
+      name='scores_high', shape=[1, 11, 10], dtype='float32')
+  anchors_low = fluid.data(
+      name='anchors_low', shape=[44, 4], dtype='float32')
+  anchors_high = fluid.data(
+      name='anchors_high', shape=[11, 4], dtype='float32')
+  im_info = fluid.data(
+      name="im_info", shape=[1, 3], dtype='float32')
  nmsed_outs = fluid.layers.retinanet_detection_output(
-                                          bboxes=[bboxes_low, bboxes_high],
-                                          scores=[scores_low, scores_high],
-                                          anchors=[anchors_low, anchors_high],
-                                          im_info=im_info,
-                                          score_threshold=0.05,
-                                          nms_top_k=1000,
-                                          keep_top_k=100,
-                                          nms_threshold=0.45,
-                                          nms_eta=1.)
+      bboxes=[bboxes_low, bboxes_high],
+      scores=[scores_low, scores_high],
+      anchors=[anchors_low, anchors_high],
+      im_info=im_info,
+      score_threshold=0.05,
+      nms_top_k=1000,
+      keep_top_k=100,
+      nms_threshold=0.45,
+      nms_eta=1.0)
--- a/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst
@@ -50,7 +50,6 @@ retinanet_target_assign
 .. code-block:: python

    import paddle.fluid as fluid
-    import numpy as np
 
    bbox_pred = fluid.data(name='bbox_pred', shape=[1, 100, 4],
                      dtype='float32')

--- a/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst
@@ -3,7 +3,7 @@
 sigmoid_focal_loss
 -------------------------------

-.. py:function:: paddle.fluid.layers.sigmoid_focal_loss(x, label, fg_num, gamma=2, alpha=0.25)
+.. py:function:: paddle.fluid.layers.sigmoid_focal_loss(x, label, fg_num, gamma=2.0, alpha=0.25)

 `Focal Loss <https://arxiv.org/abs/1708.02002>`_ 被提出用于解决计算机视觉任务中前景-背景不平衡的问题。该OP先计算输入x中每个元素的sigmoid值，然后计算sigmoid值与类别目标值label之间的Focal Loss。

@@ -49,5 +49,5 @@ Focal Loss的计算过程如下：
    loss = fluid.layers.sigmoid_focal_loss(x=input,
                                           label=label,
                                           fg_num=fg_num,
-                                           gamma=2.,
+                                           gamma=2.0,
                                           alpha=0.25)
--- a/doc/fluid/api_cn/nets_cn/scaled_dot_product_attention_cn.rst
+++ b/doc/fluid/api_cn/nets_cn/scaled_dot_product_attention_cn.rst
@@ -43,31 +43,12 @@ scaled_dot_product_attention

 .. code-block:: python

-          import paddle.fluid as fluid
-          
-          queries = fluid.layers.data(name="queries",
-                                      shape=[3, 5, 9],
-                                      dtype="float32",
-                                      append_batch_size=False)
-          queries.stop_gradient = False
-          keys = fluid.layers.data(name="keys",
-                                   shape=[3, 6, 9],
-                                   dtype="float32",
-                                   append_batch_size=False)
-          keys.stop_gradient = False
-          values = fluid.layers.data(name="values",
-                                     shape=[3, 6, 10],
-                                     dtype="float32",
-                                     append_batch_size=False)
-          values.stop_gradient = False
-          contexts = fluid.nets.scaled_dot_product_attention(queries, keys, values)
-          contexts.shape  # [3, 5, 10]
-
-
-
-
-
-
+    import paddle.fluid as fluid

+    queries = fluid.data(name="queries", shape=[3, 5, 9], dtype="float32")
+    keys = fluid.data(name="keys", shape=[3, 6, 9], dtype="float32")
+    values = fluid.data(name="values", shape=[3, 6, 10], dtype="float32")
+    contexts = fluid.nets.scaled_dot_product_attention(queries, keys, values)
+    contexts.shape  # [3, 5, 10]


--- a/doc/fluid/api_cn/nn_cn.rst
+++ b/doc/fluid/api_cn/nn_cn.rst
+=======================
+paddle.nn
+=======================
+
+
+
+
+..  toctree::
+    :maxdepth: 1
+
+    nn_cn/Conv1D.rst
+    nn_cn/Conv2D.rst
+    nn_cn/diag_embed.rst
+    nn_cn/interpolate.rst
+    nn_cn/Linear.rst
+    nn_cn/log_softmax.rst
+    nn_cn/ReLU.rst
+    nn_cn/Upsample.rst
+    nn_cn/activation_cn.rst
+    nn_cn/loss_cn.rst
--- a/doc/fluid/api_cn/nn_cn/Conv1D.rst
+++ b/doc/fluid/api_cn/nn_cn/Conv1D.rst
--- a/doc/fluid/api_cn/nn_cn/Conv2D.rst
+++ b/doc/fluid/api_cn/nn_cn/Conv2D.rst
--- a/doc/fluid/api_cn/nn_cn/Linear.rst
+++ b/doc/fluid/api_cn/nn_cn/Linear.rst
--- a/doc/fluid/api_cn/nn_cn/ReLU.rst
+++ b/doc/fluid/api_cn/nn_cn/ReLU.rst
--- a/doc/fluid/api_cn/nn_cn/Upsample.rst
+++ b/doc/fluid/api_cn/nn_cn/Upsample.rst
--- a/doc/fluid/api_cn/nn_cn/activation_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn.rst
+=======================
+activation
+=======================
+
+
+
+
+..  toctree::
+    :maxdepth: 1
+
+    activation_cn/Sigmoid.rst
--- a/doc/fluid/api_cn/nn_cn/activation_cn/Sigmoid.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/Sigmoid.rst
--- a/doc/fluid/api_cn/nn_cn/diag_embed.rst
+++ b/doc/fluid/api_cn/nn_cn/diag_embed.rst
--- a/doc/fluid/api_cn/nn_cn/interpolate.rst
+++ b/doc/fluid/api_cn/nn_cn/interpolate.rst
--- a/doc/fluid/api_cn/nn_cn/log_softmax.rst
+++ b/doc/fluid/api_cn/nn_cn/log_softmax.rst
--- a/doc/fluid/api_cn/nn_cn/loss_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn.rst
--- a/doc/fluid/api_cn/nn_cn/loss_cn/BCELoss.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/BCELoss.rst
--- a/doc/fluid/api_cn/nn_cn/loss_cn/CrossEntropyLoss.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/CrossEntropyLoss.rst
--- a/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss.rst
--- a/doc/fluid/api_cn/nn_cn/loss_cn/MSELoss.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/MSELoss.rst
--- a/doc/fluid/api_cn/nn_cn/loss_cn/NLLLoss.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/NLLLoss.rst
--- a/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdadeltaOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdagradOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdamOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/AdamaxOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/DGCMomentumOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/DGCMomentumOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/DecayedAdagradOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/DpsgdOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/DpsgdOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/FtrlOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/FtrlOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/LambOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/LambOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/LarsMomentumOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/LarsMomentumOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/ModelAverage_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/ModelAverage_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/MomentumOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/MomentumOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/RMSPropOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/RMSPropOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/RecomputeOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/RecomputeOptimizer_cn.rst
--- a/doc/fluid/api_cn/optimizer_cn/SGDOptimizer_cn.rst
+++ b/doc/fluid/api_cn/optimizer_cn/SGDOptimizer_cn.rst
--- a/doc/fluid/api_cn/regularizer_cn/L1Decay_cn.rst
+++ b/doc/fluid/api_cn/regularizer_cn/L1Decay_cn.rst
--- a/doc/fluid/api_cn/regularizer_cn/L2Decay_cn.rst
+++ b/doc/fluid/api_cn/regularizer_cn/L2Decay_cn.rst
--- a/doc/fluid/api_cn/tensor_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/add.rst
+++ b/doc/fluid/api_cn/tensor_cn/add.rst
--- a/doc/fluid/api_cn/tensor_cn/addcmul.rst
+++ b/doc/fluid/api_cn/tensor_cn/addcmul.rst
--- a/doc/fluid/api_cn/tensor_cn/addmm.rst
+++ b/doc/fluid/api_cn/tensor_cn/addmm.rst
--- a/doc/fluid/api_cn/tensor_cn/allclose.rst
+++ b/doc/fluid/api_cn/tensor_cn/allclose.rst
--- a/doc/fluid/api_cn/tensor_cn/arange.rst
+++ b/doc/fluid/api_cn/tensor_cn/arange.rst
--- a/doc/fluid/api_cn/tensor_cn/argmax.rst
+++ b/doc/fluid/api_cn/tensor_cn/argmax.rst
--- a/doc/fluid/api_cn/tensor_cn/atan.rst
+++ b/doc/fluid/api_cn/tensor_cn/atan.rst
--- a/doc/fluid/api_cn/tensor_cn/bmm.rst
+++ b/doc/fluid/api_cn/tensor_cn/bmm.rst
--- a/doc/fluid/api_cn/tensor_cn/cholesky.rst
+++ b/doc/fluid/api_cn/tensor_cn/cholesky.rst
--- a/doc/fluid/api_cn/tensor_cn/clamp.rst
+++ b/doc/fluid/api_cn/tensor_cn/clamp.rst
--- a/doc/fluid/api_cn/tensor_cn/concat.rst
+++ b/doc/fluid/api_cn/tensor_cn/concat.rst
--- a/doc/fluid/api_cn/tensor_cn/cross.rst
+++ b/doc/fluid/api_cn/tensor_cn/cross.rst
--- a/doc/fluid/api_cn/tensor_cn/dist.rst
+++ b/doc/fluid/api_cn/tensor_cn/dist.rst
--- a/doc/fluid/api_cn/tensor_cn/div.rst
+++ b/doc/fluid/api_cn/tensor_cn/div.rst
--- a/doc/fluid/api_cn/tensor_cn/dot.rst
+++ b/doc/fluid/api_cn/tensor_cn/dot.rst
--- a/doc/fluid/api_cn/tensor_cn/einsum.rst
+++ b/doc/fluid/api_cn/tensor_cn/einsum.rst
--- a/doc/fluid/api_cn/tensor_cn/elementwise_equal.rst
+++ b/doc/fluid/api_cn/tensor_cn/elementwise_equal.rst
--- a/doc/fluid/api_cn/tensor_cn/elementwise_sum.rst
+++ b/doc/fluid/api_cn/tensor_cn/elementwise_sum.rst
--- a/doc/fluid/api_cn/tensor_cn/equal.rst
+++ b/doc/fluid/api_cn/tensor_cn/equal.rst
--- a/doc/fluid/api_cn/tensor_cn/erf.rst
+++ b/doc/fluid/api_cn/tensor_cn/erf.rst
--- a/doc/fluid/api_cn/tensor_cn/eye.rst
+++ b/doc/fluid/api_cn/tensor_cn/eye.rst
--- a/doc/fluid/api_cn/tensor_cn/flip.rst
+++ b/doc/fluid/api_cn/tensor_cn/flip.rst
--- a/doc/fluid/api_cn/tensor_cn/full.rst
+++ b/doc/fluid/api_cn/tensor_cn/full.rst
--- a/doc/fluid/api_cn/tensor_cn/full_like.rst
+++ b/doc/fluid/api_cn/tensor_cn/full_like.rst
--- a/doc/fluid/api_cn/tensor_cn/gather.rst
+++ b/doc/fluid/api_cn/tensor_cn/gather.rst
--- a/doc/fluid/api_cn/tensor_cn/index_select.rst
+++ b/doc/fluid/api_cn/tensor_cn/index_select.rst
--- a/doc/fluid/api_cn/tensor_cn/inverse.rst
+++ b/doc/fluid/api_cn/tensor_cn/inverse.rst
--- a/doc/fluid/api_cn/tensor_cn/isnan.rst
+++ b/doc/fluid/api_cn/tensor_cn/isnan.rst
--- a/doc/fluid/api_cn/tensor_cn/linspace.rst
+++ b/doc/fluid/api_cn/tensor_cn/linspace.rst
--- a/doc/fluid/api_cn/tensor_cn/log1p.rst
+++ b/doc/fluid/api_cn/tensor_cn/log1p.rst
--- a/doc/fluid/api_cn/tensor_cn/logsumexp.rst
+++ b/doc/fluid/api_cn/tensor_cn/logsumexp.rst
--- a/doc/fluid/api_cn/tensor_cn/math.rst
+++ b/doc/fluid/api_cn/tensor_cn/math.rst
--- a/doc/fluid/api_cn/tensor_cn/matmul.rst
+++ b/doc/fluid/api_cn/tensor_cn/matmul.rst
--- a/doc/fluid/api_cn/tensor_cn/max.rst
+++ b/doc/fluid/api_cn/tensor_cn/max.rst
--- a/doc/fluid/api_cn/tensor_cn/meshgrid.rst
+++ b/doc/fluid/api_cn/tensor_cn/meshgrid.rst
--- a/doc/fluid/api_cn/tensor_cn/min.rst
+++ b/doc/fluid/api_cn/tensor_cn/min.rst
--- a/doc/fluid/api_cn/tensor_cn/mm.rst
+++ b/doc/fluid/api_cn/tensor_cn/mm.rst
--- a/doc/fluid/api_cn/tensor_cn/mul.rst
+++ b/doc/fluid/api_cn/tensor_cn/mul.rst
--- a/doc/fluid/api_cn/tensor_cn/nonzero.rst
+++ b/doc/fluid/api_cn/tensor_cn/nonzero.rst
--- a/doc/fluid/api_cn/tensor_cn/norm.rst
+++ b/doc/fluid/api_cn/tensor_cn/norm.rst
--- a/doc/fluid/api_cn/tensor_cn/ones.rst
+++ b/doc/fluid/api_cn/tensor_cn/ones.rst
--- a/doc/fluid/api_cn/tensor_cn/ones_like.rst
+++ b/doc/fluid/api_cn/tensor_cn/ones_like.rst
--- a/doc/fluid/api_cn/tensor_cn/pow.rst
+++ b/doc/fluid/api_cn/tensor_cn/pow.rst
--- a/doc/fluid/api_cn/tensor_cn/rand.rst
+++ b/doc/fluid/api_cn/tensor_cn/rand.rst
--- a/doc/fluid/api_cn/tensor_cn/randint.rst
+++ b/doc/fluid/api_cn/tensor_cn/randint.rst
--- a/doc/fluid/api_cn/tensor_cn/randn.rst
+++ b/doc/fluid/api_cn/tensor_cn/randn.rst
--- a/doc/fluid/api_cn/tensor_cn/randperm.rst
+++ b/doc/fluid/api_cn/tensor_cn/randperm.rst
--- a/doc/fluid/api_cn/tensor_cn/roll.rst
+++ b/doc/fluid/api_cn/tensor_cn/roll.rst
--- a/doc/fluid/api_cn/tensor_cn/sin.rst
+++ b/doc/fluid/api_cn/tensor_cn/sin.rst
--- a/doc/fluid/api_cn/tensor_cn/sort.rst
+++ b/doc/fluid/api_cn/tensor_cn/sort.rst
--- a/doc/fluid/api_cn/tensor_cn/split.rst
+++ b/doc/fluid/api_cn/tensor_cn/split.rst
--- a/doc/fluid/api_cn/tensor_cn/sqrt.rst
+++ b/doc/fluid/api_cn/tensor_cn/sqrt.rst
--- a/doc/fluid/api_cn/tensor_cn/squeeze.rst
+++ b/doc/fluid/api_cn/tensor_cn/squeeze.rst
--- a/doc/fluid/api_cn/tensor_cn/stack.rst
+++ b/doc/fluid/api_cn/tensor_cn/stack.rst
--- a/doc/fluid/api_cn/tensor_cn/std.rst
+++ b/doc/fluid/api_cn/tensor_cn/std.rst
--- a/doc/fluid/api_cn/tensor_cn/sum.rst
+++ b/doc/fluid/api_cn/tensor_cn/sum.rst
--- a/doc/fluid/api_cn/tensor_cn/t.rst
+++ b/doc/fluid/api_cn/tensor_cn/t.rst
--- a/doc/fluid/api_cn/tensor_cn/tanh.rst
+++ b/doc/fluid/api_cn/tensor_cn/tanh.rst
--- a/doc/fluid/api_cn/tensor_cn/tensordot.rst
+++ b/doc/fluid/api_cn/tensor_cn/tensordot.rst
--- a/doc/fluid/api_cn/tensor_cn/transpose.rst
+++ b/doc/fluid/api_cn/tensor_cn/transpose.rst
--- a/doc/fluid/api_cn/tensor_cn/tril.rst
+++ b/doc/fluid/api_cn/tensor_cn/tril.rst
--- a/doc/fluid/api_cn/tensor_cn/triu.rst
+++ b/doc/fluid/api_cn/tensor_cn/triu.rst
--- a/doc/fluid/api_cn/tensor_cn/unbind.rst
+++ b/doc/fluid/api_cn/tensor_cn/unbind.rst
--- a/doc/fluid/api_cn/tensor_cn/unsqueeze.rst
+++ b/doc/fluid/api_cn/tensor_cn/unsqueeze.rst
--- a/doc/fluid/api_cn/tensor_cn/var.rst
+++ b/doc/fluid/api_cn/tensor_cn/var.rst
--- a/doc/fluid/api_cn/tensor_cn/where.rst
+++ b/doc/fluid/api_cn/tensor_cn/where.rst
--- a/doc/fluid/api_cn/tensor_cn/zeros.rst
+++ b/doc/fluid/api_cn/tensor_cn/zeros.rst
--- a/doc/fluid/api_cn/tensor_cn/zeros_like.rst
+++ b/doc/fluid/api_cn/tensor_cn/zeros_like.rst
--- a/doc/fluid/release_note_cn.md
+++ b/doc/fluid/release_note_cn.md
--- a/doc/fluid/release_note_en.md
+++ b/doc/fluid/release_note_en.md
--- a/doc/fluid/user_guides/cv_case/gan/network.py
+++ b/doc/fluid/user_guides/cv_case/gan/network.py