Merge branch 'develop' of https://github.com/PaddlePaddle/FluidDoc into develop

09ab2844 · sandyhouse · fc7da52f · 8f63db58 · 09ab2844 · 09ab2844
75 changed file
--- a/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide.md
+++ b/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide.md
@@ -9,7 +9,24 @@
 - 通过所有单元测试。
 - 请遵守[提交代码的一些约定](#提交代码的一些约定)。
-以下教程将指导您提交代码。
+## 使用官方开发镜像（推荐）
+```
+# 第一次启动（CPU开发）
+docker run -it --cpu-shares=20000 --name=username --net=host --privileged --rm -v $(pwd):/Paddle hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
+# 第一次启动（GPU开发）
+nvidia-docker run -it --cpu-shares=20000 --name=username --net=host --privileged --rm -v $(pwd):/Paddle hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
+# 后面几次启动
+docker exec -it username bash
+```
+不同开发者启动docker的命令不一样，以上只是推荐命令。如果使用自己习惯的命令，一定要加参数--privileged（GPU的CUPTI库调用需要）
+**推荐使用官方开发镜像 hub.baidubce.com/paddlepaddle/paddle:latest-dev 提交代码。**
+**以下教程将指导您提交代码。**
 ## [Fork](https://help.github.com/articles/fork-a-repo/)
 跳转到[PaddlePaddle](https://github.com/PaddlePaddle/Paddle) GitHub首页，然后单击 `Fork` 按钮，生成自己目录下的仓库，比如 <https://github.com/USERNAME/Paddle>。
@@ -42,7 +59,7 @@ Paddle 目前使用[Git流分支模型](http://nvie.com/posts/a-successful-git-b
 Paddle 开发人员使用 [pre-commit](http://pre-commit.com/) 工具来管理 Git 预提交钩子。 它可以帮助我们格式化源代码（C++，Python），在提交（commit）前自动检查一些基本事宜（如每个文件只有一个 EOL，Git 中不要添加大文件等）。
-`pre-commit`测试是 Travis-CI 中单元测试的一部分，不满足钩子的 PR 不能被提交到 Paddle，首先安装并在当前目录运行它：
+`pre-commit`测试是 CI 中单元测试的一部分，不满足钩子的 PR 不能被提交到 Paddle，首先安装并在当前目录运行它：
 ```bash
 ➜  pip install pre-commit
@@ -51,7 +68,7 @@ Paddle 开发人员使用 [pre-commit](http://pre-commit.com/) 工具来管理 G
 Paddle 使用 `clang-format` 来调整 C/C++ 源代码格式，请确保 `clang-format` 版本在 3.8 以上。
-注：通过`pip install pre-commit`和`conda install -c conda-forge pre-commit`安装的`yapf`稍有不同的，Paddle 开发人员使用的是`pip install pre-commit`。
+注：通过`pip install pre-commit`和`conda install -c conda-forge pre-commit`安装的`yapf`稍有不同的，Paddle 开发人员使用的是`pip install pre-commit`，使用Paddle docker镜像会自带`pre-commit`不需要单独安装。
 ## 开始开发
@@ -66,19 +83,53 @@ Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
-	modified:   README.md
+    modified:   README.md
 Untracked files:
  (use "git add <file>..." to include in what will be committed)
-	test
+    test
 no changes added to commit (use "git add" and/or "git commit -a")
 ```
-## 编译和单元测试
+## 编译
+创建并进入/Paddle/build路径下：
+    mkdir -p /Paddle/build && cd /Paddle/build
+执行cmake：
+    * 对于需要编译**CPU版本PaddlePaddle**的用户：
+    For Python2: cmake .. -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    For Python3: cmake .. -DPY_VERSION=3.5 -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    * 对于需要编译**GPU版本PaddlePaddle**的用户：
+    For Python2: cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    For Python3: cmake .. -DPY_VERSION=3.5 -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+执行编译：
+    make -j$(nproc)
+    如：make -j16，使用16核编译
+安装编译好的whl包：首先进入/Paddle/build/python/dist目录下找到生成的.whl包后，然后当前机器或目标机器安装编译好的.whl包：
+    For Python2: pip install -U（whl包的名字）
+    For Python3: pip3.5 install -U（whl包的名字）
 关于编译 PaddlePaddle 的源码，请参见[从源码编译](../../../install/compile/fromsource.html) 选择对应的操作系统。
+## 单元测试
+    单测运行（重复运行多次，避免随机失败）如重复运行100次的命令如下:
+    ctest --repeat-until-fail 100 -R test_xx
 关于单元测试，可参考[Op单元测试](../new_op/new_op.html#id7) 的运行方法。
 ## 提交（commit）
@@ -92,7 +143,7 @@ On branch test
 Untracked files:
  (use "git add <file>..." to include in what will be committed)
-	test
+    test
 nothing added to commit but untracked files present (use "git add" to track)
 ➜  git add test
@@ -126,8 +177,8 @@ clang-formater.......................................(no files to check)Skipped
 ➜  git remote
 origin
 ➜  git remote -v
-origin	https://github.com/USERNAME/Paddle (fetch)
+origin    https://github.com/USERNAME/Paddle (fetch)
-origin	https://github.com/USERNAME/Paddle (push)
+origin    https://github.com/USERNAME/Paddle (push)
 ```
 这里 origin 是我们 clone 的远程仓库的名字，也就是自己用户名下的 Paddle，接下来我们创建一个原始 Paddle 仓库的远程主机，命名为 upstream。

--- a/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide_en.md
+++ b/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide_en.md
@@ -9,7 +9,22 @@ You will learn how to develop programs in local environment under the guidelines
 - Pass through all unit tests.
 - Please follow [regulations of submitting codes](#regulations of submitting codes).
-The following guidiance tells you how to submit code.
+## Use official development images(recommended)
+```
+# First start（CPU development）
+docker run -it --cpu-shares=20000 --name=username --net=host --privileged --rm -v $(pwd):/Paddle hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
+# First start（GPU development）
+nvidia-docker run -it --cpu-shares=20000 --name=username --net=host --privileged --rm -v $(pwd):/Paddle hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
+# Next start
+docker exec -it username bash
+```
+Different developers have different commands to start docker. The above are only recommended commands. If you use the command you are used to, you must add the parameter --privileged (needed by the GPU CUPTI library call)
+**It is recommended to use the official development mirror hub.baidubce.com/paddlepaddle/paddle:latest-dev to submit the code.**
+**The following guidiance tells you how to submit code.**
 ## [Fork](https://help.github.com/articles/fork-a-repo/)
 Transfer to the home page of Github [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) ,and then click button `Fork`  to generate the git under your own file directory,such as <https://github.com/USERNAME/Paddle>。
@@ -44,7 +59,7 @@ It is worth noting that before the checkout, you need to keep the current branch
 Paddle developers use the [pre-commit](http://pre-commit.com/) tool to manage Git pre-commit hooks. It helps us format the source code (C++, Python) and automatically check some basic things before committing (such as having only one EOL per file, not adding large files in Git, etc.).
-The `pre-commit` test is part of the unit test in Travis-CI. A PR that does not satisfy the hook cannot be submitted to Paddle. Install `pre-commit` first and then run it in current directory：
+The `pre-commit` test is part of the unit test in CI. A PR that does not satisfy the hook cannot be submitted to Paddle. Install `pre-commit` first and then run it in current directory：
 ```bash
@@ -54,7 +69,7 @@ The `pre-commit` test is part of the unit test in Travis-CI. A PR that does not
 Paddle modify the format of C/C++ source code with `clang-format` .Make sure the version of `clang-format` is above 3.8.
-Note：There are differences between the installation of `yapf` with `pip install pre-commit` and that with `conda install -c conda-forge pre-commit` . Paddle developers use `pip install pre-commit` 。
+Note：There are differences between the installation of `yapf` with `pip install pre-commit` and that with `conda install -c conda-forge pre-commit` . Paddle developers use `pip install pre-commit`, Using Paddle docker image will `pre-commit`without separate installation .
 ## Start development
@@ -76,7 +91,45 @@ Untracked files:
 no changes added to commit (use "git add" and/or "git commit -a")
 ```
-## Build and test
+## Build
+Create and enter the /Paddle/build path
+    mkdir -p /Paddle/build && cd /Paddle/build
+Execute cmake:
+    * For users who need to compile the **CPU version PaddlePaddle**:
+    For Python2: cmake .. -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    For Python3: cmake .. -DPY_VERSION=3.5 -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    * For users who need to compile the **GPU version PaddlePaddle**:
+    For Python2: cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    For Python3: cmake .. -DPY_VERSION=3.5 -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+Execute compilation:
+    make -j$(nproc)
+    Such as: make -j16, using 16 core compilation
+After compiling successfully, go to the `/paddle/build/python/dist` directory and find the generated `.whl` package.Install the compiled .whl package on the current machine or target machine:
+    For Python2: pip install -U（whl package name）
+    For Python3: pip3.5 install -U（whl package name）
+Please refer to [Compile From Source Code](../../../install/compile/fromsource_en.html) about more information of building PaddlePaddle source codes.
+## Test
+    Run Test (Run 100 times)
+    ctest --repeat-until-fail 100 -R test_xx
 Please refer to [Compile From Source Code](../../../install/compile/fromsource_en.html) about more information of building PaddlePaddle source codes.
 Please refer to [Op Unit Tests](../new_op/new_op_en.html#unit-tests) about more information of running unit tests.

--- a/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.rst
+++ b/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.rst
@@ -7,15 +7,15 @@
 -------------
 ..  csv-table:: 
-    :header: "版本说明", "预测库(1.8.3版本)", "预测库(develop版本)"
+    :header: "版本说明", "预测库(1.8.4版本)", "预测库(develop版本)"
    :widths: 3, 2, 2
-    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
+    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
    "nv-jetson-cuda10-cudnn7.5-trt5", "`fluid_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/1.7.1-nv-jetson-cuda10-cudnn7.5-trt5/fluid_inference.tar.gz>`_", 
@@ -46,7 +46,7 @@ WITH_NV_JETSON                OFF            在NV Jetson硬件上编译时需
  git clone https://github.com/paddlepaddle/Paddle
  cd Paddle
  # 建议使用git checkout切换到Paddle稳定的版本，如：
-  git checkout v1.7.2
+  git checkout v1.8.4
 **note**: 如果您是多卡机器，建议安装NCCL；如果您是单卡机器则可以在编译时显示指定WITH_NCCL=OFF来跳过这一步。注意如果WITH_NCCL=ON，且没有安装NCCL，则编译会报错。

--- a/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_en.rst
+++ b/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_en.rst
@@ -7,15 +7,15 @@ Direct Download and Installation
 ---------------------------------
 ..  csv-table:: c++ inference library list
-    :header: "version description", "inference library(1.8.3 version)", "inference library(develop version)"
+    :header: "version description", "inference library(1.8.4 version)", "inference library(develop version)"
    :widths: 3, 2, 2
-    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
+    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
    "nv-jetson-cuda10-cudnn7.5-trt5", "`fluid_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/1.7.1-nv-jetson-cuda10-cudnn7.5-trt5/fluid_inference.tar.gz>`_", 
 Build from Source Code
@@ -46,8 +46,8 @@ Firstly we pull the latest code from github.
  git clone https://github.com/paddlepaddle/Paddle
  cd Paddle
-  # Use git checkout to switch to stable versions such as v1.7.2
+  # Use git checkout to switch to stable versions such as v1.8.4
-  git checkout v1.7.2
+  git checkout v1.8.4
 **note**: If your environment is a multi-card machine, it is recommended to install nccl; otherwise, you can skip this step by specifying WITH_NCCL = OFF during compilation. Note that if WITH_NCCL = ON, and NCCL is not installed, the compiler will report an error.

--- a/doc/fluid/api/gen_doc.sh
+++ b/doc/fluid/api/gen_doc.sh
@@ -30,7 +30,7 @@ python gen_module_index.py framework paddle.framework
 # nn
-for module in loss
+for module in loss activation
 do
  python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name nn --to_multiple_files True --output_dir nn
  python gen_module_index.py nn.${module} ${module}

--- a/doc/fluid/api/nn.rst
+++ b/doc/fluid/api/nn.rst
@@ -5,6 +5,7 @@ paddle.nn
 ..  toctree::
    :maxdepth: 1
+    nn/activation.rst
    nn/adaptive_pool2d.rst
    nn/adaptive_pool3d.rst
    nn/add_position_encoding.rst
@@ -60,7 +61,7 @@ paddle.nn
    nn/GradientClipByValue.rst
    nn/grid_sampler.rst
    nn/GroupNorm.rst
-    nn/hard_shrink.rst
+    nn/hardshrink.rst
    nn/hard_sigmoid.rst
    nn/hard_swish.rst
    nn/hash.rst
@@ -81,6 +82,7 @@ paddle.nn
    nn/Linear.rst
    nn/linear_lr_warmup.rst
    nn/log_loss.rst
+    nn/log_softmax.rst
    nn/logsigmoid.rst
    nn/loss.rst
    nn/lrn.rst

--- a/doc/fluid/api/nn/activation.rst
+++ b/doc/fluid/api/nn/activation.rst
+==========
+activation
+==========
+..  toctree::
+    :maxdepth: 1
+    activation/ELU.rst
+    activation/GELU.rst
+    activation/Hardshrink.rst
+    activation/ReLU.rst
+    activation/LogSigmoid.rst
--- a/doc/fluid/api/nn/activation/Hardshrink.rst
+++ b/doc/fluid/api/nn/activation/Hardshrink.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+.. _api_nn_activation_Hardshrink:
+Hardshrink
+---------
+..  autoclass:: paddle.nn.activation.Hardshrink
+    :members:
+    :inherited-members:
+    :noindex:
--- a/doc/fluid/api/nn/functional.rst
+++ b/doc/fluid/api/nn/functional.rst
@@ -7,3 +7,4 @@ functional
    functional/l1_loss.rst
    functional/nll_loss.rst
+    functional/mse_loss.rst
--- a/doc/fluid/api/nn/functional/mse_loss.rst
+++ b/doc/fluid/api/nn/functional/mse_loss.rst
+.. _api_nn_functional_mse_loss:
+mse_loss
+------
+..  autoclass:: paddle.nn.functional.mse_loss
+    :members:
+    :inherited-members:
+    :noindex:
--- a/doc/fluid/api/nn/hard_shrink.rst
+++ b/doc/fluid/api/nn/hard_shrink.rst
-.. _api_nn_hard_shrink:
-hard_shrink
-------------------------------
-:doc_source: paddle.fluid.layers.hard_shrink
--- a/doc/fluid/api/nn/hardshrink.rst
+++ b/doc/fluid/api/nn/hardshrink.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+.. _api_nn_hardshrink:
+hardshrink
+----------
+..  autofunction:: paddle.nn.functional.hardshrink
+    :noindex:
--- a/doc/fluid/api/nn/log_softmax.rst
+++ b/doc/fluid/api/nn/log_softmax.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+.. _api_nn_log_softmax:
+log_softmax
+-----------
+..  autofunction:: paddle.nn.functional.log_softmax
+    :noindex:
\ No newline at end of file
--- a/doc/fluid/api/tensor.rst
+++ b/doc/fluid/api/tensor.rst
@@ -52,6 +52,7 @@ paddle.tensor
    tensor/isfinite.rst
    tensor/less_equal.rst
    tensor/less_than.rst
+    tensor/logic.rst
    tensor/linalg.rst
    tensor/linspace.rst
    tensor/load.rst

--- a/doc/fluid/api/tensor/logic.rst
+++ b/doc/fluid/api/tensor/logic.rst
+======
+logic
+======
+..  toctree::
+    :maxdepth: 1
+    logic/allclose.rst
\ No newline at end of file
--- a/doc/fluid/api/tensor/logic/allclose.rst
+++ b/doc/fluid/api/tensor/logic/allclose.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+.. _api_tensor_logic_allclose:
+allclose
+--------
+..  autofunction:: paddle.tensor.logic.allclose
+    :noindex:
\ No newline at end of file
--- a/doc/fluid/api_cn/dygraph_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn.rst
@@ -17,6 +17,7 @@ fluid.dygraph
    dygraph_cn/Conv3DTranspose_cn.rst
    dygraph_cn/CosineAnnealingDecay_cn.rst
    dygraph_cn/CosineDecay_cn.rst
+    dygraph_cn/DataParallel_cn.rst
    dygraph_cn/declarative_cn.rst
    dygraph_cn/Dropout_cn.rst
    dygraph_cn/Embedding_cn.rst
@@ -58,3 +59,4 @@ fluid.dygraph
    dygraph_cn/Tracer_cn.rst
    dygraph_cn/TranslatedLayer_cn.rst
    dygraph_cn/TreeConv_cn.rst
+    dygraph_cn/enabled_cn.rst
--- a/doc/fluid/api_cn/dygraph_cn/DataParallel_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/DataParallel_cn.rst
+.. _cn_api_fluid_dygraph_DataParallel:
+DataParallel
+------------
+.. py:class:: paddle.fluid.dygraph.DataParallel(layers, strategy)
+:api_attr: 命令式编程模式（动态图)
+通过数据并行模式执行动态图模型。
+目前，``DataParallel`` 仅支持以多进程的方式执行动态图模型。使用方式如下：
+``python -m paddle.distributed.launch –selected_gpus=0,1 dynamic_graph_test.py``
+其中 ``dynamic_graph_test.py`` 脚本的代码可以是下面的示例代码。
+参数：
+    - **Layer** (Layer) - 需要通过数据并行方式执行的模型。
+    - **strategy** (ParallelStrategy) - 数据并行的策略，包括并行执行的环境配置。
+返回：支持数据并行的 ``Layer``
+返回类型：Layer实例
+**代码示例**：
+.. code-block:: python
+    import numpy as np
+    import paddle.fluid as fluid
+    place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
+    with fluid.dygraph.guard(place):
+        # prepare the data parallel context
+        strategy = fluid.dygraph.prepare_context()
+        linear = fluid.dygraph.Linear(1, 10, act="softmax")
+        adam = fluid.optimizer.AdamOptimizer(
+            learning_rate=0.001, parameter_list=linear.parameters())
+        # make the module become the data parallelism module
+        linear = fluid.dygraph.DataParallel(linear, strategy)
+        x_data = np.random.random(size=[10, 1]).astype(np.float32)
+        data = fluid.dygraph.to_variable(x_data)
+        hidden = linear(data)
+        avg_loss = fluid.layers.mean(hidden)
+        # scale the loss according to the number of trainers.
+        avg_loss = linear.scale_loss(avg_loss)
+        avg_loss.backward()
+        # collect the gradients of trainers.
+        linear.apply_collective_grads()
+        adam.minimize(avg_loss)
+        linear.clear_gradients()
+.. py:method:: scale_loss(loss)
+缩放模型损失值 ``loss`` 。在数据并行模式中，损失值 ``loss`` 需要根据并行训练进程的数目进行缩放。
+如果不在数据并行模式下，会直接返回原 ``loss`` 。
+参数：
+    - **loss** (Variable) - 当前模型的损失值。
+返回：缩放后的损失值 ``loss``
+返回类型：Variable
+**代码示例**
+.. code-block:: python
+    import numpy as np
+    import paddle.fluid as fluid
+    place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
+    with fluid.dygraph.guard(place):
+        # prepare the data parallel context
+        strategy = fluid.dygraph.prepare_context()
+        linear = fluid.dygraph.Linear(1, 10, act="softmax")
+        adam = fluid.optimizer.AdamOptimizer(
+            learning_rate=0.001, parameter_list=linear.parameters())
+        # make the module become the data parallelism module
+        linear = fluid.dygraph.DataParallel(linear, strategy)
+        x_data = np.random.random(size=[10, 1]).astype(np.float32)
+        data = fluid.dygraph.to_variable(x_data)
+        hidden = linear(data)
+        avg_loss = fluid.layers.mean(hidden)
+        # scale the loss according to the number of trainers.
+        avg_loss = linear.scale_loss(avg_loss)
+        avg_loss.backward()
+        # collect the gradients of trainers.
+        linear.apply_collective_grads()
+        adam.minimize(avg_loss)
+        linear.clear_gradients()
+.. py:method:: apply_collective_grads()
+AllReduce（规约）参数的梯度值。
+返回：无
+**代码示例**
+.. code-block:: python
+    import numpy as np
+    import paddle.fluid as fluid
+    place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
+    with fluid.dygraph.guard(place):
+        # prepare the data parallel context
+        strategy = fluid.dygraph.prepare_context()
+        linear = fluid.dygraph.Linear(1, 10, act="softmax")
+        adam = fluid.optimizer.AdamOptimizer(
+            learning_rate=0.001, parameter_list=linear.parameters())
+        # make the module become the data parallelism module
+        linear = fluid.dygraph.DataParallel(linear, strategy)
+        x_data = np.random.random(size=[10, 1]).astype(np.float32)
+        data = fluid.dygraph.to_variable(x_data)
+        hidden = linear(data)
+        avg_loss = fluid.layers.mean(hidden)
+        # scale the loss according to the number of trainers.
+        avg_loss = linear.scale_loss(avg_loss)
+        avg_loss.backward()
+        # collect the gradients of trainers.
+        linear.apply_collective_grads()
+        adam.minimize(avg_loss)
+        linear.clear_gradients()
--- a/doc/fluid/api_cn/dygraph_cn/enabled_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/enabled_cn.rst
+.. _cn_api_fluid_dygraph_enabled:
+enabled
+-------------------------------
+.. py:method:: paddle.fluid.dygraph.enabled()
+这个函数用于检查程序是否运行在动态图模式。你可以使用 :ref:`cn_api_fluid_dygraph_guard` api进入动态图模式。或者使用 :ref:`cn_api_fluid_enable_dygraph` 和 :ref:`cn_api_fluid_disable_dygraph` api打开、关闭动态图模式。
+注意：   `fluid.dygraph.enabled` 实际上调用了 :ref:`cn_api_fluid_in_dygraph_mode` api，所以推荐使用 :ref:`cn_api_fluid_in_dygraph_mode` api。
+返回：   程序是否运行在动态图模式。
+返回类型：       bool
+**示例代码**
+.. code-block:: python
+            import paddle.fluid as fluid
+            fluid.enable_dygraph()  # Now we are in dygragh mode
+            print(fluid.dygraph.enabled())  # True
+            fluid.disable_dygraph()
+            print(fluid.dygraph.enabled())  # False
--- a/doc/fluid/api_cn/fluid_cn/data_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/data_cn.rst
@@ -6,10 +6,6 @@ data
 .. py:function:: paddle.fluid.data(name, shape, dtype='float32', lod_level=0)
-:api_attr: 声明式编程模式（静态图)
-:alias_main: paddle.nn.data
-:alias: paddle.nn.data,paddle.nn.input.data
-:old_api: paddle.fluid.data

--- a/doc/fluid/api_cn/layers_cn.rst
+++ b/doc/fluid/api_cn/layers_cn.rst
@@ -31,6 +31,7 @@ fluid.layers
    layers_cn/auc_cn.rst
    layers_cn/autoincreased_step_counter_cn.rst
    layers_cn/batch_norm_cn.rst
+    layers_cn/BasicDecoder_cn.rst
    layers_cn/beam_search_cn.rst
    layers_cn/beam_search_decode_cn.rst
    layers_cn/bilinear_tensor_product_cn.rst
@@ -87,6 +88,7 @@ fluid.layers
    layers_cn/dynamic_lstmp_cn.rst
    layers_cn/dynamic_decode_cn.rst
    layers_cn/Decoder_cn.rst
+    layers_cn/DecodeHelper_cn.rst
    layers_cn/DynamicRNN_cn.rst
    layers_cn/edit_distance_cn.rst
    layers_cn/elementwise_add_cn.rst
@@ -124,6 +126,7 @@ fluid.layers
    layers_cn/get_tensor_from_selected_rows_cn.rst
    layers_cn/greater_equal_cn.rst
    layers_cn/greater_than_cn.rst
+    layers_cn/GreedyEmbeddingHelper_cn.rst
    layers_cn/grid_sampler_cn.rst
    layers_cn/group_norm_cn.rst
    layers_cn/gru_unit_cn.rst
@@ -242,6 +245,7 @@ fluid.layers
    layers_cn/rsqrt_cn.rst
    layers_cn/RNNCell_cn.rst
    layers_cn/sampled_softmax_with_cross_entropy_cn.rst
+    layers_cn/SampleEmbeddingHelper_cn.rst
    layers_cn/sampling_id_cn.rst
    layers_cn/scale_cn.rst
    layers_cn/scatter_cn.rst
@@ -308,6 +312,7 @@ fluid.layers
    layers_cn/thresholded_relu_cn.rst
    layers_cn/topk_cn.rst
    layers_cn/transpose_cn.rst
+    layers_cn/TrainingHelper_cn.rst
    layers_cn/unfold_cn.rst
    layers_cn/Uniform_cn.rst
    layers_cn/uniform_random_cn.rst

--- a/doc/fluid/api_cn/layers_cn/BasicDecoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/BasicDecoder_cn.rst
+.. _cn_api_fluid_layers_BasicDecoder:
+BasicDecoder
+-------------------------------
+.. py:class:: paddle.fluid.layers.BasicDecoder(cell, helper, output_fn=None)
+BasicDecoder是 :ref:`cn_api_fluid_layers_Decoder` 的子类，它组装了 :ref:`cn_api_fluid_layers_RNNCell` 和 :ref:`cn_api_fluid_layers_DecodeHelper` 的实例作为成员，其中DecodeHelper用来实现不同的解码策略。它依次执行以下步骤来完成单步解码：
+1. 执行 :code:`cell_outputs, cell_states = cell.call(inputs, states)` 以获取输出和新的状态。
+2. 执行 :code:`sample_ids = helper.sample(time, cell_outputs, cell_states)` 以采样id并将其作为当前步的解码结果。
+3. 执行 :code:`finished, next_inputs, next_states = helper.next_inputs(time, cell_outputs, cell_states, sample_ids)` 以产生下一解码步的结束标识、输入和状态。
+参数：
+  - **cell** (RNNCell) - RNNCell的实例或者具有相同接口定义的对象。
+  - **helper** (DecodeHelper) - DecodeHelper的实例。
+  - **output_fn** (可选) - 处理cell输出的接口，在采样之前使用。默认值None。
+**示例代码**
+.. code-block:: python
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.SampleEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+.. py:method:: initialize(initial_cell_states)
+初始化，包括helper的初始化和cell的初始化，cell初始化直接使用 :code:`initial_cell_states` 作为结果。
+参数：
+  - **initial_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。这是由调用者 :ref:`cn_api_fluid_layers_dynamic_decode` 提供的参数。
+返回：:code:`(initial_inputs, initial_states, finished)` 的三元组。 :code:`initial_inputs, initial_states` 均是单个tensor变量或tensor变量组成的嵌套结构， :code:`finished` 是bool类型的tensor。 :code:`initial_inputs, finished` 与 :code:`helper.initialize()` 返回的内容相同； :code:`initial_states` 与输入参数中的 :code:`initial_cell_states` 的相同。
+返回类型：tuple
+.. py:class:: OutputWrapper(cell_outputs, sample_ids)
+ :code:`step()` 的返回值中 :code:`outputs` 使用的数据结构，是一个由 :code:`cell_outputs` 和 :code:`sample_ids` 这两个字段构成的命名元组。
+.. py:method:: step(time, inputs, states, **kwargs)
+按照以下步骤执行单步解码：
+1. 执行 :code:`cell_outputs, cell_states = cell.call(inputs, states)` 以获取输出和新的状态。
+2. 执行 :code:`sample_ids = helper.sample(time, cell_outputs, cell_states)` 以采样id并将其作为当前步的解码结果。
+3. 执行 :code:`finished, next_inputs, next_states = helper.next_inputs(time, cell_outputs, cell_states, sample_ids)` 以产生下一解码步的结束标识、输入和状态。
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **inputs** (Variable) - tensor变量。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
+  - **states** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_states` 相同。
+  - **kwargs** - 附加的关键字参数，由调用者 :ref:`cn_api_fluid_layers_dynamic_decode` 提供。
+返回： :code:`(outputs, next_states, next_inputs, finished)` 的四元组。 :code:`outputs` 是包含 :code:`cell_outputs` 和 :code:`sample_ids` 两个字段的命名元组，其中 :code:`cell_outputs` 是 :code:`cell.call()` 的结果， :code:`sample_ids` 是 :code:`helper.sample()` 的结果； :code:`next_states, next_inputs` 分别和输入参数中的 :code:`states, inputs` 有相同的的结构、形状和数据类型； :code:`finished` 是一个bool类型的tensor，形状是 :math:`[batch\_size]` 。
+返回类型：tuple
--- a/doc/fluid/api_cn/layers_cn/BeamSearchDecoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/BeamSearchDecoder_cn.rst
@@ -20,7 +20,7 @@ BeamSearchDecoder
  - **start_token** (int) - 起始标记id。
  - **end_token** (int) - 结束标记id。
  - **beam_size** (int) - 在beam search中使用的beam宽度。
-  - **embedding_fn** (可选) - 处理选中的候选id的接口。通常，它是一个将词id转换为词嵌入的嵌入层，函数的返回值作为 :code:`cell.call` 接口的 :code:`input` 参数。如果 :code:`embedding_fn` 未提供，则必须在 :code:`cell.call` 中实现词嵌入转换。默认值None。
+  - **embedding_fn** (可选) - 处理选中的候选id的接口。它通常是一个将词id转换为词嵌入的嵌入层，其返回值将作为 :code:`cell.call` 接口的 :code:`input` 参数。**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size, beam\_size]` ，如果使用后者则还需要在这里提供unsqueeze。如果 :code:`embedding_fn` 未提供，则必须在 :code:`cell.call` 中实现词嵌入转换。默认值None。
  - **output_fn** (可选) - 处理cell输出的接口，在计算得分和选择候选标记id之前使用。默认值None。
 **示例代码**
@@ -123,7 +123,7 @@ BeamSearchDecoder
 参数：
  - **initial_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。调用者提供的参数。
-返回：一个元组 :code:`(initial_inputs, initial_states, finished)`。:code:`initial_inputs` 是一个tensor，当 :code:`embedding_fn` 为None时，由 :code:`start_token` 填充，形状为 :math:`[batch\_size,beam\_size,1]` ；否则使用 :code:`embedding_fn(t)` 返回的值。:code:`initial_states` 是tensor变量的嵌套结构(命名元组，字段包括 :code:`cell_states，log_probs，finished，lengths`)，其中 :code:`log_probs，finished，lengths` 都含有一个tensor，形状为 :math:`[batch\_size, beam\_size]`，数据类型为float32，bool，int64。:code:`cell_states` 具有与输入参数 :code:`initial_cell_states` 相同结构的值，但形状扩展为 :math:`[batch\_size,beam\_size,...]`。 :code:`finished` 是一个布尔型tensor，由False填充，形状为 :math:`[batch\_size,beam\_size]`。
+返回：一个元组 :code:`(initial_inputs, initial_states, finished)`。:code:`initial_inputs` 是一个tensor，当 :code:`embedding_fn` 为None时，该tensor t的形状为 :math:`[batch\_size,beam\_size]` ，值为 :code:`start_token` ；否则使用 :code:`embedding_fn(t)` 返回的值。:code:`initial_states` 是tensor变量的嵌套结构(命名元组，字段包括 :code:`cell_states，log_probs，finished，lengths`)，其中 :code:`log_probs，finished，lengths` 都含有一个tensor，形状为 :math:`[batch\_size, beam\_size]`，数据类型为float32，bool，int64。:code:`cell_states` 具有与输入参数 :code:`initial_cell_states` 相同结构的值，但形状扩展为 :math:`[batch\_size,beam\_size,...]`。 :code:`finished` 是一个布尔型tensor，由False填充，形状为 :math:`[batch\_size,beam\_size]`。
 返回类型：tuple
@@ -135,7 +135,7 @@ BeamSearchDecoder
  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
  - **logits** (Variable) - 形状为 :math:`[batch\_size,beam\_size,vocab\_size]` 的tensor，表示当前时间步的logits。其数据类型为float32。
  - **next_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。它的结构，形状和数据类型与 :code:`initialize()` 的返回值 :code:`initial_states` 中的 :code:`cell_states` 相同。它代表该cell的下一个状态。
-  - **beam_state** (Variable) - tensor变量的结构。在第一个解码步骤与 :code:`initialize()` 返回的 :code:`initial_states` 同，其他步骤与 :code:`initialize()` 返回的 :code:`beam_search_state` 相同。
+  - **beam_state** (Variable) - tensor变量的结构。在第一个解码步骤与 :code:`initialize()` 返回的 :code:`initial_states` 同，其他步骤与 :code:`step()` 返回的 :code:`beam_search_state` 相同。
 返回：一个元组 :code:`(beam_search_output, beam_search_state)`。:code:`beam_search_output` 是tensor变量的命名元组，字段为 :code:`scores，predicted_ids parent_ids`。其中 :code:`scores，predicted_ids，parent_ids` 都含有一个tensor，形状为 :math:`[batch\_size,beam\_size]`，数据类型为float32 ，int64，int64。:code:`beam_search_state` 具有与输入参数 :code:`beam_state` 相同的结构，形状和数据类型。
@@ -146,9 +146,9 @@ BeamSearchDecoder
 执行beam search解码步骤，该步骤使用 :code:`cell` 来计算概率，然后执行beam search步骤以计算得分并选择候选标记ID。
 参数：
-  - **time** (Variable) - 调用者提供的形状为[1]的int64tensor，表示当前解码的时间步长。
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。。
  - **inputs** (Variable) - tensor变量。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
-  - **States** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`beam_search_state` 相同。
+  - **states** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`beam_search_state` 相同。
  - **kwargs** - 附加的关键字参数，由调用者提供。
 返回：一个元组 :code:`(beam_search_output，beam_search_state，next_inputs，finish)` 。:code:`beam_search_state` 和参数 :code:`states` 具有相同的结构，形状和数据类型。 :code:`next_inputs` 与输入参数 :code:`inputs` 具有相同的结构，形状和数据类型。 :code:`beam_search_output` 是tensor变量的命名元组(字段包括 :code:`scores，predicted_ids，parent_ids` )，其中 :code:`scores，predicted_ids，parent_ids` 都含有一个tensor，形状为 :math:`[batch\_size,beam\_size]`，数据类型为float32 ，int64，int64。:code:`finished` 是一个bool类型的tensor，形状为 :math:`[batch\_size,beam\_size]`。
@@ -167,12 +167,3 @@ BeamSearchDecoder
 返回：一个元组 :code:`(predicted_ids, final_states)`。:code:`predicted_ids` 是一个tensor，形状为 :math:`[time\_step，batch\_size,beam\_size]`，数据类型为int64。:code:`final_states` 与输入参数 :code:`final_states` 相同。
 返回类型：tuple
-.. py:method:: output_dtype()
-用于beam search输出的数据类型的嵌套结构。它是一个命名元组，字段包括 :code:`scores, predicted_ids, parent_ids`。
-参数：无。
-返回：用于beam search输出的数据类型的命名元组。
--- a/doc/fluid/api_cn/layers_cn/DecodeHelper_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/DecodeHelper_cn.rst
+.. _cn_api_fluid_layers_DecodeHelper:
+DecodeHelper
+-------------------------------
+.. py:class:: paddle.fluid.layers.DecodeHelper()
+DecodeHelper是一个基类，其子类的实例将在 :ref:`cn_api_fluid_layers_BasicDecoder` 中使用。它提供了在动态解码时采样和产生下一解码步的输入的接口。
+.. py:method:: initialize()
+初始化以产生第一个解码步的输入和每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` 。 :code:`initial_finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+返回类型：tuple
+.. py:method:: sample(time, outputs, states)
+根据 :code:`outputs` 以特定的方式进行采样，该方法是 :code:`BasicDecoder.step` 中的一部分。
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+返回类型：Variable        
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+产生下一解码步的输入、状态，以及每个序列是否结束的标识。该方法是 :code:`BasicDecoder.step` 中的一部分。
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构， :code:`next_states` 和输入参数中的 :code:`states` 具有相同的结构、形状和数据类型； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+返回类型：tuple
--- a/doc/fluid/api_cn/layers_cn/Decoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/Decoder_cn.rst
@@ -39,13 +39,28 @@ Decoder提供的主要抽象为：
 返回类型：tuple
-.. py:method:: step(time, inputs, states)
+.. py:method:: step(time, inputs, states, **kwargs)
 在解码的每个时间步中被调用的接口
 参数：  
-  - **outputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 结构和数据类型与 :code:`output_dtype` 相同。 tensor堆叠所有时间步长的输出从而具有shape :math:`[time\_step，batch\_size，...]` ，由调用者完成。 
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。。
-  - **final_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 它是 :code:`decoder.step` 在最后一个解码步返回的 :code:`next_states`， 因此具有与任何时间步长的状态相同的结构，形状和数据类型。
+  - **inputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`beam_search_state` 相同。
+  - **kwargs** - 附加的关键字参数，由调用者提供。
+返回：一个元组 :code:`(outputs, next_states, next_inputs, finished)` 。:code:`next_states` 和 :code:`next_inputs` 都是单个tensor变量或tensor变量组成的嵌套结构，且结构、形状和数据类型均分别与输入参数中的 :code:`states` 和 :code:`inputs` 相同。 :code:`outputs` 是单个tensor变量或tensor变量组成的嵌套结构。 :code:`finished` 是一个bool类型的tensor变量。
+返回类型：tuple
+.. py:method:: finalize(self, outputs, final_states, sequence_lengths)
+如果提供了实现，将在整个解码迭代结束后被执行一次。
+参数：  
+  - **outputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 其中每个tensor的形状均为 :math:`[time\_step，batch\_size，...]` ，是将所有解码步中与其对应的的输出进行堆叠的结果，这个过程由其调用者完成。 
+  - **final_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 它是 :code:`decoder.step` 在最后一个解码步返回的 :code:`next_states`， 因此具有与任何时间步的状态相同的结构，形状和数据类型。
+  - **kwargs** - 命名关键字参数，由提供调用者。
 返回：一个元组 :code:`(final_outputs, final_states)` 。:code:`final_outputs` 和 :code:`final_states` 都是单个tensor变量或tensor变量组成的嵌套结构。

--- a/doc/fluid/api_cn/layers_cn/GreedyEmbeddingHelper_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/GreedyEmbeddingHelper_cn.rst
+.. _cn_api_fluid_layers_GreedyEmbeddingHelper:
+GreedyEmbeddingHelper
+-------------------------------
+.. py:class:: paddle.fluid.layers.GreedyEmbeddingHelper(embedding_fn, start_tokens, end_token)
+GreedyEmbeddingHelper是 :ref:`cn_api_fluid_layers_DecodeHelper` 的子类。作为解码helper，它使用 :code:`argmax` 进行采样，并将采样结果送入embedding层，以此作为下一解码步的输入。
+参数：
+  - **embedding_fn** (callable) - 作用于 :code:`argmax` 结果的函数，通常是一个将词id转换为词嵌入的embedding层，**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size]` ，如果使用后者则还需要在这里提供unsqueeze。
+  - **start_tokens** (Variable) - 形状为 :math:`[batch\_size]` 、数据类型为int64、 值为起始标记id的tensor。
+  - **end_token** (int) - 结束标记id。
+**示例代码**
+.. code-block:: python
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.GreedyEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+.. py:method:: initialize()
+GreedyEmbeddingHelper初始化，其使用构造函数中的 :code:`start_tokens` 作为第一个解码步的输入，并给出每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 同构造函数中的 :code:`start_tokens` ； :code:`initial_finished` 是一个bool类型、值为False的tensor，其形状和 :code:`start_tokens` 相同。
+返回类型：tuple
+.. py:method:: sample(time, outputs, states)
+使用 :code:`argmax` 根据 `outputs` 进行采样。
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+返回类型：Variable
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+对 :code:`sample_ids` 使用 :code:`embedding_fn` ，以此作为下一解码步的输入；同时直接使用输入参数中的 :code:`states` 作为下一解码步的状态；并通过判别 :code:`sample_ids` 是否得到 :code:`end_token`，依此产生每个序列是否结束的标识。
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` ， :code:`next_states` 和输入参数中的 :code:`states` 相同； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+返回类型：tuple
--- a/doc/fluid/api_cn/layers_cn/RNNCell_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/RNNCell_cn.rst
@@ -21,11 +21,11 @@ RNNCell是抽象的基类，代表将输入和状态映射到输出和新状态
  - **states** - 状态，单个tensor变量或tensor变量组成的嵌套结构。
  - **kwargs** - 附加的关键字参数，由调用者提供。
-返回：输出和新状态。输出和新状态都可以是嵌套的tensor变量。新状态必须具有与状态相同的结构。
+返回：包含输出和新状态的二元组 :code:`(outputs，new_states)` 。输出和新状态都可以是嵌套的tensor变量。新状态必须具有与状态相同的结构。
 返回类型：tuple
-.. py:method:: get_initial_states(batch_ref, shape=None, dtype=None, init_value=0)
+.. py:method:: get_initial_states(batch_ref, shape=None, dtype=None, init_value=0, batch_dim_idx=0)
 该接口根据提供的形状，数据类型和初始值来初始化状态。
@@ -34,6 +34,7 @@ RNNCell是抽象的基类，代表将输入和状态映射到输出和新状态
  - **shape** - 单个形状或形状组成的嵌套结构，单个形状是整数的列表或元组。 如果形状的第一维不是batch大小，则自动插入-1作为batch大小。 如果该项为None，将使用属性 :code:`state_shape`。默认值为None。 
  - **dtype** - 单个数据类型或由数据类型组成的嵌套结构。该结构必须与shape的结构相同，例外是当状态中的所有tensor都具有相同的数据类型，这时可以使用单个数据类型。 如果是None并且属性 :code:`cell.state_shape` 不可用，则float32将用作数据类型。 默认值为None。 
  - **init_value** - 用于初始化状态的浮点值。
+  - **batch_dim_idx** - 用于指示 :code:`batch_ref` 中batch所在维度的int值，默认值为0。
 返回：和shape具有相同结构的tensor变量，代表初始状态。
@@ -41,9 +42,9 @@ RNNCell是抽象的基类，代表将输入和状态映射到输出和新状态
 .. py:method:: state_shape()
-该接口用于初始化cell的状态。 单个形状或由形状组成的嵌套结构，单个形状可以是整数的列表或元组(如果形状的第一维不是batch大小，则自动插入-1作为batch大小)。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`shape` 参数的时候，不用实现该方法。
+抽象方法（属性），该接口用于初始化cell的状态。 单个形状或由形状组成的嵌套结构，单个形状可以是整数的列表或元组(如果形状的第一维不是batch大小，则自动插入-1作为batch大小)。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`shape` 参数的时候，不用实现该方法。
 .. py:method:: state_dtype()
-该接口用于初始化cell的状态。 单个数据类型或由数据类型组成的嵌套结构，该结构必须与 :code:`shape` 的结构相同，例外是当状态中的所有tensor都具有相同的数据类型，这时可以使用单个数据类型。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`dtype` 参数的时候，不用实现该方法。
+抽象方法（属性），该接口用于初始化cell的状态。 单个数据类型或由数据类型组成的嵌套结构，该结构必须与 :code:`shape` 的结构相同，例外是当状态中的所有tensor都具有相同的数据类型，这时可以使用单个数据类型。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`dtype` 参数的时候，不用实现该方法。
--- a/doc/fluid/api_cn/layers_cn/SampleEmbeddingHelper_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/SampleEmbeddingHelper_cn.rst
+.. _cn_api_fluid_layers_SampleEmbeddingHelper:
+SampleEmbeddingHelper
+-------------------------------
+.. py:class:: paddle.fluid.layers.SampleEmbeddingHelper(embedding_fn, start_tokens, end_token, softmax_temperature=None, seed=None)
+SampleEmbeddingHelper是 :ref:`cn_api_fluid_layers_GreedyEmbeddingHelper` 的子类。作为解码helper，它通过采样而非使用 :code:`argmax` 并将采样结果送入embedding层，以此作为下一解码步的输入。
+参数：
+  - **embedding_fn** (callable) - 作用于 :code:`argmax` 结果的函数，通常是一个将词id转换为词嵌入的embedding层，**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size]` ，如果使用后者则还需要在这里提供unsqueeze。
+  - **start_tokens** (Variable) - 形状为 :math:`[batch\_size]` 、数据类型为int64、 值为起始标记id的tensor。
+  - **end_token** (int) - 结束标记id。
+  - **softmax_temperature** (float，可选) - 该值用于在softmax计算前除以logits。温度越高（大于1.0）随机性越大，温度越低则越趋向于argmax。该值必须大于0，默认值None等同于1.0。
+  - **seed** (int，可选) - 采样使用的随机种子。默认为None，表示不使用固定的随机种子。
+**示例代码**
+.. code-block:: python
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.SampleEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+.. py:method:: sample(time, outputs, states)
+根据一个多项分布进行采样，此分布由 :code:`softmax(outputs/softmax_temperature)` 计算得到。
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+返回类型：Variable
--- a/doc/fluid/api_cn/layers_cn/TrainingHelper_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/TrainingHelper_cn.rst
+.. _cn_api_fluid_layers_TrainingHelper:
+TrainingHelper
+-------------------------------
+.. py:class:: paddle.fluid.layers.TrainingHelper(inputs, sequence_length, time_major=False)
+TrainingHelper是 :ref:`cn_api_fluid_layers_DecodeHelper` 的子类。作为解码helper，它在每个解码时间步通过在完整序列输入 :code:`inputs` 的相应位置切片作为各步的输入，并且使用 :code:`argmax` 根据 :code:`cell.call()` 的输出进行采样。
+由于要求有完整的序列输入 :code:`inputs` ，TrainingHelper主要用于以teach-forcing的方式进行最大似然训练，采样得到的内容通常不会使用。
+参数：
+  - **inputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。当 :code:`time_major == False` 时，tensor的形状应为 :math:`[batch\_size, sequence\_length, ...]`；当 :code:`time_major == True` 时，tensor的形状应为 :math:`[sequence\_length, batch\_size, ...]`。在解码的每一步都要从中切片取出相应的数据。
+  - **sequence_length** (Variable) - 形状为 :math:`[batch\_size]` 的tensor。它存储了 :code:`inputs` 中每个样本的实际长度，可以据此来标识每个解码步中每个样本是否结束。
+  - **time_major** (bool，可选) - 指示输入tensor和输出tensor中包含的tensor的数据组织。如果为False，则数据组织为batch为主，形状为 :math:`[batch\_size，sequence\_length，...]`。如果为True，则数据组织为time为主，形状为 :math:`[sequence\_length，batch\_size，...]`。默认值：False。
+**示例代码**
+.. code-block:: python
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+            trg_emb = fluid.data(name="trg_emb",
+                                 shape=[None, None, 128],
+                                 dtype="float32")
+            trg_seq_length = fluid.data(name="trg_seq_length",
+                                        shape=[None],
+                                        dtype="int64")
+            helper = layers.TrainingHelper(trg_emb, trg_seq_length)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper)
+            outputs = layers.dynamic_decode(
+                decoder,
+                inits=decoder_cell.get_initial_states(trg_emb),
+                is_test=False)
+.. py:method:: initialize()
+TrainingHelper初始化，其通过在完整序列输入 :code:`inputs` 中首个时间步的位置上切片，以此作为第一个解码步的输入，并给出每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` 。 :code:`initial_finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+返回类型：tuple
+.. py:method:: sample(time, outputs, states)
+使用 :code:`argmax` 根据 `outputs` 进行采样。由于使用完整序列中的切片作为下一解码步的输入，采样得到的内容通常不会使用。
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+返回类型：Variable        
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+从完整序列输入中当前时间步的位置上切片，以此作为产生下一解码步的输入；同时直接使用输入参数中的 :code:`states` 作为下一解码步的状态；并比较当前时间与每个序列的大小，依此产生每个序列是否结束的标识。
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` ， :code:`next_states` 和输入参数中的 :code:`states` 相同； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+返回类型：tuple
--- a/doc/fluid/api_cn/layers_cn/abs_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/abs_cn.rst
@@ -11,23 +11,29 @@ abs
-绝对值激活函数。
+绝对值函数。
 .. math::
    out = |x|
 参数:
-    - **x** (Variable)- 多维Tensor，数据类型为float32或float64。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
-    - **name** (str) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
-返回：表示绝对值结果的Tensor，数据类型与x相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
-返回类型：Variable
+返回类型：Tensor
 **代码示例**：
 .. code-block:: python
-        import paddle.fluid as fluid
+        import paddle
-        data = fluid.layers.data(name="input", shape=[32, 784])
+        import numpy as np
-        result = fluid.layers.abs(data)
+        paddle.disable_static()
+        x_data = np.array([-1, -2, -3, -4]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.abs(x)
+        print(res.numpy())
+        # [1, 2, 3, 4]
--- a/doc/fluid/api_cn/layers_cn/acos_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/acos_cn.rst
@@ -11,29 +11,30 @@ acos
-arccosine激活函数。
+arccosine函数。
 .. math::
    out = cos^{-1}(x)
 参数:
-    - **x(Variable)** - acos的输入Tensor，数据类型为 float32 或 float64
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
-    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
-返回：  `acos` 的输出Tensor，数据类型与 `x` 相同。
-返回类型： Variable
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
+返回类型： Tensor
 **代码示例**：
 .. code-block:: python
-        import paddle.fluid as fluid
+        import paddle
-        data = fluid.layers.data(name="input", shape=[4])
+        import numpy as np
-        # if data is [-0.8183,  0.4912, -0.6444,  0.0371]
-        result = fluid.layers.acos(data)
-        # result is [2.5293, 1.0573, 2.2711, 1.5336]
+        paddle.disable_static()
+        x_data = np.array([-0.8183,  0.4912, -0.6444,  0.0371]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.acos(x)
+        print(res.numpy())
+        # [2.5293, 1.0573, 2.2711, 1.5336]
--- a/doc/fluid/api_cn/layers_cn/asin_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/asin_cn.rst
@@ -11,29 +11,29 @@ asin
-arcsine激活函数。
+arcsine函数。
 .. math::
    out = sin^{-1}(x)
 参数:
-    - **x(Variable)** - asin的输入Tensor，数据类型为 float32 或 float64
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
-    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
-返回：  `asin` 的输出Tensor，数据类型与 `x` 相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
-返回类型： Variable
+返回类型： Tensor
 **代码示例**：
 .. code-block:: python
-        import paddle.fluid as fluid
+        import paddle
-        data = fluid.layers.data(name="input", shape=[4])
+        import numpy as np
-        # if data is [-0.8183,  0.4912, -0.6444,  0.0371]
-        result = fluid.layers.asin(data)
-        # result is [-0.9585,  0.5135, -0.7003,  0.0372]
+        paddle.disable_static()
+        x_data = np.array([-0.8183,  0.4912, -0.6444,  0.0371]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.asin(x)
+        print(res.numpy())
+        # [-0.9585,  0.5135, -0.7003,  0.0372]
--- a/doc/fluid/api_cn/layers_cn/atan_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/atan_cn.rst
@@ -11,30 +11,29 @@ atan
-arctanh激活函数。
+arctangent函数。
 .. math::
-    out = tanh^{-1}(x)
+    out = tan^{-1}(x)
 参数:
-    - **x(Variable)** - atan的输入Tensor，数据类型为 float32 或 float64
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
-    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
-返回：  `atan` 的输出Tensor，数据类型与 `x` 相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
-返回类型： Variable
+返回类型： Tensor
 **代码示例**：
 .. code-block:: python
-        import paddle.fluid as fluid
+        import paddle
-        data = fluid.layers.data(name="input", shape=[4])
+        import numpy as np
-        # if data is [-0.8183,  0.4912, -0.6444,  0.0371]
-        result = fluid.layers.atan(data)
-        # result is [-0.6858,  0.4566, -0.5724,  0.0371]
+        paddle.disable_static()
+        x_data = np.array([-0.8183,  0.4912, -0.6444,  0.0371]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.atan(x)
+        print(res.numpy())
+        # [-0.6858,  0.4566, -0.5724,  0.0371]
--- a/doc/fluid/api_cn/layers_cn/ceil_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/ceil_cn.rst
@@ -19,24 +19,24 @@ ceil
 参数:
-    - **x** (Variable) - 该OP的输入为多维Tensor。数据类型为float32或float64。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
-    - **name** (str, 可选) - 具体用法请参见 :ref:`api_guide_Name`，一般无需设置，默认值为None。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
-返回： 输出为Tensor，与 ``x`` 维度相同、数据类型相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
-返回类型： Variable
+返回类型： Tensor
 **代码示例**：
 .. code-block:: python
-  import paddle.fluid as fluid
+        import paddle
-  import numpy as np
+        import numpy as np
-  input_ceil = np.array([[-1.5,6],[1,15.6]])
+        paddle.disable_static()
-  with fluid.dygraph.guard():
+        x_data = np.array([[-1.5,6],[1,15.6]]).astype(np.float32)
-      x = fluid.dygraph.to_variable(input_ceil)
+        x = paddle.to_variable(x_data)
-      y = fluid.layers.ceil(x)
+        res = paddle.ceil(x)
-      print(y.numpy())
+        print(res.numpy())
-      # [[-1.  6.]
+        # [[-1.  6.]
-      # [ 1. 16.]]
+        # [ 1. 16.]]
--- a/doc/fluid/api_cn/layers_cn/cos_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/cos_cn.rst
@@ -13,32 +13,31 @@ cos
 余弦函数。
+输入范围是 `(-inf, inf)` ， 输出范围是 `[-1,1]`。若输入超出边界则结果为`nan`。
 .. math::
    out = cos(x)
 参数:
-    - **x** (Variable) - 该OP的输入为多维Tensor，数据类型为float32，float64。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
-    - **name** (str, 可选) - 具体用法请参见 :ref:`api_guide_Name`，一般无需设置，默认值为None。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
-返回：输出为Tensor，与 ``x`` 维度相同、数据类型相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
-返回类型：Variable
+返回类型：Tensor
 **代码示例**：
 .. code-block:: python
-  import paddle.fluid as fluid
+        import paddle
-  import numpy as np
+        import numpy as np
-  input_cos = np.array([[-1,np.pi],[1,15.6]])
+        paddle.disable_static()
-  with fluid.dygraph.guard():
+        x_data = np.array([[-1,np.pi],[1,15.6]]).astype(np.float32)
-      x = fluid.dygraph.to_variable(input_cos)
+        x = paddle.to_variable(x_data)
-      y = fluid.layers.cos(x)
+        res = paddle.cos(x)
-      print(y.numpy())
+        print(res.numpy())
-      # [[ 0.54030231 -1.        ]
+        # [[ 0.54030231 -1.        ]
-      # [ 0.54030231 -0.99417763]]
+        # [ 0.54030231 -0.99417763]]
--- a/doc/fluid/api_cn/layers_cn/dynamic_decode_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/dynamic_decode_cn.rst
@@ -5,12 +5,12 @@ dynamic_decode
-.. py:method:: dynamic_decode(decoder, inits=None, max_step_num=None, output_time_major=False, **kwargs):
+.. py:method:: dynamic_decode(decoder, inits=None, max_step_num=None, output_time_major=False, impute_finished=False, is_test=False, return_length=False, **kwargs):
 :api_attr: 声明式编程模式（静态图)
 该接口重复执行 :code:`decoder.step()` 直到 其返回的表示完成状态的Tensor中的值全部为True或解码步骤达到 :code:`max_step_num`。
 :code:`decode.initialize()` 会在解码循环之前被调用一次。如果 :code:`decoder` 实现了 :code:`finalize` 方法，则 :code:`decoder.finalize()` 在解码循环后将被调用一次。
@@ -20,9 +20,12 @@ dynamic_decode
  - **inits** (object，可选) - 传递给 :code:`decoder.initialize` 的参数。默认为None。
  - **max_step_num** (int，可选) - 最大步数。如果未提供，解码直到解码过程完成（ :code:`decode.step()` 返回的表示完成状态的Tensor中的值全部为True）。默认为None。
  - **output_time_major** (bool，可选) - 指明最终输出(此方法的第一个返回值)中包含的Tensor的数据布局。如果为False，其将使用batch优先的数据布局, 此时的形状为 :math:`[batch\_size，seq\_len，...]`。如果为True，其将使用time优先的数据布局，此时的形状为 :math:`[seq\_len，batch\_size，...]`。默认值为False。
+  - **impute_finished** (bool，可选) - 若为True，对于当前批次中完成状态为结束的样本，将会拷贝其上一步的状态，而非像未结束的实例那样使用 :code:`decode.step()` 返回的 :code:`next_states` 作为新的状态，这保证了返回的最终状态 :code:`final_states` 是正确的；否则，不会区分是否结束，也没有这个拷贝操作。若 :code:`final_states` 会被使用，则这里应该设置为True，这会一定程度上影响速度。默认为False。
+  - **is_test** (bool，可选) - 标识是否是预测模式，预测模式下内存占用会更少。默认为False。
+  - **return_length** (bool，可选) - 标识是否在返回的元组中额外包含一个存放了所有解码序列实际长度的Tensor。默认为False。
  - **kwargs** - 其他命名关键字参数。这些参数将传递给 :code:`decoder.step`。
-返回:一个二元组 :code:`(final_outputs，final_states)`, 其包含了最终的输出和状态，这两者都是Tensor或Tensor的嵌套结构。:code:`final_outputs` 具有与 :code:`decoder.output_dtype` 相同的结构和数据类型， 其中的每个tensor都是对所有解码时间步对应输出的堆叠。 这些tensor也可能会通过 :code:`decoder.finalize` 进行修改。:code:`final_states` 是最后时间步的状态，和 :code:`decoder.initialize` 返回的初始状态具有相同的结构，其中的tensor也具有相同的形状 和数据类型。
+返回：若 :code:`return_length` 为True，则返回三元组 :code:`(final_outputs, final_states, sequence_lengths)` ，否则返回二元组 :code:`(final_outputs, final_states)` 。 :code:`final_outputs, final_states` 包含了最终的输出和状态，这两者都是Tensor或Tensor的嵌套结构。:code:`final_outputs` 具有与 :code:`decoder.step()` 返回的 :code:`outputs` 相同的结构和数据类型， 且其中的每个tensor都是将所有解码步中与其对应的的输出进行堆叠的结果；如果 :code:`decoder` 实现了 :code:`finalize` 方法，这些tensor也可能会通过 :code:`decoder.finalize()` 进行修改。:code:`final_states` 是最后时间步的状态，和 :code:`decoder.initialize()` 返回的初始状态具有相同的结构，形状和数据类型。:code:`sequence_lengths` 是int64类型的tensor，和 :code:`decoder.initialize()` 返回的 :code:`finished` 具有相同的形状，其保存了所有解码序列实际长度。
 返回类型：tuple

--- a/doc/fluid/api_cn/nn_cn.rst
+++ b/doc/fluid/api_cn/nn_cn.rst
@@ -73,7 +73,7 @@ paddle.nn
    nn_cn/GradientClipByValue_cn.rst
    nn_cn/grid_sampler_cn.rst
    nn_cn/GroupNorm_cn.rst
-    nn_cn/hard_shrink_cn.rst
+    nn_cn/hardshrink_cn.rst
    nn_cn/hard_sigmoid_cn.rst
    nn_cn/hard_swish_cn.rst
    nn_cn/hash_cn.rst
@@ -94,6 +94,7 @@ paddle.nn
    nn_cn/linear_lr_warmup_cn.rst
    nn_cn/logsigmoid_cn.rst
    nn_cn/log_loss_cn.rst
+    nn_cn/log_softmax_cn.rst
    nn_cn/lrn_cn.rst
    nn_cn/margin_ranking_loss_cn.rst
    nn_cn/maxout_cn.rst

--- a/doc/fluid/api_cn/nn_cn/LogSoftmax_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/LogSoftmax_cn.rst
-.. _cn_api_nn_LogSoftmax:
-LogSoftmax
-------------------------------
-.. py:class:: paddle.nn.LogSoftmax(axis=None)
-:alias_main: paddle.nn.LogSoftmax
-:alias: paddle.nn.LogSoftmax,paddle.nn.layer.LogSoftmax,paddle.nn.layer.activation.LogSoftmax
-**LogSoftmax激活层：**
-.. math::
-        \\output = \frac{1}{1 + e^{-input}}\\
-参数:
-    - **axis** (int, 可选) - 指示进行LogSoftmax计算的维度索引，其范围应为 :math:`[-1，rank-1]` ，其中rank是输入变量的秩。默认值：None（与-1效果相同，表示对最后一维做LogSoftmax操作）。
-返回：无
-**代码示例**
-..  code-block:: python
-    import paddle.fluid as fluid
-    import paddle.nn as nn
-    import numpy as np
-    data = np.array([[[-2.0, 3.0, -4.0, 5.0],
-                      [3.0, -4.0, 5.0, -6.0],
-                      [-7.0, -8.0, 8.0, 9.0]],
-                     [[1.0, -2.0, -3.0, 4.0],
-                      [-5.0, 6.0, 7.0, -8.0],
-                      [6.0, 7.0, 8.0, 9.0]]]).astype('float32')
-    my_log_softnmax = nn.LogSoftmax()
-    with fluid.dygraph.guard():
-        data = fluid.dygraph.to_variable(data)
-        res = my_log_softnmax(data)
-        # [[[ -7.1278396   -2.1278396   -9.127839    -0.12783948]
-        #   [ -2.1270514   -9.127051    -0.12705144 -11.127051  ]
-        #   [-16.313261   -17.313261    -1.3132617   -0.31326184]]
-        #  [[ -3.0518122   -6.051812    -7.051812    -0.051812  ]
-        #   [-12.313267    -1.3132664   -0.3132665  -15.313267  ]
-        #   [ -3.4401896   -2.4401896   -1.4401896   -0.44018966]]]
--- a/doc/fluid/api_cn/nn_cn/ReLU_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/ReLU_cn.rst
-.. _cn_api_nn_ReLU:
-ReLU
-------------------------------
-.. py:class:: paddle.nn.ReLU(inplace=False)
-:alias_main: paddle.nn.ReLU
-:alias: paddle.nn.ReLU,paddle.nn.layer.ReLU,paddle.nn.layer.activation.ReLU
-:update_api: paddle.fluid.layers.relu
-**ReLU（Rectified Linear Unit）激活层：**
-.. math::
-        \\Out = max(X, 0)\\
-其中，:math:`X` 为输入的 Tensor
-参数:
-    - **inplace** （bool，可选）- 如果 ``inplace`` 为 ``True``，则 ``ReLU`` 的输入和输出是同一个变量，否则 ``ReLU`` 的输入和输出是不同的变量。默认值：``False``。请注意，如果 ``ReLU`` 的输入同时是其它OP的输入，则 ``inplace`` 必须为False。
-返回：无
-**代码示例**
-..  code-block:: python
-    import paddle.fluid as fluid
-    import paddle.nn as nn
-    import numpy as np
-    data = np.array([-2, 0, 1]).astype('float32')
-    my_relu = nn.ReLU()
-    with fluid.dygraph.guard():
-        data = fluid.dygraph.to_variable(data)
-        res = my_relu(data)  # [0, 0, 1]
--- a/doc/fluid/api_cn/nn_cn/activation_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn.rst
@@ -8,5 +8,11 @@ activation
 ..  toctree::
    :maxdepth: 1
+    activation_cn/ELU_cn.rst
+    activation_cn/GELU_cn.rst
+    activation_cn/Hardshrink_cn.rst
+    activation_cn/ReLU_cn.rst
    activation_cn/LeakyReLU_cn.rst
+    activation_cn/LogSoftmax_cn.rst
    activation_cn/Sigmoid_cn.rst
+    activation_cn/LogSigmoid_cn.rst
--- a/doc/fluid/api_cn/nn_cn/activation_cn/ELU_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/ELU_cn.rst
+.. _cn_api_nn_ELU:
+ELU
+-------------------------------
+.. py:class:: paddle.nn.ELU(alpha=1.0, name=None)
+ELU激活层（ELU Activation Operator）
+根据 `Exponential Linear Units <https://arxiv.org/abs/1511.07289>` 对输入Tensor中每个元素应用以下计算。
+.. math::
+    ELU(x) = max(0, x) + min(0, \alpha * (e^{x} − 1))
+其中，:math:`x` 为输入的 Tensor
+参数
+::::::::::
+    - alpha (float, 可选) - ELU的alpha值，默认值为1.0。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+形状:
+::::::::::
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+代码示例
+:::::::::
+.. code-block:: python
+    import paddle
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_tensor(np.array([[-1, 6],[1, 15.6]]))
+    m = paddle.nn.ELU(0.2)
+    out = m(x)
+    # [[-0.12642411  6.        ]
+    #  [ 1.          15.6      ]]
--- a/doc/fluid/api_cn/nn_cn/activation_cn/GELU_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/GELU_cn.rst
+.. _cn_api_nn_GELU:
+GELU
+-------------------------------
+.. py:class:: paddle.nn.GELU(approximate=False, name=None)
+GELU激活层（GELU Activation Operator）
+更多细节请参考 `Gaussian Error Linear Units <https://arxiv.org/abs/1606.08415>`。
+如果使用近似计算：
+.. math::
+    GELU(x) = 0.5 * x * (1 + tanh(\sqrt{\frac{2}{\pi}} * (x + 0.044715x^{3})))
+如果不使用近似计算：
+.. math::
+    GELU(x) = 0.5 * x * (1 + erf(\frac{x}{\sqrt{2}}))
+其中，:math:`x` 为输入的 Tensor
+参数
+::::::::::
+    - approximate (bool, 可选) - 是否使用近似计算，默认值为 False，即不使用近似计算。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+形状:
+::::::::::
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+代码示例
+:::::::::
+.. code-block:: python
+    import paddle
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_tensor(np.array([[-1, 0.5],[1, 1.5]]))
+    m = paddle.nn.GELU()
+    out = m(x) # [-0.158655 0.345731 0.841345 1.39979]
+    m = paddle.nn.GELU(True)
+    out = m(x) # [-0.158808 0.345714 0.841192 1.39957]
--- a/doc/fluid/api_cn/nn_cn/activation_cn/Hardshrink_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/Hardshrink_cn.rst
+.. _cn_api_nn_Hardshrink:
+Hardshrink
+-------------------------------
+.. py:class:: paddle.nn.Hardshrink(threshold=0.5, name=None)
+Hardshrink激活层
+.. math::
+    Hardshrink(x)=
+        \left\{
+        \begin{aligned}
+        &x, & & if \ x > threshold \\
+        &x, & & if \ x < -threshold \\
+        &0, & & if \ others
+        \end{aligned}
+        \right.
+其中，:math:`x` 为输入的 Tensor
+参数
+::::::::::
+    - threshold (float, 可选) - Hardshrink激活计算公式中的threshold值。默认值为0.5。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+形状:
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+代码示例
+:::::::::
+.. code-block:: python
+    import paddle
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_variable(np.array([-1, 0.3, 2.5]))
+    m = paddle.nn.Hardshrink()
+    out = m(x) # [-1., 0., 2.5]
--- a/doc/fluid/api_cn/nn_cn/activation_cn/LogSigmoid_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/LogSigmoid_cn.rst
+.. _cn_api_nn_LogSigmoid:
+LogSigmoid
+-------------------------------
+.. py:class:: paddle.nn.LogSigmoid(name=None)
+Logsigmoid激活层。计算公式如下：
+.. math::
+    Logsigmoid(x) = \log \frac{1}{1 + e^{-x}}
+其中，:math:`x` 为输入的 Tensor
+参数
+::::::::::
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+形状:
+::::::::::
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+代码示例
+:::::::::
+.. code-block:: python
+    import paddle
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_tensor(np.array([1.0, 2.0, 3.0, 4.0]))
+    m = paddle.nn.LogSigmoid()
+    out = m(x) # [0.7310586, 0.880797, 0.95257413, 0.98201376]
--- a/doc/fluid/api_cn/nn_cn/activation_cn/LogSoftmax_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/LogSoftmax_cn.rst
+.. _cn_api_nn_LogSoftmax:
+LogSoftmax
+-------------------------------
+.. py:class:: paddle.nn.LogSoftmax(axis=-1, name=None)
+LogSoftmax激活层，计算公式如下：
+.. math::
+    Out[i, j] = log(softmax(x)) 
+              = log(\\frac{\exp(X[i, j])}{\sum_j(exp(X[i, j])})
+参数
+::::::::::
+    - axis (int, 可选) - 指定对输入Tensor进行运算的轴。``axis`` 的有效范围是[-D, D)，D是输入Tensor的维度， ``axis`` 为负值时与 :math:`axis + D` 等价。默认值为-1。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+形状:
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+代码示例
+:::::::::
+.. code-block:: python
+    import paddle
+    import numpy as np
+    paddle.disable_static()
+    x = np.array([[[-2.0, 3.0, -4.0, 5.0],
+                    [3.0, -4.0, 5.0, -6.0],
+                    [-7.0, -8.0, 8.0, 9.0]],
+                    [[1.0, -2.0, -3.0, 4.0],
+                    [-5.0, 6.0, 7.0, -8.0],
+                    [6.0, 7.0, 8.0, 9.0]]], 'float32')
+    m = paddle.nn.LogSoftmax()
+    x = paddle.to_tensor(x)
+    out = m(x)
+    # [[[ -7.1278396   -2.1278396   -9.127839    -0.12783948]
+    #   [ -2.1270514   -9.127051    -0.12705144 -11.127051  ]
+    #   [-16.313261   -17.313261    -1.3132617   -0.31326184]]
+    #  [[ -3.0518122   -6.051812    -7.051812    -0.051812  ]
+    #   [-12.313267    -1.3132664   -0.3132665  -15.313267  ]
+    #   [ -3.4401896   -2.4401896   -1.4401896   -0.44018966]]]
--- a/doc/fluid/api_cn/nn_cn/activation_cn/ReLU_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/ReLU_cn.rst
+.. _cn_api_nn_ReLU:
+ReLU
+-------------------------------
+.. py:class:: paddle.nn.ReLU(name=None)
+ReLU激活层（Rectified Linear Unit）。计算公式如下：
+.. math::
+    ReLU(x) = max(0, x)
+其中，:math:`x` 为输入的 Tensor
+参数
+::::::::::
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+形状:
+::::::::::
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+代码示例
+:::::::::
+.. code-block:: python
+    import paddle
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_tensor(np.array([-2, 0, 1]).astype('float32'))
+    m = paddle.nn.ReLU()
+    out = m(x) # [0., 0., 1.]
--- a/doc/fluid/api_cn/nn_cn/elu_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/elu_cn.rst
@@ -2,6 +2,43 @@
 elu
 -------------------------------
-:doc_source: paddle.fluid.layers.elu
+.. py:function:: paddle.nn.functional.elu(x, alpha=1.0, name=None)
+elu激活层（ELU Activation Operator）
+根据 `Exponential Linear Units <https://arxiv.org/abs/1511.07289>` 对输入Tensor中每个元素应用以下计算。
+.. math::
+    elu(x) = max(0, x) + min(0, \alpha * (e^{x} − 1))
+其中，:math:`x` 为输入的 Tensor
+参数:
+::::::::::
+ - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+ - alpha (float, 可选) - elu的alpha值，默认值为1.0。
+ - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+代码示例
+::::::::::
+.. code-block:: python
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_tensor(np.array([[-1,6],[1,15.6]]))
+    out = F.elu(x, alpha=0.2) 
+    # [[-0.12642411  6.        ]
+    #  [ 1.          15.6      ]]
--- a/doc/fluid/api_cn/nn_cn/functional_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/functional_cn.rst
@@ -10,4 +10,7 @@ functional
    functional_cn/l1_loss_cn.rst
    functional_cn/nll_loss_cn.rst
+    functional_cn/normalize_cn.rst
    functional_cn/margin_ranking_loss_cn.rst
+    functional_cn/mse_loss_cn.rst
+    functional_cn/sigmoid_cn.rst
--- a/doc/fluid/api_cn/nn_cn/functional_cn/l1_loss_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/functional_cn/l1_loss_cn.rst
 l1_loss
 -------------------------------
-.. py:function:: paddle.nn.functional.l1_loss(x, label, reduction='mean', name=None)
+.. py:function:: paddle.nn.functional.l1_loss(input, label, reduction='mean', name=None)
-该接口计算输入 ``x`` 和标签 ``label`` 间的 `L1 loss` 损失。
+该接口计算输入 ``input`` 和标签 ``label`` 间的 `L1 loss` 损失。
 该损失函数的数学计算公式如下：
 当 `reduction` 设置为 ``'none'`` 时，
    .. math::
-        Out = \lvert x - label\rvert
+        Out = \lvert input - label\rvert
 当 `reduction` 设置为 ``'mean'`` 时，
    .. math::
-       Out = MEAN(\lvert x - label\rvert)
+       Out = MEAN(\lvert input - label\rvert)
 当 `reduction` 设置为 ``'sum'`` 时，
    .. math::
-       Out = SUM(\lvert x - label\rvert)
+       Out = SUM(\lvert input - label\rvert)
 参数
 :::::::::
-    - **x** (Tensor): - 输入的Tensor，维度是[N, *], 其中N是batch size， `*` 是任意数量的额外维度。数据类型为：float32、float64、int32、int64。
+    - **input** (Tensor): - 输入的Tensor，维度是[N, *], 其中N是batch size， `*` 是任意数量的额外维度。数据类型为：float32、float64、int32、int64。
-    - **label** (Tensor): - 标签，维度是[N, *], 与 ``x`` 相同。数据类型为：float32、float64、int32、int64。
+    - **label** (Tensor): - 标签，维度是[N, *], 与 ``input`` 相同。数据类型为：float32、float64、int32、int64。
    - **reduction** (str, 可选): - 指定应用于输出结果的计算方式，可选值有: ``'none'``, ``'mean'``, ``'sum'`` 。默认为 ``'mean'``，计算 `L1Loss` 的均值；设置为 ``'sum'`` 时，计算 `L1Loss` 的总和；设置为 ``'none'`` 时，则返回 `L1Loss`。
    - **name** (str，可选): - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 返回
 :::::::::
-``Tensor``, 输入 ``x`` 和标签 ``label`` 间的 `L1 loss` 损失。如果 :attr:`reduction` 是 ``'none'``, 则输出Loss的维度为 [N, *], 与输入 ``x`` 相同。如果 :attr:`reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出Loss的维度为 [1]。
+``Tensor``, 输入 ``input`` 和标签 ``label`` 间的 `L1 loss` 损失。如果 `reduction` 是 ``'none'``, 则输出Loss的维度为 [N, *], 与输入 ``input`` 相同。如果 `reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出Loss的维度为 [1]。
 代码示例
@@ -40,24 +40,24 @@ l1_loss
 .. code-block:: python
-        import paddle
        import numpy as np
+        import paddle
        paddle.disable_static()
-        x_data = np.array([[1.5, 0.8], [0.2, 1.3]]).astype("float32")
+        input_data = np.array([[1.5, 0.8], [0.2, 1.3]]).astype("float32")
        label_data = np.array([[1.7, 1], [0.4, 0.5]]).astype("float32")
-        x = paddle.to_variable(x_data)
+        input = paddle.to_variable(input_data)
        label = paddle.to_variable(label_data)
-        l1_loss = paddle.nn.functional.l1_loss(x, label)
+        l1_loss = paddle.nn.functional.l1_loss(input, label)
        print(l1_loss.numpy())  
        # [0.35]
-        l1_loss = paddle.nn.functional.l1_loss(x, label, reduction='none')
+        l1_loss = paddle.nn.functional.l1_loss(input, label, reduction='none')
        print(l1_loss.numpy())  
        # [[0.20000005 0.19999999]
        # [0.2        0.79999995]]
-        l1_loss = paddle.nn.functional.l1_loss(x, label, reduction='sum')
+        l1_loss = paddle.nn.functional.l1_loss(input, label, reduction='sum')
        print(l1_loss.numpy())  
        # [1.4]
--- a/doc/fluid/api_cn/nn_cn/functional_cn/mse_loss_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/functional_cn/mse_loss_cn.rst
+mse_loss
+-------------------------------
+.. py:function:: paddle.nn.functional.mse_loss(input, label, reduction='mean', name=None)
+该OP用于计算预测值和目标值的均方差误差。
+对于预测值input和目标值label，公式为：
+当 `reduction` 设置为 ``'none'`` 时，
+    .. math::
+        Out = (input - label)^2
+当 `reduction` 设置为 ``'mean'`` 时，
+    .. math::
+       Out = \operatorname{mean}((input - label)^2)
+当 `reduction` 设置为 ``'sum'`` 时，
+    .. math::
+       Out = \operatorname{sum}((input - label)^2)
+参数：
+:::::::::
+    - **input** (Tensor) - 预测值，维度为 :math:`[N_1, N_2, ..., N_k]` 的多维Tensor。数据类型为float32或float64。
+    - **label** (Tensor) - 目标值，维度为 :math:`[N_1, N_2, ..., N_k]` 的多维Tensor。数据类型为float32或float64。
+返回
+:::::::::
+``Tensor``, 输入 ``input`` 和标签 ``label`` 间的 `mse loss` 损失。
+**代码示例**：
+.. code-block:: python
+    import numpy as np
+    import paddle
+    # static graph mode
+    paddle.enable_static()
+    mse_loss = paddle.nn.loss.MSELoss()
+    input = paddle.data(name="input", shape=[1])
+    label = paddle.data(name="label", shape=[1])
+    place = paddle.CPUPlace()
+    input_data = np.array([1.5]).astype("float32")
+    label_data = np.array([1.7]).astype("float32")
+    output = mse_loss(input,label)
+    exe = paddle.static.Executor(place)
+    exe.run(paddle.static.default_startup_program())
+    output_data = exe.run(
+        paddle.static.default_main_program(),
+        feed={"input":input_data, "label":label_data},
+        fetch_list=[output],
+        return_numpy=True)
+    print(output_data)
+    # [array([0.04000002], dtype=float32)]
+    # dynamic graph mode
+    paddle.disable_static()
+    input = paddle.to_variable(input_data)
+    label = paddle.to_variable(label_data)
+    output = mse_loss(input, label)
+    print(output.numpy())
+    # [0.04000002]
--- a/doc/fluid/api_cn/nn_cn/functional_cn/normalize_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/functional_cn/normalize_cn.rst
+normalize
+-------------------------------
+.. py:function:: paddle.nn.functional.normalize(x, p=2, axis=1, epsilon=1e-12, name=None)
+该接口使用 :math:`L_p` 范数沿维度 ``axis`` 对 ``x`` 进行归一化。计算公式如下：
+.. math::
+    y = \frac{x}{ \max\left( \lvert \lvert x \rvert \rvert_p, epsilon\right) }
+.. math::
+    \lvert \lvert x \rvert \rvert_p = \left(\sum_i {\lvert x_i\rvert^p}  \right)^{1/p}
+其中 :math:`\sum_i{\lvert x_i\rvert^p}` 沿维度 ``axis`` 进行计算。
+参数
+:::::::::
+    - **x** (Tensor) - 输入可以是N-D Tensor。数据类型为：float32、float64。
+    - **p** (float|int, 可选) - 范数公式中的指数值。默认值:2
+    - **axis** (int, 可选）- 要进行归一化的轴。如果 ``x`` 是1-D Tensor，轴固定为0。如果 `axis < 0`，轴为 `x.ndim + axis`。-1表示最后一维。
+    - **epsilon** (float，可选) - 添加到分母上的值以防止分母除0。默认值为1e-12。
+    - **name** (str，可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+返回
+:::::::::
+``Tensor``, 输出的形状和数据类型和 ``x`` 相同。
+抛出异常：
+:::::::::
+    - ``TypeError`` - 当参数  ``p`` 或者 ``axis`` 的类型不符合要求时。或者当参数 ``x`` 的类型或数据类型不符合要求时。
+代码示例
+:::::::::
+.. code-block:: python
+        import numpy as np
+        import paddle
+        import paddle.nn.functional as F
+        paddle.disable_static()
+        x = np.arange(6, dtype=np.float32).reshape(2,3)
+        x = paddle.to_variable(x)
+        y = F.normalize(x)
+        print(y.numpy())
+        # [[0.         0.4472136  0.8944272 ]
+        # [0.42426404 0.5656854  0.7071067 ]]
+        y = F.normalize(x, p=1.5)
+        print(y.numpy())
+        # [[0.         0.40862012 0.81724024]
+        # [0.35684016 0.4757869  0.5947336 ]]
+        y = F.normalize(x, axis=0)
+        print(y.numpy())
+        # [[0.         0.24253564 0.37139067]
+        # [1.         0.97014254 0.9284767 ]]
--- a/doc/fluid/api_cn/nn_cn/functional_cn/sigmoid_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/functional_cn/sigmoid_cn.rst
+.. _cn_api_nn_functional_sigmoid:
+sigmoid
+-------------------------------
+.. py:function:: paddle.nn.functional.sigmoid(x, name=None)
+sigmoid激活函数
+.. math::
+    out = \frac{1}{1 + e^{-x}}
+参数：
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+返回：
+    - Tensor，对输入x进行sigmoid激活后的Tensor，形状、数据类型与输入x一致。
+**代码示例**：
+.. code-block:: python
+    import numpy as np
+    import paddle
+    import paddle.nn.functional as F
+    paddle.disable_static()
+    x_data = np.array([-0.4, -0.2, 0.1, 0.3])
+    x = paddle.to_tensor(x_data)
+    out = F.sigmoid(x)
+    print(out.numpy())
+    # [0.40131234 0.450166   0.52497919 0.57444252]
--- a/doc/fluid/api_cn/nn_cn/gelu_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/gelu_cn.rst
@@ -2,6 +2,47 @@
 gelu
 -------------------------------
-:doc_source: paddle.fluid.layers.gelu
+.. py:function:: paddle.nn.functional.gelu(x, approximate=False, name=None)
+gelu激活层（GELU Activation Operator）
+逐元素计算 gelu激活函数。更多细节请参考 `Gaussian Error Linear Units <https://arxiv.org/abs/1606.08415>`_ 。
+如果使用近似计算：
+.. math::
+    gelu(x) = 0.5 * x * (1 + tanh(\sqrt{\frac{2}{\pi}} * (x + 0.044715x^{3})))
+如果不使用近似计算：
+.. math::
+    gelu(x) = 0.5 * x * (1 + erf(\frac{x}{\sqrt{2}}))
+其中，:math:`x` 为输入的 Tensor
+参数:
+::::::::::
+ - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+ - approximate (bool, 可选) - 是否使用近似计算，默认值为 False，表示不使用近似计算。
+ - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+代码示例
+::::::::::
+.. code-block:: python
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_tensor(np.array([[-1, 0.5],[1, 1.5]]))
+    out1 = F.gelu(x) # [-0.158655 0.345731 0.841345 1.39979]
+    out2 = F.gelu(x, True) # [-0.158808 0.345714 0.841192 1.39957]
--- a/doc/fluid/api_cn/nn_cn/hard_shrink_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/hard_shrink_cn.rst
-.. _cn_api_nn_cn_hard_shrink:
-hard_shrink
-------------------------------
-:doc_source: paddle.fluid.layers.hard_shrink
--- a/doc/fluid/api_cn/nn_cn/hardshrink_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/hardshrink_cn.rst
+.. _cn_api_nn_cn_hard_shrink:
+hardshrink
+-------------------------------
+.. py:functional:: paddle.nn.functional.hardshrink(x, threshold=0.5, name=None)
+hardshrink激活层。计算公式如下：
+.. math::
+    hardshrink(x)=
+        \left\{
+        \begin{aligned}
+        &x, & & if \ x > threshold \\
+        &x, & & if \ x < -threshold \\
+        &0, & & if \ others
+        \end{aligned}
+        \right.
+其中，:math:`x` 为输入的 Tensor
+参数
+::::::::::
+    - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+    - threshold (float, 可选) - hard_shrink激活计算公式中的threshold值。默认值为0.5。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+代码示例
+::::::::::
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_variable(np.array([-1, 0.3, 2.5]))
+    out = F.hardshrink(x) # [-1., 0., 2.5]
--- a/doc/fluid/api_cn/nn_cn/log_softmax_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/log_softmax_cn.rst
+.. _cn_api_nn_cn_log_softmax:
+log_softmax
+-------------------------------
+.. py:function:: paddle.nn.functional.log_softmax(x, axis=-1, dtype=None, name=None)
+该OP实现了log_softmax层。OP的计算公式如下：
+.. math::
+    Out[i, j] = log(softmax(x)) = log(\frac{\exp(X[i, j])}{\sum_j(exp(X[i, j])})
+参数
+::::::::::
+    - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+    - axis (int, 可选) - 指定对输入 ``x`` 进行运算的轴。``axis`` 的有效范围是[-D, D)，D是输入 ``x`` 的维度， ``axis`` 为负值时与 :math:`axis + D` 等价。默认值为-1。
+    - dtype (str|np.dtype|core.VarDesc.VarType, 可选) - 输入Tensor的数据类型。如果指定了 ``dtype`` ，则输入Tensor的数据类型会在计算前转换到 ``dtype`` 。``dtype``可以用来避免数据溢出。如果 ``dtype`` 为None，则输出Tensor的数据类型和 ``x`` 相同。默认值为None。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+返回
+::::::::::
+    ``Tensor`` ，形状和 ``x`` 相同，数据类型为 ``dtype`` 或者和 ``x`` 相同。
+代码示例
+::::::::::
+.. code-block:: python
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+    paddle.disable_static()
+    x = np.array([[[-2.0, 3.0, -4.0, 5.0],
+                    [3.0, -4.0, 5.0, -6.0],
+                    [-7.0, -8.0, 8.0, 9.0]],
+                    [[1.0, -2.0, -3.0, 4.0],
+                    [-5.0, 6.0, 7.0, -8.0],
+                    [6.0, 7.0, 8.0, 9.0]]]).astype('float32')
+    x = paddle.to_tensor(x)
+    out1 = F.log_softmax(x)
+    out2 = F.log_softmax(x, dtype='float64')
+    # out1's data type is float32; out2's data type is float64
+    # out1 and out2's value is as follows:
+    # [[[ -7.1278396   -2.1278396   -9.127839    -0.12783948]
+    #   [ -2.1270514   -9.127051    -0.12705144 -11.127051  ]
+    #   [-16.313261   -17.313261    -1.3132617   -0.31326184]]
+    #  [[ -3.0518122   -6.051812    -7.051812    -0.051812  ]
+    #   [-12.313267    -1.3132664   -0.3132665  -15.313267  ]
+    #   [ -3.4401896   -2.4401896   -1.4401896   -0.44018966]]]
--- a/doc/fluid/api_cn/nn_cn/logsigmoid_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/logsigmoid_cn.rst
@@ -2,6 +2,36 @@
 logsigmoid
 -------------------------------
-:doc_source: paddle.fluid.layers.logsigmoid
+.. py:function:: paddle.nn.functional.logsigmoid(x, name=None)
+logsigmoid激活层。计算公式如下：
+.. math::
+    logsigmoid(x) = \log \frac{1}{1 + e^{-x}}
+其中，:math:`x` 为输入的 Tensor
+参数
+::::::::::
+    - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+代码示例
+::::::::::
+.. code-block:: python
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_tensor(np.array([1.0, 2.0, 3.0, 4.0]))
+    out = F.logsigmoid(x) # [0.7310586, 0.880797, 0.95257413, 0.98201376]
--- a/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss_cn.rst
@@ -3,24 +3,24 @@ L1Loss
 .. py:class:: paddle.nn.loss.L1Loss(reduction='mean', name=None)
-该接口用于创建一个L1Loss的可调用类，L1Loss计算输入x和标签label间的 `L1 loss` 损失。
+该接口用于创建一个L1Loss的可调用类，L1Loss计算输入input和标签label间的 `L1 loss` 损失。
 该损失函数的数学计算公式如下：
 当 `reduction` 设置为 ``'none'`` 时，
    .. math::
-        Out = \lvert x - label\rvert
+        Out = \lvert input - label\rvert
 当 `reduction` 设置为 ``'mean'`` 时，
    .. math::
-       Out = MEAN(\lvert x - label\rvert)
+       Out = MEAN(\lvert input - label\rvert)
 当 `reduction` 设置为 ``'sum'`` 时，
    .. math::
-       Out = SUM(\lvert x - label\rvert)
+       Out = SUM(\lvert input - label\rvert)
 参数
@@ -30,36 +30,36 @@ L1Loss
 形状
 :::::::::
-    - **x** (Tensor): - 输入的Tensor，维度是[N, *], 其中N是batch size， `*` 是任意数量的额外维度。数据类型为：float32、float64、int32、int64。
+    - **input** (Tensor): - 输入的Tensor，维度是[N, *], 其中N是batch size， `*` 是任意数量的额外维度。数据类型为：float32、float64、int32、int64。
-    - **label** (Tensor): - 标签，维度是[N, *], 与 ``x`` 相同。数据类型为：float32、float64、int32、int64。
+    - **label** (Tensor): - 标签，维度是[N, *], 与 ``input`` 相同。数据类型为：float32、float64、int32、int64。
-    - **output** (Tensor): - 输入 ``x`` 和标签 ``label`` 间的 `L1 loss` 损失。如果 :attr:`reduction` 是 ``'none'``, 则输出Loss的维度为 [N, *], 与输入 ``x`` 相同。如果 :attr:`reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出Loss的维度为 [1]。
+    - **output** (Tensor): - 输入 ``input`` 和标签 ``label`` 间的 `L1 loss` 损失。如果 `reduction` 是 ``'none'``, 则输出Loss的维度为 [N, *], 与输入 ``input`` 相同。如果 `reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出Loss的维度为 [1]。
 代码示例
 :::::::::
 .. code-block:: python
-        import paddle
        import numpy as np
+        import paddle
        paddle.disable_static()
-        x_data = np.array([[1.5, 0.8], [0.2, 1.3]]).astype("float32")
+        input_data = np.array([[1.5, 0.8], [0.2, 1.3]]).astype("float32")
        label_data = np.array([[1.7, 1], [0.4, 0.5]]).astype("float32")
-        x = paddle.to_variable(x_data)
+        input = paddle.to_variable(input_data)
        label = paddle.to_variable(label_data)
        l1_loss = paddle.nn.loss.L1Loss()
-        output = l1_loss(x, label)
+        output = l1_loss(input, label)
        print(output.numpy())  
        # [0.35]
        l1_loss = paddle.nn.loss.L1Loss(reduction='sum')
-        output = l1_loss(x, label)
+        output = l1_loss(input, label)
        print(output.numpy())  
        # [1.4]
        l1_loss = paddle.nn.loss.L1Loss(reduction='none')
-        output = l1_loss(x, label)
+        output = l1_loss(input, label)
        print(output.numpy())  
        # [[0.20000005 0.19999999]
        # [0.2        0.79999995]]

--- a/doc/fluid/api_cn/nn_cn/relu_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/relu_cn.rst
+.. _cn_api_nn_cn_relu:
+relu
+-------------------------------
+.. py:function:: paddle.nn.functional.relu(x, name=None)
+relu激活层（Rectified Linear Unit）。计算公式如下：
+.. math::
+    relu(x) = max(0, x)
+其中，:math:`x` 为输入的 Tensor
+参数
+::::::::::
+    - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+代码示例
+::::::::::
+.. code-block:: python
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+    paddle.disable_static()
+    x = paddle.to_tensor(np.array([-2, 0, 1]).astype('float32'))
+    out = F.relu(x) # [0., 0., 1.]
--- a/doc/fluid/api_cn/nn_cn/softmax_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/softmax_cn.rst
@@ -2,7 +2,9 @@
 softmax
 -------------------------------
-.. py:class:: paddle.nn.functional.softmax(x, axis=-1, name=None)
+.. py:function:: paddle.nn.functional.softmax(x, axis=-1, name=None)
 该OP实现了softmax层。OP的计算过程如下：
@@ -27,9 +29,9 @@ softmax
 - 示例1（矩阵一共有三维。axis = -1，表示沿着最后一维（即第三维）做softmax操作）
-.. code-block:: python
+.. code-block:: text
-  输入
+  # input
    x.shape = [2, 3, 4] 
@@ -42,7 +44,7 @@ softmax
    axis = -1
-  输出
+  # output
    out.shape = [2, 3, 4]
@@ -55,9 +57,9 @@ softmax
 - 示例2（矩阵一共有三维。axis = 1，表示沿着第二维做softmax操作）
-.. code-block:: python
+.. code-block:: text
-  输入
+  # input
    x.shape = [2, 3, 4] 
@@ -70,7 +72,7 @@ softmax
    axis = 1
-  输出
+  # output
    out.shape = [2, 3, 4]
@@ -101,7 +103,7 @@ softmax
    import paddle.nn.functional as F
    import numpy as np
-    paddle.enable_imperative()
+    paddle.disable_static()
    x = np.array([[[2.0, 3.0, 4.0, 5.0],
                    [3.0, 4.0, 5.0, 6.0],
@@ -109,7 +111,7 @@ softmax
                    [[1.0, 2.0, 3.0, 4.0],
                    [5.0, 6.0, 7.0, 8.0],
                    [6.0, 7.0, 8.0, 9.0]]], 'float32')
-    x = paddle.imperative.to_variable(x)
+    x = paddle.to_variable(x)
    out = F.softmax(x)
    # [[[0.0320586 , 0.08714432, 0.23688282, 0.64391426],
    #   [0.0320586 , 0.08714432, 0.23688282, 0.64391426],

--- a/doc/fluid/api_cn/paddle_cn/add_cn.rst
+++ b/doc/fluid/api_cn/paddle_cn/add_cn.rst
@@ -2,6 +2,4 @@
 add
 -------------------------------
-:doc_source: paddle.fluid.layers.elementwise_add
+:doc_source: paddle.tensor.add
--- a/doc/fluid/api_cn/tensor_cn/add_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/add_cn.rst
@@ -3,177 +3,47 @@
 add
 -------------------------------
-.. py:function:: paddle.add(x, y, alpha=1, out=None, name=None)
+.. py:function:: paddle.add(x, y, name=None)
 :alias_main: paddle.add
 :alias: paddle.add,paddle.tensor.add,paddle.tensor.math.add
 :update_api: paddle.fluid.layers.elementwise_add
 该OP是逐元素相加算子，输入 ``x`` 与输入 ``y`` 逐元素相加，并将各个位置的输出元素保存到返回结果中。
+输入 ``x`` 与输入 ``y`` 必须可以广播为相同形状, 关于广播规则，请参考 :ref:`use_guide_broadcasting`
 等式为：
 .. math::
        Out = X + Y
 - :math:`X` ：多维Tensor。
- :math:`Y` ：维度必须小于等于X维度的Tensor。
+- :math:`Y` ：多维Tensor。
-对于这个运算算子有2种情况：
-        1. :math:`Y` 的 ``shape`` 与 :math:`X` 相同。
-        2. :math:`Y` 的 ``shape`` 是 :math:`X` 的连续子序列。
-对于情况2:
-        1. 用 :math:`Y` 匹配 :math:`X` 的形状（shape），其中 ``axis`` 是 :math:`Y` 在 :math:`X` 上的起始维度的位置。
-        2. 如果 ``axis`` 为-1（默认值），则 :math:`axis= rank(X)-rank(Y)` 。
-        3. 考虑到子序列， :math:`Y` 的大小为1的尾部维度将被忽略，例如shape（Y）=（2,1）=>（2）。
-例如：
-..  code-block:: text
-        shape(X) = (2, 3, 4, 5), shape(Y) = (,)
-        shape(X) = (2, 3, 4, 5), shape(Y) = (5,)
-        shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5), with axis=-1(default) or axis=2
-        shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
-        shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
-        shape(X) = (2, 3, 4, 5), shape(Y) = (2, 1), with axis=0
 参数：
-        - **x** （Variable）- 多维 ``Tensor`` 或 ``LoDTensor`` 。数据类型为 ``float32`` 、 ``float64`` 、 ``int32`` 或  ``int64``。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64、int32、int64。
-        - **y** （Variable）- 多维 ``Tensor`` 或 ``LoDTensor`` 。数据类型为 ``float32`` 、 ``float64`` 、 ``int32`` 或  ``int64``。
+    - y (Tensor) - 输入的Tensor，数据类型为：float32、float64、int32、int64。
-        - **alpha** （int|float，可选）- 输入y的缩放因子。默认值为1. 如果alpha不为1，本api计算公式变为 :math:`Out = X + alpha * Y`
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
-        - **out** （Variable，可选）-  指定存储运算结果的 ``Tensor`` 。如果设置为None或者不设置，将创建新的 ``Tensor`` 存储运算结果，默认值为None。
-        - **name** （str，可选）- 输出的名字。默认值为None。该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` 。
+返回：  多维Tensor, 数据类型与 ``x`` 相同, 维度为广播后的形状。
-返回：        多维 ``Tensor`` 或 ``LoDTensor`` ，维度和数据类型都与 ``x`` 相同。
+返回类型：        Tensor
-返回类型：        Variable
-**代码示例 1**
+**代码示例**
 ..  code-block:: python
    import paddle
-    import paddle.fluid as fluid
    import numpy as np
-    def gen_data():
+    paddle.enable_imperative()
-        return {
+    np_x = np.array([2, 3, 4]).astype('float64')
-            "x": np.array([2, 3, 4]).astype('float32'),
+    np_y = np.array([1, 5, 2]).astype('float64')
-            "y": np.array([1, 5, 2]).astype('float32')
+    x = paddle.imperative.to_variable(np_x)
-        }
+    y = paddle.imperative.to_variable(np_y)
-    x = fluid.data(name="x", shape=[3], dtype='float32')
-    y = fluid.data(name="y", shape=[3], dtype='float32')
-    z1 = paddle.add(x, y)
-    z2 = paddle.add(x, y, alpha=10)
-    # z = x + y
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    z_value = exe.run(feed=gen_data(),
-                        fetch_list=[z1.name, z2.name])
-    print(z_value[0]) # [3., 8., 6.]
-    print(z_value[1]) # [12. 53. 24.]
-**代码示例 2**
-..  code-block:: python
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-    def gen_data():
-        return {
-            "x": np.ones((2, 3, 4, 5)).astype('float32'),
-            "y": np.zeros((4, 5)).astype('float32')
-        }
-    x = fluid.data(name="x", shape=[2, 3, 4, 5], dtype='float32')
-    y = fluid.data(name="y", shape=[4, 5], dtype='float32')
-    z = paddle.add(x, y, name='z')
-    # z = x + y
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    z_value = exe.run(feed=gen_data(),
-                        fetch_list=[z.name])
-    print(z_value[0])
-    print(z_value[0].shape) # z.shape=[2,3,4,5]
-**代码示例 3**
-..  code-block:: python
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-    def gen_data():
-        return {
-            "x": np.random.randint(1, 5, size=[2, 3, 4, 5]).astype('float32'),
-            "y": np.random.randint(1, 5, size=[5]).astype('float32')
-        }
-    x = fluid.data(name="x", shape=[2,3,4,5], dtype='float32')
-    y = fluid.data(name="y", shape=[5], dtype='float32')
    z = paddle.add(x, y)
-    # z = x / y
+    np_z = z.numpy()
+    print(np_z)  # [3., 8., 6. ]
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    z_value = exe.run(feed=gen_data(),
-                        fetch_list=[z.name])
-    print(z_value[0])
-    print(z_value[0].shape) # z.shape=[2,3,4,5]
-**代码示例 4**
-..  code-block:: python
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-    x = fluid.data(name="x", shape=[3], dtype="float32")
-    y = fluid.data(name='y', shape=[3], dtype='float32')
-    output = fluid.data(name="output", shape=[3], dtype="float32")
-    z = paddle.add(x, y, out=output)
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    data1 = np.array([2, 3, 4], dtype='float32')
-    data2 = np.array([1, 5, 2], dtype='float32')
-    z_value = exe.run(feed={'x': data1,
-                            'y': data2},
-                            fetch_list=[z])
-    print(z_value[0]) # [3. 8. 6.]
-**代码示例 5（动态图）**
-..  code-block:: python
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-    with fluid.dygraph.guard():
-        np_x = np.array([2, 3, 4]).astype('float64')
-        np_y = np.array([1, 5, 2]).astype('float64')
-        x = fluid.dygraph.to_variable(np_x)
-        y = fluid.dygraph.to_variable(np_y)
-        z = paddle.add(x, y, alpha=-0.5)
-        np_z = z.numpy()
-        print(np_z)  # [1.5, 0.5, 3. ]
--- a/doc/fluid/api_cn/tensor_cn/allclose_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/allclose_cn.rst
@@ -3,23 +3,18 @@
 allclose
 -------------------------------
-.. py:function:: paddle.allclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False, name=None)
+.. py:function:: paddle.allclose(x, y, rtol=1e-05, atol=1e-08, equal_nan=False, name=None)
-:alias_main: paddle.allclose
+逐个检查x和y的所有元素是否均满足如下条件：
-:alias: paddle.allclose,paddle.tensor.allclose,paddle.tensor.logic.allclose
-逐个检查input和other的所有元素是否均满足如下条件：
 ..  math::
-    \left| input - other \right| \leq atol + rtol \times \left| other \right|
+    \left| x - y \right| \leq atol + rtol \times \left| y \right|
 该API的行为类似于 :math:`numpy.allclose` ，即当两个待比较Tensor的所有元素均在一定容忍误差范围内视为相等则该API返回True值。
 参数:
-    - **input** (Variable) - 第一个输入待比较Tensor input。
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64。
-    - **other** (Variable) - 第二个输入待比较Tensor other。
+    - **y** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64。
    - **rtol** (float，可选) - 相对容忍误差，默认值为1e-5。
    - **atol** (float，可选) - 绝对容忍误差，默认值为1e-8。
    - **equal_nan** (bool，可选) - 如果设置为True，则两个NaN数值将被视为相等，默认值为False。
@@ -27,43 +22,37 @@ allclose
 返回：计算得到的布尔类型单值Tensor。
-返回类型：变量（Variable）
 **代码示例**:
 .. code-block:: python
    import paddle
-    import paddle.fluid as fluid
    import numpy as np
-    use_cuda = fluid.core.is_compiled_with_cuda()
-    a = fluid.data(name="a", shape=[2], dtype='float32')
+    paddle.disable_static()
-    b = fluid.data(name="b", shape=[2], dtype='float32')
-    result = paddle.allclose(a, b, rtol=1e-05, atol=1e-08,
+    np_x = np.array([10000., 1e-07]).astype("float32")
+    np_y = np.array([10000.1, 1e-08]).astype("float32")
+    x = paddle.to_tensor (np_x)
+    y = paddle.to_tensor (np_y)
+    result1 = paddle.allclose(x, y, rtol=1e-05, atol=1e-08,
+                            equal_nan=False, name="ignore_nan")
+    np_result1 = result1.numpy()
+    # [False]
+    result2 = paddle.allclose(x, y, rtol=1e-05, atol=1e-08,
+                                equal_nan=True, name="equal_nan")
+    np_result2 = result2.numpy()
+    # [False]
+    np_x = np.array([1.0, float('nan')]).astype("float32")
+    np_y = np.array([1.0, float('nan')]).astype("float32")
+    x = paddle.to_tensor (np_x)
+    y = paddle.to_tensor (np_y)
+    result1 = paddle.allclose(x, y, rtol=1e-05, atol=1e-08,
                            equal_nan=False, name="ignore_nan")
-    result_nan = paddle.allclose(a, b, rtol=1e-05, atol=1e-08,
+    np_result1 = result1.numpy()
+    # [False]
+    result2 = paddle.allclose(x, y, rtol=1e-05, atol=1e-08,
                                equal_nan=True, name="equal_nan")
-    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    np_result2 = result2.numpy()
-    exe = fluid.Executor(place)
+    # [True]
-    exe.run(fluid.default_startup_program())
-    x = np.array([10000., 1e-07]).astype("float32")
-    y = np.array([10000.1, 1e-08]).astype("float32")
-    result_v, result_nan_v = exe.run(
-        feed={'a': x, 'b': y},
-        fetch_list=[result, result_nan])
-    print(result_v, result_nan_v)
-    # Output: (array([False]), array([False]))
-    x = np.array([10000., 1e-08]).astype("float32")
-    y = np.array([10000.1, 1e-09]).astype("float32")
-    result_v, result_nan_v = exe.run(
-        feed={'a': x, 'b': y},
-        fetch_list=[result, result_nan])
-    print(result_v, result_nan_v)
-    # Output: (array([ True]), array([ True]))
-    x = np.array([1.0, float('nan')]).astype("float32")
-    y = np.array([1.0, float('nan')]).astype("float32")
-    result_v, result_nan_v = exe.run(
-        feed={'a': x, 'b': y},
-        fetch_list=[result, result_nan])
-    print(result_v, result_nan_v)
-    # Output: (array([False]), array([ True]))
--- a/doc/fluid/api_cn/tensor_cn/div_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/div_cn.rst
--- a/doc/fluid/api_cn/tensor_cn/erf_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/erf_cn.rst
+.. _cn_api_tensor_erf:
 erf
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.erf(x, name=None)
+逐元素计算 Erf 激活函数。更多细节请参考 `Error function <https://en.wikipedia.org/wiki/Error_function>`_ 。
+.. math::
+    out = \frac{2}{\sqrt{\pi}} \int_{0}^{x}e^{- \eta^{2}}d\eta
+参数：
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为： float16, float32, float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+返回：
+    - Tensor，对输入x进行erf激活后的Tensor，形状、数据类型与输入 x 一致。
+**代码示例**：
+.. code-block:: python
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([-0.4, -0.2, 0.1, 0.3])
+    x = paddle.to_tensor(x_data)
+    out = paddle.erf(x)
+    print(out.numpy())
+    # [-0.42839236 -0.22270259  0.11246292  0.32862676]
--- a/doc/fluid/api_cn/tensor_cn/index_select_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/index_select_cn.rst
@@ -7,7 +7,7 @@ index_select
-该OP沿着指定轴 ``axis`` 对输入 ``x`` 进行索引，取 ``index`` 中指定的相应项，创建并返回到一个新的Tensor。这里 ``index`` 是一个 ``1-D`` Tensor。除 ``axis`` 轴外，返回的Tensor其余维度大小和输入 ``x``相等 ， ``axis`` 维度的大小等于 ``index`` 的大小。
+该OP沿着指定轴 ``axis`` 对输入 ``x`` 进行索引，取 ``index`` 中指定的相应项，创建并返回到一个新的Tensor。这里 ``index`` 是一个 ``1-D`` Tensor。除 ``axis`` 轴外，返回的Tensor其余维度大小和输入 ``x`` 相等 ， ``axis`` 维度的大小等于 ``index`` 的大小。
 **参数**：
    - **x** （Tensor）– 输入Tensor。 ``x`` 的数据类型可以是float32，float64，int32，int64。
@@ -30,14 +30,14 @@ index_select
        import paddle
        import numpy as np
-        paddle.enable_imperative()  # Now we are in imperative mode
+        paddle.disable_static()  # Now we are in imperative mode
        data = np.array([[1.0, 2.0, 3.0, 4.0],
                         [5.0, 6.0, 7.0, 8.0],
                         [9.0, 10.0, 11.0, 12.0]])
-        data_index = np.array([-1, 1, 1]).astype('int32')
+        data_index = np.array([0, 1, 1]).astype('int32')
-        x = paddle.imperative.to_variable(data)
+        x = paddle.to_variable(data)
-        index = paddle.imperative.to_variable(data_index)
+        index = paddle.to_variable(data_index)
        out_z1 = paddle.index_select(x=x, index=index)
        #[[1. 2. 3. 4.]
        # [5. 6. 7. 8.]

--- a/doc/fluid/api_cn/tensor_cn/mean_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/mean_cn.rst
@@ -11,9 +11,9 @@ mean
 参数
 ::::::::::
-    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64、int32.int64 。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
    - axis (int|list|tuple, 可选) - 指定对 ``x`` 进行计算的轴。``axis`` 可以是int、list(int)、tuple(int)。如果 ``axis`` 包含多个维度，则沿着 ``axis`` 中的所有轴进行计算。``axis`` 或者其中的元素值应该在范围[-D, D)内，D是 ``x`` 的维度。如果 ``axis`` 或者其中的元素值小于0，则等价于 :math:`axis + D` 。如果 ``axis`` 是None，则对 ``x`` 的全部元素计算平均值。默认值为None。
-    - keepdim (bool, 可选) - 是否在输出Tensor中保留减小的维度。如果 ``keep_dim`` 为True，则输出Tensor和 ``x`` 具有相同的维度(减少的维度除外，减少的维度的大小为1)。否则，输出Tensor的形状会在 ``axsi`` 上进行squeeze操作。默认值为False。
+    - keepdim (bool, 可选) - 是否在输出Tensor中保留减小的维度。如果 ``keepdim`` 为True，则输出Tensor和 ``x`` 具有相同的维度(减少的维度除外，减少的维度的大小为1)。否则，输出Tensor的形状会在 ``axis`` 上进行squeeze操作。默认值为False。
    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 返回
@@ -31,12 +31,12 @@ mean
    paddle.disable_static()
    x = np.array([[[1, 2, 3, 4],
-                    [5, 6, 7, 8],
+                   [5, 6, 7, 8],
-                    [9, 10, 11, 12]],
+                   [9, 10, 11, 12]],
-                    [[13, 14, 15, 16],
+                  [[13, 14, 15, 16],
-                    [17, 18, 19, 20],
+                   [17, 18, 19, 20],
-                    [21, 22, 23, 24]]], 'float32')
+                   [21, 22, 23, 24]]], 'float32')
-    x = paddle.to_variable(x)
+    x = paddle.to_tensor(x)
    out1 = paddle.mean(x)
    # [12.5]
    out2 = paddle.mean(x, axis=-1)

--- a/doc/fluid/api_cn/tensor_cn/round_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/round_cn.rst
@@ -2,6 +2,30 @@
 round
 -------------------------------
-:doc_source: paddle.fluid.layers.round
+.. py:function:: paddle.round(x, name=None)
+该OP将输入中的数值四舍五入到最接近的整数数值。
+参数:
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为： float16, float32, float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+返回：
+    - Tensor，对输入x四舍五入后的Tensor，形状、数据类型与输入x一致。
+**代码示例**：
+.. code-block:: python
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([-0.5, -0.2, 0.6, 1.5])
+    x = paddle.to_tensor(x_data)
+    out = paddle.round(x)
+    print(out.numpy())
+    # [-1. -0.  1.  2.]
--- a/doc/fluid/api_cn/tensor_cn/rsqrt_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/rsqrt_cn.rst
@@ -2,6 +2,39 @@
 rsqrt
 -------------------------------
-:doc_source: paddle.fluid.layers.rsqrt
+.. py:function:: paddle.rsqrt(x, name=None)
+该OP为rsqrt激活函数。
+注：输入x应确保为非 **0** 值，否则程序会抛异常退出。
+其运算公式如下：
+.. math::
+    out = \frac{1}{\sqrt{x}}
+参数:
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+返回：
+    - Tensor，对输入x进行rsqrt激活后的Tensor，形状、数据类型与输入x一致。
+**代码示例**：
+.. code-block:: python
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([0.1, 0.2, 0.3, 0.4])
+    x = paddle.to_tensor(x_data)
+    out = paddle.rsqrt(x)
+    print(out.numpy())
+    # [3.16227766 2.23606798 1.82574186 1.58113883]
--- a/doc/fluid/api_cn/tensor_cn/sin_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/sin_cn.rst
@@ -3,11 +3,7 @@
 sin
 -------------------------------
-.. py:function:: paddle.sin(x, name=None, out=None)
+.. py:function:: paddle.sin(x, name=None)
-:alias_main: paddle.sin
-:alias: paddle.sin,paddle.tensor.sin,paddle.tensor.math.sin
-:update_api: paddle.fluid.layers.sin
@@ -16,29 +12,23 @@ sin
 .. math::
        out = sin(x)
-参数:
+参数：
-    - **x** (Variable) - 支持任意维度的Tensor。数据类型为float32，float64或float16。
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为： float16, float32, float64。
-    - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
-    - **out** (Variable, 可选) – 指定存储运算结果的Tensor。如果设置为None或者不设置，将创建新的Tensor存储运算结果，默认值为None。
+返回：
+    - Tensor，对输入x计算sin值后的Tensor，形状、数据类型同输入x一致。
-返回：返回类型为Variable(Tensor|LoDTensor)， 数据类型同输入一致。
 **代码示例**：
 .. code-block:: python
-        import numpy as np
+    import numpy as np
-        import paddle
+    import paddle
-        import paddle.fluid as fluid
+    paddle.disable_static()
+    x_data = np.array([-0.4, -0.2, 0.1, 0.3])
-        inputs = fluid.layers.data(name="x", shape = [3], dtype='float32')
+    x = paddle.to_tensor(x_data)
-        output = paddle.sin(inputs)
+    out = paddle.sin(x)
+    print(out.numpy())
-        exe = fluid.Executor(fluid.CPUPlace())
+    # [-0.38941834 -0.19866933  0.09983342  0.29552021]
-        exe.run(fluid.default_startup_program())
-        img = np.array([0, 45, 90]).astype(np.float32)
-        res = exe.run(fluid.default_main_program(), feed={'x':img}, fetch_list=[output])
-        print(res)
-        # [array([0.        , 0.8509035 , 0.89399666], dtype=float32)]
--- a/doc/fluid/api_cn/tensor_cn/sqrt_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/sqrt_cn.rst
@@ -3,7 +3,7 @@
 sqrt
 -------------------------------
-.. py:function:: paddle.sqrt(x, name=None, out=None)
+.. py:function:: paddle.sqrt(x, name=None)
 :alias_main: paddle.sqrt
 :alias: paddle.sqrt,paddle.tensor.sqrt,paddle.tensor.math.sqrt
@@ -21,28 +21,20 @@ sqrt
 参数:
-    - **x** (Variable) - 支持任意维度的Tensor。数据类型为float32，float64或float16。
+    - **x** (Tensor) - 支持任意维度的Tensor。数据类型为float32，float64或float16。
    - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
-    - **out** (Variable, 可选) – 指定存储运算结果的Tensor。如果设置为None或者不设置，将创建新的Tensor存储运算结果，默认值为None。
-返回：返回类型为Variable(Tensor|LoDTensor)， 数据类型同输入一致。
+返回：返回类型为Tensor， 数据类型同输入一致。
 **代码示例**：
 .. code-block:: python
-        import numpy as np
+    import numpy as np
-        import paddle
+    import paddle
-        import paddle.fluid as fluid
+    paddle.disable_static()
+    x_data = np.array([0.1, 0.2, 0.3, 0.4])
-        inputs = fluid.layers.data(name="x", shape = [3], dtype='float32')
+    x = paddle.to_variable(x_data)
-        output = paddle.sqrt(inputs)
+    out = paddle.sqrt(x)
+    print(out.numpy())
-        exe = fluid.Executor(fluid.CPUPlace())
+    # [0.31622777 0.4472136  0.54772256 0.63245553]
-        exe.run(fluid.default_startup_program())
-        img = np.array([0, 9, 36]).astype(np.float32)
-        res = exe.run(fluid.default_main_program(), feed={'x':img}, fetch_list=[output])
-        print(res)
-        # [array([0., 3., 6.], dtype=float32)]
--- a/doc/fluid/api_cn/tensor_cn/square_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/square_cn.rst
@@ -2,6 +2,34 @@
 square
 -------------------------------
-:doc_source: paddle.fluid.layers.square
+.. py:function:: paddle.square(x,name=None)
+该OP执行逐元素取平方运算。
+.. math::
+    out = x^2
+参数:
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64, float16, int32, int64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+返回：
+    - Tensor，对输入x取平方后的Tensor，形状、数据类型与输入x一致。
+**代码示例**：
+.. code-block:: python
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([-0.4, -0.2, 0.1, 0.3])
+    x = paddle.to_tensor(x_data)
+    out = paddle.square(x)
+    print(out.numpy())
+    # [0.16 0.04 0.01 0.09]
--- a/doc/fluid/beginners_guide/dygraph/DyGraph.md
+++ b/doc/fluid/beginners_guide/dygraph/DyGraph.md
@@ -55,8 +55,7 @@ import paddle
 from paddle.imperative import to_variable
 data = np.ones([2, 2], np.float32)
-#x = paddle.data(name='x', shape=[2,2], dtype='float32')
+x = paddle.static.data(name='x', shape=[2,2], dtype='float32')
-x = paddle.nn.data(name='x', shape=[2,2], dtype='float32')
 x += 10
 exe = paddle.Executor()
 exe.run(paddle.default_startup_program())
@@ -67,7 +66,7 @@ print("result", out)  #[[11, 11], [11, 11]]
 paddle.enable_imperative()
 x = paddle.imperative.to_variable(data)
 x += 10
 print('result', x.numpy())  #[[11, 11], [11, 11]]
 ```
 * 命令式编程下，所有操作在运行时就已经完成，更接近我们平时的编程方式，可以随时获取每一个操作的执行结果。
@@ -152,7 +151,7 @@ class SimpleImgConvPool(paddle.nn.Layer):
                 param_attr=None,
                 bias_attr=None):
        super(SimpleImgConvPool, self).__init__()
        self._conv2d = Conv2D(
            num_channels=num_channels,
            num_filters=num_filters,
@@ -165,7 +164,7 @@ class SimpleImgConvPool(paddle.nn.Layer):
            bias_attr=None,
            act=act,
            use_cudnn=use_cudnn)
        self._pool2d = Pool2D(
            pool_size=pool_size,
            pool_type=pool_type,
@@ -203,12 +202,12 @@ class MNIST(paddle.nn.Layer):
            1, 20, 5, 2, 2, act="relu")
        self._simple_img_conv_pool_2 = SimpleImgConvPool(
            20, 50, 5, 2, 2, act="relu")
        self.pool_2_shape = 50 * 4 * 4
        SIZE = 10
        self.output_weight = self.create_parameter(
            [self.pool_2_shape, 10])
    def forward(self, inputs, label=None):
        x = self._simple_img_conv_pool_1(inputs)
        x = self._simple_img_conv_pool_2(x)
@@ -275,25 +274,25 @@ adam = AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
 epoch_num = 5
 for epoch in range(epoch_num):
-	for batch_id, data in enumerate(train_reader()):
+    for batch_id, data in enumerate(train_reader()):
-		dy_x_data = np.array([x[0].reshape(1, 28, 28) for x in data]).astype('float32')
+        dy_x_data = np.array([x[0].reshape(1, 28, 28) for x in data]).astype('float32')
-		y_data = np.array([x[1] for x in data]).astype('int64').reshape(-1, 1)
+        y_data = np.array([x[1] for x in data]).astype('int64').reshape(-1, 1)
-		img = to_variable(dy_x_data)
+        img = to_variable(dy_x_data)
-		label = to_variable(y_data)
+        label = to_variable(y_data)
-		cost, acc = mnist(img, label)
+        cost, acc = mnist(img, label)
-		loss = paddle.nn.functional.cross_entropy(cost, label)
+        loss = paddle.nn.functional.cross_entropy(cost, label)
-		avg_loss = paddle.mean(loss)
+        avg_loss = paddle.mean(loss)
-		avg_loss.backward()
+        avg_loss.backward()
-		adam.minimize(avg_loss)
+        adam.minimize(avg_loss)
-		mnist.clear_gradients()
+        mnist.clear_gradients()
-		if batch_id % 100 == 0:
+        if batch_id % 100 == 0:
-			print("Loss at epoch {} step {}: {:}".format(
+            print("Loss at epoch {} step {}: {:}".format(
-				epoch, batch_id, avg_loss.numpy()))
+                epoch, batch_id, avg_loss.numpy()))
 model_dict = mnist.state_dict()
 paddle.imperative.save(model_dict, "save_temp")
 ```
@@ -307,7 +306,7 @@ paddle.imperative.save(model_dict, "save_temp")
 model.eval()      #切换到评估模式
 model.train()     #切换到训练模式
 ```
 模型评估测试的实现如下：
 * 首先定义 MNIST 类的对象 mnist_eval，然后通过 [load_dygraph](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/load_dygraph_cn.html#load-dygraph) 接口加载保存好的模型参数，通过 [Layer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#layer) 的 [set_dict](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#set_dict) 接口将参数导入到模型中，通过 [Layer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#layer) 的 eval 接口切换到预测评估模式。
@@ -316,7 +315,7 @@ model.train()     #切换到训练模式
 ```python
 paddle.enable_imperative()
-mnist_eval = MNIST() 
+mnist_eval = MNIST()
 model_dict, _ = paddle.imperative.load("save_temp")
 mnist_eval.set_dict(model_dict)
 print("checkpoint loaded")
@@ -326,21 +325,21 @@ mnist_eval.eval()
 acc_set = []
 avg_loss_set = []
 for batch_id, data in enumerate(test_reader()):
-	dy_x_data = np.array([x[0].reshape(1, 28, 28)
+    dy_x_data = np.array([x[0].reshape(1, 28, 28)
-						  for x in data]).astype('float32')
+                          for x in data]).astype('float32')
-	y_data = np.array(
+    y_data = np.array(
-		[x[1] for x in data]).astype('int64').reshape(-1, 1)
+        [x[1] for x in data]).astype('int64').reshape(-1, 1)
-	img = to_variable(dy_x_data)
+    img = to_variable(dy_x_data)
-	label = to_variable(y_data)
+    label = to_variable(y_data)
-	prediction, acc = mnist_eval(img, label)
+    prediction, acc = mnist_eval(img, label)
-	loss = paddle.nn.functional.cross_entropy(input=prediction, label=label)
+    loss = paddle.nn.functional.cross_entropy(input=prediction, label=label)
-	avg_loss = paddle.mean(loss)
+    avg_loss = paddle.mean(loss)
-	acc_set.append(float(acc.numpy()))
+    acc_set.append(float(acc.numpy()))
-	avg_loss_set.append(float(avg_loss.numpy()))
+    avg_loss_set.append(float(avg_loss.numpy()))
 acc_val_mean = np.array(acc_set).mean()
 avg_loss_val_mean = np.array(avg_loss_set).mean()
 print("Eval avg_loss is: {}, acc is: {}".format(avg_loss_val_mean, acc_val_mean))
@@ -351,9 +350,9 @@ print("Eval avg_loss is: {}, acc is: {}".format(avg_loss_val_mean, acc_val_mean)
 在命令式编程下，模型和优化器在不同的模块中，所以模型和优化器分别在不同的对象中存储，使得模型参数和优化器信息需分别存储。
 因此模型的保存需要单独调用模型和优化器中的 state_dict() 接口，同样模型的加载也需要单独进行处理。
-保存模型 ： 
+保存模型 ：
 1. 保存模型参数：首先通过 minist.state_dict 函数获取 mnist 网络的所有参数，然后通过 paddle.imperative.save 函数将获得的参数保存至以 save_path 为前缀的文件中。
-1. 保存优化器信息：首先通过 adam.state_dict 函数获取 adam 优化器的信息，然后通过  paddle.imperative.save 函数将获得的参数保存至以 save_path 为前缀的文件中。 
+1. 保存优化器信息：首先通过 adam.state_dict 函数获取 adam 优化器的信息，然后通过  paddle.imperative.save 函数将获得的参数保存至以 save_path 为前缀的文件中。
   * [Layer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#layer) 的 [state_dict](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#state_dict) 接口：该接口可以获取当前层及其子层的所有参数，并将参数存放在 dict 结构中。
   * [Optimizer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/optimizer_cn/AdamOptimizer_cn.html#adamoptimizer) 的 state_dict 接口：该接口可以获取优化器的信息，并将信息存放在 dict 结构中。其中包含优化器使用的所有变量，例如对于 Adam 优化器，包括 beta1、beta2、momentum 等信息。注意如果该优化器的 minimize 函数没有被调用过，则优化器的信息为空。
   * [paddle.imperative.save](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/save_dygraph_cn.html#save-dygraph) 接口：该接口将传入的参数或优化器的 dict 保存到磁盘上。
@@ -363,7 +362,7 @@ print("Eval avg_loss is: {}, acc is: {}".format(avg_loss_val_mean, acc_val_mean)
 # 保存优化器信息
 2. paddle.imperative.save(adam.state_dict(), "save_path")
 ```
-加载模型： 
+加载模型：
 1. 通过 paddle.imperative.load 函数获取模型参数信息 model_state 和优化器信息 opt_state；
 1. 通过 mnist.set_dict 函数用获取的模型参数信息设置 mnist 网络的参数
 1. 通过 adam.set_dict 函数用获取的优化器信息设置 adam 优化器信息。
@@ -406,35 +405,35 @@ adam = AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
 mnist = paddle.imperative.DataParallel(mnist, strategy)
 train_reader = paddle.batch(
-	paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
+    paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
 train_reader = paddle.incubate.reader.distributed_batch_reader(
-	train_reader)
+    train_reader)
 for epoch in range(epoch_num):
-	for batch_id, data in enumerate(train_reader()):
+    for batch_id, data in enumerate(train_reader()):
-		dy_x_data = np.array([x[0].reshape(1, 28, 28)
+        dy_x_data = np.array([x[0].reshape(1, 28, 28)
-							  for x in data]).astype('float32')
+                              for x in data]).astype('float32')
-		y_data = np.array(
+        y_data = np.array(
-			[x[1] for x in data]).astype('int64').reshape(-1, 1)
+            [x[1] for x in data]).astype('int64').reshape(-1, 1)
-		img = to_variable(dy_x_data)
+        img = to_variable(dy_x_data)
-		label = to_variable(y_data)
+        label = to_variable(y_data)
-		label.stop_gradient = True
+        label.stop_gradient = True
-		cost, acc = mnist(img, label)
+        cost, acc = mnist(img, label)
-		loss = paddle.nn.functional.cross_entropy(cost, label)
+        loss = paddle.nn.functional.cross_entropy(cost, label)
-		avg_loss = paddle.mean(loss)
+        avg_loss = paddle.mean(loss)
-		avg_loss = mnist.scale_loss(avg_loss)
+        avg_loss = mnist.scale_loss(avg_loss)
-		avg_loss.backward()
+        avg_loss.backward()
-		mnist.apply_collective_grads()
+        mnist.apply_collective_grads()
-		adam.minimize(avg_loss)
+        adam.minimize(avg_loss)
-		mnist.clear_gradients()
+        mnist.clear_gradients()
-		if batch_id % 100 == 0 and batch_id is not 0:
+        if batch_id % 100 == 0 and batch_id is not 0:
-			print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
+            print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
 if paddle.imperative.ParallelEnv().local_rank == 0:
    paddle.imperative.save(mnist.state_dict(),  "work_0")
@@ -477,7 +476,7 @@ trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171 , node_id: 0 , current_node_ip
 总结一下，多卡训练相比单卡训练，有如下步骤不同：
 1. 通过 ParallelEnv() 的 dev_id 设置程序运行的设备。
 ```
-place = paddle.CUDAPlace(paddle.imperative.ParallelEnv().dev_id) 
+place = paddle.CUDAPlace(paddle.imperative.ParallelEnv().dev_id)
 paddle.enable_imperative(place):
 ```
 2. 准备多卡环境。
@@ -511,7 +510,7 @@ mnist.apply_collective_grads()
 和单卡不同，多卡训练时需逐个进程执行保存操作，多个进程同时保存会使模型文件格式出错。
 ```
 if paddle.imperative.ParallelEnv().local_rank == 0：
-	paddle.imperative.save(mnist.state_dict(), "worker_0")
+    paddle.imperative.save(mnist.state_dict(), "worker_0")
 ```
 7. 评估测试。
@@ -532,18 +531,18 @@ if paddle.imperative.ParallelEnv().local_rank == 0：
 ```python
 from paddle.imperative import TracedLayer
 paddle.enable_imperative()
 # 定义MNIST类的对象
 mnist = MNIST()
 in_np = np.random.random([10, 1, 28, 28]).astype('float32')
 # 将numpy的ndarray类型的数据转换为Variable类型
 input_var = paddle.imperative.to_variable(in_np)
 # 通过 TracerLayer.trace 接口将命令式模型转换为声明式模型
 out_dygraph, static_layer = TracedLayer.trace(mnist, inputs=[input_var])
 save_dirname = './saved_infer_model'
 # 将转换后的模型保存
 static_layer.save_inference_model(save_dirname, feed=[0], fetch=[0])
 ```
@@ -573,9 +572,9 @@ in_np = np.array([-2]).astype('int')
 input_var = paddle.imperative.to_variable(in_np)
 # if判断与输入input_var的shape有关
 if input_var.shape[0] > 1:
-	print("input_var's shape[0] > 1")
+    print("input_var's shape[0] > 1")
 else:
-	print("input_var's shape[1] < 1")
+    print("input_var's shape[1] < 1")
 ```
 * 针对依赖数据的控制流，解决流程如下 1. 添加declarative装饰器； 2. 利用ProgramTranslator进行转换
@@ -584,10 +583,10 @@ else:
 首先需要对给MNist类的forward函数添加一个declarative 装饰器，来标记需要转换的代码块，（注：需要在最外层的class的forward函数中添加）
 ```python
 from paddle.imperative import declarative
 # 定义MNIST网络，必须继承自paddle.nn.Layer
 # 该网络由两个SimpleImgConvPool子网络、reshape层、matmul层、softmax层、accuracy层组成
 class MNIST(paddle.nn.Layer):
    def __init__(self):
        super(MNIST, self).__init__()
@@ -595,13 +594,13 @@ class MNIST(paddle.nn.Layer):
            1, 20, 5, 2, 2, act="relu")
        self._simple_img_conv_pool_2 = SimpleImgConvPool(
            20, 50, 5, 2, 2, act="relu")
        self.pool_2_shape = 50 * 4 * 4
        SIZE = 10
        self.output_weight = self.create_parameter(
            [self.pool_2_shape, 10])
-	@declarative
+    @declarative
    def forward(self, inputs, label=None):
        x = self._simple_img_conv_pool_1(inputs)
        x = self._simple_img_conv_pool_2(x)
@@ -612,8 +611,8 @@ class MNIST(paddle.nn.Layer):
            acc = paddle.metric.accuracy(input=x, label=label)
            return x, acc
        else:
            return x
 ```
@@ -622,19 +621,19 @@ class MNIST(paddle.nn.Layer):
 ```python
 import paddle
 paddle.enable_imperative()
 prog_trans = paddle.imperative.ProgramTranslator()
 mnist = MNIST()
 in_np = np.random.random([10, 1, 28, 28]).astype('float32')
 label_np = np.random.randint(0, 10, size=(10,1)).astype( "int64")
 input_var = paddle.imperative.to_variable(in_np)
 label_var = paddle.imperative.to_variable(label_np)
 out = mnist( input_var, label_var)
 prog_trans.save_inference_model("./mnist_dy2stat", fetch=[0,1])
 ```
@@ -654,13 +653,13 @@ class MNIST(paddle.nn.Layer):
            1, 20, 5, 2, 2, act="relu")
        self._simple_img_conv_pool_2 = SimpleImgConvPool(
            20, 50, 5, 2, 2, act="relu")
        self.pool_2_shape = 50 * 4 * 4
        SIZE = 10
        self.output_weight = self.create_parameter(
            [self.pool_2_shape, 10])
-	@declarative
+    @declarative
    def forward(self, inputs, label=None):
        x = self._simple_img_conv_pool_1(inputs)
        x = self._simple_img_conv_pool_2(x)
@@ -672,7 +671,7 @@ class MNIST(paddle.nn.Layer):
            return x, acc
        else:
            return x
 ```
@@ -685,7 +684,7 @@ class MNIST(paddle.nn.Layer):
 ```
 x = y * 10
-print(x.numpy()) 
+print(x.numpy())
 ```
 来直接打印变量的值

--- a/doc/fluid/beginners_guide/index_cn.rst
+++ b/doc/fluid/beginners_guide/index_cn.rst
@@ -34,8 +34,8 @@
    import numpy
    import paddle
    # 定义输入数据占位符
-    a = paddle.nn.data(name="a", shape=[1], dtype='int64')
+    a = paddle.static.data(name="a", shape=[1], dtype='int64')
-    b = paddle.nn.data(name="b", shape=[1], dtype='int64')
+    b = paddle.static.data(name="b", shape=[1], dtype='int64')
    # 组建网络（此处网络仅由一个操作构成，即elementwise_add）
    result = paddle.elementwise_add(a, b)
    # 准备运行网络

--- a/scripts/api_white_list.txt
+++ b/scripts/api_white_list.txt
@@ -7,3 +7,4 @@ transpiler_cn/release_memory_cn.rst
 transpiler_cn/RoundRobin_cn.rst
 optimizer_cn/Dpsgd_cn.rst
 io_cn/ComposeNotAligned_cn.rst
+dygraph_cn/DataParallel_cn.rst
\ No newline at end of file