diff --git a/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide.md b/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide.md
index a381ebe8ad3cc5cf1fc9a16628fb09926b8749cd..d9c1f4f5bd641fe1ca037ee499997cdedbcd408a 100644
--- a/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide.md
+++ b/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide.md
@@ -9,7 +9,24 @@
 - 通过所有单元测试。
 - 请遵守[提交代码的一些约定](#提交代码的一些约定)。
 
-以下教程将指导您提交代码。
+
+## 使用官方开发镜像（推荐）
+
+```
+# 第一次启动（CPU开发）
+docker run -it --cpu-shares=20000 --name=username --net=host --privileged --rm -v $(pwd):/Paddle hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
+# 第一次启动（GPU开发）
+nvidia-docker run -it --cpu-shares=20000 --name=username --net=host --privileged --rm -v $(pwd):/Paddle hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
+# 后面几次启动
+docker exec -it username bash
+```
+
+不同开发者启动docker的命令不一样，以上只是推荐命令。如果使用自己习惯的命令，一定要加参数--privileged（GPU的CUPTI库调用需要）
+
+**推荐使用官方开发镜像 hub.baidubce.com/paddlepaddle/paddle:latest-dev 提交代码。**
+
+**以下教程将指导您提交代码。**
+
 ## [Fork](https://help.github.com/articles/fork-a-repo/)
 
 跳转到[PaddlePaddle](https://github.com/PaddlePaddle/Paddle) GitHub首页，然后单击 `Fork` 按钮，生成自己目录下的仓库，比如 <https://github.com/USERNAME/Paddle>。
@@ -42,7 +59,7 @@ Paddle 目前使用[Git流分支模型](http://nvie.com/posts/a-successful-git-b
 
 Paddle 开发人员使用 [pre-commit](http://pre-commit.com/) 工具来管理 Git 预提交钩子。 它可以帮助我们格式化源代码（C++，Python），在提交（commit）前自动检查一些基本事宜（如每个文件只有一个 EOL，Git 中不要添加大文件等）。
 
-`pre-commit`测试是 Travis-CI 中单元测试的一部分，不满足钩子的 PR 不能被提交到 Paddle，首先安装并在当前目录运行它：
+`pre-commit`测试是 CI 中单元测试的一部分，不满足钩子的 PR 不能被提交到 Paddle，首先安装并在当前目录运行它：
 
 ```bash
 ➜  pip install pre-commit
@@ -51,7 +68,7 @@ Paddle 开发人员使用 [pre-commit](http://pre-commit.com/) 工具来管理 G
 
 Paddle 使用 `clang-format` 来调整 C/C++ 源代码格式，请确保 `clang-format` 版本在 3.8 以上。
 
-注：通过`pip install pre-commit`和`conda install -c conda-forge pre-commit`安装的`yapf`稍有不同的，Paddle 开发人员使用的是`pip install pre-commit`。
+注：通过`pip install pre-commit`和`conda install -c conda-forge pre-commit`安装的`yapf`稍有不同的，Paddle 开发人员使用的是`pip install pre-commit`，使用Paddle docker镜像会自带`pre-commit`不需要单独安装。
 
 ## 开始开发
 
@@ -66,19 +83,53 @@ Changes not staged for commit:
   (use "git add <file>..." to update what will be committed)
   (use "git checkout -- <file>..." to discard changes in working directory)
 
-	modified:   README.md
+    modified:   README.md
 
 Untracked files:
   (use "git add <file>..." to include in what will be committed)
 
-	test
+    test
 
 no changes added to commit (use "git add" and/or "git commit -a")
 ```
 
-## 编译和单元测试
+## 编译
+
+创建并进入/Paddle/build路径下：
+
+    mkdir -p /Paddle/build && cd /Paddle/build
+
+执行cmake：
+
+
+    * 对于需要编译**CPU版本PaddlePaddle**的用户：
+
+    For Python2: cmake .. -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    For Python3: cmake .. -DPY_VERSION=3.5 -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+
+    * 对于需要编译**GPU版本PaddlePaddle**的用户：
+
+    For Python2: cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    For Python3: cmake .. -DPY_VERSION=3.5 -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+
+执行编译：
+
+    make -j$(nproc)
+
+    如：make -j16，使用16核编译
+
+安装编译好的whl包：首先进入/Paddle/build/python/dist目录下找到生成的.whl包后，然后当前机器或目标机器安装编译好的.whl包：
+
+    For Python2: pip install -U（whl包的名字）
+    For Python3: pip3.5 install -U（whl包的名字）
 
 关于编译 PaddlePaddle 的源码，请参见[从源码编译](../../../install/compile/fromsource.html) 选择对应的操作系统。
+
+## 单元测试
+
+    单测运行（重复运行多次，避免随机失败）如重复运行100次的命令如下:
+    ctest --repeat-until-fail 100 -R test_xx
+
 关于单元测试，可参考[Op单元测试](../new_op/new_op.html#id7) 的运行方法。
 
 ## 提交（commit）
@@ -92,7 +143,7 @@ On branch test
 Untracked files:
   (use "git add <file>..." to include in what will be committed)
 
-	test
+    test
 
 nothing added to commit but untracked files present (use "git add" to track)
 ➜  git add test
@@ -126,8 +177,8 @@ clang-formater.......................................(no files to check)Skipped
 ➜  git remote
 origin
 ➜  git remote -v
-origin	https://github.com/USERNAME/Paddle (fetch)
-origin	https://github.com/USERNAME/Paddle (push)
+origin    https://github.com/USERNAME/Paddle (fetch)
+origin    https://github.com/USERNAME/Paddle (push)
 ```
 
 这里 origin 是我们 clone 的远程仓库的名字，也就是自己用户名下的 Paddle，接下来我们创建一个原始 Paddle 仓库的远程主机，命名为 upstream。
diff --git a/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide_en.md b/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide_en.md
index 52c04f2341a5cbb0da9cd7e4510b80657a7fd0ab..3158b23326094b7a2da4f1f87445d6518ea5f57a 100644
--- a/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide_en.md
+++ b/doc/fluid/advanced_guide/addon_development/contribute_code/local_dev_guide_en.md
@@ -9,7 +9,22 @@ You will learn how to develop programs in local environment under the guidelines
 - Pass through all unit tests.
 - Please follow [regulations of submitting codes](#regulations of submitting codes).
 
-The following guidiance tells you how to submit code.
+## Use official development images(recommended)
+
+```
+# First start（CPU development）
+docker run -it --cpu-shares=20000 --name=username --net=host --privileged --rm -v $(pwd):/Paddle hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
+# First start（GPU development）
+nvidia-docker run -it --cpu-shares=20000 --name=username --net=host --privileged --rm -v $(pwd):/Paddle hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
+# Next start
+docker exec -it username bash
+```
+Different developers have different commands to start docker. The above are only recommended commands. If you use the command you are used to, you must add the parameter --privileged (needed by the GPU CUPTI library call)
+
+**It is recommended to use the official development mirror hub.baidubce.com/paddlepaddle/paddle:latest-dev to submit the code.**
+
+**The following guidiance tells you how to submit code.**
+
 ## [Fork](https://help.github.com/articles/fork-a-repo/)
 
 Transfer to the home page of Github [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) ,and then click button `Fork`  to generate the git under your own file directory,such as <https://github.com/USERNAME/Paddle>。
@@ -44,7 +59,7 @@ It is worth noting that before the checkout, you need to keep the current branch
 
 Paddle developers use the [pre-commit](http://pre-commit.com/) tool to manage Git pre-commit hooks. It helps us format the source code (C++, Python) and automatically check some basic things before committing (such as having only one EOL per file, not adding large files in Git, etc.).
 
-The `pre-commit` test is part of the unit test in Travis-CI. A PR that does not satisfy the hook cannot be submitted to Paddle. Install `pre-commit` first and then run it in current directory：
+The `pre-commit` test is part of the unit test in CI. A PR that does not satisfy the hook cannot be submitted to Paddle. Install `pre-commit` first and then run it in current directory：
 
 
 ```bash
@@ -54,7 +69,7 @@ The `pre-commit` test is part of the unit test in Travis-CI. A PR that does not
 
 Paddle modify the format of C/C++ source code with `clang-format` .Make sure the version of `clang-format` is above 3.8.
 
-Note：There are differences between the installation of `yapf` with `pip install pre-commit` and that with `conda install -c conda-forge pre-commit` . Paddle developers use `pip install pre-commit` 。
+Note：There are differences between the installation of `yapf` with `pip install pre-commit` and that with `conda install -c conda-forge pre-commit` . Paddle developers use `pip install pre-commit`, Using Paddle docker image will `pre-commit`without separate installation .
 
 ## Start development
 
@@ -76,7 +91,45 @@ Untracked files:
 no changes added to commit (use "git add" and/or "git commit -a")
 ```
 
-## Build and test
+## Build
+
+Create and enter the /Paddle/build path
+
+    mkdir -p /Paddle/build && cd /Paddle/build
+
+Execute cmake:
+
+
+    * For users who need to compile the **CPU version PaddlePaddle**:
+
+    For Python2: cmake .. -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    For Python3: cmake .. -DPY_VERSION=3.5 -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+
+
+    * For users who need to compile the **GPU version PaddlePaddle**:
+
+    For Python2: cmake .. -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+    For Python3: cmake .. -DPY_VERSION=3.5 -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
+
+
+Execute compilation:
+
+    make -j$(nproc)
+
+    Such as: make -j16, using 16 core compilation
+
+After compiling successfully, go to the `/paddle/build/python/dist` directory and find the generated `.whl` package.Install the compiled .whl package on the current machine or target machine:
+
+    For Python2: pip install -U（whl package name）
+    For Python3: pip3.5 install -U（whl package name）
+
+Please refer to [Compile From Source Code](../../../install/compile/fromsource_en.html) about more information of building PaddlePaddle source codes.
+
+## Test
+
+    Run Test (Run 100 times)
+    ctest --repeat-until-fail 100 -R test_xx
+
 
 Please refer to [Compile From Source Code](../../../install/compile/fromsource_en.html) about more information of building PaddlePaddle source codes.
 Please refer to [Op Unit Tests](../new_op/new_op_en.html#unit-tests) about more information of running unit tests.
diff --git a/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.rst b/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.rst
index c1bfba460db6c12651ac6a04f823812642490c9f..788341863e1fe669ab10bc634d948fa7c6ef481c 100644
--- a/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.rst
+++ b/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.rst
@@ -7,15 +7,15 @@
 -------------
 
 ..  csv-table:: 
-    :header: "版本说明", "预测库(1.8.3版本)", "预测库(develop版本)"
+    :header: "版本说明", "预测库(1.8.4版本)", "预测库(develop版本)"
     :widths: 3, 2, 2
 
-    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
+    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
     "nv-jetson-cuda10-cudnn7.5-trt5", "`fluid_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/1.7.1-nv-jetson-cuda10-cudnn7.5-trt5/fluid_inference.tar.gz>`_", 
 
 
@@ -46,7 +46,7 @@ WITH_NV_JETSON                OFF            在NV Jetson硬件上编译时需
   git clone https://github.com/paddlepaddle/Paddle
   cd Paddle
   # 建议使用git checkout切换到Paddle稳定的版本，如：
-  git checkout v1.7.2
+  git checkout v1.8.4
 
 **note**: 如果您是多卡机器，建议安装NCCL；如果您是单卡机器则可以在编译时显示指定WITH_NCCL=OFF来跳过这一步。注意如果WITH_NCCL=ON，且没有安装NCCL，则编译会报错。
 
diff --git a/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_en.rst b/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_en.rst
index 545aba61360b0018e3d3a1c28f4e56f4f6005925..9ed8bc9c8da226bb20dd987fc64f7070a5ba89b7 100644
--- a/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_en.rst
+++ b/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_en.rst
@@ -7,15 +7,15 @@ Direct Download and Installation
 ---------------------------------
 
 ..  csv-table:: c++ inference library list
-    :header: "version description", "inference library(1.8.3 version)", "inference library(develop version)"
+    :header: "version description", "inference library(1.8.4 version)", "inference library(develop version)"
     :widths: 3, 2, 2
 
-    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.3-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
+    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
     "nv-jetson-cuda10-cudnn7.5-trt5", "`fluid_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/1.7.1-nv-jetson-cuda10-cudnn7.5-trt5/fluid_inference.tar.gz>`_", 
 
 Build from Source Code
@@ -46,8 +46,8 @@ Firstly we pull the latest code from github.
 
   git clone https://github.com/paddlepaddle/Paddle
   cd Paddle
-  # Use git checkout to switch to stable versions such as v1.7.2
-  git checkout v1.7.2
+  # Use git checkout to switch to stable versions such as v1.8.4
+  git checkout v1.8.4
 
 
 **note**: If your environment is a multi-card machine, it is recommended to install nccl; otherwise, you can skip this step by specifying WITH_NCCL = OFF during compilation. Note that if WITH_NCCL = ON, and NCCL is not installed, the compiler will report an error.
diff --git a/doc/fluid/api/gen_doc.sh b/doc/fluid/api/gen_doc.sh
index f30d5560880385d42b6cd0b60d8b619a90ed771b..5284b277e24cf9ea8eeaf79c0aeb86c8fe5f6904 100644
--- a/doc/fluid/api/gen_doc.sh
+++ b/doc/fluid/api/gen_doc.sh
@@ -30,7 +30,7 @@ python gen_module_index.py framework paddle.framework
 
 
 # nn
-for module in loss
+for module in loss activation
 do
   python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name nn --to_multiple_files True --output_dir nn
   python gen_module_index.py nn.${module} ${module}
diff --git a/doc/fluid/api/nn.rst b/doc/fluid/api/nn.rst
index 3d8ad814db7dfe9da66f6324117ad7c6c83c18fb..cfea0bdde22c80b86fadb444aaeee731f14d0eda 100644
--- a/doc/fluid/api/nn.rst
+++ b/doc/fluid/api/nn.rst
@@ -5,6 +5,7 @@ paddle.nn
 ..  toctree::
     :maxdepth: 1
 
+    nn/activation.rst
     nn/adaptive_pool2d.rst
     nn/adaptive_pool3d.rst
     nn/add_position_encoding.rst
@@ -60,7 +61,7 @@ paddle.nn
     nn/GradientClipByValue.rst
     nn/grid_sampler.rst
     nn/GroupNorm.rst
-    nn/hard_shrink.rst
+    nn/hardshrink.rst
     nn/hard_sigmoid.rst
     nn/hard_swish.rst
     nn/hash.rst
@@ -81,6 +82,7 @@ paddle.nn
     nn/Linear.rst
     nn/linear_lr_warmup.rst
     nn/log_loss.rst
+    nn/log_softmax.rst
     nn/logsigmoid.rst
     nn/loss.rst
     nn/lrn.rst
diff --git a/doc/fluid/api/nn/activation.rst b/doc/fluid/api/nn/activation.rst
new file mode 100644
index 0000000000000000000000000000000000000000..917e1abd4f51f37da88f86daaac323449a82efa9
--- /dev/null
+++ b/doc/fluid/api/nn/activation.rst
@@ -0,0 +1,12 @@
+==========
+activation
+==========
+
+..  toctree::
+    :maxdepth: 1
+
+    activation/ELU.rst
+    activation/GELU.rst
+    activation/Hardshrink.rst
+    activation/ReLU.rst
+    activation/LogSigmoid.rst
diff --git a/doc/fluid/api/nn/activation/Hardshrink.rst b/doc/fluid/api/nn/activation/Hardshrink.rst
new file mode 100644
index 0000000000000000000000000000000000000000..552e6a2a9883ed37f55544ed0f148920bd08f46a
--- /dev/null
+++ b/doc/fluid/api/nn/activation/Hardshrink.rst
@@ -0,0 +1,13 @@
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_nn_activation_Hardshrink:
+
+Hardshrink
+---------
+
+..  autoclass:: paddle.nn.activation.Hardshrink
+    :members:
+    :inherited-members:
+    :noindex:
+
diff --git a/doc/fluid/api/nn/functional.rst b/doc/fluid/api/nn/functional.rst
index 551924348e956066edf7affedb78a60e7adf2df4..598b76a479602a53a4d7073bc31c65ba3eddbe53 100644
--- a/doc/fluid/api/nn/functional.rst
+++ b/doc/fluid/api/nn/functional.rst
@@ -7,3 +7,4 @@ functional
 
     functional/l1_loss.rst
     functional/nll_loss.rst
+    functional/mse_loss.rst
diff --git a/doc/fluid/api/nn/functional/mse_loss.rst b/doc/fluid/api/nn/functional/mse_loss.rst
new file mode 100644
index 0000000000000000000000000000000000000000..b5ec8c58b5a10c206e85184f36e414396fc7d9b9
--- /dev/null
+++ b/doc/fluid/api/nn/functional/mse_loss.rst
@@ -0,0 +1,10 @@
+.. _api_nn_functional_mse_loss:
+
+mse_loss
+------
+
+..  autoclass:: paddle.nn.functional.mse_loss
+    :members:
+    :inherited-members:
+    :noindex:
+
diff --git a/doc/fluid/api/nn/hard_shrink.rst b/doc/fluid/api/nn/hard_shrink.rst
deleted file mode 100644
index a6e2cef0cdb0f23db406efe149ae5afb9cbc571d..0000000000000000000000000000000000000000
--- a/doc/fluid/api/nn/hard_shrink.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-.. _api_nn_hard_shrink:
-
-hard_shrink
--------------------------------
-:doc_source: paddle.fluid.layers.hard_shrink
-
-
diff --git a/doc/fluid/api/nn/hardshrink.rst b/doc/fluid/api/nn/hardshrink.rst
new file mode 100644
index 0000000000000000000000000000000000000000..48b98f2a5366941aa80c5dcd6b64b5a089378860
--- /dev/null
+++ b/doc/fluid/api/nn/hardshrink.rst
@@ -0,0 +1,11 @@
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_nn_hardshrink:
+
+hardshrink
+----------
+
+..  autofunction:: paddle.nn.functional.hardshrink
+    :noindex:
+
diff --git a/doc/fluid/api/nn/log_softmax.rst b/doc/fluid/api/nn/log_softmax.rst
new file mode 100644
index 0000000000000000000000000000000000000000..88e8b52219798fb016f567414ac88157e4e107b6
--- /dev/null
+++ b/doc/fluid/api/nn/log_softmax.rst
@@ -0,0 +1,10 @@
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_nn_log_softmax:
+
+log_softmax
+-----------
+
+..  autofunction:: paddle.nn.functional.log_softmax
+    :noindex:
\ No newline at end of file
diff --git a/doc/fluid/api/tensor.rst b/doc/fluid/api/tensor.rst
index a8eb2516782826e475c067311c765b50fddf4aaa..48ed3d6b232bb167113f9e06ce1a9524daf7e39b 100644
--- a/doc/fluid/api/tensor.rst
+++ b/doc/fluid/api/tensor.rst
@@ -52,6 +52,7 @@ paddle.tensor
     tensor/isfinite.rst
     tensor/less_equal.rst
     tensor/less_than.rst
+    tensor/logic.rst
     tensor/linalg.rst
     tensor/linspace.rst
     tensor/load.rst
diff --git a/doc/fluid/api/tensor/logic.rst b/doc/fluid/api/tensor/logic.rst
new file mode 100644
index 0000000000000000000000000000000000000000..389c83b100894432c202533508bd2fa173c53246
--- /dev/null
+++ b/doc/fluid/api/tensor/logic.rst
@@ -0,0 +1,8 @@
+======
+logic
+======
+
+..  toctree::
+    :maxdepth: 1
+
+    logic/allclose.rst
\ No newline at end of file
diff --git a/doc/fluid/api/tensor/logic/allclose.rst b/doc/fluid/api/tensor/logic/allclose.rst
new file mode 100644
index 0000000000000000000000000000000000000000..72a8c73d61df39271a187aa9fa3e56eb90006844
--- /dev/null
+++ b/doc/fluid/api/tensor/logic/allclose.rst
@@ -0,0 +1,10 @@
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+
+.. _api_tensor_logic_allclose:
+
+allclose
+--------
+
+..  autofunction:: paddle.tensor.logic.allclose
+    :noindex:
\ No newline at end of file
diff --git a/doc/fluid/api_cn/dygraph_cn.rst b/doc/fluid/api_cn/dygraph_cn.rst
index 7cf2de04add71995bdb359d5427f2e65f5190946..40246074545244441bc0ddde67eda93111273229 100644
--- a/doc/fluid/api_cn/dygraph_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn.rst
@@ -17,6 +17,7 @@ fluid.dygraph
     dygraph_cn/Conv3DTranspose_cn.rst
     dygraph_cn/CosineAnnealingDecay_cn.rst
     dygraph_cn/CosineDecay_cn.rst
+    dygraph_cn/DataParallel_cn.rst
     dygraph_cn/declarative_cn.rst
     dygraph_cn/Dropout_cn.rst
     dygraph_cn/Embedding_cn.rst
@@ -58,3 +59,4 @@ fluid.dygraph
     dygraph_cn/Tracer_cn.rst
     dygraph_cn/TranslatedLayer_cn.rst
     dygraph_cn/TreeConv_cn.rst
+    dygraph_cn/enabled_cn.rst
diff --git a/doc/fluid/api_cn/dygraph_cn/DataParallel_cn.rst b/doc/fluid/api_cn/dygraph_cn/DataParallel_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..f9a92c72b2687141e9073b34b85afb86f2cd8ca6
--- /dev/null
+++ b/doc/fluid/api_cn/dygraph_cn/DataParallel_cn.rst
@@ -0,0 +1,155 @@
+.. _cn_api_fluid_dygraph_DataParallel:
+
+DataParallel
+------------
+
+.. py:class:: paddle.fluid.dygraph.DataParallel(layers, strategy)
+
+:api_attr: 命令式编程模式（动态图)
+
+通过数据并行模式执行动态图模型。
+
+目前，``DataParallel`` 仅支持以多进程的方式执行动态图模型。使用方式如下：
+
+``python -m paddle.distributed.launch –selected_gpus=0,1 dynamic_graph_test.py``
+
+其中 ``dynamic_graph_test.py`` 脚本的代码可以是下面的示例代码。
+
+参数：
+    - **Layer** (Layer) - 需要通过数据并行方式执行的模型。
+    - **strategy** (ParallelStrategy) - 数据并行的策略，包括并行执行的环境配置。
+
+返回：支持数据并行的 ``Layer``
+
+返回类型：Layer实例
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle.fluid as fluid
+
+    place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
+    with fluid.dygraph.guard(place):
+
+        # prepare the data parallel context
+        strategy = fluid.dygraph.prepare_context()
+
+        linear = fluid.dygraph.Linear(1, 10, act="softmax")
+        adam = fluid.optimizer.AdamOptimizer(
+            learning_rate=0.001, parameter_list=linear.parameters())
+
+        # make the module become the data parallelism module
+        linear = fluid.dygraph.DataParallel(linear, strategy)
+
+        x_data = np.random.random(size=[10, 1]).astype(np.float32)
+        data = fluid.dygraph.to_variable(x_data)
+
+        hidden = linear(data)
+        avg_loss = fluid.layers.mean(hidden)
+
+        # scale the loss according to the number of trainers.
+        avg_loss = linear.scale_loss(avg_loss)
+
+        avg_loss.backward()
+
+        # collect the gradients of trainers.
+        linear.apply_collective_grads()
+
+        adam.minimize(avg_loss)
+        linear.clear_gradients()
+
+.. py:method:: scale_loss(loss)
+
+缩放模型损失值 ``loss`` 。在数据并行模式中，损失值 ``loss`` 需要根据并行训练进程的数目进行缩放。
+
+如果不在数据并行模式下，会直接返回原 ``loss`` 。
+
+参数：
+    - **loss** (Variable) - 当前模型的损失值。
+
+返回：缩放后的损失值 ``loss``
+
+返回类型：Variable
+
+**代码示例**
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle.fluid as fluid
+
+    place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
+    with fluid.dygraph.guard(place):
+
+        # prepare the data parallel context
+        strategy = fluid.dygraph.prepare_context()
+
+        linear = fluid.dygraph.Linear(1, 10, act="softmax")
+        adam = fluid.optimizer.AdamOptimizer(
+            learning_rate=0.001, parameter_list=linear.parameters())
+
+        # make the module become the data parallelism module
+        linear = fluid.dygraph.DataParallel(linear, strategy)
+
+        x_data = np.random.random(size=[10, 1]).astype(np.float32)
+        data = fluid.dygraph.to_variable(x_data)
+
+        hidden = linear(data)
+        avg_loss = fluid.layers.mean(hidden)
+
+        # scale the loss according to the number of trainers.
+        avg_loss = linear.scale_loss(avg_loss)
+
+        avg_loss.backward()
+
+        # collect the gradients of trainers.
+        linear.apply_collective_grads()
+
+        adam.minimize(avg_loss)
+        linear.clear_gradients()
+
+
+.. py:method:: apply_collective_grads()
+
+AllReduce（规约）参数的梯度值。
+
+返回：无
+
+**代码示例**
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle.fluid as fluid
+
+    place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
+    with fluid.dygraph.guard(place):
+
+        # prepare the data parallel context
+        strategy = fluid.dygraph.prepare_context()
+
+        linear = fluid.dygraph.Linear(1, 10, act="softmax")
+        adam = fluid.optimizer.AdamOptimizer(
+            learning_rate=0.001, parameter_list=linear.parameters())
+
+        # make the module become the data parallelism module
+        linear = fluid.dygraph.DataParallel(linear, strategy)
+
+        x_data = np.random.random(size=[10, 1]).astype(np.float32)
+        data = fluid.dygraph.to_variable(x_data)
+
+        hidden = linear(data)
+        avg_loss = fluid.layers.mean(hidden)
+
+        # scale the loss according to the number of trainers.
+        avg_loss = linear.scale_loss(avg_loss)
+
+        avg_loss.backward()
+
+        # collect the gradients of trainers.
+        linear.apply_collective_grads()
+
+        adam.minimize(avg_loss)
+        linear.clear_gradients()
diff --git a/doc/fluid/api_cn/dygraph_cn/enabled_cn.rst b/doc/fluid/api_cn/dygraph_cn/enabled_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..e5716e76456a99ba5724369d4c2aaba7bfa129f8
--- /dev/null
+++ b/doc/fluid/api_cn/dygraph_cn/enabled_cn.rst
@@ -0,0 +1,25 @@
+.. _cn_api_fluid_dygraph_enabled:
+
+enabled
+-------------------------------
+
+.. py:method:: paddle.fluid.dygraph.enabled()
+
+这个函数用于检查程序是否运行在动态图模式。你可以使用 :ref:`cn_api_fluid_dygraph_guard` api进入动态图模式。或者使用 :ref:`cn_api_fluid_enable_dygraph` 和 :ref:`cn_api_fluid_disable_dygraph` api打开、关闭动态图模式。
+
+注意：   `fluid.dygraph.enabled` 实际上调用了 :ref:`cn_api_fluid_in_dygraph_mode` api，所以推荐使用 :ref:`cn_api_fluid_in_dygraph_mode` api。
+
+返回：   程序是否运行在动态图模式。
+
+返回类型：       bool
+
+**示例代码**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+
+            fluid.enable_dygraph()  # Now we are in dygragh mode
+            print(fluid.dygraph.enabled())  # True
+            fluid.disable_dygraph()
+            print(fluid.dygraph.enabled())  # False
diff --git a/doc/fluid/api_cn/fluid_cn/data_cn.rst b/doc/fluid/api_cn/fluid_cn/data_cn.rst
index f250c9438581e2cddaebe4f72c8adb5c6821cdb9..14a6ab6ea1d94dcdc3586417ef9c85db98783c74 100644
--- a/doc/fluid/api_cn/fluid_cn/data_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/data_cn.rst
@@ -6,10 +6,6 @@ data
 
 .. py:function:: paddle.fluid.data(name, shape, dtype='float32', lod_level=0)
 
-:api_attr: 声明式编程模式（静态图)
-:alias_main: paddle.nn.data
-:alias: paddle.nn.data,paddle.nn.input.data
-:old_api: paddle.fluid.data
 
 
 
diff --git a/doc/fluid/api_cn/layers_cn.rst b/doc/fluid/api_cn/layers_cn.rst
index 513c6212058877c8f80afe079df9dae03d8199e4..8990362cf904ee1e252ed0a04a0dfaedd5707350 100644
--- a/doc/fluid/api_cn/layers_cn.rst
+++ b/doc/fluid/api_cn/layers_cn.rst
@@ -31,6 +31,7 @@ fluid.layers
     layers_cn/auc_cn.rst
     layers_cn/autoincreased_step_counter_cn.rst
     layers_cn/batch_norm_cn.rst
+    layers_cn/BasicDecoder_cn.rst
     layers_cn/beam_search_cn.rst
     layers_cn/beam_search_decode_cn.rst
     layers_cn/bilinear_tensor_product_cn.rst
@@ -87,6 +88,7 @@ fluid.layers
     layers_cn/dynamic_lstmp_cn.rst
     layers_cn/dynamic_decode_cn.rst
     layers_cn/Decoder_cn.rst
+    layers_cn/DecodeHelper_cn.rst
     layers_cn/DynamicRNN_cn.rst
     layers_cn/edit_distance_cn.rst
     layers_cn/elementwise_add_cn.rst
@@ -124,6 +126,7 @@ fluid.layers
     layers_cn/get_tensor_from_selected_rows_cn.rst
     layers_cn/greater_equal_cn.rst
     layers_cn/greater_than_cn.rst
+    layers_cn/GreedyEmbeddingHelper_cn.rst
     layers_cn/grid_sampler_cn.rst
     layers_cn/group_norm_cn.rst
     layers_cn/gru_unit_cn.rst
@@ -242,6 +245,7 @@ fluid.layers
     layers_cn/rsqrt_cn.rst
     layers_cn/RNNCell_cn.rst
     layers_cn/sampled_softmax_with_cross_entropy_cn.rst
+    layers_cn/SampleEmbeddingHelper_cn.rst
     layers_cn/sampling_id_cn.rst
     layers_cn/scale_cn.rst
     layers_cn/scatter_cn.rst
@@ -308,6 +312,7 @@ fluid.layers
     layers_cn/thresholded_relu_cn.rst
     layers_cn/topk_cn.rst
     layers_cn/transpose_cn.rst
+    layers_cn/TrainingHelper_cn.rst
     layers_cn/unfold_cn.rst
     layers_cn/Uniform_cn.rst
     layers_cn/uniform_random_cn.rst
diff --git a/doc/fluid/api_cn/layers_cn/BasicDecoder_cn.rst b/doc/fluid/api_cn/layers_cn/BasicDecoder_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..dd3820d852961be4903b9e7b6f0f10ca1eac35b8
--- /dev/null
+++ b/doc/fluid/api_cn/layers_cn/BasicDecoder_cn.rst
@@ -0,0 +1,80 @@
+.. _cn_api_fluid_layers_BasicDecoder:
+
+BasicDecoder
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.BasicDecoder(cell, helper, output_fn=None)
+
+BasicDecoder是 :ref:`cn_api_fluid_layers_Decoder` 的子类，它组装了 :ref:`cn_api_fluid_layers_RNNCell` 和 :ref:`cn_api_fluid_layers_DecodeHelper` 的实例作为成员，其中DecodeHelper用来实现不同的解码策略。它依次执行以下步骤来完成单步解码：
+
+1. 执行 :code:`cell_outputs, cell_states = cell.call(inputs, states)` 以获取输出和新的状态。
+
+2. 执行 :code:`sample_ids = helper.sample(time, cell_outputs, cell_states)` 以采样id并将其作为当前步的解码结果。
+
+3. 执行 :code:`finished, next_inputs, next_states = helper.next_inputs(time, cell_outputs, cell_states, sample_ids)` 以产生下一解码步的结束标识、输入和状态。
+
+参数：
+  - **cell** (RNNCell) - RNNCell的实例或者具有相同接口定义的对象。
+  - **helper** (DecodeHelper) - DecodeHelper的实例。
+  - **output_fn** (可选) - 处理cell输出的接口，在采样之前使用。默认值None。
+
+**示例代码**
+
+.. code-block:: python
+        
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.SampleEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+
+.. py:method:: initialize(initial_cell_states)
+
+初始化，包括helper的初始化和cell的初始化，cell初始化直接使用 :code:`initial_cell_states` 作为结果。
+
+参数：
+  - **initial_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。这是由调用者 :ref:`cn_api_fluid_layers_dynamic_decode` 提供的参数。
+
+返回：:code:`(initial_inputs, initial_states, finished)` 的三元组。 :code:`initial_inputs, initial_states` 均是单个tensor变量或tensor变量组成的嵌套结构， :code:`finished` 是bool类型的tensor。 :code:`initial_inputs, finished` 与 :code:`helper.initialize()` 返回的内容相同； :code:`initial_states` 与输入参数中的 :code:`initial_cell_states` 的相同。
+
+返回类型：tuple
+    
+.. py:class:: OutputWrapper(cell_outputs, sample_ids)
+
+ :code:`step()` 的返回值中 :code:`outputs` 使用的数据结构，是一个由 :code:`cell_outputs` 和 :code:`sample_ids` 这两个字段构成的命名元组。
+
+.. py:method:: step(time, inputs, states, **kwargs)
+
+按照以下步骤执行单步解码：
+
+1. 执行 :code:`cell_outputs, cell_states = cell.call(inputs, states)` 以获取输出和新的状态。
+
+2. 执行 :code:`sample_ids = helper.sample(time, cell_outputs, cell_states)` 以采样id并将其作为当前步的解码结果。
+
+3. 执行 :code:`finished, next_inputs, next_states = helper.next_inputs(time, cell_outputs, cell_states, sample_ids)` 以产生下一解码步的结束标识、输入和状态。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **inputs** (Variable) - tensor变量。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
+  - **states** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_states` 相同。
+  - **kwargs** - 附加的关键字参数，由调用者 :ref:`cn_api_fluid_layers_dynamic_decode` 提供。
+
+返回： :code:`(outputs, next_states, next_inputs, finished)` 的四元组。 :code:`outputs` 是包含 :code:`cell_outputs` 和 :code:`sample_ids` 两个字段的命名元组，其中 :code:`cell_outputs` 是 :code:`cell.call()` 的结果， :code:`sample_ids` 是 :code:`helper.sample()` 的结果； :code:`next_states, next_inputs` 分别和输入参数中的 :code:`states, inputs` 有相同的的结构、形状和数据类型； :code:`finished` 是一个bool类型的tensor，形状是 :math:`[batch\_size]` 。
+
+返回类型：tuple
diff --git a/doc/fluid/api_cn/layers_cn/BeamSearchDecoder_cn.rst b/doc/fluid/api_cn/layers_cn/BeamSearchDecoder_cn.rst
index 67e7312aaef57ff031e410031aecc73bc50c265f..d62d05ae86bda97df4fe06e328653df5251db4cd 100644
--- a/doc/fluid/api_cn/layers_cn/BeamSearchDecoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/BeamSearchDecoder_cn.rst
@@ -20,7 +20,7 @@ BeamSearchDecoder
   - **start_token** (int) - 起始标记id。
   - **end_token** (int) - 结束标记id。
   - **beam_size** (int) - 在beam search中使用的beam宽度。
-  - **embedding_fn** (可选) - 处理选中的候选id的接口。通常，它是一个将词id转换为词嵌入的嵌入层，函数的返回值作为 :code:`cell.call` 接口的 :code:`input` 参数。如果 :code:`embedding_fn` 未提供，则必须在 :code:`cell.call` 中实现词嵌入转换。默认值None。
+  - **embedding_fn** (可选) - 处理选中的候选id的接口。它通常是一个将词id转换为词嵌入的嵌入层，其返回值将作为 :code:`cell.call` 接口的 :code:`input` 参数。**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size, beam\_size]` ，如果使用后者则还需要在这里提供unsqueeze。如果 :code:`embedding_fn` 未提供，则必须在 :code:`cell.call` 中实现词嵌入转换。默认值None。
   - **output_fn** (可选) - 处理cell输出的接口，在计算得分和选择候选标记id之前使用。默认值None。
 
 **示例代码**
@@ -123,7 +123,7 @@ BeamSearchDecoder
 参数：
   - **initial_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。调用者提供的参数。
 
-返回：一个元组 :code:`(initial_inputs, initial_states, finished)`。:code:`initial_inputs` 是一个tensor，当 :code:`embedding_fn` 为None时，由 :code:`start_token` 填充，形状为 :math:`[batch\_size,beam\_size,1]` ；否则使用 :code:`embedding_fn(t)` 返回的值。:code:`initial_states` 是tensor变量的嵌套结构(命名元组，字段包括 :code:`cell_states，log_probs，finished，lengths`)，其中 :code:`log_probs，finished，lengths` 都含有一个tensor，形状为 :math:`[batch\_size, beam\_size]`，数据类型为float32，bool，int64。:code:`cell_states` 具有与输入参数 :code:`initial_cell_states` 相同结构的值，但形状扩展为 :math:`[batch\_size,beam\_size,...]`。 :code:`finished` 是一个布尔型tensor，由False填充，形状为 :math:`[batch\_size,beam\_size]`。
+返回：一个元组 :code:`(initial_inputs, initial_states, finished)`。:code:`initial_inputs` 是一个tensor，当 :code:`embedding_fn` 为None时，该tensor t的形状为 :math:`[batch\_size,beam\_size]` ，值为 :code:`start_token` ；否则使用 :code:`embedding_fn(t)` 返回的值。:code:`initial_states` 是tensor变量的嵌套结构(命名元组，字段包括 :code:`cell_states，log_probs，finished，lengths`)，其中 :code:`log_probs，finished，lengths` 都含有一个tensor，形状为 :math:`[batch\_size, beam\_size]`，数据类型为float32，bool，int64。:code:`cell_states` 具有与输入参数 :code:`initial_cell_states` 相同结构的值，但形状扩展为 :math:`[batch\_size,beam\_size,...]`。 :code:`finished` 是一个布尔型tensor，由False填充，形状为 :math:`[batch\_size,beam\_size]`。
 
 返回类型：tuple
 
@@ -135,7 +135,7 @@ BeamSearchDecoder
   - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
   - **logits** (Variable) - 形状为 :math:`[batch\_size,beam\_size,vocab\_size]` 的tensor，表示当前时间步的logits。其数据类型为float32。
   - **next_cell_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。它的结构，形状和数据类型与 :code:`initialize()` 的返回值 :code:`initial_states` 中的 :code:`cell_states` 相同。它代表该cell的下一个状态。
-  - **beam_state** (Variable) - tensor变量的结构。在第一个解码步骤与 :code:`initialize()` 返回的 :code:`initial_states` 同，其他步骤与 :code:`initialize()` 返回的 :code:`beam_search_state` 相同。
+  - **beam_state** (Variable) - tensor变量的结构。在第一个解码步骤与 :code:`initialize()` 返回的 :code:`initial_states` 同，其他步骤与 :code:`step()` 返回的 :code:`beam_search_state` 相同。
   
 返回：一个元组 :code:`(beam_search_output, beam_search_state)`。:code:`beam_search_output` 是tensor变量的命名元组，字段为 :code:`scores，predicted_ids parent_ids`。其中 :code:`scores，predicted_ids，parent_ids` 都含有一个tensor，形状为 :math:`[batch\_size,beam\_size]`，数据类型为float32 ，int64，int64。:code:`beam_search_state` 具有与输入参数 :code:`beam_state` 相同的结构，形状和数据类型。
 
@@ -146,9 +146,9 @@ BeamSearchDecoder
 执行beam search解码步骤，该步骤使用 :code:`cell` 来计算概率，然后执行beam search步骤以计算得分并选择候选标记ID。
   
 参数：
-  - **time** (Variable) - 调用者提供的形状为[1]的int64tensor，表示当前解码的时间步长。
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。。
   - **inputs** (Variable) - tensor变量。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
-  - **States** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`beam_search_state` 相同。
+  - **states** (Variable) - tensor变量的结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`beam_search_state` 相同。
   - **kwargs** - 附加的关键字参数，由调用者提供。
   
 返回：一个元组 :code:`(beam_search_output，beam_search_state，next_inputs，finish)` 。:code:`beam_search_state` 和参数 :code:`states` 具有相同的结构，形状和数据类型。 :code:`next_inputs` 与输入参数 :code:`inputs` 具有相同的结构，形状和数据类型。 :code:`beam_search_output` 是tensor变量的命名元组(字段包括 :code:`scores，predicted_ids，parent_ids` )，其中 :code:`scores，predicted_ids，parent_ids` 都含有一个tensor，形状为 :math:`[batch\_size,beam\_size]`，数据类型为float32 ，int64，int64。:code:`finished` 是一个bool类型的tensor，形状为 :math:`[batch\_size,beam\_size]`。
@@ -167,12 +167,3 @@ BeamSearchDecoder
 返回：一个元组 :code:`(predicted_ids, final_states)`。:code:`predicted_ids` 是一个tensor，形状为 :math:`[time\_step，batch\_size,beam\_size]`，数据类型为int64。:code:`final_states` 与输入参数 :code:`final_states` 相同。
 
 返回类型：tuple
-
-.. py:method:: output_dtype()
-   
-用于beam search输出的数据类型的嵌套结构。它是一个命名元组，字段包括 :code:`scores, predicted_ids, parent_ids`。
-
-参数：无。
-
-返回：用于beam search输出的数据类型的命名元组。
-
diff --git a/doc/fluid/api_cn/layers_cn/DecodeHelper_cn.rst b/doc/fluid/api_cn/layers_cn/DecodeHelper_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..14ad49c37982245c138bb04b7377d9b40edc6fa1
--- /dev/null
+++ b/doc/fluid/api_cn/layers_cn/DecodeHelper_cn.rst
@@ -0,0 +1,44 @@
+.. _cn_api_fluid_layers_DecodeHelper:
+
+DecodeHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.DecodeHelper()
+
+DecodeHelper是一个基类，其子类的实例将在 :ref:`cn_api_fluid_layers_BasicDecoder` 中使用。它提供了在动态解码时采样和产生下一解码步的输入的接口。
+
+.. py:method:: initialize()
+
+初始化以产生第一个解码步的输入和每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` 。 :code:`initial_finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
+    
+.. py:method:: sample(time, outputs, states)
+
+根据 :code:`outputs` 以特定的方式进行采样，该方法是 :code:`BasicDecoder.step` 中的一部分。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable        
+
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+
+产生下一解码步的输入、状态，以及每个序列是否结束的标识。该方法是 :code:`BasicDecoder.step` 中的一部分。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构， :code:`next_states` 和输入参数中的 :code:`states` 具有相同的结构、形状和数据类型； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
diff --git a/doc/fluid/api_cn/layers_cn/Decoder_cn.rst b/doc/fluid/api_cn/layers_cn/Decoder_cn.rst
index 9adb65614226bafe68094528d902e88ec17d1d83..ffe67dc97342f0ef561d0350c38806ed8bd15ce5 100644
--- a/doc/fluid/api_cn/layers_cn/Decoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/Decoder_cn.rst
@@ -39,13 +39,28 @@ Decoder提供的主要抽象为：
 
 返回类型：tuple
 
-.. py:method:: step(time, inputs, states)
+.. py:method:: step(time, inputs, states, **kwargs)
 
 在解码的每个时间步中被调用的接口
 
 参数：  
-  - **outputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 结构和数据类型与 :code:`output_dtype` 相同。 tensor堆叠所有时间步长的输出从而具有shape :math:`[time\_step，batch\_size，...]` ，由调用者完成。 
-  - **final_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 它是 :code:`decoder.step` 在最后一个解码步返回的 :code:`next_states`， 因此具有与任何时间步长的状态相同的结构，形状和数据类型。
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。。
+  - **inputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。在第一个解码时间步时与由 :code:`initialize()` 返回的 :code:`initial_inputs` 相同，其他时间步与由 :code:`step()` 返回的 :code:`next_inputs` 相同。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。在第一个解码时间步时与 :code:`initialize()` 返回的 :code:`initial_states` 相同，其他时间步与由 :code:`step()` 返回的 :code:`beam_search_state` 相同。
+  - **kwargs** - 附加的关键字参数，由调用者提供。
+
+返回：一个元组 :code:`(outputs, next_states, next_inputs, finished)` 。:code:`next_states` 和 :code:`next_inputs` 都是单个tensor变量或tensor变量组成的嵌套结构，且结构、形状和数据类型均分别与输入参数中的 :code:`states` 和 :code:`inputs` 相同。 :code:`outputs` 是单个tensor变量或tensor变量组成的嵌套结构。 :code:`finished` 是一个bool类型的tensor变量。
+
+返回类型：tuple
+
+.. py:method:: finalize(self, outputs, final_states, sequence_lengths)
+
+如果提供了实现，将在整个解码迭代结束后被执行一次。
+
+参数：  
+  - **outputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 其中每个tensor的形状均为 :math:`[time\_step，batch\_size，...]` ，是将所有解码步中与其对应的的输出进行堆叠的结果，这个过程由其调用者完成。 
+  - **final_states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。 它是 :code:`decoder.step` 在最后一个解码步返回的 :code:`next_states`， 因此具有与任何时间步的状态相同的结构，形状和数据类型。
+  - **kwargs** - 命名关键字参数，由提供调用者。
 
 返回：一个元组 :code:`(final_outputs, final_states)` 。:code:`final_outputs` 和 :code:`final_states` 都是单个tensor变量或tensor变量组成的嵌套结构。
 
diff --git a/doc/fluid/api_cn/layers_cn/GreedyEmbeddingHelper_cn.rst b/doc/fluid/api_cn/layers_cn/GreedyEmbeddingHelper_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..a01e1ab3e575b54855d774e29057ffc2b7d04a8f
--- /dev/null
+++ b/doc/fluid/api_cn/layers_cn/GreedyEmbeddingHelper_cn.rst
@@ -0,0 +1,74 @@
+.. _cn_api_fluid_layers_GreedyEmbeddingHelper:
+
+GreedyEmbeddingHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.GreedyEmbeddingHelper(embedding_fn, start_tokens, end_token)
+
+GreedyEmbeddingHelper是 :ref:`cn_api_fluid_layers_DecodeHelper` 的子类。作为解码helper，它使用 :code:`argmax` 进行采样，并将采样结果送入embedding层，以此作为下一解码步的输入。
+
+参数：
+  - **embedding_fn** (callable) - 作用于 :code:`argmax` 结果的函数，通常是一个将词id转换为词嵌入的embedding层，**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size]` ，如果使用后者则还需要在这里提供unsqueeze。
+  - **start_tokens** (Variable) - 形状为 :math:`[batch\_size]` 、数据类型为int64、 值为起始标记id的tensor。
+  - **end_token** (int) - 结束标记id。
+
+**示例代码**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.GreedyEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+
+.. py:method:: initialize()
+
+GreedyEmbeddingHelper初始化，其使用构造函数中的 :code:`start_tokens` 作为第一个解码步的输入，并给出每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 同构造函数中的 :code:`start_tokens` ； :code:`initial_finished` 是一个bool类型、值为False的tensor，其形状和 :code:`start_tokens` 相同。
+
+返回类型：tuple
+    
+.. py:method:: sample(time, outputs, states)
+
+使用 :code:`argmax` 根据 `outputs` 进行采样。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable
+
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+
+对 :code:`sample_ids` 使用 :code:`embedding_fn` ，以此作为下一解码步的输入；同时直接使用输入参数中的 :code:`states` 作为下一解码步的状态；并通过判别 :code:`sample_ids` 是否得到 :code:`end_token`，依此产生每个序列是否结束的标识。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` ， :code:`next_states` 和输入参数中的 :code:`states` 相同； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
diff --git a/doc/fluid/api_cn/layers_cn/RNNCell_cn.rst b/doc/fluid/api_cn/layers_cn/RNNCell_cn.rst
index edc8d0df0369029d9c9021621919b7a60b1b7523..1368e2ac33f57a483ced44c49ccf65aa83671f7a 100644
--- a/doc/fluid/api_cn/layers_cn/RNNCell_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/RNNCell_cn.rst
@@ -21,11 +21,11 @@ RNNCell是抽象的基类，代表将输入和状态映射到输出和新状态
   - **states** - 状态，单个tensor变量或tensor变量组成的嵌套结构。
   - **kwargs** - 附加的关键字参数，由调用者提供。
         
-返回：输出和新状态。输出和新状态都可以是嵌套的tensor变量。新状态必须具有与状态相同的结构。
+返回：包含输出和新状态的二元组 :code:`(outputs，new_states)` 。输出和新状态都可以是嵌套的tensor变量。新状态必须具有与状态相同的结构。
 
 返回类型：tuple
 
-.. py:method:: get_initial_states(batch_ref, shape=None, dtype=None, init_value=0)
+.. py:method:: get_initial_states(batch_ref, shape=None, dtype=None, init_value=0, batch_dim_idx=0)
 
 该接口根据提供的形状，数据类型和初始值来初始化状态。
 
@@ -34,6 +34,7 @@ RNNCell是抽象的基类，代表将输入和状态映射到输出和新状态
   - **shape** - 单个形状或形状组成的嵌套结构，单个形状是整数的列表或元组。 如果形状的第一维不是batch大小，则自动插入-1作为batch大小。 如果该项为None，将使用属性 :code:`state_shape`。默认值为None。 
   - **dtype** - 单个数据类型或由数据类型组成的嵌套结构。该结构必须与shape的结构相同，例外是当状态中的所有tensor都具有相同的数据类型，这时可以使用单个数据类型。 如果是None并且属性 :code:`cell.state_shape` 不可用，则float32将用作数据类型。 默认值为None。 
   - **init_value** - 用于初始化状态的浮点值。
+  - **batch_dim_idx** - 用于指示 :code:`batch_ref` 中batch所在维度的int值，默认值为0。
 
 返回：和shape具有相同结构的tensor变量，代表初始状态。
 
@@ -41,9 +42,9 @@ RNNCell是抽象的基类，代表将输入和状态映射到输出和新状态
 
 .. py:method:: state_shape()
 
-该接口用于初始化cell的状态。 单个形状或由形状组成的嵌套结构，单个形状可以是整数的列表或元组(如果形状的第一维不是batch大小，则自动插入-1作为batch大小)。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`shape` 参数的时候，不用实现该方法。
+抽象方法（属性），该接口用于初始化cell的状态。 单个形状或由形状组成的嵌套结构，单个形状可以是整数的列表或元组(如果形状的第一维不是batch大小，则自动插入-1作为batch大小)。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`shape` 参数的时候，不用实现该方法。
 
 
 .. py:method:: state_dtype()
 
-该接口用于初始化cell的状态。 单个数据类型或由数据类型组成的嵌套结构，该结构必须与 :code:`shape` 的结构相同，例外是当状态中的所有tensor都具有相同的数据类型，这时可以使用单个数据类型。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`dtype` 参数的时候，不用实现该方法。
+抽象方法（属性），该接口用于初始化cell的状态。 单个数据类型或由数据类型组成的嵌套结构，该结构必须与 :code:`shape` 的结构相同，例外是当状态中的所有tensor都具有相同的数据类型，这时可以使用单个数据类型。 当没有使用 :code:`get_initial_states` 初始化状态或 :code:`get_initial_states` 没有提供 :code:`dtype` 参数的时候，不用实现该方法。
diff --git a/doc/fluid/api_cn/layers_cn/SampleEmbeddingHelper_cn.rst b/doc/fluid/api_cn/layers_cn/SampleEmbeddingHelper_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..c38b80052fe9040d84d3ed3ba353e6e02cfe5a9c
--- /dev/null
+++ b/doc/fluid/api_cn/layers_cn/SampleEmbeddingHelper_cn.rst
@@ -0,0 +1,54 @@
+.. _cn_api_fluid_layers_SampleEmbeddingHelper:
+
+SampleEmbeddingHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.SampleEmbeddingHelper(embedding_fn, start_tokens, end_token, softmax_temperature=None, seed=None)
+
+SampleEmbeddingHelper是 :ref:`cn_api_fluid_layers_GreedyEmbeddingHelper` 的子类。作为解码helper，它通过采样而非使用 :code:`argmax` 并将采样结果送入embedding层，以此作为下一解码步的输入。
+
+参数：
+  - **embedding_fn** (callable) - 作用于 :code:`argmax` 结果的函数，通常是一个将词id转换为词嵌入的embedding层，**注意** ，这里要使用 :ref:`cn_api_fluid_embedding` 而非 :ref:`cn_api_fluid_layers_embedding`，因为选中的id的形状是 :math:`[batch\_size]` ，如果使用后者则还需要在这里提供unsqueeze。
+  - **start_tokens** (Variable) - 形状为 :math:`[batch\_size]` 、数据类型为int64、 值为起始标记id的tensor。
+  - **end_token** (int) - 结束标记id。
+  - **softmax_temperature** (float，可选) - 该值用于在softmax计算前除以logits。温度越高（大于1.0）随机性越大，温度越低则越趋向于argmax。该值必须大于0，默认值None等同于1.0。
+  - **seed** (int，可选) - 采样使用的随机种子。默认为None，表示不使用固定的随机种子。
+
+**示例代码**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+
+            start_tokens = fluid.data(name="start_tokens",
+                                 shape=[None],
+                                 dtype="int64")
+            
+            trg_embeder = lambda x: fluid.embedding(
+                x, size=[10000, 128], param_attr=fluid.ParamAttr(name="trg_embedding"))
+            output_layer = lambda x: layers.fc(x,
+                                            size=10000,
+                                            num_flatten_dims=len(x.shape) - 1,
+                                            param_attr=fluid.ParamAttr(name=
+                                                                    "output_w"),
+                                            bias_attr=False)
+            helper = layers.SampleEmbeddingHelper(trg_embeder, start_tokens=start_tokens, end_token=1)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper, output_fn=output_layer)
+            outputs = layers.dynamic_decode(
+                decoder=decoder, inits=decoder_cell.get_initial_states(start_tokens))
+    
+.. py:method:: sample(time, outputs, states)
+
+根据一个多项分布进行采样，此分布由 :code:`softmax(outputs/softmax_temperature)` 计算得到。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable
diff --git a/doc/fluid/api_cn/layers_cn/TrainingHelper_cn.rst b/doc/fluid/api_cn/layers_cn/TrainingHelper_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..5d140dbf8ac61370b3a0c7a33a50f56c378e4929
--- /dev/null
+++ b/doc/fluid/api_cn/layers_cn/TrainingHelper_cn.rst
@@ -0,0 +1,70 @@
+.. _cn_api_fluid_layers_TrainingHelper:
+
+TrainingHelper
+-------------------------------
+
+
+.. py:class:: paddle.fluid.layers.TrainingHelper(inputs, sequence_length, time_major=False)
+
+TrainingHelper是 :ref:`cn_api_fluid_layers_DecodeHelper` 的子类。作为解码helper，它在每个解码时间步通过在完整序列输入 :code:`inputs` 的相应位置切片作为各步的输入，并且使用 :code:`argmax` 根据 :code:`cell.call()` 的输出进行采样。
+由于要求有完整的序列输入 :code:`inputs` ，TrainingHelper主要用于以teach-forcing的方式进行最大似然训练，采样得到的内容通常不会使用。
+
+参数：
+  - **inputs** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构。当 :code:`time_major == False` 时，tensor的形状应为 :math:`[batch\_size, sequence\_length, ...]`；当 :code:`time_major == True` 时，tensor的形状应为 :math:`[sequence\_length, batch\_size, ...]`。在解码的每一步都要从中切片取出相应的数据。
+  - **sequence_length** (Variable) - 形状为 :math:`[batch\_size]` 的tensor。它存储了 :code:`inputs` 中每个样本的实际长度，可以据此来标识每个解码步中每个样本是否结束。
+  - **time_major** (bool，可选) - 指示输入tensor和输出tensor中包含的tensor的数据组织。如果为False，则数据组织为batch为主，形状为 :math:`[batch\_size，sequence\_length，...]`。如果为True，则数据组织为time为主，形状为 :math:`[sequence\_length，batch\_size，...]`。默认值：False。
+
+**示例代码**
+
+.. code-block:: python
+
+            import paddle.fluid as fluid
+            import paddle.fluid.layers as layers
+            trg_emb = fluid.data(name="trg_emb",
+                                 shape=[None, None, 128],
+                                 dtype="float32")
+            trg_seq_length = fluid.data(name="trg_seq_length",
+                                        shape=[None],
+                                        dtype="int64")
+            helper = layers.TrainingHelper(trg_emb, trg_seq_length)
+            decoder_cell = layers.GRUCell(hidden_size=128)
+            decoder = layers.BasicDecoder(decoder_cell, helper)
+            outputs = layers.dynamic_decode(
+                decoder,
+                inits=decoder_cell.get_initial_states(trg_emb),
+                is_test=False)
+
+.. py:method:: initialize()
+
+TrainingHelper初始化，其通过在完整序列输入 :code:`inputs` 中首个时间步的位置上切片，以此作为第一个解码步的输入，并给出每个序列是否结束的初始标识。这是 :ref:`cn_api_fluid_layers_BasicDecoder` 初始化的一部分。
+
+返回：:code:`(initial_inputs, initial_finished)` 的二元组， :code:`initial_inputs` 是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` 。 :code:`initial_finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
+    
+.. py:method:: sample(time, outputs, states)
+
+使用 :code:`argmax` 根据 `outputs` 进行采样。由于使用完整序列中的切片作为下一解码步的输入，采样得到的内容通常不会使用。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+
+返回：数据类型为int64形状为 :math:`[batch\_size]` 的tensor，表示采样得到的id。
+
+返回类型：Variable        
+
+.. py:method:: next_inputs(time, outputs, states, sample_ids)
+
+从完整序列输入中当前时间步的位置上切片，以此作为产生下一解码步的输入；同时直接使用输入参数中的 :code:`states` 作为下一解码步的状态；并比较当前时间与每个序列的大小，依此产生每个序列是否结束的标识。
+
+参数：
+  - **time** (Variable) - 调用者提供的形状为[1]的tensor，表示当前解码的时间步长。其数据类型为int64。
+  - **outputs** (Variable) - tensor变量，通常其数据类型为float32或float64，形状为 :math:`[batch\_size, vocabulary\_size]` ，表示当前解码步预测产生的logit（未归一化的概率），和由 :code:`BasicDecoder.output_fn(BasicDecoder.cell.call())` 返回的 :code:`outputs` 是同一内容。
+  - **states** (Variable) - 单个tensor变量或tensor变量组成的嵌套结构，和由 :code:`BasicDecoder.cell.call()` 返回的 :code:`new_states` 是同一内容。
+  - **sample_ids** (Variable) - 数据类型为int64形状为 :math:`[batch\_size]` 的tensor，和由 :code:`sample()` 返回的 :code:`sample_ids` 是同一内容。
+
+返回： :code:`(finished, next_inputs, next_states)` 的三元组。 :code:`next_inputs, next_states` 均是单个tensor变量或tensor变量组成的嵌套结构，tensor的形状是 :math:`[batch\_size, ...]` ， :code:`next_states` 和输入参数中的 :code:`states` 相同； :code:`finished` 是一个bool类型且形状为 :math:`[batch\_size]` 的tensor。
+
+返回类型：tuple
diff --git a/doc/fluid/api_cn/layers_cn/abs_cn.rst b/doc/fluid/api_cn/layers_cn/abs_cn.rst
index cf726de9f97c0bc5c621654cf07ff5787f8c9260..3c0cdf4f06dd720c7c1281ede892b01e2089521c 100644
--- a/doc/fluid/api_cn/layers_cn/abs_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/abs_cn.rst
@@ -11,23 +11,29 @@ abs
 
 
 
-绝对值激活函数。
+绝对值函数。
 
 .. math::
     out = |x|
 
 参数:
-    - **x** (Variable)- 多维Tensor，数据类型为float32或float64。
-    - **name** (str) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 
-返回：表示绝对值结果的Tensor，数据类型与x相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
 
-返回类型：Variable
+返回类型：Tensor
 
 **代码示例**：
 
 .. code-block:: python
 
-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[32, 784])
-        result = fluid.layers.abs(data)
+        import paddle
+        import numpy as np
+
+        paddle.disable_static()
+        x_data = np.array([-1, -2, -3, -4]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.abs(x)
+        print(res.numpy())
+        # [1, 2, 3, 4]
diff --git a/doc/fluid/api_cn/layers_cn/acos_cn.rst b/doc/fluid/api_cn/layers_cn/acos_cn.rst
index 9185569aa0e9f5329c63bc734e3a96996042584e..dad19ff258cbf0b89b6d45fd86eb7cc69c730636 100644
--- a/doc/fluid/api_cn/layers_cn/acos_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/acos_cn.rst
@@ -11,29 +11,30 @@ acos
 
 
 
-arccosine激活函数。
+arccosine函数。
 
 .. math::
     out = cos^{-1}(x)
 
 参数:
-    - **x(Variable)** - acos的输入Tensor，数据类型为 float32 或 float64
-    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
-返回：  `acos` 的输出Tensor，数据类型与 `x` 相同。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 
-返回类型： Variable
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
 
+返回类型： Tensor
 
 
 **代码示例**：
 
 .. code-block:: python
 
-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[4])
-        # if data is [-0.8183,  0.4912, -0.6444,  0.0371]
-        result = fluid.layers.acos(data)
-        # result is [2.5293, 1.0573, 2.2711, 1.5336]
-
-
+        import paddle
+        import numpy as np
 
+        paddle.disable_static()
+        x_data = np.array([-0.8183,  0.4912, -0.6444,  0.0371]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.acos(x)
+        print(res.numpy())
+        # [2.5293, 1.0573, 2.2711, 1.5336]
diff --git a/doc/fluid/api_cn/layers_cn/asin_cn.rst b/doc/fluid/api_cn/layers_cn/asin_cn.rst
index 03109d28ec3125c9f1cc5a3e8bd97e63484bde07..3635c8a3f1212b1cc83c4728eef7cca6188d3ab9 100644
--- a/doc/fluid/api_cn/layers_cn/asin_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/asin_cn.rst
@@ -11,29 +11,29 @@ asin
 
 
 
-arcsine激活函数。
+arcsine函数。
 
 .. math::
     out = sin^{-1}(x)
 
-
 参数:
-    - **x(Variable)** - asin的输入Tensor，数据类型为 float32 或 float64
-    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 
-返回：  `asin` 的输出Tensor，数据类型与 `x` 相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
 
-返回类型： Variable
+返回类型： Tensor
 
 **代码示例**：
 
 .. code-block:: python
 
-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[4])
-        # if data is [-0.8183,  0.4912, -0.6444,  0.0371]
-        result = fluid.layers.asin(data)
-        # result is [-0.9585,  0.5135, -0.7003,  0.0372]
-
-
+        import paddle
+        import numpy as np
 
+        paddle.disable_static()
+        x_data = np.array([-0.8183,  0.4912, -0.6444,  0.0371]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.asin(x)
+        print(res.numpy())
+        # [-0.9585,  0.5135, -0.7003,  0.0372]
diff --git a/doc/fluid/api_cn/layers_cn/atan_cn.rst b/doc/fluid/api_cn/layers_cn/atan_cn.rst
index 1c36f104731560ef4918730b21682497cbd415e2..5cd60cd447b2f1322d29842b2a1c3743126849f5 100644
--- a/doc/fluid/api_cn/layers_cn/atan_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/atan_cn.rst
@@ -11,30 +11,29 @@ atan
 
 
 
-arctanh激活函数。
+arctangent函数。
 
 .. math::
-    out = tanh^{-1}(x)
+    out = tan^{-1}(x)
 
 参数:
-    - **x(Variable)** - atan的输入Tensor，数据类型为 float32 或 float64
-    - **name** (str|None) – 具体用法请参见 :ref:`cn_api_guide_Name` ，一般无需设置，默认值为None。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 
-返回：  `atan` 的输出Tensor，数据类型与 `x` 相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
 
-返回类型： Variable
+返回类型： Tensor
 
 **代码示例**：
 
 .. code-block:: python
 
-        import paddle.fluid as fluid
-        data = fluid.layers.data(name="input", shape=[4])
-        # if data is [-0.8183,  0.4912, -0.6444,  0.0371]
-        result = fluid.layers.atan(data)
-        # result is [-0.6858,  0.4566, -0.5724,  0.0371]
-
-
-
-
+        import paddle
+        import numpy as np
 
+        paddle.disable_static()
+        x_data = np.array([-0.8183,  0.4912, -0.6444,  0.0371]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.atan(x)
+        print(res.numpy())
+        # [-0.6858,  0.4566, -0.5724,  0.0371]
diff --git a/doc/fluid/api_cn/layers_cn/ceil_cn.rst b/doc/fluid/api_cn/layers_cn/ceil_cn.rst
index 27ca3dd547fb43ecf26ce0d499ce39049e2ef1bb..81a8265afe10cfcfa529ee65eb30f15d195cc28d 100644
--- a/doc/fluid/api_cn/layers_cn/ceil_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/ceil_cn.rst
@@ -19,24 +19,24 @@ ceil
 
 
 参数:
-    - **x** (Variable) - 该OP的输入为多维Tensor。数据类型为float32或float64。
-    - **name** (str, 可选) - 具体用法请参见 :ref:`api_guide_Name`，一般无需设置，默认值为None。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 
-返回： 输出为Tensor，与 ``x`` 维度相同、数据类型相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
 
-返回类型： Variable
+返回类型： Tensor
 
 **代码示例**：
 
 .. code-block:: python
 
-  import paddle.fluid as fluid
-  import numpy as np
+        import paddle
+        import numpy as np
 
-  input_ceil = np.array([[-1.5,6],[1,15.6]])
-  with fluid.dygraph.guard():
-      x = fluid.dygraph.to_variable(input_ceil)
-      y = fluid.layers.ceil(x)
-      print(y.numpy())
-      # [[-1.  6.]
-      # [ 1. 16.]]
+        paddle.disable_static()
+        x_data = np.array([[-1.5,6],[1,15.6]]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.ceil(x)
+        print(res.numpy())
+        # [[-1.  6.]
+        # [ 1. 16.]]
diff --git a/doc/fluid/api_cn/layers_cn/cos_cn.rst b/doc/fluid/api_cn/layers_cn/cos_cn.rst
index 4f31c473c95be1f3b4a46915c505fe29250d11e8..99e6b061f23e642e3ceb8227e77b2f25eeb57d71 100644
--- a/doc/fluid/api_cn/layers_cn/cos_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/cos_cn.rst
@@ -13,32 +13,31 @@ cos
 
 余弦函数。
 
+输入范围是 `(-inf, inf)` ， 输出范围是 `[-1,1]`。若输入超出边界则结果为`nan`。
+
 .. math::
 
     out = cos(x)
 
-
-
 参数:
-    - **x** (Variable) - 该OP的输入为多维Tensor，数据类型为float32，float64。
-    - **name** (str, 可选) - 具体用法请参见 :ref:`api_guide_Name`，一般无需设置，默认值为None。
-
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 
-返回：输出为Tensor，与 ``x`` 维度相同、数据类型相同。
+返回：输出Tensor，与 ``x`` 维度相同、数据类型相同。
 
-返回类型：Variable
+返回类型：Tensor
 
 **代码示例**：
 
 .. code-block:: python
 
-  import paddle.fluid as fluid
-  import numpy as np
+        import paddle
+        import numpy as np
 
-  input_cos = np.array([[-1,np.pi],[1,15.6]])
-  with fluid.dygraph.guard():
-      x = fluid.dygraph.to_variable(input_cos)
-      y = fluid.layers.cos(x)
-      print(y.numpy())
-      # [[ 0.54030231 -1.        ]
-      # [ 0.54030231 -0.99417763]]
+        paddle.disable_static()
+        x_data = np.array([[-1,np.pi],[1,15.6]]).astype(np.float32)
+        x = paddle.to_variable(x_data)
+        res = paddle.cos(x)
+        print(res.numpy())
+        # [[ 0.54030231 -1.        ]
+        # [ 0.54030231 -0.99417763]]
diff --git a/doc/fluid/api_cn/layers_cn/dynamic_decode_cn.rst b/doc/fluid/api_cn/layers_cn/dynamic_decode_cn.rst
index 6c439a90a54b5df6be64e743aef1d311d3908f15..ea289057a13d6d8d572c15350a6d87d3d072f03d 100644
--- a/doc/fluid/api_cn/layers_cn/dynamic_decode_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/dynamic_decode_cn.rst
@@ -5,12 +5,12 @@ dynamic_decode
 
 
 
-.. py:method:: dynamic_decode(decoder, inits=None, max_step_num=None, output_time_major=False, **kwargs):
+.. py:method:: dynamic_decode(decoder, inits=None, max_step_num=None, output_time_major=False, impute_finished=False, is_test=False, return_length=False, **kwargs):
 
 :api_attr: 声明式编程模式（静态图)
 
 
-    
+
 该接口重复执行 :code:`decoder.step()` 直到 其返回的表示完成状态的Tensor中的值全部为True或解码步骤达到 :code:`max_step_num`。
 
 :code:`decode.initialize()` 会在解码循环之前被调用一次。如果 :code:`decoder` 实现了 :code:`finalize` 方法，则 :code:`decoder.finalize()` 在解码循环后将被调用一次。
@@ -20,9 +20,12 @@ dynamic_decode
   - **inits** (object，可选) - 传递给 :code:`decoder.initialize` 的参数。默认为None。
   - **max_step_num** (int，可选) - 最大步数。如果未提供，解码直到解码过程完成（ :code:`decode.step()` 返回的表示完成状态的Tensor中的值全部为True）。默认为None。
   - **output_time_major** (bool，可选) - 指明最终输出(此方法的第一个返回值)中包含的Tensor的数据布局。如果为False，其将使用batch优先的数据布局, 此时的形状为 :math:`[batch\_size，seq\_len，...]`。如果为True，其将使用time优先的数据布局，此时的形状为 :math:`[seq\_len，batch\_size，...]`。默认值为False。
+  - **impute_finished** (bool，可选) - 若为True，对于当前批次中完成状态为结束的样本，将会拷贝其上一步的状态，而非像未结束的实例那样使用 :code:`decode.step()` 返回的 :code:`next_states` 作为新的状态，这保证了返回的最终状态 :code:`final_states` 是正确的；否则，不会区分是否结束，也没有这个拷贝操作。若 :code:`final_states` 会被使用，则这里应该设置为True，这会一定程度上影响速度。默认为False。
+  - **is_test** (bool，可选) - 标识是否是预测模式，预测模式下内存占用会更少。默认为False。
+  - **return_length** (bool，可选) - 标识是否在返回的元组中额外包含一个存放了所有解码序列实际长度的Tensor。默认为False。
   - **kwargs** - 其他命名关键字参数。这些参数将传递给 :code:`decoder.step`。
 
-返回:一个二元组 :code:`(final_outputs，final_states)`, 其包含了最终的输出和状态，这两者都是Tensor或Tensor的嵌套结构。:code:`final_outputs` 具有与 :code:`decoder.output_dtype` 相同的结构和数据类型， 其中的每个tensor都是对所有解码时间步对应输出的堆叠。 这些tensor也可能会通过 :code:`decoder.finalize` 进行修改。:code:`final_states` 是最后时间步的状态，和 :code:`decoder.initialize` 返回的初始状态具有相同的结构，其中的tensor也具有相同的形状 和数据类型。
+返回：若 :code:`return_length` 为True，则返回三元组 :code:`(final_outputs, final_states, sequence_lengths)` ，否则返回二元组 :code:`(final_outputs, final_states)` 。 :code:`final_outputs, final_states` 包含了最终的输出和状态，这两者都是Tensor或Tensor的嵌套结构。:code:`final_outputs` 具有与 :code:`decoder.step()` 返回的 :code:`outputs` 相同的结构和数据类型， 且其中的每个tensor都是将所有解码步中与其对应的的输出进行堆叠的结果；如果 :code:`decoder` 实现了 :code:`finalize` 方法，这些tensor也可能会通过 :code:`decoder.finalize()` 进行修改。:code:`final_states` 是最后时间步的状态，和 :code:`decoder.initialize()` 返回的初始状态具有相同的结构，形状和数据类型。:code:`sequence_lengths` 是int64类型的tensor，和 :code:`decoder.initialize()` 返回的 :code:`finished` 具有相同的形状，其保存了所有解码序列实际长度。
 
 返回类型：tuple
 
diff --git a/doc/fluid/api_cn/nn_cn.rst b/doc/fluid/api_cn/nn_cn.rst
index b42ec565b8cdb613db774a5630cfa6e7575d850e..f7e47d72f084cff5c7bf4879d6ab59019455935d 100644
--- a/doc/fluid/api_cn/nn_cn.rst
+++ b/doc/fluid/api_cn/nn_cn.rst
@@ -73,7 +73,7 @@ paddle.nn
     nn_cn/GradientClipByValue_cn.rst
     nn_cn/grid_sampler_cn.rst
     nn_cn/GroupNorm_cn.rst
-    nn_cn/hard_shrink_cn.rst
+    nn_cn/hardshrink_cn.rst
     nn_cn/hard_sigmoid_cn.rst
     nn_cn/hard_swish_cn.rst
     nn_cn/hash_cn.rst
@@ -94,6 +94,7 @@ paddle.nn
     nn_cn/linear_lr_warmup_cn.rst
     nn_cn/logsigmoid_cn.rst
     nn_cn/log_loss_cn.rst
+    nn_cn/log_softmax_cn.rst
     nn_cn/lrn_cn.rst
     nn_cn/margin_ranking_loss_cn.rst
     nn_cn/maxout_cn.rst
diff --git a/doc/fluid/api_cn/nn_cn/LogSoftmax_cn.rst b/doc/fluid/api_cn/nn_cn/LogSoftmax_cn.rst
deleted file mode 100644
index 72ed06ecc1caa1b4e7296274e50df6dc623da1e3..0000000000000000000000000000000000000000
--- a/doc/fluid/api_cn/nn_cn/LogSoftmax_cn.rst
+++ /dev/null
@@ -1,47 +0,0 @@
-.. _cn_api_nn_LogSoftmax:
-
-LogSoftmax
--------------------------------
-.. py:class:: paddle.nn.LogSoftmax(axis=None)
-
-:alias_main: paddle.nn.LogSoftmax
-:alias: paddle.nn.LogSoftmax,paddle.nn.layer.LogSoftmax,paddle.nn.layer.activation.LogSoftmax
-
-
-
-
-**LogSoftmax激活层：**
-
-.. math::
-
-        \\output = \frac{1}{1 + e^{-input}}\\
-
-参数:
-    - **axis** (int, 可选) - 指示进行LogSoftmax计算的维度索引，其范围应为 :math:`[-1，rank-1]` ，其中rank是输入变量的秩。默认值：None（与-1效果相同，表示对最后一维做LogSoftmax操作）。
-
-返回：无
-
-**代码示例**
-
-..  code-block:: python
-
-    import paddle.fluid as fluid
-    import paddle.nn as nn
-    import numpy as np
-
-    data = np.array([[[-2.0, 3.0, -4.0, 5.0],
-                      [3.0, -4.0, 5.0, -6.0],
-                      [-7.0, -8.0, 8.0, 9.0]],
-                     [[1.0, -2.0, -3.0, 4.0],
-                      [-5.0, 6.0, 7.0, -8.0],
-                      [6.0, 7.0, 8.0, 9.0]]]).astype('float32')
-    my_log_softnmax = nn.LogSoftmax()
-    with fluid.dygraph.guard():
-        data = fluid.dygraph.to_variable(data)
-        res = my_log_softnmax(data)
-        # [[[ -7.1278396   -2.1278396   -9.127839    -0.12783948]
-        #   [ -2.1270514   -9.127051    -0.12705144 -11.127051  ]
-        #   [-16.313261   -17.313261    -1.3132617   -0.31326184]]
-        #  [[ -3.0518122   -6.051812    -7.051812    -0.051812  ]
-        #   [-12.313267    -1.3132664   -0.3132665  -15.313267  ]
-        #   [ -3.4401896   -2.4401896   -1.4401896   -0.44018966]]]
diff --git a/doc/fluid/api_cn/nn_cn/ReLU_cn.rst b/doc/fluid/api_cn/nn_cn/ReLU_cn.rst
deleted file mode 100644
index 2e8640fa3796d6e7cfcfcb13600b38d08209cfcb..0000000000000000000000000000000000000000
--- a/doc/fluid/api_cn/nn_cn/ReLU_cn.rst
+++ /dev/null
@@ -1,39 +0,0 @@
-.. _cn_api_nn_ReLU:
-
-ReLU
--------------------------------
-.. py:class:: paddle.nn.ReLU(inplace=False)
-
-:alias_main: paddle.nn.ReLU
-:alias: paddle.nn.ReLU,paddle.nn.layer.ReLU,paddle.nn.layer.activation.ReLU
-:update_api: paddle.fluid.layers.relu
-
-
-
-
-**ReLU（Rectified Linear Unit）激活层：**
-
-.. math::
-
-        \\Out = max(X, 0)\\
-
-其中，:math:`X` 为输入的 Tensor
-
-参数:
-    - **inplace** （bool，可选）- 如果 ``inplace`` 为 ``True``，则 ``ReLU`` 的输入和输出是同一个变量，否则 ``ReLU`` 的输入和输出是不同的变量。默认值：``False``。请注意，如果 ``ReLU`` 的输入同时是其它OP的输入，则 ``inplace`` 必须为False。
-
-返回：无
-
-**代码示例**
-
-..  code-block:: python
-
-    import paddle.fluid as fluid
-    import paddle.nn as nn
-    import numpy as np
-
-    data = np.array([-2, 0, 1]).astype('float32')
-    my_relu = nn.ReLU()
-    with fluid.dygraph.guard():
-        data = fluid.dygraph.to_variable(data)
-        res = my_relu(data)  # [0, 0, 1]
diff --git a/doc/fluid/api_cn/nn_cn/activation_cn.rst b/doc/fluid/api_cn/nn_cn/activation_cn.rst
index 79d1258944cf3cc467ec059b87a5ffeaea6ba678..4ba1ca09390a8c4d03da573eca7050eed9c19a21 100644
--- a/doc/fluid/api_cn/nn_cn/activation_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/activation_cn.rst
@@ -8,5 +8,11 @@ activation
 ..  toctree::
     :maxdepth: 1
 
+    activation_cn/ELU_cn.rst
+    activation_cn/GELU_cn.rst
+    activation_cn/Hardshrink_cn.rst
+    activation_cn/ReLU_cn.rst
     activation_cn/LeakyReLU_cn.rst
+    activation_cn/LogSoftmax_cn.rst
     activation_cn/Sigmoid_cn.rst
+    activation_cn/LogSigmoid_cn.rst
diff --git a/doc/fluid/api_cn/nn_cn/activation_cn/ELU_cn.rst b/doc/fluid/api_cn/nn_cn/activation_cn/ELU_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..8a2bb7bd2f239c4c6db57b7b9991c91d7ef12100
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/ELU_cn.rst
@@ -0,0 +1,41 @@
+.. _cn_api_nn_ELU:
+
+ELU
+-------------------------------
+.. py:class:: paddle.nn.ELU(alpha=1.0, name=None)
+
+ELU激活层（ELU Activation Operator）
+
+根据 `Exponential Linear Units <https://arxiv.org/abs/1511.07289>` 对输入Tensor中每个元素应用以下计算。
+
+.. math::
+
+    ELU(x) = max(0, x) + min(0, \alpha * (e^{x} − 1))
+
+其中，:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+    - alpha (float, 可选) - ELU的alpha值，默认值为1.0。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+形状:
+::::::::::
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+  
+    paddle.disable_static()
+
+    x = paddle.to_tensor(np.array([[-1, 6],[1, 15.6]]))
+    m = paddle.nn.ELU(0.2)
+    out = m(x)
+    # [[-0.12642411  6.        ]
+    #  [ 1.          15.6      ]]
diff --git a/doc/fluid/api_cn/nn_cn/activation_cn/GELU_cn.rst b/doc/fluid/api_cn/nn_cn/activation_cn/GELU_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..bd04b40302626eae592a5126e990f53ca0fb1ecf
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/GELU_cn.rst
@@ -0,0 +1,50 @@
+.. _cn_api_nn_GELU:
+
+GELU
+-------------------------------
+.. py:class:: paddle.nn.GELU(approximate=False, name=None)
+
+GELU激活层（GELU Activation Operator）
+
+更多细节请参考 `Gaussian Error Linear Units <https://arxiv.org/abs/1606.08415>`。
+
+如果使用近似计算：
+
+.. math::
+    GELU(x) = 0.5 * x * (1 + tanh(\sqrt{\frac{2}{\pi}} * (x + 0.044715x^{3})))
+
+如果不使用近似计算：
+
+.. math::
+    GELU(x) = 0.5 * x * (1 + erf(\frac{x}{\sqrt{2}}))
+
+
+其中，:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+    - approximate (bool, 可选) - 是否使用近似计算，默认值为 False，即不使用近似计算。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+形状:
+::::::::::
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = paddle.to_tensor(np.array([[-1, 0.5],[1, 1.5]]))
+    
+    m = paddle.nn.GELU()
+    out = m(x) # [-0.158655 0.345731 0.841345 1.39979]
+
+    m = paddle.nn.GELU(True)
+    out = m(x) # [-0.158808 0.345714 0.841192 1.39957]
diff --git a/doc/fluid/api_cn/nn_cn/activation_cn/Hardshrink_cn.rst b/doc/fluid/api_cn/nn_cn/activation_cn/Hardshrink_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..61ba40a8f6e7b9b1a4f09fda94c26ea6b9c34830
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/Hardshrink_cn.rst
@@ -0,0 +1,43 @@
+.. _cn_api_nn_Hardshrink:
+
+Hardshrink
+-------------------------------
+.. py:class:: paddle.nn.Hardshrink(threshold=0.5, name=None)
+
+Hardshrink激活层
+
+.. math::
+
+    Hardshrink(x)=
+        \left\{
+        \begin{aligned}
+        &x, & & if \ x > threshold \\
+        &x, & & if \ x < -threshold \\
+        &0, & & if \ others
+        \end{aligned}
+        \right.
+
+其中，:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+    - threshold (float, 可选) - Hardshrink激活计算公式中的threshold值。默认值为0.5。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+形状:
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = paddle.to_variable(np.array([-1, 0.3, 2.5]))
+    m = paddle.nn.Hardshrink()
+    out = m(x) # [-1., 0., 2.5]
diff --git a/doc/fluid/api_cn/nn_cn/activation_cn/LogSigmoid_cn.rst b/doc/fluid/api_cn/nn_cn/activation_cn/LogSigmoid_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..321eb1d9ef0907d8ba1b651bf273b6c8d7dea47c
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/LogSigmoid_cn.rst
@@ -0,0 +1,36 @@
+.. _cn_api_nn_LogSigmoid:
+
+LogSigmoid
+-------------------------------
+.. py:class:: paddle.nn.LogSigmoid(name=None)
+
+Logsigmoid激活层。计算公式如下：
+
+.. math::
+
+    Logsigmoid(x) = \log \frac{1}{1 + e^{-x}}
+
+其中，:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+形状:
+::::::::::
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = paddle.to_tensor(np.array([1.0, 2.0, 3.0, 4.0]))
+    m = paddle.nn.LogSigmoid()
+    out = m(x) # [0.7310586, 0.880797, 0.95257413, 0.98201376]
diff --git a/doc/fluid/api_cn/nn_cn/activation_cn/LogSoftmax_cn.rst b/doc/fluid/api_cn/nn_cn/activation_cn/LogSoftmax_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..96bbc3a886f9535bf02cb1954645d5182e669120
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/LogSoftmax_cn.rst
@@ -0,0 +1,47 @@
+.. _cn_api_nn_LogSoftmax:
+
+LogSoftmax
+-------------------------------
+.. py:class:: paddle.nn.LogSoftmax(axis=-1, name=None)
+
+LogSoftmax激活层，计算公式如下：
+
+.. math::
+
+    Out[i, j] = log(softmax(x)) 
+              = log(\\frac{\exp(X[i, j])}{\sum_j(exp(X[i, j])})
+
+参数
+::::::::::
+    - axis (int, 可选) - 指定对输入Tensor进行运算的轴。``axis`` 的有效范围是[-D, D)，D是输入Tensor的维度， ``axis`` 为负值时与 :math:`axis + D` 等价。默认值为-1。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+形状:
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = np.array([[[-2.0, 3.0, -4.0, 5.0],
+                    [3.0, -4.0, 5.0, -6.0],
+                    [-7.0, -8.0, 8.0, 9.0]],
+                    [[1.0, -2.0, -3.0, 4.0],
+                    [-5.0, 6.0, 7.0, -8.0],
+                    [6.0, 7.0, 8.0, 9.0]]], 'float32')
+    m = paddle.nn.LogSoftmax()
+    x = paddle.to_tensor(x)
+    out = m(x)
+    # [[[ -7.1278396   -2.1278396   -9.127839    -0.12783948]
+    #   [ -2.1270514   -9.127051    -0.12705144 -11.127051  ]
+    #   [-16.313261   -17.313261    -1.3132617   -0.31326184]]
+    #  [[ -3.0518122   -6.051812    -7.051812    -0.051812  ]
+    #   [-12.313267    -1.3132664   -0.3132665  -15.313267  ]
+    #   [ -3.4401896   -2.4401896   -1.4401896   -0.44018966]]]
diff --git a/doc/fluid/api_cn/nn_cn/activation_cn/ReLU_cn.rst b/doc/fluid/api_cn/nn_cn/activation_cn/ReLU_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..ac39225bf61db1653fc7d62a7339a824f992aa0e
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/activation_cn/ReLU_cn.rst
@@ -0,0 +1,36 @@
+.. _cn_api_nn_ReLU:
+
+ReLU
+-------------------------------
+.. py:class:: paddle.nn.ReLU(name=None)
+
+ReLU激活层（Rectified Linear Unit）。计算公式如下：
+
+.. math::
+
+    ReLU(x) = max(0, x)
+
+其中，:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+形状:
+::::::::::
+    - input: 任意形状的Tensor。
+    - output: 和input具有相同形状的Tensor。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+    import paddle
+    import numpy as np
+  
+    paddle.disable_static()
+
+    x = paddle.to_tensor(np.array([-2, 0, 1]).astype('float32'))
+    m = paddle.nn.ReLU()
+    out = m(x) # [0., 0., 1.]
diff --git a/doc/fluid/api_cn/nn_cn/elu_cn.rst b/doc/fluid/api_cn/nn_cn/elu_cn.rst
index 59b3119aa3874a0280afd265f118733e9b33c19a..2006e70251db3b67675083623f114e7f273ef47b 100644
--- a/doc/fluid/api_cn/nn_cn/elu_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/elu_cn.rst
@@ -2,6 +2,43 @@
 
 elu
 -------------------------------
-:doc_source: paddle.fluid.layers.elu
+
+.. py:function:: paddle.nn.functional.elu(x, alpha=1.0, name=None)
+
+elu激活层（ELU Activation Operator）
+
+根据 `Exponential Linear Units <https://arxiv.org/abs/1511.07289>` 对输入Tensor中每个元素应用以下计算。
+
+.. math::
+
+    elu(x) = max(0, x) + min(0, \alpha * (e^{x} − 1))
+
+其中，:math:`x` 为输入的 Tensor
+
+参数:
+::::::::::
+ - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+ - alpha (float, 可选) - elu的alpha值，默认值为1.0。
+ - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+
+代码示例
+::::::::::
+
+.. code-block:: python
+
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = paddle.to_tensor(np.array([[-1,6],[1,15.6]]))
+    out = F.elu(x, alpha=0.2) 
+    # [[-0.12642411  6.        ]
+    #  [ 1.          15.6      ]]
 
 
diff --git a/doc/fluid/api_cn/nn_cn/functional_cn.rst b/doc/fluid/api_cn/nn_cn/functional_cn.rst
index b314bbe0ef25f09151745db0d91bdc8404eb540f..b6400159895b15562b72a9115443119230e02c42 100644
--- a/doc/fluid/api_cn/nn_cn/functional_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/functional_cn.rst
@@ -10,4 +10,7 @@ functional
 
     functional_cn/l1_loss_cn.rst
     functional_cn/nll_loss_cn.rst
+    functional_cn/normalize_cn.rst
     functional_cn/margin_ranking_loss_cn.rst
+    functional_cn/mse_loss_cn.rst
+    functional_cn/sigmoid_cn.rst
diff --git a/doc/fluid/api_cn/nn_cn/functional_cn/l1_loss_cn.rst b/doc/fluid/api_cn/nn_cn/functional_cn/l1_loss_cn.rst
index d7bf747f4d1720f65bfbab23738cc0ddc2389b3f..77c9a232a61d8f922a3cfd70c3e95508249f199b 100644
--- a/doc/fluid/api_cn/nn_cn/functional_cn/l1_loss_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/functional_cn/l1_loss_cn.rst
@@ -1,38 +1,38 @@
 l1_loss
 -------------------------------
 
-.. py:function:: paddle.nn.functional.l1_loss(x, label, reduction='mean', name=None)
+.. py:function:: paddle.nn.functional.l1_loss(input, label, reduction='mean', name=None)
 
-该接口计算输入 ``x`` 和标签 ``label`` 间的 `L1 loss` 损失。
+该接口计算输入 ``input`` 和标签 ``label`` 间的 `L1 loss` 损失。
 
 该损失函数的数学计算公式如下：
 
 当 `reduction` 设置为 ``'none'`` 时，
     
     .. math::
-        Out = \lvert x - label\rvert
+        Out = \lvert input - label\rvert
 
 当 `reduction` 设置为 ``'mean'`` 时，
 
     .. math::
-       Out = MEAN(\lvert x - label\rvert)
+       Out = MEAN(\lvert input - label\rvert)
 
 当 `reduction` 设置为 ``'sum'`` 时，
     
     .. math::
-       Out = SUM(\lvert x - label\rvert)
+       Out = SUM(\lvert input - label\rvert)
 
 
 参数
 :::::::::
-    - **x** (Tensor): - 输入的Tensor，维度是[N, *], 其中N是batch size， `*` 是任意数量的额外维度。数据类型为：float32、float64、int32、int64。
-    - **label** (Tensor): - 标签，维度是[N, *], 与 ``x`` 相同。数据类型为：float32、float64、int32、int64。
+    - **input** (Tensor): - 输入的Tensor，维度是[N, *], 其中N是batch size， `*` 是任意数量的额外维度。数据类型为：float32、float64、int32、int64。
+    - **label** (Tensor): - 标签，维度是[N, *], 与 ``input`` 相同。数据类型为：float32、float64、int32、int64。
     - **reduction** (str, 可选): - 指定应用于输出结果的计算方式，可选值有: ``'none'``, ``'mean'``, ``'sum'`` 。默认为 ``'mean'``，计算 `L1Loss` 的均值；设置为 ``'sum'`` 时，计算 `L1Loss` 的总和；设置为 ``'none'`` 时，则返回 `L1Loss`。
     - **name** (str，可选): - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 
 返回
 :::::::::
-``Tensor``, 输入 ``x`` 和标签 ``label`` 间的 `L1 loss` 损失。如果 :attr:`reduction` 是 ``'none'``, 则输出Loss的维度为 [N, *], 与输入 ``x`` 相同。如果 :attr:`reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出Loss的维度为 [1]。
+``Tensor``, 输入 ``input`` 和标签 ``label`` 间的 `L1 loss` 损失。如果 `reduction` 是 ``'none'``, 则输出Loss的维度为 [N, *], 与输入 ``input`` 相同。如果 `reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出Loss的维度为 [1]。
 
 
 代码示例
@@ -40,24 +40,24 @@ l1_loss
 
 .. code-block:: python
 
-        import paddle
         import numpy as np
+        import paddle
         
         paddle.disable_static()
-        x_data = np.array([[1.5, 0.8], [0.2, 1.3]]).astype("float32")
+        input_data = np.array([[1.5, 0.8], [0.2, 1.3]]).astype("float32")
         label_data = np.array([[1.7, 1], [0.4, 0.5]]).astype("float32")
-        x = paddle.to_variable(x_data)
+        input = paddle.to_variable(input_data)
         label = paddle.to_variable(label_data)
 
-        l1_loss = paddle.nn.functional.l1_loss(x, label)
+        l1_loss = paddle.nn.functional.l1_loss(input, label)
         print(l1_loss.numpy())  
         # [0.35]
 
-        l1_loss = paddle.nn.functional.l1_loss(x, label, reduction='none')
+        l1_loss = paddle.nn.functional.l1_loss(input, label, reduction='none')
         print(l1_loss.numpy())  
         # [[0.20000005 0.19999999]
         # [0.2        0.79999995]]
 
-        l1_loss = paddle.nn.functional.l1_loss(x, label, reduction='sum')
+        l1_loss = paddle.nn.functional.l1_loss(input, label, reduction='sum')
         print(l1_loss.numpy())  
         # [1.4]
diff --git a/doc/fluid/api_cn/nn_cn/functional_cn/mse_loss_cn.rst b/doc/fluid/api_cn/nn_cn/functional_cn/mse_loss_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..99041e8ac0bf5c0f5558c096d6e152b8b62b9094
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/functional_cn/mse_loss_cn.rst
@@ -0,0 +1,66 @@
+mse_loss
+-------------------------------
+
+.. py:function:: paddle.nn.functional.mse_loss(input, label, reduction='mean', name=None)
+
+该OP用于计算预测值和目标值的均方差误差。
+
+对于预测值input和目标值label，公式为：
+
+当 `reduction` 设置为 ``'none'`` 时，
+    
+    .. math::
+        Out = (input - label)^2
+
+当 `reduction` 设置为 ``'mean'`` 时，
+
+    .. math::
+       Out = \operatorname{mean}((input - label)^2)
+
+当 `reduction` 设置为 ``'sum'`` 时，
+    
+    .. math::
+       Out = \operatorname{sum}((input - label)^2)
+
+
+参数：
+:::::::::
+    - **input** (Tensor) - 预测值，维度为 :math:`[N_1, N_2, ..., N_k]` 的多维Tensor。数据类型为float32或float64。
+    - **label** (Tensor) - 目标值，维度为 :math:`[N_1, N_2, ..., N_k]` 的多维Tensor。数据类型为float32或float64。
+
+返回
+:::::::::
+``Tensor``, 输入 ``input`` 和标签 ``label`` 间的 `mse loss` 损失。
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    # static graph mode
+    paddle.enable_static()
+    mse_loss = paddle.nn.loss.MSELoss()
+    input = paddle.data(name="input", shape=[1])
+    label = paddle.data(name="label", shape=[1])
+    place = paddle.CPUPlace()
+    input_data = np.array([1.5]).astype("float32")
+    label_data = np.array([1.7]).astype("float32")
+    output = mse_loss(input,label)
+    exe = paddle.static.Executor(place)
+    exe.run(paddle.static.default_startup_program())
+    output_data = exe.run(
+        paddle.static.default_main_program(),
+        feed={"input":input_data, "label":label_data},
+        fetch_list=[output],
+        return_numpy=True)
+    print(output_data)
+    # [array([0.04000002], dtype=float32)]
+    # dynamic graph mode
+    paddle.disable_static()
+    input = paddle.to_variable(input_data)
+    label = paddle.to_variable(label_data)
+    output = mse_loss(input, label)
+    print(output.numpy())
+    # [0.04000002]
+
diff --git a/doc/fluid/api_cn/nn_cn/functional_cn/normalize_cn.rst b/doc/fluid/api_cn/nn_cn/functional_cn/normalize_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..04b5b185be37be17039652abf3e7c854b925d644
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/functional_cn/normalize_cn.rst
@@ -0,0 +1,59 @@
+normalize
+-------------------------------
+
+.. py:function:: paddle.nn.functional.normalize(x, p=2, axis=1, epsilon=1e-12, name=None)
+
+该接口使用 :math:`L_p` 范数沿维度 ``axis`` 对 ``x`` 进行归一化。计算公式如下：
+
+.. math::
+
+    y = \frac{x}{ \max\left( \lvert \lvert x \rvert \rvert_p, epsilon\right) }
+
+.. math::
+    \lvert \lvert x \rvert \rvert_p = \left(\sum_i {\lvert x_i\rvert^p}  \right)^{1/p}
+
+其中 :math:`\sum_i{\lvert x_i\rvert^p}` 沿维度 ``axis`` 进行计算。
+
+
+参数
+:::::::::
+    - **x** (Tensor) - 输入可以是N-D Tensor。数据类型为：float32、float64。
+    - **p** (float|int, 可选) - 范数公式中的指数值。默认值:2
+    - **axis** (int, 可选）- 要进行归一化的轴。如果 ``x`` 是1-D Tensor，轴固定为0。如果 `axis < 0`，轴为 `x.ndim + axis`。-1表示最后一维。
+    - **epsilon** (float，可选) - 添加到分母上的值以防止分母除0。默认值为1e-12。
+    - **name** (str，可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+返回
+:::::::::
+``Tensor``, 输出的形状和数据类型和 ``x`` 相同。
+
+抛出异常：
+:::::::::
+    - ``TypeError`` - 当参数  ``p`` 或者 ``axis`` 的类型不符合要求时。或者当参数 ``x`` 的类型或数据类型不符合要求时。
+
+代码示例
+:::::::::
+
+.. code-block:: python
+
+        import numpy as np
+        import paddle
+        import paddle.nn.functional as F
+
+        paddle.disable_static()
+        x = np.arange(6, dtype=np.float32).reshape(2,3)
+        x = paddle.to_variable(x)
+        y = F.normalize(x)
+        print(y.numpy())
+        # [[0.         0.4472136  0.8944272 ]
+        # [0.42426404 0.5656854  0.7071067 ]]
+
+        y = F.normalize(x, p=1.5)
+        print(y.numpy())
+        # [[0.         0.40862012 0.81724024]
+        # [0.35684016 0.4757869  0.5947336 ]]
+
+        y = F.normalize(x, axis=0)
+        print(y.numpy())
+        # [[0.         0.24253564 0.37139067]
+        # [1.         0.97014254 0.9284767 ]]
diff --git a/doc/fluid/api_cn/nn_cn/functional_cn/sigmoid_cn.rst b/doc/fluid/api_cn/nn_cn/functional_cn/sigmoid_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..0de495bc57a2e2d834d6d9b8fbac79c7ce34c975
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/functional_cn/sigmoid_cn.rst
@@ -0,0 +1,36 @@
+.. _cn_api_nn_functional_sigmoid:
+
+sigmoid
+-------------------------------
+
+.. py:function:: paddle.nn.functional.sigmoid(x, name=None)
+
+
+
+sigmoid激活函数
+
+.. math::
+    out = \frac{1}{1 + e^{-x}}
+
+
+参数：
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回：
+    - Tensor，对输入x进行sigmoid激活后的Tensor，形状、数据类型与输入x一致。
+
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    import paddle.nn.functional as F
+    paddle.disable_static()
+    x_data = np.array([-0.4, -0.2, 0.1, 0.3])
+    x = paddle.to_tensor(x_data)
+    out = F.sigmoid(x)
+    print(out.numpy())
+    # [0.40131234 0.450166   0.52497919 0.57444252]
diff --git a/doc/fluid/api_cn/nn_cn/gelu_cn.rst b/doc/fluid/api_cn/nn_cn/gelu_cn.rst
index b91b33eef3cfb2444803086f6e6a54ba79fb4e8d..586cd3677d7fddeeddccb47df01b46125dddba08 100644
--- a/doc/fluid/api_cn/nn_cn/gelu_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/gelu_cn.rst
@@ -2,6 +2,47 @@
 
 gelu
 -------------------------------
-:doc_source: paddle.fluid.layers.gelu
 
+.. py:function:: paddle.nn.functional.gelu(x, approximate=False, name=None)
+
+gelu激活层（GELU Activation Operator）
+
+逐元素计算 gelu激活函数。更多细节请参考 `Gaussian Error Linear Units <https://arxiv.org/abs/1606.08415>`_ 。
+
+如果使用近似计算：
+
+.. math::
+    gelu(x) = 0.5 * x * (1 + tanh(\sqrt{\frac{2}{\pi}} * (x + 0.044715x^{3})))
+
+如果不使用近似计算：
+
+.. math::
+    gelu(x) = 0.5 * x * (1 + erf(\frac{x}{\sqrt{2}}))
+
+其中，:math:`x` 为输入的 Tensor
+
+参数:
+::::::::::
+ - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+ - approximate (bool, 可选) - 是否使用近似计算，默认值为 False，表示不使用近似计算。
+ - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+
+代码示例
+::::::::::
+
+.. code-block:: python
+
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = paddle.to_tensor(np.array([[-1, 0.5],[1, 1.5]]))
+    out1 = F.gelu(x) # [-0.158655 0.345731 0.841345 1.39979]
+    out2 = F.gelu(x, True) # [-0.158808 0.345714 0.841192 1.39957]
 
diff --git a/doc/fluid/api_cn/nn_cn/hard_shrink_cn.rst b/doc/fluid/api_cn/nn_cn/hard_shrink_cn.rst
deleted file mode 100644
index 336fef4da117064d3ff1c8fd18d2c5f1fe9ae603..0000000000000000000000000000000000000000
--- a/doc/fluid/api_cn/nn_cn/hard_shrink_cn.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-.. _cn_api_nn_cn_hard_shrink:
-
-hard_shrink
--------------------------------
-:doc_source: paddle.fluid.layers.hard_shrink
-
-
diff --git a/doc/fluid/api_cn/nn_cn/hardshrink_cn.rst b/doc/fluid/api_cn/nn_cn/hardshrink_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..cb837884b4213834e11c41390a5da07a6c47079b
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/hardshrink_cn.rst
@@ -0,0 +1,42 @@
+.. _cn_api_nn_cn_hard_shrink:
+
+hardshrink
+-------------------------------
+.. py:functional:: paddle.nn.functional.hardshrink(x, threshold=0.5, name=None)
+
+hardshrink激活层。计算公式如下：
+
+.. math::
+
+    hardshrink(x)=
+        \left\{
+        \begin{aligned}
+        &x, & & if \ x > threshold \\
+        &x, & & if \ x < -threshold \\
+        &0, & & if \ others
+        \end{aligned}
+        \right.
+
+其中，:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+    - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+    - threshold (float, 可选) - hard_shrink激活计算公式中的threshold值。默认值为0.5。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+
+代码示例
+::::::::::
+
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = paddle.to_variable(np.array([-1, 0.3, 2.5]))
+    out = F.hardshrink(x) # [-1., 0., 2.5]
diff --git a/doc/fluid/api_cn/nn_cn/log_softmax_cn.rst b/doc/fluid/api_cn/nn_cn/log_softmax_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..5509a6a4f18b928ffa4426b7bedfda88926f5017
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/log_softmax_cn.rst
@@ -0,0 +1,51 @@
+.. _cn_api_nn_cn_log_softmax:
+
+log_softmax
+-------------------------------
+.. py:function:: paddle.nn.functional.log_softmax(x, axis=-1, dtype=None, name=None)
+
+该OP实现了log_softmax层。OP的计算公式如下：
+
+.. math::
+
+    Out[i, j] = log(softmax(x)) = log(\frac{\exp(X[i, j])}{\sum_j(exp(X[i, j])})
+
+参数
+::::::::::
+    - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+    - axis (int, 可选) - 指定对输入 ``x`` 进行运算的轴。``axis`` 的有效范围是[-D, D)，D是输入 ``x`` 的维度， ``axis`` 为负值时与 :math:`axis + D` 等价。默认值为-1。
+    - dtype (str|np.dtype|core.VarDesc.VarType, 可选) - 输入Tensor的数据类型。如果指定了 ``dtype`` ，则输入Tensor的数据类型会在计算前转换到 ``dtype`` 。``dtype``可以用来避免数据溢出。如果 ``dtype`` 为None，则输出Tensor的数据类型和 ``x`` 相同。默认值为None。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+返回
+::::::::::
+    ``Tensor`` ，形状和 ``x`` 相同，数据类型为 ``dtype`` 或者和 ``x`` 相同。
+
+代码示例
+::::::::::
+
+.. code-block:: python
+
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = np.array([[[-2.0, 3.0, -4.0, 5.0],
+                    [3.0, -4.0, 5.0, -6.0],
+                    [-7.0, -8.0, 8.0, 9.0]],
+                    [[1.0, -2.0, -3.0, 4.0],
+                    [-5.0, 6.0, 7.0, -8.0],
+                    [6.0, 7.0, 8.0, 9.0]]]).astype('float32')
+    x = paddle.to_tensor(x)
+    out1 = F.log_softmax(x)
+    out2 = F.log_softmax(x, dtype='float64')
+    # out1's data type is float32; out2's data type is float64
+    # out1 and out2's value is as follows:
+    # [[[ -7.1278396   -2.1278396   -9.127839    -0.12783948]
+    #   [ -2.1270514   -9.127051    -0.12705144 -11.127051  ]
+    #   [-16.313261   -17.313261    -1.3132617   -0.31326184]]
+    #  [[ -3.0518122   -6.051812    -7.051812    -0.051812  ]
+    #   [-12.313267    -1.3132664   -0.3132665  -15.313267  ]
+    #   [ -3.4401896   -2.4401896   -1.4401896   -0.44018966]]]
diff --git a/doc/fluid/api_cn/nn_cn/logsigmoid_cn.rst b/doc/fluid/api_cn/nn_cn/logsigmoid_cn.rst
index afe57610df60397347eb246c4d3b3e0098a307a1..0bbb5f3ca510f293705777512dbdd024dc629efa 100644
--- a/doc/fluid/api_cn/nn_cn/logsigmoid_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/logsigmoid_cn.rst
@@ -2,6 +2,36 @@
 
 logsigmoid
 -------------------------------
-:doc_source: paddle.fluid.layers.logsigmoid
 
+.. py:function:: paddle.nn.functional.logsigmoid(x, name=None)
 
+logsigmoid激活层。计算公式如下：
+
+.. math::
+
+    logsigmoid(x) = \log \frac{1}{1 + e^{-x}}
+
+其中，:math:`x` 为输入的 Tensor
+
+参数
+::::::::::
+    - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+
+代码示例
+::::::::::
+
+.. code-block:: python
+
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = paddle.to_tensor(np.array([1.0, 2.0, 3.0, 4.0]))
+    out = F.logsigmoid(x) # [0.7310586, 0.880797, 0.95257413, 0.98201376]
diff --git a/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss_cn.rst b/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss_cn.rst
index 71f366e326e910ee35528ee4c299cc2175a8e329..f5a41b3a80888c195da84a470a45c87d70b08ed3 100644
--- a/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss_cn.rst
@@ -3,24 +3,24 @@ L1Loss
 
 .. py:class:: paddle.nn.loss.L1Loss(reduction='mean', name=None)
 
-该接口用于创建一个L1Loss的可调用类，L1Loss计算输入x和标签label间的 `L1 loss` 损失。
+该接口用于创建一个L1Loss的可调用类，L1Loss计算输入input和标签label间的 `L1 loss` 损失。
 
 该损失函数的数学计算公式如下：
 
 当 `reduction` 设置为 ``'none'`` 时，
     
     .. math::
-        Out = \lvert x - label\rvert
+        Out = \lvert input - label\rvert
 
 当 `reduction` 设置为 ``'mean'`` 时，
 
     .. math::
-       Out = MEAN(\lvert x - label\rvert)
+       Out = MEAN(\lvert input - label\rvert)
 
 当 `reduction` 设置为 ``'sum'`` 时，
     
     .. math::
-       Out = SUM(\lvert x - label\rvert)
+       Out = SUM(\lvert input - label\rvert)
 
 
 参数
@@ -30,36 +30,36 @@ L1Loss
 
 形状
 :::::::::
-    - **x** (Tensor): - 输入的Tensor，维度是[N, *], 其中N是batch size， `*` 是任意数量的额外维度。数据类型为：float32、float64、int32、int64。
-    - **label** (Tensor): - 标签，维度是[N, *], 与 ``x`` 相同。数据类型为：float32、float64、int32、int64。
-    - **output** (Tensor): - 输入 ``x`` 和标签 ``label`` 间的 `L1 loss` 损失。如果 :attr:`reduction` 是 ``'none'``, 则输出Loss的维度为 [N, *], 与输入 ``x`` 相同。如果 :attr:`reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出Loss的维度为 [1]。
+    - **input** (Tensor): - 输入的Tensor，维度是[N, *], 其中N是batch size， `*` 是任意数量的额外维度。数据类型为：float32、float64、int32、int64。
+    - **label** (Tensor): - 标签，维度是[N, *], 与 ``input`` 相同。数据类型为：float32、float64、int32、int64。
+    - **output** (Tensor): - 输入 ``input`` 和标签 ``label`` 间的 `L1 loss` 损失。如果 `reduction` 是 ``'none'``, 则输出Loss的维度为 [N, *], 与输入 ``input`` 相同。如果 `reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出Loss的维度为 [1]。
 
 代码示例
 :::::::::
 
 .. code-block:: python
 
-        import paddle
         import numpy as np
+        import paddle
 
         paddle.disable_static()
-        x_data = np.array([[1.5, 0.8], [0.2, 1.3]]).astype("float32")
+        input_data = np.array([[1.5, 0.8], [0.2, 1.3]]).astype("float32")
         label_data = np.array([[1.7, 1], [0.4, 0.5]]).astype("float32")
-        x = paddle.to_variable(x_data)
+        input = paddle.to_variable(input_data)
         label = paddle.to_variable(label_data)
 
         l1_loss = paddle.nn.loss.L1Loss()
-        output = l1_loss(x, label)
+        output = l1_loss(input, label)
         print(output.numpy())  
         # [0.35]
 
         l1_loss = paddle.nn.loss.L1Loss(reduction='sum')
-        output = l1_loss(x, label)
+        output = l1_loss(input, label)
         print(output.numpy())  
         # [1.4]
 
         l1_loss = paddle.nn.loss.L1Loss(reduction='none')
-        output = l1_loss(x, label)
+        output = l1_loss(input, label)
         print(output.numpy())  
         # [[0.20000005 0.19999999]
         # [0.2        0.79999995]]
diff --git a/doc/fluid/api_cn/nn_cn/relu_cn.rst b/doc/fluid/api_cn/nn_cn/relu_cn.rst
new file mode 100644
index 0000000000000000000000000000000000000000..447d0bb01514d42aa74f6265d52cd8ed42c40880
--- /dev/null
+++ b/doc/fluid/api_cn/nn_cn/relu_cn.rst
@@ -0,0 +1,38 @@
+.. _cn_api_nn_cn_relu:
+
+relu
+-------------------------------
+
+.. py:function:: paddle.nn.functional.relu(x, name=None)
+
+relu激活层（Rectified Linear Unit）。计算公式如下：
+
+.. math::
+
+    relu(x) = max(0, x)
+
+其中，:math:`x` 为输入的 Tensor
+
+
+参数
+::::::::::
+    - x (Tensor) - 输入的 ``Tensor`` ，数据类型为：float32、float64。
+    - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
+
+返回
+::::::::::
+    ``Tensor`` ，数据类型和形状同 ``x`` 一致。
+
+代码示例
+::::::::::
+
+.. code-block:: python
+
+    import paddle
+    import paddle.nn.functional as F
+    import numpy as np
+
+    paddle.disable_static()
+
+    x = paddle.to_tensor(np.array([-2, 0, 1]).astype('float32'))
+    out = F.relu(x) # [0., 0., 1.]
diff --git a/doc/fluid/api_cn/nn_cn/softmax_cn.rst b/doc/fluid/api_cn/nn_cn/softmax_cn.rst
index 5879cfa4368af1de5b006a02e533a6b46627eb7f..5c2e0cc806c78a831b0a66e6fa89c4bc233a6ecb 100644
--- a/doc/fluid/api_cn/nn_cn/softmax_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/softmax_cn.rst
@@ -2,7 +2,9 @@
 
 softmax
 -------------------------------
-.. py:class:: paddle.nn.functional.softmax(x, axis=-1, name=None)
+
+.. py:function:: paddle.nn.functional.softmax(x, axis=-1, name=None)
+
 
 该OP实现了softmax层。OP的计算过程如下：
 
@@ -27,9 +29,9 @@ softmax
 
 - 示例1（矩阵一共有三维。axis = -1，表示沿着最后一维（即第三维）做softmax操作）
 
-.. code-block:: python
+.. code-block:: text
 
-  输入
+  # input
 
     x.shape = [2, 3, 4] 
 
@@ -42,7 +44,7 @@ softmax
 
     axis = -1
 
-  输出
+  # output
 
     out.shape = [2, 3, 4]
 
@@ -55,9 +57,9 @@ softmax
 
 - 示例2（矩阵一共有三维。axis = 1，表示沿着第二维做softmax操作）
 
-.. code-block:: python
+.. code-block:: text
 
-  输入
+  # input
 
     x.shape = [2, 3, 4] 
 
@@ -70,7 +72,7 @@ softmax
 
     axis = 1
 
-  输出
+  # output
 
     out.shape = [2, 3, 4]
 
@@ -101,7 +103,7 @@ softmax
     import paddle.nn.functional as F
     import numpy as np
 
-    paddle.enable_imperative()
+    paddle.disable_static()
 
     x = np.array([[[2.0, 3.0, 4.0, 5.0],
                     [3.0, 4.0, 5.0, 6.0],
@@ -109,7 +111,7 @@ softmax
                     [[1.0, 2.0, 3.0, 4.0],
                     [5.0, 6.0, 7.0, 8.0],
                     [6.0, 7.0, 8.0, 9.0]]], 'float32')
-    x = paddle.imperative.to_variable(x)
+    x = paddle.to_variable(x)
     out = F.softmax(x)
     # [[[0.0320586 , 0.08714432, 0.23688282, 0.64391426],
     #   [0.0320586 , 0.08714432, 0.23688282, 0.64391426],
diff --git a/doc/fluid/api_cn/paddle_cn/add_cn.rst b/doc/fluid/api_cn/paddle_cn/add_cn.rst
index 8cb469caf8ec6459b309256bdf95d08bf2fac7a4..8fe64a011995437277129bd9070b2a53b6a56543 100644
--- a/doc/fluid/api_cn/paddle_cn/add_cn.rst
+++ b/doc/fluid/api_cn/paddle_cn/add_cn.rst
@@ -2,6 +2,4 @@
 
 add
 -------------------------------
-:doc_source: paddle.fluid.layers.elementwise_add
-
-
+:doc_source: paddle.tensor.add
diff --git a/doc/fluid/api_cn/tensor_cn/add_cn.rst b/doc/fluid/api_cn/tensor_cn/add_cn.rst
index 5673e801092d6af999465df5073fa22efad24779..94162e5c8419121731a5dc89905c2e5bd9b1d898 100644
--- a/doc/fluid/api_cn/tensor_cn/add_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/add_cn.rst
@@ -3,177 +3,47 @@
 add
 -------------------------------
 
-.. py:function:: paddle.add(x, y, alpha=1, out=None, name=None)
+.. py:function:: paddle.add(x, y, name=None)
 
 :alias_main: paddle.add
 :alias: paddle.add,paddle.tensor.add,paddle.tensor.math.add
 :update_api: paddle.fluid.layers.elementwise_add
 
 
-
 该OP是逐元素相加算子，输入 ``x`` 与输入 ``y`` 逐元素相加，并将各个位置的输出元素保存到返回结果中。
 
+输入 ``x`` 与输入 ``y`` 必须可以广播为相同形状, 关于广播规则，请参考 :ref:`use_guide_broadcasting`
+
 等式为：
 
 .. math::
         Out = X + Y
 
 - :math:`X` ：多维Tensor。
-- :math:`Y` ：维度必须小于等于X维度的Tensor。
-
-对于这个运算算子有2种情况：
-        1. :math:`Y` 的 ``shape`` 与 :math:`X` 相同。
-        2. :math:`Y` 的 ``shape`` 是 :math:`X` 的连续子序列。
-
-对于情况2:
-        1. 用 :math:`Y` 匹配 :math:`X` 的形状（shape），其中 ``axis`` 是 :math:`Y` 在 :math:`X` 上的起始维度的位置。
-        2. 如果 ``axis`` 为-1（默认值），则 :math:`axis= rank(X)-rank(Y)` 。
-        3. 考虑到子序列， :math:`Y` 的大小为1的尾部维度将被忽略，例如shape（Y）=（2,1）=>（2）。
-
-例如：
-
-..  code-block:: text
-
-        shape(X) = (2, 3, 4, 5), shape(Y) = (,)
-        shape(X) = (2, 3, 4, 5), shape(Y) = (5,)
-        shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5), with axis=-1(default) or axis=2
-        shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1
-        shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0
-        shape(X) = (2, 3, 4, 5), shape(Y) = (2, 1), with axis=0
+- :math:`Y` ：多维Tensor。
 
 参数：
-        - **x** （Variable）- 多维 ``Tensor`` 或 ``LoDTensor`` 。数据类型为 ``float32`` 、 ``float64`` 、 ``int32`` 或  ``int64``。
-        - **y** （Variable）- 多维 ``Tensor`` 或 ``LoDTensor`` 。数据类型为 ``float32`` 、 ``float64`` 、 ``int32`` 或  ``int64``。
-        - **alpha** （int|float，可选）- 输入y的缩放因子。默认值为1. 如果alpha不为1，本api计算公式变为 :math:`Out = X + alpha * Y`
-        - **out** （Variable，可选）-  指定存储运算结果的 ``Tensor`` 。如果设置为None或者不设置，将创建新的 ``Tensor`` 存储运算结果，默认值为None。
-        - **name** （str，可选）- 输出的名字。默认值为None。该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` 。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64、int32、int64。
+    - y (Tensor) - 输入的Tensor，数据类型为：float32、float64、int32、int64。
+    - name (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 
+返回：  多维Tensor, 数据类型与 ``x`` 相同, 维度为广播后的形状。
 
-返回：        多维 ``Tensor`` 或 ``LoDTensor`` ，维度和数据类型都与 ``x`` 相同。
+返回类型：        Tensor
 
-返回类型：        Variable
 
-**代码示例 1**
+**代码示例**
 
 ..  code-block:: python
 
     import paddle
-    import paddle.fluid as fluid
     import numpy as np
 
-    def gen_data():
-        return {
-            "x": np.array([2, 3, 4]).astype('float32'),
-            "y": np.array([1, 5, 2]).astype('float32')
-        }
-
-    x = fluid.data(name="x", shape=[3], dtype='float32')
-    y = fluid.data(name="y", shape=[3], dtype='float32')
-    z1 = paddle.add(x, y)
-    z2 = paddle.add(x, y, alpha=10)
-    # z = x + y
-
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    z_value = exe.run(feed=gen_data(),
-                        fetch_list=[z1.name, z2.name])
-
-    print(z_value[0]) # [3., 8., 6.]
-    print(z_value[1]) # [12. 53. 24.]
-
-**代码示例 2**
-
-..  code-block:: python
-
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-
-    def gen_data():
-        return {
-            "x": np.ones((2, 3, 4, 5)).astype('float32'),
-            "y": np.zeros((4, 5)).astype('float32')
-        }
-
-    x = fluid.data(name="x", shape=[2, 3, 4, 5], dtype='float32')
-    y = fluid.data(name="y", shape=[4, 5], dtype='float32')
-    z = paddle.add(x, y, name='z')
-    # z = x + y
-
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-
-    z_value = exe.run(feed=gen_data(),
-                        fetch_list=[z.name])
-
-    print(z_value[0])
-    print(z_value[0].shape) # z.shape=[2,3,4,5]
-
-**代码示例 3**
-
-..  code-block:: python
-
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-
-    def gen_data():
-        return {
-            "x": np.random.randint(1, 5, size=[2, 3, 4, 5]).astype('float32'),
-            "y": np.random.randint(1, 5, size=[5]).astype('float32')
-        }
-
-    x = fluid.data(name="x", shape=[2,3,4,5], dtype='float32')
-    y = fluid.data(name="y", shape=[5], dtype='float32')
+    paddle.enable_imperative()
+    np_x = np.array([2, 3, 4]).astype('float64')
+    np_y = np.array([1, 5, 2]).astype('float64')
+    x = paddle.imperative.to_variable(np_x)
+    y = paddle.imperative.to_variable(np_y)
     z = paddle.add(x, y)
-    # z = x / y
-
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-
-    z_value = exe.run(feed=gen_data(),
-                        fetch_list=[z.name])
-    print(z_value[0])
-    print(z_value[0].shape) # z.shape=[2,3,4,5]
-
-
-**代码示例 4**
-
-..  code-block:: python
-
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-
-    x = fluid.data(name="x", shape=[3], dtype="float32")
-    y = fluid.data(name='y', shape=[3], dtype='float32')
-
-    output = fluid.data(name="output", shape=[3], dtype="float32")
-    z = paddle.add(x, y, out=output)
-
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    data1 = np.array([2, 3, 4], dtype='float32')
-    data2 = np.array([1, 5, 2], dtype='float32')
-    z_value = exe.run(feed={'x': data1,
-                            'y': data2},
-                            fetch_list=[z])
-    print(z_value[0]) # [3. 8. 6.]
-
-
-**代码示例 5（动态图）**
-
-..  code-block:: python
-
-    import paddle
-    import paddle.fluid as fluid
-    import numpy as np
-
-    with fluid.dygraph.guard():
-        np_x = np.array([2, 3, 4]).astype('float64')
-        np_y = np.array([1, 5, 2]).astype('float64')
-        x = fluid.dygraph.to_variable(np_x)
-        y = fluid.dygraph.to_variable(np_y)
-        z = paddle.add(x, y, alpha=-0.5)
-        np_z = z.numpy()
-        print(np_z)  # [1.5, 0.5, 3. ]
+    np_z = z.numpy()
+    print(np_z)  # [3., 8., 6. ]
diff --git a/doc/fluid/api_cn/tensor_cn/allclose_cn.rst b/doc/fluid/api_cn/tensor_cn/allclose_cn.rst
index e580aa233a340115835a263bf893422f055dc6b7..c483e3a112f2513f8db0bb7095dc1f99e7a4abd3 100644
--- a/doc/fluid/api_cn/tensor_cn/allclose_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/allclose_cn.rst
@@ -3,23 +3,18 @@
 allclose
 -------------------------------
 
-.. py:function:: paddle.allclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False, name=None)
+.. py:function:: paddle.allclose(x, y, rtol=1e-05, atol=1e-08, equal_nan=False, name=None)
 
-:alias_main: paddle.allclose
-:alias: paddle.allclose,paddle.tensor.allclose,paddle.tensor.logic.allclose
-
-
-
-逐个检查input和other的所有元素是否均满足如下条件：
+逐个检查x和y的所有元素是否均满足如下条件：
 
 ..  math::
-    \left| input - other \right| \leq atol + rtol \times \left| other \right|
+    \left| x - y \right| \leq atol + rtol \times \left| y \right|
 
 该API的行为类似于 :math:`numpy.allclose` ，即当两个待比较Tensor的所有元素均在一定容忍误差范围内视为相等则该API返回True值。
 
 参数:
-    - **input** (Variable) - 第一个输入待比较Tensor input。
-    - **other** (Variable) - 第二个输入待比较Tensor other。
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64。
+    - **y** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64。
     - **rtol** (float，可选) - 相对容忍误差，默认值为1e-5。
     - **atol** (float，可选) - 绝对容忍误差，默认值为1e-8。
     - **equal_nan** (bool，可选) - 如果设置为True，则两个NaN数值将被视为相等，默认值为False。
@@ -27,43 +22,37 @@ allclose
 
 返回：计算得到的布尔类型单值Tensor。
 
-返回类型：变量（Variable）
-
 **代码示例**:
 
 .. code-block:: python
 
     import paddle
-    import paddle.fluid as fluid
     import numpy as np
-    use_cuda = fluid.core.is_compiled_with_cuda()
-    a = fluid.data(name="a", shape=[2], dtype='float32')
-    b = fluid.data(name="b", shape=[2], dtype='float32')
-    result = paddle.allclose(a, b, rtol=1e-05, atol=1e-08,
+
+    paddle.disable_static()
+
+    np_x = np.array([10000., 1e-07]).astype("float32")
+    np_y = np.array([10000.1, 1e-08]).astype("float32")
+    x = paddle.to_tensor (np_x)
+    y = paddle.to_tensor (np_y)
+    result1 = paddle.allclose(x, y, rtol=1e-05, atol=1e-08,
+                            equal_nan=False, name="ignore_nan")
+    np_result1 = result1.numpy()
+    # [False]
+    result2 = paddle.allclose(x, y, rtol=1e-05, atol=1e-08,
+                                equal_nan=True, name="equal_nan")
+    np_result2 = result2.numpy()
+    # [False]
+
+    np_x = np.array([1.0, float('nan')]).astype("float32")
+    np_y = np.array([1.0, float('nan')]).astype("float32")
+    x = paddle.to_tensor (np_x)
+    y = paddle.to_tensor (np_y)
+    result1 = paddle.allclose(x, y, rtol=1e-05, atol=1e-08,
                             equal_nan=False, name="ignore_nan")
-    result_nan = paddle.allclose(a, b, rtol=1e-05, atol=1e-08,
+    np_result1 = result1.numpy()
+    # [False]
+    result2 = paddle.allclose(x, y, rtol=1e-05, atol=1e-08,
                                 equal_nan=True, name="equal_nan")
-    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    exe.run(fluid.default_startup_program())
-    x = np.array([10000., 1e-07]).astype("float32")
-    y = np.array([10000.1, 1e-08]).astype("float32")
-    result_v, result_nan_v = exe.run(
-        feed={'a': x, 'b': y},
-        fetch_list=[result, result_nan])
-    print(result_v, result_nan_v)
-    # Output: (array([False]), array([False]))
-    x = np.array([10000., 1e-08]).astype("float32")
-    y = np.array([10000.1, 1e-09]).astype("float32")
-    result_v, result_nan_v = exe.run(
-        feed={'a': x, 'b': y},
-        fetch_list=[result, result_nan])
-    print(result_v, result_nan_v)
-    # Output: (array([ True]), array([ True]))
-    x = np.array([1.0, float('nan')]).astype("float32")
-    y = np.array([1.0, float('nan')]).astype("float32")
-    result_v, result_nan_v = exe.run(
-        feed={'a': x, 'b': y},
-        fetch_list=[result, result_nan])
-    print(result_v, result_nan_v)
-    # Output: (array([False]), array([ True]))
+    np_result2 = result2.numpy()
+    # [True]
diff --git a/doc/fluid/api_cn/tensor_cn/div_cn.rst b/doc/fluid/api_cn/tensor_cn/div_cn.rst
old mode 100755
new mode 100644
diff --git a/doc/fluid/api_cn/tensor_cn/erf_cn.rst b/doc/fluid/api_cn/tensor_cn/erf_cn.rst
index ca07a4a13cd0a231390b25f47cbc4410df6e151e..b56832508a4ff7985dd237d60d7bc34d69486724 100644
--- a/doc/fluid/api_cn/tensor_cn/erf_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/erf_cn.rst
@@ -1,3 +1,35 @@
+.. _cn_api_tensor_erf:
+
 erf
 -------------------------------
-**版本升级，文档正在开发中**
+
+.. py:function:: paddle.erf(x, name=None)
+
+
+
+逐元素计算 Erf 激活函数。更多细节请参考 `Error function <https://en.wikipedia.org/wiki/Error_function>`_ 。
+
+
+.. math::
+    out = \frac{2}{\sqrt{\pi}} \int_{0}^{x}e^{- \eta^{2}}d\eta
+
+参数：
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为： float16, float32, float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回：
+    - Tensor，对输入x进行erf激活后的Tensor，形状、数据类型与输入 x 一致。
+
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([-0.4, -0.2, 0.1, 0.3])
+    x = paddle.to_tensor(x_data)
+    out = paddle.erf(x)
+    print(out.numpy())
+    # [-0.42839236 -0.22270259  0.11246292  0.32862676]
diff --git a/doc/fluid/api_cn/tensor_cn/index_select_cn.rst b/doc/fluid/api_cn/tensor_cn/index_select_cn.rst
index 077baf49bd0af13faa889992b2d41ce7723ac574..dfb235db5b4f72f04c85dd6b878b7b5568f4344e 100644
--- a/doc/fluid/api_cn/tensor_cn/index_select_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/index_select_cn.rst
@@ -7,7 +7,7 @@ index_select
 
 
 
-该OP沿着指定轴 ``axis`` 对输入 ``x`` 进行索引，取 ``index`` 中指定的相应项，创建并返回到一个新的Tensor。这里 ``index`` 是一个 ``1-D`` Tensor。除 ``axis`` 轴外，返回的Tensor其余维度大小和输入 ``x``相等 ， ``axis`` 维度的大小等于 ``index`` 的大小。
+该OP沿着指定轴 ``axis`` 对输入 ``x`` 进行索引，取 ``index`` 中指定的相应项，创建并返回到一个新的Tensor。这里 ``index`` 是一个 ``1-D`` Tensor。除 ``axis`` 轴外，返回的Tensor其余维度大小和输入 ``x`` 相等 ， ``axis`` 维度的大小等于 ``index`` 的大小。
         
 **参数**：
     - **x** （Tensor）– 输入Tensor。 ``x`` 的数据类型可以是float32，float64，int32，int64。
@@ -30,14 +30,14 @@ index_select
         import paddle
         import numpy as np
 
-        paddle.enable_imperative()  # Now we are in imperative mode
+        paddle.disable_static()  # Now we are in imperative mode
         data = np.array([[1.0, 2.0, 3.0, 4.0],
                          [5.0, 6.0, 7.0, 8.0],
                          [9.0, 10.0, 11.0, 12.0]])
-        data_index = np.array([-1, 1, 1]).astype('int32')
+        data_index = np.array([0, 1, 1]).astype('int32')
 
-        x = paddle.imperative.to_variable(data)
-        index = paddle.imperative.to_variable(data_index)
+        x = paddle.to_variable(data)
+        index = paddle.to_variable(data_index)
         out_z1 = paddle.index_select(x=x, index=index)
         #[[1. 2. 3. 4.]
         # [5. 6. 7. 8.]
diff --git a/doc/fluid/api_cn/tensor_cn/mean_cn.rst b/doc/fluid/api_cn/tensor_cn/mean_cn.rst
index bc838016e17b8d4992aee8802128eb69c983cb71..8ac774a6b4471daca40ba4ab7ee8308fe3539b84 100644
--- a/doc/fluid/api_cn/tensor_cn/mean_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/mean_cn.rst
@@ -11,9 +11,9 @@ mean
 
 参数
 ::::::::::
-    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64、int32.int64 。
+    - x (Tensor) - 输入的Tensor，数据类型为：float32、float64。
     - axis (int|list|tuple, 可选) - 指定对 ``x`` 进行计算的轴。``axis`` 可以是int、list(int)、tuple(int)。如果 ``axis`` 包含多个维度，则沿着 ``axis`` 中的所有轴进行计算。``axis`` 或者其中的元素值应该在范围[-D, D)内，D是 ``x`` 的维度。如果 ``axis`` 或者其中的元素值小于0，则等价于 :math:`axis + D` 。如果 ``axis`` 是None，则对 ``x`` 的全部元素计算平均值。默认值为None。
-    - keepdim (bool, 可选) - 是否在输出Tensor中保留减小的维度。如果 ``keep_dim`` 为True，则输出Tensor和 ``x`` 具有相同的维度(减少的维度除外，减少的维度的大小为1)。否则，输出Tensor的形状会在 ``axsi`` 上进行squeeze操作。默认值为False。
+    - keepdim (bool, 可选) - 是否在输出Tensor中保留减小的维度。如果 ``keepdim`` 为True，则输出Tensor和 ``x`` 具有相同的维度(减少的维度除外，减少的维度的大小为1)。否则，输出Tensor的形状会在 ``axis`` 上进行squeeze操作。默认值为False。
     - name (str, 可选) - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name`。
 
 返回
@@ -31,12 +31,12 @@ mean
     paddle.disable_static()
 
     x = np.array([[[1, 2, 3, 4],
-                    [5, 6, 7, 8],
-                    [9, 10, 11, 12]],
-                    [[13, 14, 15, 16],
-                    [17, 18, 19, 20],
-                    [21, 22, 23, 24]]], 'float32')
-    x = paddle.to_variable(x)
+                   [5, 6, 7, 8],
+                   [9, 10, 11, 12]],
+                  [[13, 14, 15, 16],
+                   [17, 18, 19, 20],
+                   [21, 22, 23, 24]]], 'float32')
+    x = paddle.to_tensor(x)
     out1 = paddle.mean(x)
     # [12.5]
     out2 = paddle.mean(x, axis=-1)
diff --git a/doc/fluid/api_cn/tensor_cn/round_cn.rst b/doc/fluid/api_cn/tensor_cn/round_cn.rst
index f6a66cc620a66363cecf4c60681723d274a75e5a..0e5eed214ae6a74521e030803a36525c21e2820b 100644
--- a/doc/fluid/api_cn/tensor_cn/round_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/round_cn.rst
@@ -2,6 +2,30 @@
 
 round
 -------------------------------
-:doc_source: paddle.fluid.layers.round
 
+.. py:function:: paddle.round(x, name=None)
 
+
+
+该OP将输入中的数值四舍五入到最接近的整数数值。
+
+参数:
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为： float16, float32, float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回：
+    - Tensor，对输入x四舍五入后的Tensor，形状、数据类型与输入x一致。
+
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([-0.5, -0.2, 0.6, 1.5])
+    x = paddle.to_tensor(x_data)
+    out = paddle.round(x)
+    print(out.numpy())
+    # [-1. -0.  1.  2.]
diff --git a/doc/fluid/api_cn/tensor_cn/rsqrt_cn.rst b/doc/fluid/api_cn/tensor_cn/rsqrt_cn.rst
index 58614d3310e7a14af95c7b2e1ea474f9893df8c5..98bb2483cb9055a8c1010eeec753286902ce4ab5 100644
--- a/doc/fluid/api_cn/tensor_cn/rsqrt_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/rsqrt_cn.rst
@@ -2,6 +2,39 @@
 
 rsqrt
 -------------------------------
-:doc_source: paddle.fluid.layers.rsqrt
 
+.. py:function:: paddle.rsqrt(x, name=None)
+
+
+
+
+该OP为rsqrt激活函数。
+
+注：输入x应确保为非 **0** 值，否则程序会抛异常退出。
+
+其运算公式如下：
+
+.. math::
+    out = \frac{1}{\sqrt{x}}
+
+
+参数:
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回：
+    - Tensor，对输入x进行rsqrt激活后的Tensor，形状、数据类型与输入x一致。
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([0.1, 0.2, 0.3, 0.4])
+    x = paddle.to_tensor(x_data)
+    out = paddle.rsqrt(x)
+    print(out.numpy())
+    # [3.16227766 2.23606798 1.82574186 1.58113883]
 
diff --git a/doc/fluid/api_cn/tensor_cn/sin_cn.rst b/doc/fluid/api_cn/tensor_cn/sin_cn.rst
index 97d3fec0fdae36058f1403c054a8cbfa21587faa..36f6255bb9a046f97b4463ab17ac18a0b439d4aa 100644
--- a/doc/fluid/api_cn/tensor_cn/sin_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/sin_cn.rst
@@ -3,11 +3,7 @@
 sin
 -------------------------------
 
-.. py:function:: paddle.sin(x, name=None, out=None)
-
-:alias_main: paddle.sin
-:alias: paddle.sin,paddle.tensor.sin,paddle.tensor.math.sin
-:update_api: paddle.fluid.layers.sin
+.. py:function:: paddle.sin(x, name=None)
 
 
 
@@ -16,29 +12,23 @@ sin
 .. math::
         out = sin(x)
 
-参数:
-    - **x** (Variable) - 支持任意维度的Tensor。数据类型为float32，float64或float16。
-    - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
-    - **out** (Variable, 可选) – 指定存储运算结果的Tensor。如果设置为None或者不设置，将创建新的Tensor存储运算结果，默认值为None。
+参数：
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为： float16, float32, float64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回：
+    - Tensor，对输入x计算sin值后的Tensor，形状、数据类型同输入x一致。
 
-返回：返回类型为Variable(Tensor|LoDTensor)， 数据类型同输入一致。
 
 **代码示例**：
 
 .. code-block:: python
 
-        import numpy as np
-        import paddle
-        import paddle.fluid as fluid
-
-        inputs = fluid.layers.data(name="x", shape = [3], dtype='float32')
-        output = paddle.sin(inputs)
-
-        exe = fluid.Executor(fluid.CPUPlace())
-        exe.run(fluid.default_startup_program())
-
-        img = np.array([0, 45, 90]).astype(np.float32)
-
-        res = exe.run(fluid.default_main_program(), feed={'x':img}, fetch_list=[output])
-        print(res)
-        # [array([0.        , 0.8509035 , 0.89399666], dtype=float32)]
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([-0.4, -0.2, 0.1, 0.3])
+    x = paddle.to_tensor(x_data)
+    out = paddle.sin(x)
+    print(out.numpy())
+    # [-0.38941834 -0.19866933  0.09983342  0.29552021]
diff --git a/doc/fluid/api_cn/tensor_cn/sqrt_cn.rst b/doc/fluid/api_cn/tensor_cn/sqrt_cn.rst
index fd64d0cb368dfbbd30dc741f50dacab64428a484..ce74caa93efb368672577ff6665878b66183073c 100644
--- a/doc/fluid/api_cn/tensor_cn/sqrt_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/sqrt_cn.rst
@@ -3,7 +3,7 @@
 sqrt
 -------------------------------
 
-.. py:function:: paddle.sqrt(x, name=None, out=None)
+.. py:function:: paddle.sqrt(x, name=None)
 
 :alias_main: paddle.sqrt
 :alias: paddle.sqrt,paddle.tensor.sqrt,paddle.tensor.math.sqrt
@@ -21,28 +21,20 @@ sqrt
 
 参数:
 
-    - **x** (Variable) - 支持任意维度的Tensor。数据类型为float32，float64或float16。
+    - **x** (Tensor) - 支持任意维度的Tensor。数据类型为float32，float64或float16。
     - **name** (str，可选) – 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
-    - **out** (Variable, 可选) – 指定存储运算结果的Tensor。如果设置为None或者不设置，将创建新的Tensor存储运算结果，默认值为None。
 
-返回：返回类型为Variable(Tensor|LoDTensor)， 数据类型同输入一致。
+返回：返回类型为Tensor， 数据类型同输入一致。
 
 **代码示例**：
 
 .. code-block:: python
 
-        import numpy as np
-        import paddle
-        import paddle.fluid as fluid
-
-        inputs = fluid.layers.data(name="x", shape = [3], dtype='float32')
-        output = paddle.sqrt(inputs)
-
-        exe = fluid.Executor(fluid.CPUPlace())
-        exe.run(fluid.default_startup_program())
-
-        img = np.array([0, 9, 36]).astype(np.float32)
-
-        res = exe.run(fluid.default_main_program(), feed={'x':img}, fetch_list=[output])
-        print(res)
-        # [array([0., 3., 6.], dtype=float32)]
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([0.1, 0.2, 0.3, 0.4])
+    x = paddle.to_variable(x_data)
+    out = paddle.sqrt(x)
+    print(out.numpy())
+    # [0.31622777 0.4472136  0.54772256 0.63245553]
diff --git a/doc/fluid/api_cn/tensor_cn/square_cn.rst b/doc/fluid/api_cn/tensor_cn/square_cn.rst
index 218ab86d35fab3f1adf099a4704a8d43b483553b..be30b04f93f3d5f874d7a79abb3d7182d18f8abe 100644
--- a/doc/fluid/api_cn/tensor_cn/square_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/square_cn.rst
@@ -2,6 +2,34 @@
 
 square
 -------------------------------
-:doc_source: paddle.fluid.layers.square
 
+.. py:function:: paddle.square(x,name=None)
 
+
+
+
+该OP执行逐元素取平方运算。
+
+.. math::
+    out = x^2
+
+参数:
+    - **x** (Tensor) - 输入的 `Tensor` ，数据类型为：float32、float64, float16, int32, int64。
+    - **name** (str，可选） - 操作的名称(可选，默认值为None）。更多信息请参见 :ref:`api_guide_Name` 。
+
+返回：
+    - Tensor，对输入x取平方后的Tensor，形状、数据类型与输入x一致。
+
+
+**代码示例**：
+
+.. code-block:: python
+
+    import numpy as np
+    import paddle
+    paddle.disable_static()
+    x_data = np.array([-0.4, -0.2, 0.1, 0.3])
+    x = paddle.to_tensor(x_data)
+    out = paddle.square(x)
+    print(out.numpy())
+    # [0.16 0.04 0.01 0.09]
diff --git a/doc/fluid/beginners_guide/dygraph/DyGraph.md b/doc/fluid/beginners_guide/dygraph/DyGraph.md
index 7115fc6f58febd55f4ae8a9edef3a67869f2cb5d..6d1089d50a84d0789a362ea5ae4068f8dcc2c414 100644
--- a/doc/fluid/beginners_guide/dygraph/DyGraph.md
+++ b/doc/fluid/beginners_guide/dygraph/DyGraph.md
@@ -55,8 +55,7 @@ import paddle
 from paddle.imperative import to_variable
 
 data = np.ones([2, 2], np.float32)
-#x = paddle.data(name='x', shape=[2,2], dtype='float32')
-x = paddle.nn.data(name='x', shape=[2,2], dtype='float32')
+x = paddle.static.data(name='x', shape=[2,2], dtype='float32')
 x += 10
 exe = paddle.Executor()
 exe.run(paddle.default_startup_program())
@@ -67,7 +66,7 @@ print("result", out)  #[[11, 11], [11, 11]]
 paddle.enable_imperative()
 x = paddle.imperative.to_variable(data)
 x += 10
-print('result', x.numpy())  #[[11, 11], [11, 11]]
+print('result', x.numpy())  #[[11, 11], [11, 11]]
 
 ```
 * 命令式编程下，所有操作在运行时就已经完成，更接近我们平时的编程方式，可以随时获取每一个操作的执行结果。
@@ -152,7 +151,7 @@ class SimpleImgConvPool(paddle.nn.Layer):
                  param_attr=None,
                  bias_attr=None):
         super(SimpleImgConvPool, self).__init__()
-        
+
         self._conv2d = Conv2D(
             num_channels=num_channels,
             num_filters=num_filters,
@@ -165,7 +164,7 @@ class SimpleImgConvPool(paddle.nn.Layer):
             bias_attr=None,
             act=act,
             use_cudnn=use_cudnn)
-        
+
         self._pool2d = Pool2D(
             pool_size=pool_size,
             pool_type=pool_type,
@@ -203,12 +202,12 @@ class MNIST(paddle.nn.Layer):
             1, 20, 5, 2, 2, act="relu")
         self._simple_img_conv_pool_2 = SimpleImgConvPool(
             20, 50, 5, 2, 2, act="relu")
-        
+
         self.pool_2_shape = 50 * 4 * 4
         SIZE = 10
         self.output_weight = self.create_parameter(
             [self.pool_2_shape, 10])
-    
+
     def forward(self, inputs, label=None):
         x = self._simple_img_conv_pool_1(inputs)
         x = self._simple_img_conv_pool_2(x)
@@ -275,25 +274,25 @@ adam = AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
 epoch_num = 5
 
 for epoch in range(epoch_num):
-	for batch_id, data in enumerate(train_reader()):
-		dy_x_data = np.array([x[0].reshape(1, 28, 28) for x in data]).astype('float32')
-		y_data = np.array([x[1] for x in data]).astype('int64').reshape(-1, 1)
-	
-		img = to_variable(dy_x_data)
-		label = to_variable(y_data)
-	
-		cost, acc = mnist(img, label)
-		
-		loss = paddle.nn.functional.cross_entropy(cost, label)
-		avg_loss = paddle.mean(loss)
-		avg_loss.backward()
-		adam.minimize(avg_loss)
-		mnist.clear_gradients()
-		
-		if batch_id % 100 == 0:
-			print("Loss at epoch {} step {}: {:}".format(
-				epoch, batch_id, avg_loss.numpy()))
-	
+    for batch_id, data in enumerate(train_reader()):
+        dy_x_data = np.array([x[0].reshape(1, 28, 28) for x in data]).astype('float32')
+        y_data = np.array([x[1] for x in data]).astype('int64').reshape(-1, 1)
+
+        img = to_variable(dy_x_data)
+        label = to_variable(y_data)
+
+        cost, acc = mnist(img, label)
+
+        loss = paddle.nn.functional.cross_entropy(cost, label)
+        avg_loss = paddle.mean(loss)
+        avg_loss.backward()
+        adam.minimize(avg_loss)
+        mnist.clear_gradients()
+
+        if batch_id % 100 == 0:
+            print("Loss at epoch {} step {}: {:}".format(
+                epoch, batch_id, avg_loss.numpy()))
+
 model_dict = mnist.state_dict()
 paddle.imperative.save(model_dict, "save_temp")
 ```
@@ -307,7 +306,7 @@ paddle.imperative.save(model_dict, "save_temp")
 model.eval()      #切换到评估模式
 model.train()     #切换到训练模式
 ```
- 
+
 
 模型评估测试的实现如下：
 * 首先定义 MNIST 类的对象 mnist_eval，然后通过 [load_dygraph](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/load_dygraph_cn.html#load-dygraph) 接口加载保存好的模型参数，通过 [Layer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#layer) 的 [set_dict](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#set_dict) 接口将参数导入到模型中，通过 [Layer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#layer) 的 eval 接口切换到预测评估模式。
@@ -316,7 +315,7 @@ model.train()     #切换到训练模式
 
 ```python
 paddle.enable_imperative()
-mnist_eval = MNIST() 
+mnist_eval = MNIST()
 model_dict, _ = paddle.imperative.load("save_temp")
 mnist_eval.set_dict(model_dict)
 print("checkpoint loaded")
@@ -326,21 +325,21 @@ mnist_eval.eval()
 acc_set = []
 avg_loss_set = []
 for batch_id, data in enumerate(test_reader()):
-	dy_x_data = np.array([x[0].reshape(1, 28, 28)
-						  for x in data]).astype('float32')
-	y_data = np.array(
-		[x[1] for x in data]).astype('int64').reshape(-1, 1)
-		
-	img = to_variable(dy_x_data)
-	label = to_variable(y_data)
-
-	prediction, acc = mnist_eval(img, label)
-	
-	loss = paddle.nn.functional.cross_entropy(input=prediction, label=label)
-	avg_loss = paddle.mean(loss)
-	acc_set.append(float(acc.numpy()))
-	avg_loss_set.append(float(avg_loss.numpy()))
-	
+    dy_x_data = np.array([x[0].reshape(1, 28, 28)
+                          for x in data]).astype('float32')
+    y_data = np.array(
+        [x[1] for x in data]).astype('int64').reshape(-1, 1)
+
+    img = to_variable(dy_x_data)
+    label = to_variable(y_data)
+
+    prediction, acc = mnist_eval(img, label)
+
+    loss = paddle.nn.functional.cross_entropy(input=prediction, label=label)
+    avg_loss = paddle.mean(loss)
+    acc_set.append(float(acc.numpy()))
+    avg_loss_set.append(float(avg_loss.numpy()))
+
 acc_val_mean = np.array(acc_set).mean()
 avg_loss_val_mean = np.array(avg_loss_set).mean()
 print("Eval avg_loss is: {}, acc is: {}".format(avg_loss_val_mean, acc_val_mean))
@@ -351,9 +350,9 @@ print("Eval avg_loss is: {}, acc is: {}".format(avg_loss_val_mean, acc_val_mean)
 在命令式编程下，模型和优化器在不同的模块中，所以模型和优化器分别在不同的对象中存储，使得模型参数和优化器信息需分别存储。
 因此模型的保存需要单独调用模型和优化器中的 state_dict() 接口，同样模型的加载也需要单独进行处理。
 
-保存模型 ： 
+保存模型 ：
 1. 保存模型参数：首先通过 minist.state_dict 函数获取 mnist 网络的所有参数，然后通过 paddle.imperative.save 函数将获得的参数保存至以 save_path 为前缀的文件中。
-1. 保存优化器信息：首先通过 adam.state_dict 函数获取 adam 优化器的信息，然后通过  paddle.imperative.save 函数将获得的参数保存至以 save_path 为前缀的文件中。 
+1. 保存优化器信息：首先通过 adam.state_dict 函数获取 adam 优化器的信息，然后通过  paddle.imperative.save 函数将获得的参数保存至以 save_path 为前缀的文件中。
    * [Layer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#layer) 的 [state_dict](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/Layer_cn.html#state_dict) 接口：该接口可以获取当前层及其子层的所有参数，并将参数存放在 dict 结构中。
    * [Optimizer](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/optimizer_cn/AdamOptimizer_cn.html#adamoptimizer) 的 state_dict 接口：该接口可以获取优化器的信息，并将信息存放在 dict 结构中。其中包含优化器使用的所有变量，例如对于 Adam 优化器，包括 beta1、beta2、momentum 等信息。注意如果该优化器的 minimize 函数没有被调用过，则优化器的信息为空。
    * [paddle.imperative.save](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/dygraph_cn/save_dygraph_cn.html#save-dygraph) 接口：该接口将传入的参数或优化器的 dict 保存到磁盘上。
@@ -363,7 +362,7 @@ print("Eval avg_loss is: {}, acc is: {}".format(avg_loss_val_mean, acc_val_mean)
 # 保存优化器信息
 2. paddle.imperative.save(adam.state_dict(), "save_path")
 ```
-加载模型： 
+加载模型：
 1. 通过 paddle.imperative.load 函数获取模型参数信息 model_state 和优化器信息 opt_state；
 1. 通过 mnist.set_dict 函数用获取的模型参数信息设置 mnist 网络的参数
 1. 通过 adam.set_dict 函数用获取的优化器信息设置 adam 优化器信息。
@@ -406,35 +405,35 @@ adam = AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
 mnist = paddle.imperative.DataParallel(mnist, strategy)
 
 train_reader = paddle.batch(
-	paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
+    paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
 train_reader = paddle.incubate.reader.distributed_batch_reader(
-	train_reader)
+    train_reader)
 
 for epoch in range(epoch_num):
-	for batch_id, data in enumerate(train_reader()):
-		dy_x_data = np.array([x[0].reshape(1, 28, 28)
-							  for x in data]).astype('float32')
-		y_data = np.array(
-			[x[1] for x in data]).astype('int64').reshape(-1, 1)
-
-		img = to_variable(dy_x_data)
-		label = to_variable(y_data)
-		label.stop_gradient = True
-		
-		cost, acc = mnist(img, label)
-		
-		loss = paddle.nn.functional.cross_entropy(cost, label)
-		avg_loss = paddle.mean(loss)
-		
-		avg_loss = mnist.scale_loss(avg_loss)
-		avg_loss.backward()
-		mnist.apply_collective_grads()
-		
-		adam.minimize(avg_loss)
-		mnist.clear_gradients()
-		
-		if batch_id % 100 == 0 and batch_id is not 0:
-			print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
+    for batch_id, data in enumerate(train_reader()):
+        dy_x_data = np.array([x[0].reshape(1, 28, 28)
+                              for x in data]).astype('float32')
+        y_data = np.array(
+            [x[1] for x in data]).astype('int64').reshape(-1, 1)
+
+        img = to_variable(dy_x_data)
+        label = to_variable(y_data)
+        label.stop_gradient = True
+
+        cost, acc = mnist(img, label)
+
+        loss = paddle.nn.functional.cross_entropy(cost, label)
+        avg_loss = paddle.mean(loss)
+
+        avg_loss = mnist.scale_loss(avg_loss)
+        avg_loss.backward()
+        mnist.apply_collective_grads()
+
+        adam.minimize(avg_loss)
+        mnist.clear_gradients()
+
+        if batch_id % 100 == 0 and batch_id is not 0:
+            print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
 
 if paddle.imperative.ParallelEnv().local_rank == 0:
     paddle.imperative.save(mnist.state_dict(),  "work_0")
@@ -477,7 +476,7 @@ trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171 , node_id: 0 , current_node_ip
 总结一下，多卡训练相比单卡训练，有如下步骤不同：
 1. 通过 ParallelEnv() 的 dev_id 设置程序运行的设备。
 ```
-place = paddle.CUDAPlace(paddle.imperative.ParallelEnv().dev_id) 
+place = paddle.CUDAPlace(paddle.imperative.ParallelEnv().dev_id)
 paddle.enable_imperative(place):
 ```
 2. 准备多卡环境。
@@ -511,7 +510,7 @@ mnist.apply_collective_grads()
 和单卡不同，多卡训练时需逐个进程执行保存操作，多个进程同时保存会使模型文件格式出错。
 ```
 if paddle.imperative.ParallelEnv().local_rank == 0：
-	paddle.imperative.save(mnist.state_dict(), "worker_0")
+    paddle.imperative.save(mnist.state_dict(), "worker_0")
 ```
 7. 评估测试。
 
@@ -532,18 +531,18 @@ if paddle.imperative.ParallelEnv().local_rank == 0：
 
 
 ```python
-from paddle.imperative import TracedLayer
-
-paddle.enable_imperative()
-# 定义MNIST类的对象
-mnist = MNIST()
-in_np = np.random.random([10, 1, 28, 28]).astype('float32')
-# 将numpy的ndarray类型的数据转换为Variable类型
-input_var = paddle.imperative.to_variable(in_np)
-# 通过 TracerLayer.trace 接口将命令式模型转换为声明式模型
-out_dygraph, static_layer = TracedLayer.trace(mnist, inputs=[input_var])
-save_dirname = './saved_infer_model'
-# 将转换后的模型保存
+from paddle.imperative import TracedLayer
+
+paddle.enable_imperative()
+# 定义MNIST类的对象
+mnist = MNIST()
+in_np = np.random.random([10, 1, 28, 28]).astype('float32')
+# 将numpy的ndarray类型的数据转换为Variable类型
+input_var = paddle.imperative.to_variable(in_np)
+# 通过 TracerLayer.trace 接口将命令式模型转换为声明式模型
+out_dygraph, static_layer = TracedLayer.trace(mnist, inputs=[input_var])
+save_dirname = './saved_infer_model'
+# 将转换后的模型保存
 static_layer.save_inference_model(save_dirname, feed=[0], fetch=[0])
 ```
 
@@ -573,9 +572,9 @@ in_np = np.array([-2]).astype('int')
 input_var = paddle.imperative.to_variable(in_np)
 # if判断与输入input_var的shape有关
 if input_var.shape[0] > 1:
-	print("input_var's shape[0] > 1")
+    print("input_var's shape[0] > 1")
 else:
-	print("input_var's shape[1] < 1")
+    print("input_var's shape[1] < 1")
 ```
 
 * 针对依赖数据的控制流，解决流程如下 1. 添加declarative装饰器； 2. 利用ProgramTranslator进行转换
@@ -584,10 +583,10 @@ else:
 首先需要对给MNist类的forward函数添加一个declarative 装饰器，来标记需要转换的代码块，（注：需要在最外层的class的forward函数中添加）
 
 ```python
-from paddle.imperative import declarative
-
-# 定义MNIST网络，必须继承自paddle.nn.Layer
-# 该网络由两个SimpleImgConvPool子网络、reshape层、matmul层、softmax层、accuracy层组成
+from paddle.imperative import declarative
+
+# 定义MNIST网络，必须继承自paddle.nn.Layer
+# 该网络由两个SimpleImgConvPool子网络、reshape层、matmul层、softmax层、accuracy层组成
 class MNIST(paddle.nn.Layer):
     def __init__(self):
         super(MNIST, self).__init__()
@@ -595,13 +594,13 @@ class MNIST(paddle.nn.Layer):
             1, 20, 5, 2, 2, act="relu")
         self._simple_img_conv_pool_2 = SimpleImgConvPool(
             20, 50, 5, 2, 2, act="relu")
-        
+
         self.pool_2_shape = 50 * 4 * 4
         SIZE = 10
         self.output_weight = self.create_parameter(
             [self.pool_2_shape, 10])
-    
-	@declarative
+
+    @declarative
     def forward(self, inputs, label=None):
         x = self._simple_img_conv_pool_1(inputs)
         x = self._simple_img_conv_pool_2(x)
@@ -612,8 +611,8 @@ class MNIST(paddle.nn.Layer):
             acc = paddle.metric.accuracy(input=x, label=label)
             return x, acc
         else:
-            return x
-			
+            return x
+
 ```
 
 
@@ -622,19 +621,19 @@ class MNIST(paddle.nn.Layer):
 
 
 ```python
-import paddle
-
-paddle.enable_imperative()
-prog_trans = paddle.imperative.ProgramTranslator()
-mnist = MNIST()
-
-in_np = np.random.random([10, 1, 28, 28]).astype('float32')
-label_np = np.random.randint(0, 10, size=(10,1)).astype( "int64")
-input_var = paddle.imperative.to_variable(in_np)
-label_var = paddle.imperative.to_variable(label_np)
-
-out = mnist( input_var, label_var)
-
+import paddle
+
+paddle.enable_imperative()
+prog_trans = paddle.imperative.ProgramTranslator()
+mnist = MNIST()
+
+in_np = np.random.random([10, 1, 28, 28]).astype('float32')
+label_np = np.random.randint(0, 10, size=(10,1)).astype( "int64")
+input_var = paddle.imperative.to_variable(in_np)
+label_var = paddle.imperative.to_variable(label_np)
+
+out = mnist( input_var, label_var)
+
 prog_trans.save_inference_model("./mnist_dy2stat", fetch=[0,1])
 ```
 
@@ -654,13 +653,13 @@ class MNIST(paddle.nn.Layer):
             1, 20, 5, 2, 2, act="relu")
         self._simple_img_conv_pool_2 = SimpleImgConvPool(
             20, 50, 5, 2, 2, act="relu")
-        
+
         self.pool_2_shape = 50 * 4 * 4
         SIZE = 10
         self.output_weight = self.create_parameter(
             [self.pool_2_shape, 10])
-    
-	@declarative
+
+    @declarative
     def forward(self, inputs, label=None):
         x = self._simple_img_conv_pool_1(inputs)
         x = self._simple_img_conv_pool_2(x)
@@ -672,7 +671,7 @@ class MNIST(paddle.nn.Layer):
             return x, acc
         else:
             return x
-			
+
 ```
 
 
@@ -685,7 +684,7 @@ class MNIST(paddle.nn.Layer):
 
 ```
 x = y * 10
-print(x.numpy()) 
+print(x.numpy())
 ```
 
 来直接打印变量的值
diff --git a/doc/fluid/beginners_guide/index_cn.rst b/doc/fluid/beginners_guide/index_cn.rst
index c8d6d26db13464b4f6f33511bbc1c720ec649a5e..7f751487e98a40297c272bf1ba13077a61161953 100644
--- a/doc/fluid/beginners_guide/index_cn.rst
+++ b/doc/fluid/beginners_guide/index_cn.rst
@@ -34,8 +34,8 @@
     import numpy
     import paddle
     # 定义输入数据占位符
-    a = paddle.nn.data(name="a", shape=[1], dtype='int64')
-    b = paddle.nn.data(name="b", shape=[1], dtype='int64')
+    a = paddle.static.data(name="a", shape=[1], dtype='int64')
+    b = paddle.static.data(name="b", shape=[1], dtype='int64')
     # 组建网络（此处网络仅由一个操作构成，即elementwise_add）
     result = paddle.elementwise_add(a, b)
     # 准备运行网络
diff --git a/scripts/api_white_list.txt b/scripts/api_white_list.txt
index 778fa2be2de1e3e526a6f284e3b8f7e9d0eb34fc..b5f42e56586ee5f4c1042d2dd4d710af31828e8b 100644
--- a/scripts/api_white_list.txt
+++ b/scripts/api_white_list.txt
@@ -7,3 +7,4 @@ transpiler_cn/release_memory_cn.rst
 transpiler_cn/RoundRobin_cn.rst
 optimizer_cn/Dpsgd_cn.rst
 io_cn/ComposeNotAligned_cn.rst
+dygraph_cn/DataParallel_cn.rst
\ No newline at end of file