Merge branch 'develop' of https://github.com/PaddlePaddle/FluidDoc into executor

aca8fcc0 · gfwm0502 · 22861a9e · 17824675 · aca8fcc0 · aca8fcc0
97 changed file
--- a/doc/fluid/advanced_guide/addon_development/new_op/custom_op.md
+++ b/doc/fluid/advanced_guide/addon_development/new_op/custom_op.md
@@ -77,14 +77,12 @@ class Relu2GradMaker : public framework::SingleGradOpMaker<T> {
 public:
  using framework::SingleGradOpMaker<T>::SingleGradOpMaker;
-  std::unique_ptr<T> Apply() const override {
+  void Apply(GradOpPtr<T> op) const override {
-    auto* op = new T();
    op->SetType("relu2_grad");
    op->SetInput("Y", this->Output("Y"));
    op->SetInput(framework::GradVarName("Y"), this->OutputGrad("Y"));
    op->SetAttrMap(this->Attrs());
    op->SetOutput(framework::GradVarName("X"), this->InputGrad("X"));
-    return std::unique_ptr<T>(op);
  }
 };
@@ -142,7 +140,7 @@ REGISTER_OP_CPU_KERNEL(relu2_grad,
-ReLU OP的GPU实现， ``relu_op.cc`` 文件:
+ReLU OP的GPU实现， ``relu_op.cu`` 文件:
 ```
 // relu_op.cu
@@ -272,8 +270,8 @@ g++ relu_op.cc relu_op.cu.o -o relu2_op.so -shared -fPIC -std=c++11 -O3 -DPADDLE
 注意点:
-1. NVCC编译GPU OP的cu文件时，需要加 `-DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO` 。
+1. 通过NVCC编译CUDA源文件时，需要加编译选项 `-DPADDLE_WITH_CUDA -DEIGEN_USE_GPU -DPADDLE_USE_DSO`，在框架源码中会使用这些宏定义进行条件编译。用户自定义的C++ OP实现编译时，选项的开启状态需要和核心框架编译行为一致。如`EIGEN_USE_GPU`是使用Eigen数学库的GPU实现时需要增加的编译选项。
-2. 如果安装的PaddlePaddle不包含MKLDNN，则需要去掉编译选项`-DPADDLE_WITH_MKLDNN`。默认的安装包已包含MKLDNN。
+2. 如果飞桨安装包中不包含MKLDNN库，则需要去掉编译选项`-DPADDLE_WITH_MKLDNN`。核心框架源码中(比如tensor.h)有使用此宏定义进行条件编译，该选项是否打开同样需要和核心框架编译行为保持一致。默认的飞桨安装包中含有MKLDNN库。
 3. 可多个OP编译到同一个动态库中。
 4. 通过pip方式安装的PaddlePaddle由GCC 4.8编译得到，由于GCC 4.8和GCC 5以上**C++11 ABI不兼容**，您编写的自定义OP，需要通过GCC 4.8编译。若是GCC 5及以上的环境上使用自定义OP，推荐使用[Docker安装PaddlePaddle](https://www.paddlepaddle.org.cn/install/doc/docker)，使得编Paddle和编译自定义OP的GCC版本相同。
@@ -333,6 +331,11 @@ np.allclose(out, np.maximum(x,0.))
 ## FAQ
-1. Q:如果出现类似错误: cannot open shared object file: No such file or directory.
+1. Q: 如果出现类似错误: `relu2_op.so: cannot open shared object file: No such file or directory` 以及 `libpaddle_framework.so: cannot open shared object file: No such file or directory`。
-   A:  需要设置动态库的路径到环境变量LD_LIBRARY_PATH中。
+   A: 需要将`relu2_op.so`所在路径以及`libpaddle_framework.so`路径(即`paddle.sysconfig.get_lib()`得到路径)设置到环境变量LD_LIBRARY_PATH中:
+     ``` 
+      # 假如relu2_op.so路径是：`paddle/test`，对于Linux环境设置:
+      export LD_LIBRARY_PATH=paddle/test:$( python -c 'import paddle; print(paddle.sysconfig.get_lib())'):$LD_LIBRARY_PATH
+     ```
--- a/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.rst
+++ b/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.rst
@@ -7,15 +7,15 @@
 -------------
 ..  csv-table:: 
-    :header: "版本说明", "预测库(1.7.2版本)", "预测库(develop版本)"
+    :header: "版本说明", "预测库(1.8.0版本)", "预测库(develop版本)"
    :widths: 3, 2, 2
-    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
+    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
    "nv-jetson-cuda10-cudnn7.5-trt5", "`fluid_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/1.7.1-nv-jetson-cuda10-cudnn7.5-trt5/fluid_inference.tar.gz>`_", 
@@ -39,19 +39,14 @@ WITH_NV_JETSON                OFF            在NV Jetson硬件上编译时需
 建议按照推荐值设置，以避免链接不必要的库。其它可选编译选项按需进行设定。
-首先从github拉取最新代码并安装nccl
+首先从github拉取最新代码
 .. code-block:: bash
  git clone https://github.com/paddlepaddle/paddle
  # 建议使用git checkout切换到Paddle稳定的版本，如：
-  git checkout v1.6.2
+  git checkout v1.7.2
-  git clone https://github.com/NVIDIA/nccl.git
-  make -j4
-  make install
-**note**： 单卡机器上不会用到nccl但仍存在依赖， 后续会考虑将此依赖去除。
 **Server端预测库源码编译**
@@ -164,28 +159,21 @@ NVIDIA Jetson是NVIDIA推出的嵌入式AI平台，Paddle Inference支持在 NVI
     │       ├── libpaddle_fluid.a
     │       └── libpaddle_fluid.so
     ├── third_party
-     │   ├── boost
-     │   │   └── boost
-     │   ├── eigen3
-     │   │   ├── Eigen
-     │   │   └── unsupported
     │   └── install
     │       ├── gflags
     │       ├── glog
     │       ├── mkldnn
     │       ├── mklml
-     │       ├── protobuf
+     │       └── protobuf
-     │       ├── xxhash
-     │       └── zlib
     └── version.txt
 version.txt 中记录了该预测库的版本信息，包括Git Commit ID、使用OpenBlas或MKL数学库、CUDA/CUDNN版本号，如：
  .. code-block:: text
-     GIT COMMIT ID: cc9028b90ef50a825a722c55e5fda4b7cd26b0d6
+     GIT COMMIT ID: 0231f58e592ad9f673ac1832d8c495c8ed65d24f
     WITH_MKL: ON
     WITH_MKLDNN: ON
     WITH_GPU: ON
-     CUDA version: 8.0
+     CUDA version: 10.1
     CUDNN version: v7
--- a/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_en.rst
+++ b/doc/fluid/advanced_guide/inference_deployment/inference/build_and_install_lib_en.rst
@@ -7,15 +7,15 @@ Direct Download and Installation
 ---------------------------------
 ..  csv-table:: c++ inference library list
-    :header: "version description", "inference library(1.7.2 version)", "inference library(develop version)"
+    :header: "version description", "inference library(1.8.0 version)", "inference library(develop version)"
    :widths: 3, 2, 2
-    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-cpu-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-cpu-avx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
+    "ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-cpu-noavx-openblas/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
+    "ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_"
-    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.7.2-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
+    "ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.0-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", 
    "nv-jetson-cuda10-cudnn7.5-trt5", "`fluid_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/1.7.1-nv-jetson-cuda10-cudnn7.5-trt5/fluid_inference.tar.gz>`_", 
 Build from Source Code

--- a/doc/fluid/advanced_guide/inference_deployment/inference/windows_cpp_inference.md
+++ b/doc/fluid/advanced_guide/inference_deployment/inference/windows_cpp_inference.md
@@ -5,13 +5,13 @@
 下载安装包与对应的测试环境
 -------------
-| 版本说明      |     预测库(1.7.2版本)     |       编译器        |    构建工具      |  cuDNN  |  CUDA  |
+| 版本说明      |     预测库(1.8.0版本)     |       编译器        |    构建工具      |  cuDNN  |  CUDA  |
 |:---------|:-------------------|:-------------------|:----------------|:--------|:-------|
-|    cpu_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/mkl/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3|  CMake v3.16.0  |
+|    cpu_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/mkl/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3|  CMake v3.16.0  |
-|    cpu_avx_openblas | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/open/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3|  CMake v3.16.0  |
+|    cpu_avx_openblas | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/open/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3|  CMake v3.16.0  |
-|    cuda9.0_cudnn7_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/mkl/post97/fluid_inference_install_dir.zip) |  MSVC 2015 update 3 |  CMake v3.16.0  |  7.4.1  |   9.0    |
+|    cuda9.0_cudnn7_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/mkl/post97/fluid_inference_install_dir.zip) |  MSVC 2015 update 3 |  CMake v3.16.0  |  7.4.1  |   9.0    |
-|    cuda9.0_cudnn7_avx_openblas | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/open/post97/fluid_inference_install_dir.zip) | MSVC 2015 update 3 |  CMake v3.16.0  |  7.4.1  |   9.0    |
+|    cuda9.0_cudnn7_avx_openblas | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/open/post97/fluid_inference_install_dir.zip) | MSVC 2015 update 3 |  CMake v3.16.0  |  7.4.1  |   9.0    |
-|    cuda10.0_cudnn7_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/mkl/post107/fluid_inference_install_dir.zip) | MSVC 2015 update 3 |  CMake v3.16.0  |  7.5.0  |   10.0    |
+|    cuda10.0_cudnn7_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/mkl/post107/fluid_inference_install_dir.zip) | MSVC 2015 update 3 |  CMake v3.16.0  |  7.5.0  |   10.0    |
 ### 硬件环境

--- a/doc/fluid/advanced_guide/inference_deployment/inference/windows_cpp_inference_en.md
+++ b/doc/fluid/advanced_guide/inference_deployment/inference/windows_cpp_inference_en.md
@@ -5,13 +5,13 @@ Install and Compile C++ Inference Library on Windows
 Direct Download and Install
 -------------
-| Version      |     Inference Libraries(v1.7.2)   | Compiler | Build tools | cuDNN | CUDA |
+| Version      |     Inference Libraries(v1.8.0)   | Compiler | Build tools | cuDNN | CUDA |
 |:---------|:-------------------|:-------------------|:----------------|:--------|:-------|
-|    cpu_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/mkl/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3|  CMake v3.16.0  |
+|    cpu_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/mkl/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3|  CMake v3.16.0  |
-|    cpu_avx_openblas | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/open/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3|  CMake v3.16.0  |
+|    cpu_avx_openblas | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/open/cpu/fluid_inference_install_dir.zip) | MSVC 2015 update 3|  CMake v3.16.0  |
-|    cuda9.0_cudnn7_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/mkl/post97/fluid_inference_install_dir.zip) |  MSVC 2015 update 3 |  CMake v3.16.0  |  7.4.1  |   9.0    |
+|    cuda9.0_cudnn7_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/mkl/post97/fluid_inference_install_dir.zip) |  MSVC 2015 update 3 |  CMake v3.16.0  |  7.4.1  |   9.0    |
-|    cuda9.0_cudnn7_avx_openblas | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/open/post97/fluid_inference_install_dir.zip) | MSVC 2015 update 3 |  CMake v3.16.0  |  7.4.1  |   9.0    |
+|    cuda9.0_cudnn7_avx_openblas | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/open/post97/fluid_inference_install_dir.zip) | MSVC 2015 update 3 |  CMake v3.16.0  |  7.4.1  |   9.0    |
-|    cuda10.0_cudnn7_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.7.2/win-infer/mkl/post107/fluid_inference_install_dir.zip) | MSVC 2015 update 3 |  CMake v3.16.0  |  7.5.0  |   10.0    |
+|    cuda10.0_cudnn7_avx_mkl | [fluid_inference.zip](https://paddle-wheel.bj.bcebos.com/1.8.0/win-infer/mkl/post107/fluid_inference_install_dir.zip) | MSVC 2015 update 3 |  CMake v3.16.0  |  7.5.0  |   10.0    |
 ### Hardware Environment

--- a/doc/fluid/advanced_guide/performance_improving/device_switching/device_switching.md
+++ b/doc/fluid/advanced_guide/performance_improving/device_switching/device_switching.md
+# 运行时设备切换
+Paddle提供了[fluid.CUDAPlace](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/CUDAPlace_cn.html)以及[fluid.CPUPlace](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/CPUPlace_cn.html)用于指定运行时的设备。这两个接口用于指定全局的设备，从1.8版本开始，Paddle提供了[device_guard](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/fluid_cn/device_guard_cn.html)接口，用于指定部分OP的运行设备，此教程会介绍device_guard的使用场景，以及如何使用该接口对模型进行优化。
+如果使用了`fluid.CUDAPlace`设置了全局的执行设备，框架将尽可能地将OP设置在GPU上执行，因此有可能会遇到显存不够的情况。`device_guard`可以用于设置OP的执行设备，如果将部分层设置在CPU上运行，就能够充分利用CPU大内存的优势，避免显存超出。
+有时尽管指定了全局的执行设备为GPU，但框架在自动分配OP执行设备时，可能会将部分OP设置在CPU上执行。另外，个别OP会将输出存储在CPU上。在以上的场景中，常常会发生不同设备间的数据传输，可能会影响模型的性能。使用`device_guard`可以避免模型运行中不必要的数据传输。在下面的内容中，将会详细介绍如何通过[profile](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/profiler_cn.html)工具分析数据传输开销，以及如何使用`device_guard`避免不必要的数据传输，从而提升模型性能。
+## 如何避免显存超出
+下面示例代码中的`embedding`层，其参数`size`包含两个元素，第一个元素为`vocab_size` (词表大小), 第二个为`emb_size`（`embedding`层维度）。实际场景中，词表可能会非常大。示例代码中，词表大小被设置为10000000。如果在GPU模式下运行，该层创建的权重矩阵的大小为(10000000, 150)，仅这一层就需要5.59G的显存，如果词表大小继续增加，极有可能会导致显存超出。
+```python
+import paddle.fluid as fluid
+data = fluid.layers.fill_constant(shape=[1], value=128, dtype='int64')
+label = fluid.layers.fill_constant(shape=[1, 150], value=0.5, dtype='float32')
+emb = fluid.embedding(input=data, size=(10000000, 150), dtype='float32')
+out = fluid.layers.l2_normalize(x=emb, axis=-1)
+cost = fluid.layers.square_error_cost(input=out, label=label)
+avg_cost = fluid.layers.mean(cost)
+sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
+sgd_optimizer.minimize(avg_cost)
+place = fluid.CUDAPlace(0)
+exe = fluid.Executor(place)
+exe.run(fluid.default_startup_program())
+result = exe.run(fluid.default_main_program(), fetch_list=[avg_cost])
+```
+`embedding`是根据`input`中的`id`信息从`embedding`矩阵中查询对应`embedding`信息，在CPU上进行计算，其速度也是可接受的。因此，可以参考如下代码，使用`device_guard`将`embedding`层设置在CPU上，以利用CPU内存资源。那么，除了`embedding`层，其他各层都会在GPU上运行。
+```python
+import paddle.fluid as fluid
+data = fluid.layers.fill_constant(shape=[1], value=128, dtype='int64')
+label = fluid.layers.fill_constant(shape=[1, 150], value=0.5, dtype='float32')
+with fluid.device_guard("cpu"):
+    emb = fluid.embedding(input=data, size=(10000000, 150), dtype='float32')
+out = fluid.layers.l2_normalize(x=emb, axis=-1)
+cost = fluid.layers.square_error_cost(input=out, label=label)
+avg_cost = fluid.layers.mean(cost)
+sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
+sgd_optimizer.minimize(avg_cost)
+place = fluid.CUDAPlace(0)
+exe = fluid.Executor(place)
+exe.run(fluid.default_startup_program())
+result = exe.run(fluid.default_main_program(), fetch_list=[avg_cost])
+```
+在显存足够的情况下，可不必进行这样的设置。
+## 如何减少数据传输
+### 使用profile工具确认是否发生了数据传输
+首先对模型的性能数据进行分析，找到发生数据传输的原因。如下列代码所示，可以利用[profile](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/profiler_cn.html)工具进行分析。
+```python
+import paddle.fluid as fluid
+import paddle.fluid.profiler as profiler
+data1 = fluid.layers.fill_constant(shape=[1, 3, 8, 8], value=0.5, dtype='float32')
+data2 = fluid.layers.fill_constant(shape=[1, 3, 5, 5], value=0.5, dtype='float32')
+shape = fluid.layers.shape(data2) 
+shape = fluid.layers.slice(shape, axes=[0], starts=[0], ends=[4]) 
+out = fluid.layers.crop_tensor(data1, shape=shape) 
+place = fluid.CUDAPlace(0) 
+exe = fluid.Executor(place)
+exe.run(fluid.default_startup_program())
+with profiler.profiler('All', 'total') as prof:
+    for i in range(10):
+        result = exe.run(fetch_list=[out])
+```
+在程序运行结束后，将会自动地打印出profile report。在下面的profile report中，可以看到	`GpuMemCpy Summary`中给出了2项数据传输的调用耗时。在OP执行过程中，如果输入Tensor所在的设备与OP执行的设备不同，就会发生`GpuMemcpySync`，通常我们可以直接优化的就是这一项。进一步分析，可以看到`slice`和`crop_tensor`执行中都发生了`GpuMemcpySync`。尽管我们在程序中设置了GPU模式运行，但是框架中有些OP，例如shape，会将输出结果放在CPU上。
+```text
+------------------------->     Profiling Report     <-------------------------
+Note! This Report merge all thread info into one.
+Place: All
+Time unit: ms
+Sorted by event first end time in descending order in the same thread
+Total time: 24.0922
+  Computation time       Total: 3.60143     Ratio: 14.9485%
+  Framework overhead     Total: 20.4908     Ratio: 85.0515%
+-------------------------     GpuMemCpy Summary     -------------------------
+GpuMemcpy                Calls: 30          Total: 1.44377     Ratio: 5.99267%
+  GpuMemcpyAsync         Calls: 10          Total: 0.459803    Ratio: 1.90851%
+  GpuMemcpySync          Calls: 20          Total: 0.983967    Ratio: 4.08416%
+-------------------------       Event Summary       -------------------------
+Event                                                       Calls       Total       CPU Time (Ratio)        GPU Time (Ratio)        Min.        Max.        Ave.        Ratio.
+fill_constant                                               20          2.03147     1.995597 (0.982342)     0.035872 (0.017658)     0.064199    0.379822    0.101573    0.0843204
+shape                                                       10          0.466503    0.466503 (1.000000)     0.000000 (0.000000)     0.021165    0.207393    0.0466503   0.0193632
+eager_deletion                                              30          0.28398     0.283980 (1.000000)     0.000000 (0.000000)     0.004668    0.028065    0.009466    0.0117872
+slice                                                       10          1.53533     1.505664 (0.980679)     0.029664 (0.019321)     0.1312      0.259446    0.153533    0.0637271
+  GpuMemcpySync:CPU->GPU                                    10          0.41714     0.408532 (0.979364)     0.008608 (0.020636)     0.038545    0.054022    0.041714    0.0173143
+crop_tensor                                                 10          1.49584     1.438558 (0.961707)     0.057280 (0.038293)     0.129106    0.246395    0.149584    0.0620879
+  GpuMemcpySync:GPU->CPU                                    10          0.566827    0.543787 (0.959353)     0.023040 (0.040647)     0.047598    0.097705    0.0566827   0.0235274
+Fetch                                                       10          0.921333    0.897141 (0.973742)     0.024192 (0.026258)     0.077059    0.177223    0.0921333   0.0382419
+  GpuMemcpyAsync:GPU->CPU                                   10          0.459803    0.435611 (0.947386)     0.024192 (0.052614)     0.039321    0.073849    0.0459803   0.0190851
+ParallelExecutor::Run                                       10          17.3578     17.345797 (0.999309)    0.012000 (0.000691)     0.705361    10.3389     1.73578     0.720472
+  InitLocalVars                                             1           0.084954    0.084954 (1.000000)     0.000000 (0.000000)     0.084954    0.084954    0.084954    0.0035262
+  ScopeBufferedMonitor::pre_local_exec_scopes_process       10          0.040771    0.040771 (1.000000)     0.000000 (0.000000)     0.003653    0.00543     0.0040771   0.00169229
+  FastThreadedSSAGraphExecutorPrepare                       10          8.64291     8.630914 (0.998612)     0.012000 (0.001388)     0.033383    8.29818     0.864291    0.358743
+  ScopeBufferedMonitor::post_local_exec_scopes_process      10          0.252618    0.252618 (1.000000)     0.000000 (0.000000)     0.022696    0.041439    0.0252618   0.0104854
+```
+### 通过log查看发生数据传输的具体位置
+以上的示例程序比较简单，我们只用看profile report就能知道具体是哪些算子发生了数据传输。但是当模型比较复杂时，可能需要去查看更加详细的调试信息，可以打印出运行时的log去确定发生数据传输的具体位置。依然以上述程序为例，执行`GLOG_vmodule=operator=3 python test_case.py`，会得到如下log信息，会发现发生了2次数据传输：
+- `shape`输出的结果在CPU上，在`slice`运行时，`shape`的输出被拷贝到GPU上
+- `slice`执行完的结果在GPU上，当`crop_tensor`执行时，它会被拷贝到CPU上。
+```text
+I0406 14:56:23.286592 17516 operator.cc:180] CUDAPlace(0) Op(shape), inputs:{Input[fill_constant_1.tmp_0:float[1, 3, 5, 5]({})]}, outputs:{Out[shape_0.tmp_0:int[4]({})]}.
+I0406 14:56:23.286628 17516 eager_deletion_op_handle.cc:107] Erase variable fill_constant_1.tmp_0 on CUDAPlace(0)
+I0406 14:56:23.286725 17516 operator.cc:1210] Transform Variable shape_0.tmp_0 from data_type[int]:data_layout[NCHW]:place[CPUPlace]:library_type[PLAIN] to data_type[int]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN]
+I0406 14:56:23.286763 17516 scope.cc:169] Create variable shape_0.tmp_0
+I0406 14:56:23.286784 17516 data_device_transform.cc:21] DeviceTransform in, src_place CPUPlace dst_place: CUDAPlace(0)
+I0406 14:56:23.286867 17516 tensor_util.cu:129] TensorCopySync 4 from CPUPlace to CUDAPlace(0)
+I0406 14:56:23.287099 17516 operator.cc:180] CUDAPlace(0) Op(slice), inputs:{EndsTensor[], EndsTensorList[], Input[shape_0.tmp_0:int[4]({})], StartsTensor[], StartsTensorList[]}, outputs:{Out[slice_0.tmp_0:int[4]({})]}.
+I0406 14:56:23.287140 17516 eager_deletion_op_handle.cc:107] Erase variable shape_0.tmp_0 on CUDAPlace(0)
+I0406 14:56:23.287220 17516 tensor_util.cu:129] TensorCopySync 4 from CUDAPlace(0) to CPUPlace
+I0406 14:56:23.287473 17516 operator.cc:180] CUDAPlace(0) Op(crop_tensor), inputs:{Offsets[], OffsetsTensor[], Shape[slice_0.tmp_0:int[4]({})], ShapeTensor[], X[fill_constant_0.tmp_0:float[1, 3, 8, 8]({})]}, outputs:{Out[crop_tensor_0.tmp_0:float[1, 3, 5, 5]({})]}.
+```
+### 使用device_guard避免不必要的数据传输
+在上面的例子中，`shape`输出的是一个1-D的Tensor，因此对于`slice`而言计算量很小。这种情况下如果将`slice`设置在CPU上运行，就可以避免2次数据传输。修改后的程序如下：
+```python
+import paddle.fluid as fluid
+import paddle.fluid.profiler as profiler
+data1 = fluid.layers.fill_constant(shape=[1, 3, 8, 8], value=0.5, dtype='float32')
+data2 = fluid.layers.fill_constant(shape=[1, 3, 5, 5], value=0.5, dtype='float32')
+shape = fluid.layers.shape(data2)
+with fluid.device_guard("cpu"):
+    shape = fluid.layers.slice(shape, axes=[0], starts=[0], ends=[4])
+out = fluid.layers.crop_tensor(data1, shape=shape)
+place = fluid.CUDAPlace(0) 
+exe = fluid.Executor(place)
+exe.run(fluid.default_startup_program())
+with profiler.profiler('All', 'total') as prof:
+    for i in range(10):
+        result = exe.run(fetch_list=[out])
+```
+再次观察profile report中`GpuMemCpy Summary`的内容，可以看到`GpuMemCpySync`已经被消除。在实际的模型中，若`GpuMemCpySync` 调用耗时占比较大，并且可以通过设置`device_guard`避免，那么就能够带来一定的性能提升。
+```text
+------------------------->     Profiling Report     <-------------------------
+Note! This Report merge all thread info into one.
+Place: All
+Time unit: ms
+Sorted by total time in descending order in the same thread
+Total time: 14.5345
+  Computation time       Total: 4.47587     Ratio: 30.7948%
+  Framework overhead     Total: 10.0586     Ratio: 69.2052%
+-------------------------     GpuMemCpy Summary     -------------------------
+GpuMemcpy                Calls: 10          Total: 0.457033    Ratio: 3.14447%
+  GpuMemcpyAsync         Calls: 10          Total: 0.457033    Ratio: 3.14447%
+-------------------------       Event Summary       -------------------------
+Event                                                       Calls       Total       CPU Time (Ratio)        GPU Time (Ratio)        Min.        Max.        Ave.        Ratio.
+FastThreadedSSAGraphExecutorPrepare                         10          7.70113     7.689066 (0.998433)     0.012064 (0.001567)     0.032657    7.39363     0.770113    0.529852
+fill_constant                                               20          2.62299     2.587022 (0.986287)     0.035968 (0.013713)     0.071097    0.342082    0.13115     0.180466
+shape                                                       10          1.93504     1.935040 (1.000000)     0.000000 (0.000000)     0.026774    1.6016      0.193504    0.133134
+Fetch                                                       10          0.880496    0.858512 (0.975032)     0.021984 (0.024968)     0.07392     0.140896    0.0880496   0.0605797
+  GpuMemcpyAsync:GPU->CPU                                   10          0.457033    0.435049 (0.951898)     0.021984 (0.048102)     0.037836    0.071424    0.0457033   0.0314447
+crop_tensor                                                 10          0.705426    0.671506 (0.951916)     0.033920 (0.048084)     0.05841     0.123901    0.0705426   0.0485346
+slice                                                       10          0.324241    0.324241 (1.000000)     0.000000 (0.000000)     0.024299    0.07213     0.0324241   0.0223084
+eager_deletion                                              30          0.250524    0.250524 (1.000000)     0.000000 (0.000000)     0.004171    0.016235    0.0083508   0.0172365
+ScopeBufferedMonitor::post_local_exec_scopes_process        10          0.047794    0.047794 (1.000000)     0.000000 (0.000000)     0.003344    0.014131    0.0047794   0.00328831
+InitLocalVars                                               1           0.034629    0.034629 (1.000000)     0.000000 (0.000000)     0.034629    0.034629    0.034629    0.00238254
+ScopeBufferedMonitor::pre_local_exec_scopes_process         10          0.032231    0.032231 (1.000000)     0.000000 (0.000000)     0.002952    0.004076    0.0032231   0.00221755
+```
+### 总结
+- 使用profile工具对模型进行分析，看是否存在GpuMemcpySync的调用耗时。若存在，则进一步分析发生数据传输的原因。
+- 可以通过profile report找到发生GpuMemcpySync的OP。如果需要，可以通过打印log，找到GpuMemcpySync发生的具体位置。
+- 尝试使用`device_guard`设置部分OP的运行设备，来减少GpuMemcpySync的调用。
+- 最后可以通过比较修改前后模型的profile report，或者其他用来衡量性能的指标，确认修改后是否带来了性能提升。
--- a/doc/fluid/advanced_guide/performance_improving/index_cn.rst
+++ b/doc/fluid/advanced_guide/performance_improving/index_cn.rst
@@ -7,6 +7,7 @@
    singlenode_training_improving/training_best_practice.rst
    singlenode_training_improving/memory_optimize.rst
+    device_switching/device_switching.md
    multinode_training_improving/cpu_train_best_practice.rst
    multinode_training_improving/dist_training_gpu.rst
    multinode_training_improving/gpu_training_with_recompute.rst

--- a/doc/fluid/api/dygraph.rst
+++ b/doc/fluid/api/dygraph.rst
@@ -14,6 +14,7 @@ fluid.dygraph
    dygraph/Conv3DTranspose.rst
    dygraph/CosineDecay.rst
    dygraph/DataParallel.rst
+    dygraph/declarative.rst
    dygraph/disable_dygraph.rst
    dygraph/Dropout.rst
    dygraph/dygraph_to_static_code.rst
@@ -28,6 +29,7 @@ fluid.dygraph
    dygraph/GroupNorm.rst
    dygraph/GRUUnit.rst
    dygraph/guard.rst
+    dygraph/InstanceNorm.rst
    dygraph/InverseTimeDecay.rst
    dygraph/Layer.rst
    dygraph/LayerList.rst
@@ -45,6 +47,7 @@ fluid.dygraph
    dygraph/Pool2D.rst
    dygraph/PRelu.rst
    dygraph/prepare_context.rst
+    dygraph/ProgramTranslator.rst
    dygraph/save_dygraph.rst
    dygraph/Sequential.rst
    dygraph/SpectralNorm.rst

--- a/doc/fluid/api/dygraph/GRUCell.rst
+++ b/doc/fluid/api/dygraph/GRUCell.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+.. _api_fluid_dygraph_GRUCell:
+GRUCell
+-------
+..  autoclass:: paddle.fluid.dygraph.GRUCell
+    :members:
+    :noindex:
--- a/doc/fluid/api/dygraph/InstanceNorm.rst
+++ b/doc/fluid/api/dygraph/InstanceNorm.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+.. _api_fluid_dygraph_InstanceNorm:
+InstanceNorm
+---------
+..  autoclass:: paddle.fluid.dygraph.InstanceNorm
+    :members:
+    :noindex:
--- a/doc/fluid/api/dygraph/LSTMCell.rst
+++ b/doc/fluid/api/dygraph/LSTMCell.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+.. _api_fluid_dygraph_LSTMCell:
+LSTMCell
+--------
+..  autoclass:: paddle.fluid.dygraph.LSTMCell
+    :members:
+    :noindex:
--- a/doc/fluid/api/fluid.rst
+++ b/doc/fluid/api/fluid.rst
@@ -7,6 +7,7 @@ fluid
    fluid/BuildStrategy.rst
    fluid/CompiledProgram.rst
+    fluid/ComplexVariable.rst
    fluid/cpu_places.rst
    fluid/CPUPlace.rst
    fluid/create_lod_tensor.rst

--- a/doc/fluid/api/gen_doc.py
+++ b/doc/fluid/api/gen_doc.py
@@ -21,6 +21,7 @@ import contextlib
 import paddle.fluid as fluid
 import paddle.tensor as tensor
 import paddle.nn as nn
+import paddle.complex as complex
 #import paddle.framework as framework
 def parse_arg():
@@ -82,7 +83,7 @@ class DocGenerator(object):
    def print_section(self, name):
        self._print_header_(name, dot='=', is_title=False)
-    def print_item(self, name):
+    def print_item(self, name, output_name):
        item = getattr(self.module, name, None)
        if isinstance(item, types.TypeType):
            self.print_class(name)
@@ -90,7 +91,7 @@ class DocGenerator(object):
            self.print_method(name)
        else:
            self.stream.close()
-            path = os.getcwd()+"/fluid/"+name+".rst"
+            path = os.getcwd()+"/"+output_name+"/"+name+".rst"
            if name != "PipeReader":
                os.remove(path)
@@ -207,7 +208,7 @@ def generate_doc(module_name, module_prefix, output, output_name, to_multiple_fi
            header_name = api
            with gen.guard(os.path.join(output, api + '.rst')):
                gen.print_header_reminder()
-                gen.print_item(api)
+                gen.print_item(api, output_name)
 def main():

--- a/doc/fluid/api/gen_doc.sh
+++ b/doc/fluid/api/gen_doc.sh
 #!/bin/bash
-for module in layers dataset clip metrics executor initializer io nets optimizer profiler regularizer transpiler backward profiler unique_name dygraph
+for module in layers dataset clip metrics executor initializer io nets optimizer profiler regularizer transpiler backward profiler unique_name dygraph framework
 do
  python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name fluid --to_multiple_files True
  python gen_module_index.py ${module}  fluid.${module}
@@ -9,6 +9,7 @@ done
 python gen_doc.py --module_name "" --module_prefix "" --output fluid --output_name fluid --to_multiple_files True
 python gen_module_index.py fluid  fluid
+# tensor
 for module in math random stat
 do
  python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name tensor --to_multiple_files True --output_dir tensor
@@ -17,13 +18,27 @@ done
 python gen_module_index.py tensor paddle.tensor
+for module in math manipulation
+do
+  python gen_doc.py --module_name tensor.${module} --module_prefix tensor.${module} --output tensor/${module} --output_name complex --to_multiple_files True --output_dir complex
+  python gen_module_index.py complex.tensor.${module} ${module}
+done
+python gen_module_index.py complex.tensor tensor
+python gen_module_index.py complex paddle.complex
+python gen_module_index.py framework paddle.framework
+# nn
 for module in loss
 do
  python gen_doc.py --module_name ${module} --module_prefix ${module} --output ${module} --output_name nn --to_multiple_files True --output_dir nn
  python gen_module_index.py nn.${module} ${module}
 done
+python gen_doc.py --module_name "" --module_prefix "" --output nn --output_name nn --to_multiple_files True
 python gen_module_index.py nn paddle.nn
+# index.rst
 python gen_index.py
--- a/doc/fluid/api/index_en.rst
+++ b/doc/fluid/api/index_en.rst
@@ -6,6 +6,7 @@ API Reference
    :maxdepth: 1
    ../api_guides/index_en.rst
+    complex.rst
    nn.rst
    tensor.rst
    fluid.rst
@@ -16,6 +17,7 @@ API Reference
    dataset.rst
    dygraph.rst
    executor.rst
+    index.rst
    initializer.rst
    io.rst
    layers.rst
@@ -24,6 +26,5 @@ API Reference
    optimizer.rst
    profiler.rst
    regularizer.rst
-    tensor.rst
    transpiler.rst
    unique_name.rst
--- a/doc/fluid/api/nn/loss.rst
+++ b/doc/fluid/api/nn/loss.rst
@@ -5,4 +5,8 @@ loss
 ..  toctree::
    :maxdepth: 1
+    loss/BCELoss.rst
+    loss/CrossEntropyLoss.rst
    loss/L1Loss.rst
+    loss/MSELoss.rst
+    loss/NLLLoss.rst
--- a/doc/fluid/api/tensor.rst
+++ b/doc/fluid/api/tensor.rst
@@ -8,3 +8,4 @@ paddle.tensor
    tensor/linalg.rst
    tensor/math.rst
    tensor/random.rst
+    tensor/stat.rst
--- a/doc/fluid/api/tensor/math.rst
+++ b/doc/fluid/api/tensor/math.rst
@@ -6,9 +6,16 @@ math
    :maxdepth: 1
    math/add.rst
+    math/addcmul.rst
+    math/addmm.rst
    math/atan.rst
+    math/clamp.rst
    math/div.rst
    math/elementwise_sum.rst
+    math/log1p.rst
+    math/logsumexp.rst
+    math/max.rst
+    math/min.rst
    math/mm.rst
    math/mul.rst
    math/pow.rst

--- a/doc/fluid/api/tensor/random/rand.rst
+++ b/doc/fluid/api/tensor/random/rand.rst
+..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
+    !DO NOT EDIT THIS FILE MANUALLY!
+.. _api_tensor_random_rand:
+rand
+----
+..  autofunction:: paddle.tensor.random.rand
+    :noindex:
--- a/doc/fluid/api_cn/dygraph_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn.rst
@@ -24,6 +24,7 @@ fluid.dygraph
    dygraph_cn/GroupNorm_cn.rst
    dygraph_cn/GRUUnit_cn.rst
    dygraph_cn/guard_cn.rst
+    dygraph_cn/InstanceNorm_cn.rst
    dygraph_cn/InverseTimeDecay_cn.rst
    dygraph_cn/Layer_cn.rst
    dygraph_cn/LayerList_cn.rst

--- a/doc/fluid/api_cn/dygraph_cn/Conv2DTranspose_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Conv2DTranspose_cn.rst
@@ -71,7 +71,7 @@ Conv2DTranspose
    with fluid.dygraph.guard():
        data = np.random.random((3, 32, 32, 5)).astype('float32')
        conv2DTranspose = fluid.dygraph.nn.Conv2DTranspose(
-              'Conv2DTranspose', num_filters=2, filter_size=3)
+              num_channels=32, num_filters=2, filter_size=3)
        ret = conv2DTranspose(fluid.dygraph.base.to_variable(data))
 属性

--- a/doc/fluid/api_cn/dygraph_cn/InstanceNorm_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/InstanceNorm_cn.rst
+.. _cn_api_fluid_dygraph_InstanceNorm:
+InstanceNorm
+-------------------------------
+.. py:class:: paddle.fluid.dygraph.InstanceNorm(num_channels, epsilon=1e-05, param_attr=None, bias_attr=None, dtype='float32') 
+该接口用于构建 ``InstanceNorm`` 类的一个可调用对象，具体用法参照 ``代码示例`` 。
+可用作卷积和全连接操作的实例正则化函数，根据每个样本的每个通道的均值和方差信息进行正则化。该层需要的数据格式如下：
+NCHW[batch,in_channels,in_height,in_width]
+更多详情请参考 : `Instance Normalization: The Missing Ingredient for Fast Stylization <https://arxiv.org/pdf/1607.08022.pdf>`_
+``input`` 是mini-batch的输入。
+.. math::
+    \mu_{\beta}        &\gets \frac{1}{m} \sum_{i=1}^{m} x_i                                 \quad &// mean of each channel in each sample in a batch  \\
+    \sigma_{\beta}^{2} &\gets \frac{1}{m} \sum_{i=1}^{m}(x_i - \mu_{\beta})^2               \quad &// variance of each channel in each sample a batch  \\
+    \hat{x_i}          &\gets \frac{x_i - \mu_\beta} {\sqrt{\sigma_{\beta}^{2} + \epsilon}}  \quad &// normalize \\
+    y_i &\gets \gamma \hat{x_i} + \beta                                                      \quad &// scale-and-shift
+参数：
+    - **num_channels** （int）- 指明输入 ``Tensor`` 的通道数量。
+    - **epsilon** （float，默认1e-05）- 为了当前输入做标准化时得到稳定的结果而加在的分母上的扰动值。默认值为1e-5。
+    - **param_attr** （ParamAttr|None） - instance_norm 权重参数的属性，可以设置为None或者一个ParamAttr的类（ParamAttr中可以指定参数的各种属性）。 如果设为None，则默认的参数初始化为1.0。如果在ParamAttr指定了属性时, instance_norm创建相应属性的param_attr（权重）参数。默认：None。
+    - **bias_attr** （ParamAttr|None） - instance_norm 偏置参数的属性，可以设置为None或者一个ParamAttr的类（ParamAttr中可以指定参数的各种属性）。如果设为None，默认的参数初始化为0.0。如果在ParamAttr指定了参数的属性时, instance_norm创建相应属性的bias_attr（偏置）参数。默认：None。
+    - **dtype** （string，默认float32）- 指明输入 ``Tensor`` 的数据类型，可以为float32或float64。默认：float32。
+返回：无
+**代码示例**：
+.. code-block:: python
+    import paddle.fluid as fluid
+    from paddle.fluid.dygraph.base import to_variable
+    import numpy as np
+    import paddle
+    # x's shape is [1, 3, 1, 2] 
+    x = np.array([[[[1.0, 8.0]], [[10.0, 5.0]], [[4.0, 6.0]]]]).astype('float32')
+    with fluid.dygraph.guard():
+        x = to_variable(x)
+        instanceNorm = paddle.nn.InstanceNorm(3)
+        ret = instanceNorm(x)
+        # ret's shape is [1, 3, 1, 2]; value is [-1 1 0.999999 -0.999999 -0.999995 0.999995]
+        print(ret)
--- a/doc/fluid/api_cn/dygraph_cn/Pool2D_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/Pool2D_cn.rst
@@ -74,14 +74,13 @@ Pool2D
    import paddle.fluid as fluid
    from paddle.fluid.dygraph.base import to_variable
-    import numpy as np
    with fluid.dygraph.guard():
-        data = np.random.random((3, 32, 32, 5)).astype('float32')
+       data = numpy.random.random((3, 32, 32, 5)).astype('float32')
-        pool2d = fluid.dygraph.Pool2D(pool_size=2,
+       pool2d = fluid.dygraph.Pool2D(pool_size=2,
                      pool_type='max',
                      pool_stride=1,
                      global_pooling=False)
-        pool2d_res = pool2d(to_variable(data))
+       pool2d_res = pool2d(to_variable(data))
--- a/doc/fluid/api_cn/dygraph_cn/load_dygraph_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/load_dygraph_cn.rst
@@ -32,7 +32,8 @@ load_dygraph
        emb = fluid.dygraph.Embedding([10, 10])
        state_dict = emb.state_dict()
        fluid.save_dygraph( state_dict, "paddle_dy")
-        adam = fluid.optimizer.Adam( learning_rate = fluid.layers.noam_decay( 100, 10000) )
+        adam = fluid.optimizer.Adam( learning_rate = fluid.layers.noam_decay( 100, 10000) ,
+                                     parameter_list = emb.parameters() )
        state_dict = adam.state_dict()
        fluid.save_dygraph( state_dict, "paddle_dy")

--- a/doc/fluid/api_cn/dygraph_cn/save_dygraph_cn.rst
+++ b/doc/fluid/api_cn/dygraph_cn/save_dygraph_cn.rst
@@ -29,15 +29,13 @@ save_dygraph
    import paddle.fluid as fluid
    with fluid.dygraph.guard():
-        emb = fluid.dygraph.Embedding(
+        emb = fluid.dygraph.Embedding([10, 10])
-            size=[10, 32],
-            param_attr='emb.w',
-            is_sparse=False)
        state_dict = emb.state_dict()
-        fluid.save_dygraph(state_dict, "paddle_dy")  # 会保存为 paddle_dy.pdparams
+        fluid.save_dygraph( state_dict, "paddle_dy") # 会保存为 paddle_dy.pdparams
+        adam = fluid.optimizer.Adam( learning_rate = fluid.layers.noam_decay( 100, 10000),
+                                     parameter_list = emb.parameters() )
-        adam = fluid.optimizer.Adam(
-            learning_rate=fluid.layers.noam_decay(100, 10000),
-            parameter_list = emb.parameters())
        state_dict = adam.state_dict()
-        fluid.save_dygraph(state_dict, "paddle_dy")  # 会保存为 paddle_dy.pdopt
+        fluid.save_dygraph( state_dict, "paddle_dy") # 会保存为 paddle_dy.pdopt
\ No newline at end of file
--- a/doc/fluid/api_cn/fluid_cn/default_main_program_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/default_main_program_cn.rst
@@ -28,8 +28,8 @@ default_main_program
        import paddle.fluid as fluid
        #示例网络:
-        data = fluid.layers.data(name='image', shape=[3, 224, 224], dtype='float32')
+        data = fluid.data(name='image', shape=[None, 3, 224, 224], dtype='float32')
-        label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+        label = fluid.data(name='label', shape=[None, 1], dtype='int64')
        conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None)
        bn1 = fluid.layers.batch_norm(conv1, act='relu')

--- a/doc/fluid/api_cn/fluid_cn/in_dygraph_mode_cn.rst
+++ b/doc/fluid/api_cn/fluid_cn/in_dygraph_mode_cn.rst
@@ -16,11 +16,11 @@ in_dygraph_mode
 .. code-block:: python
-    from __future__ import print_function
    import paddle.fluid as fluid
-    if fluid.in_dygraph_mode():
-        print('running in dygraph mode')
+    fluid.enable_dygraph()          # 现在进入 dygragh 模式
-    else:
+    print(fluid.in_dygraph_mode())  # True
-        print('not running in dygraph mode')
+    fluid.disable_dygraph()
+    print(fluid.in_dygraph_mode())  # False
--- a/doc/fluid/api_cn/framework_cn/manual_seed_cn.rst
+++ b/doc/fluid/api_cn/framework_cn/manual_seed_cn.rst
-manual
+.. _cn_api_paddle_framework_manual_seed:
+manual_seed
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.framework.manual_seed(seed)
+设置并固定随机种子, manual_seed设置后，会将用户定义的Program中的random_seed参数设置成相同的种子
+参数:
+     - **seed** (int32|int64) - 设置产生随机数的种子
+返回: 无
+**代码示例**：
+.. code-block:: python
+    import paddle
+    from paddle.framework import manual_seed
+    default_seed = paddle.fluid.default_startup_program().random_seed #default_seed为0
+    manual_seed(102)
+    prog = paddle.fluid.Program()
+    prog_seed = prog.random_seed #prog_seed为102
+    update_seed = paddle.fluid.default_startup_program().random_seed #update_seed 为102
--- a/doc/fluid/api_cn/layers_cn/GRUCell_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/GRUCell_cn.rst
@@ -35,7 +35,7 @@ GRUCell
 ..  code-block:: python 
    import paddle.fluid.layers as layers
-    cell = layers.rnn.GRUCell(hidden_size=256)
+    cell = layers.GRUCell(hidden_size=256)
 .. py:method:: call(inputs, states)

--- a/doc/fluid/api_cn/layers_cn/lstm_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/lstm_cn.rst
@@ -10,7 +10,7 @@ lstm
 .. note::
    该OP仅支持 GPU 设备运行
-该OP实现了 LSTM，即 Long-Short Term Memory（长短期记忆）运算 - `Hochreiter, S., & Schmidhuber, J. (1997) <http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf>`_。
+该OP实现了 LSTM，即 Long-Short Term Memory（长短期记忆）运算 - `Hochreiter, S., & Schmidhuber, J. (1997) <https://www.bioinf.jku.at/publications/older/2604.pdf>`_。
 该OP的实现不包括 diagonal/peephole 连接，参见 `Gers, F. A., & Schmidhuber, J. (2000) <ftp://ftp.idsia.ch/pub/juergen/TimeCount-IJCNN2000.pdf>`_。
 如果需要使用 peephole 连接方法，请使用 :ref:`cn_api_fluid_layers_dynamic_lstm` 。

--- a/doc/fluid/api_cn/layers_cn/lstm_unit_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/lstm_unit_cn.rst
@@ -48,26 +48,18 @@ Long-Short Term Memory（LSTM）循环神经网络计算单元。该OP用于完
 **代码示例**：
 .. code-block:: python
    import paddle.fluid as fluid
    dict_dim, emb_dim, hidden_dim = 128, 64, 512
-    data = fluid.layers.data(name='step_data', shape=[1], dtype='int32')
+    data = fluid.data(name='step_data', shape=[None], dtype='int64')
-    x = fluid.layers.embedding(input=data, size=[dict_dim, emb_dim])
+    x = fluid.embedding(input=data, size=[dict_dim, emb_dim])
-    pre_hidden = fluid.layers.data(name='pre_hidden', shape=[hidden_dim], dtype='float32')
+    pre_hidden = fluid.data(
-    pre_cell = fluid.layers.data(name='pre_cell', shape=[hidden_dim], dtype='float32')
+        name='pre_hidden', shape=[None, hidden_dim], dtype='float32')
+    pre_cell = fluid.data(
+        name='pre_cell', shape=[None, hidden_dim], dtype='float32')
    hidden = fluid.layers.lstm_unit(
        x_t=x,
        hidden_t_prev=pre_hidden,
        cell_t_prev=pre_cell)
--- a/doc/fluid/api_cn/nn_cn.rst
+++ b/doc/fluid/api_cn/nn_cn.rst
@@ -13,7 +13,7 @@ paddle.nn
    nn_cn/diag_embed_cn.rst
    nn_cn/interpolate_cn.rst
    nn_cn/Linear_cn.rst
-    nn_cn/log_softmax_cn.rst
+    nn_cn/LogSoftmax_cn.rst
    nn_cn/ReLU_cn.rst
    nn_cn/Upsample_cn.rst
    nn_cn/activation_cn.rst

--- a/doc/fluid/api_cn/nn_cn/Linear_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/Linear_cn.rst
+.. _cn_api_fluid_dygraph_Linear:
 Linear
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:class:: paddle.nn.Linear(input_dim, output_dim, param_attr=None, bias_attr=None, act=None, dtype='float32')
+**线性变换层：**
+.. math::
+        \\Out = Act({XW + b})\\
+其中，:math:`X` 为输入的 Tensor， :math:`W` 和 :math:`b` 分别为权重和偏置。
+Linear 层只接受一个 Tensor 的输入。
+Linear 层将输入 Tensor 与权重矩阵 :math:`W` 相乘，然后生成形状为 :math:`[N，*，output_dim]` 的输出张量，
+其中 :math:`N` 是批量大小，:math:`*` 表示任意数量的附加尺寸。
+如果 bias_attr 不是 None，则将创建一个 bias 变量并将其添加到输出中。
+最后，如果激活 act 不是 None，则相应激活函数也将应用于输出上。
+参数:
+  - **input_dim** (int) – 线性变换层输入单元的数目。
+  - **output_dim** (int) – 线性变换层输出单元的数目。
+  - **param_attr** (ParamAttr, 可选) – 指定权重参数属性的对象。默认值为None，表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+  - **bias_attr** (ParamAttr, 可选) – 指定偏置参数属性的对象，若 `bias_attr` 为bool类型，如果设置为False，表示不会为该层添加偏置；如果设置为True，表示使用默认的偏置参数属性。默认值为None，表示使用默认的偏置参数属性。默认的偏置参数属性将偏置参数的初始值设为0。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
+  - **act** (str, 可选) – 应用于输出上的激活函数，如tanh、softmax、sigmoid，relu等，支持列表请参考 :ref:`api_guide_activations` ，默认值为None。
+  - **dtype** (str, 可选) – 权重的数据类型，可以为float32或float64。默认为float32。
+返回：无
+**代码示例**
+..  code-block:: python
+    from paddle.fluid.dygraph.base import to_variable
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    data = np.random.uniform( -1, 1, [30, 10, 32] ).astype('float32')
+    with fluid.dygraph.guard():
+        linear = paddle.nn.Linear(32, 64)
+        data = to_variable(data)
+        res = linear(data)  # [30, 10, 64]
+属性
+::::::::::::
+.. py:attribute:: weight
+本层的可学习参数，类型为 ``Parameter``
+.. py:attribute:: bias
+本层的可学习偏置，类型为 ``Parameter``
--- a/doc/fluid/api_cn/nn_cn/LogSoftmax_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/LogSoftmax_cn.rst
+.. _cn_api_nn_LogSoftmax:
+LogSoftmax
+-------------------------------
+.. py:class:: paddle.nn.LogSoftmax(axis=None)
+**LogSoftmax激活层：**
+.. math::
+        \\output = \frac{1}{1 + e^{-input}}\\
+参数:
+    - **axis** (int, 可选) - 指示进行LogSoftmax计算的维度索引，其范围应为 :math:`[-1，rank-1]` ，其中rank是输入变量的秩。默认值：None（与-1效果相同，表示对最后一维做LogSoftmax操作）。
+返回：无
+**代码示例**
+..  code-block:: python
+    import paddle.fluid as fluid
+    import paddle.nn as nn
+    import numpy as np
+    data = np.array([[[-2.0, 3.0, -4.0, 5.0],
+                      [3.0, -4.0, 5.0, -6.0],
+                      [-7.0, -8.0, 8.0, 9.0]],
+                     [[1.0, -2.0, -3.0, 4.0],
+                      [-5.0, 6.0, 7.0, -8.0],
+                      [6.0, 7.0, 8.0, 9.0]]]).astype('float32')
+    my_log_softnmax = nn.LogSoftmax()
+    with fluid.dygraph.guard():
+        data = fluid.dygraph.to_variable(data)
+        res = my_log_softnmax(data)
+        # [[[ -7.1278396   -2.1278396   -9.127839    -0.12783948]
+        #   [ -2.1270514   -9.127051    -0.12705144 -11.127051  ]
+        #   [-16.313261   -17.313261    -1.3132617   -0.31326184]]
+        #  [[ -3.0518122   -6.051812    -7.051812    -0.051812  ]
+        #   [-12.313267    -1.3132664   -0.3132665  -15.313267  ]
+        #   [ -3.4401896   -2.4401896   -1.4401896   -0.44018966]]]
--- a/doc/fluid/api_cn/nn_cn/ReLU_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/ReLU_cn.rst
+.. _cn_api_nn_ReLU:
 ReLU
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:class:: paddle.nn.ReLU(inplace=False)
+**ReLU（Rectified Linear Unit）激活层：**
+.. math::
+        \\Out = max(X, 0)\\
+其中，:math:`X` 为输入的 Tensor
+参数:
+    - **inplace** （bool，可选）- 如果 ``inplace`` 为 ``True``，则 ``ReLU`` 的输入和输出是同一个变量，否则 ``ReLU`` 的输入和输出是不同的变量。默认值：``False``。请注意，如果 ``ReLU`` 的输入同时是其它OP的输入，则 ``inplace`` 必须为False。
+返回：无
+**代码示例**
+..  code-block:: python
+    import paddle.fluid as fluid
+    import paddle.nn as nn
+    import numpy as np
+    data = np.array([-2, 0, 1]).astype('float32')
+    my_relu = nn.ReLU()
+    with fluid.dygraph.guard():
+        data = fluid.dygraph.to_variable(data)
+        res = my_relu(data)  # [0, 0, 1]
--- a/doc/fluid/api_cn/nn_cn/diag_embed_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/diag_embed_cn.rst
-diag
+.. _cn_api_functional_diag_embed:
+diag_embed
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.functional.diag_embed(input, offset=0, dim1=-2, dim2=-1):
+    该 OP 创建一个 Tensor，其在指定的 2D 平面（由 ``dim1`` 和 ``dim2`` 指定）上的对角线由输入 ``input`` 填充。
+    默认的，指定的 2D 平面由返回 Tensor 的最后两维组成。
+    参数 ``offset`` 确定在指定的二维平面中填充对角线的位置：
+    - 如果 offset = 0，则填充主对角线。
+    - 如果 offset > 0，则填充主对角线右上的对角线。
+    - 如果 offset < 0，则填充主对角线左下的对角线。
+参数：
+    - **input** （Variable|numpy.ndarray）- 输入变量，至少为 1D 数组，支持数据类型为 float32，float64，int32，int64。
+    - **offset** （int ，可选）- 从指定的二维平面中获取对角线的位置，默认值为 0，既主对角线。
+    - **dim1** （int ， 可选）- 填充对角线的二维平面的第一维，默认值为 -2。
+    - **dim2** （int ， 可选）- 填充对角线的二维平面的第二维，默认值为 -1。
+返回： 指定二维平面填充了对角线的 Tensor。数据类型和输入数据类型一致。
+返回类型：  变量（Variable）
+**代码示例** 
+..  code-block:: python
+    import paddle.nn.functional as F
+    import paddle.fluid.dygraph as dg
+    import numpy as np
+    diag_embed = np.random.randn(2, 3).astype('float32')
+    # [[ 0.7545889 , -0.25074545,  0.5929117 ],
+    #  [-0.6097662 , -0.01753256,  0.619769  ]]
+    with dg.guard():
+        data1 = F.diag_embed(diag_embed)
+        data1.numpy()
+        # [[[ 0.7545889 ,  0.        ,  0.        ],
+        #  [ 0.        , -0.25074545,  0.        ],
+        #   [ 0.        ,  0.        ,  0.5929117 ]],
+        # [[-0.6097662 ,  0.        ,  0.        ],
+        #  [ 0.        , -0.01753256,  0.        ],
+        #  [ 0.        ,  0.        ,  0.619769  ]]]
+        data2 = F.diag_embed(diag_embed, offset=-1, dim1=0, dim2=2)
+        data2.numpy()
+        # [[[ 0.        ,  0.        ,  0.        ,  0.        ],
+        #   [ 0.7545889 ,  0.        ,  0.        ,  0.        ],
+        #   [ 0.        , -0.25074545,  0.        ,  0.        ],
+        #   [ 0.        ,  0.        ,  0.5929117 ,  0.        ]],
+        #
+        #  [[ 0.        ,  0.        ,  0.        ,  0.        ],
+        #   [-0.6097662 ,  0.        ,  0.        ,  0.        ],
+        #   [ 0.        , -0.01753256,  0.        ,  0.        ],
+        #   [ 0.        ,  0.        ,  0.619769  ,  0.        ]]]
+        data3 = F.diag_embed(diag_embed, offset=1, dim1=0, dim2=2)
+        data3.numpy()
+        # [[[ 0.        ,  0.7545889 ,  0.        ,  0.        ],
+        #   [ 0.        , -0.6097662 ,  0.        ,  0.        ]],
+        #
+        #  [[ 0.        ,  0.        , -0.25074545,  0.        ],
+        #   [ 0.        ,  0.        , -0.01753256,  0.        ]],
+        #
+        #  [[ 0.        ,  0.        ,  0.        ,  0.5929117 ],
+        #   [ 0.        ,  0.        ,  0.        ,  0.619769  ]],
+        #
+        #  [[ 0.        ,  0.        ,  0.        ,  0.        ],
+        #   [ 0.        ,  0.        ,  0.        ,  0.        ]]]
--- a/doc/fluid/api_cn/nn_cn/interpolate_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/interpolate_cn.rst
-interpolate
+.. _cn_api_paddle_nn_functioanl_interpolate:
+Inerpolate
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.nn.functioanl.interpolate(input, out_shape=None, scale=None, name=None, resample='BILINEAR', actual_shape=None, align_corners=True, align_mode=1, data_format='NCHW')
+**注意:** 参数 ``actual_shape`` 将被弃用，请使用 ``out_shape`` 替代。
+该OP用于调整一个batch中图片的大小。
+输入为4-D Tensor时形状为(num_batches, channels, in_h, in_w)或者(num_batches, in_h, in_w, channels)，输入为5-D Tensor时形状为(num_batches, channels, in_d, in_h, in_w)或者(num_batches, in_d, in_h, in_w, channels)，并且调整大小只适用于深度，高度和宽度对应的维度。
+支持的插值方法:
+    NEAREST：最近邻插值
+    BILINEAR：双线性插值
+    TRALINEAR：三线性插值
+    BICUBIC：双三次插值
+最近邻插值是在输入张量的高度和宽度上进行最近邻插值。
+双线性插值是线性插值的扩展，用于在直线2D网格上插值两个变量（例如，该操作中的H方向和W方向）的函数。 关键思想是首先在一个方向上执行线性插值，然后在另一个方向上再次执行线性插值。
+三线插值是线性插值的一种扩展，是3参数的插值方程（比如op里的D,H,W方向），在三个方向上进行线性插值。
+双三次插值是在二维网格上对数据点进行插值的三次插值的扩展，它能创造出比双线性和最近临插值更为光滑的图像边缘。
+Align_corners和align_mode是可选参数，插值的计算方法可以由它们选择。
+示例:
+::
+      scale 计算方法：
+        if align_corners = True && out_size > 1 :
+          scale_factor = (in_size-1.0)/(out_size-1.0)
+        else:
+          scale_factor = float(in_size/out_size)
+      不同插值方式的输出纬度计算规则：
+      Nearest neighbor interpolation:
+      if:
+          align_corners = False
+          input : (N,C,H_in,W_in)
+          output: (N,C,H_out,W_out) where:
+          H_out = \left \lfloor {H_{in} * scale_{}factor}} \right \rfloor
+          W_out = \left \lfloor {W_{in} * scale_{}factor}} \right \rfloor
+      else:
+          align_corners = True
+          input : (N,C,H_in,W_in)
+          output: (N,C,H_out,W_out) where:
+          H_out = round(H_{in} * scale_{factor})
+          W_out = round(W_{in} * scale_{factor})
+      Bilinear interpolation:
+      if:
+          align_corners = False , align_mode = 0
+          input : (N,C,H_in,W_in)
+          output: (N,C,H_out,W_out) where:
+          H_out = (H_{in}+0.5) * scale_{factor} - 0.5
+          W_out = (W_{in}+0.5) * scale_{factor} - 0.5
+      else:
+          input : (N,C,H_in,W_in)
+          output: (N,C,H_out,W_out) where:
+          H_out = H_{in} * scale_{factor}
+          W_out = W_{in} * scale_{factor}
+      Bicubic interpolation:
+      if:
+          align_corners = False
+          input : (N,C,H_in,W_in)
+          output: (N,C,H_out,W_out) where:
+          H_out = (H_{in}+0.5) * scale_{factor} - 0.5
+          W_out = (W_{in}+0.5) * scale_{factor} - 0.5
+      else:
+          input : (N,C,H_in,W_in)
+          output: (N,C,H_out,W_out) where:
+          H_out = H_{in} * scale_{factor}
+          W_out = W_{in} * scale_{factor}
+      Trilinear interpolation:
+      if:
+          align_corners = False , align_mode = 0
+          input : (N,C,D_in,H_in,W_in)
+          output: (N,C,D_out,H_out,W_out) where:
+          D_out = (D_{in}+0.5) * scale_{factor} - 0.5
+          H_out = (H_{in}+0.5) * scale_{factor} - 0.5
+          W_out = (W_{in}+0.5) * scale_{factor} - 0.5
+      else:
+          input : (N,C,D_in,H_in,W_in)
+          output: (N,C,D_out,H_out,W_out) where:
+          D_out = D_{in} * scale_{factor}
+          H_out = H_{in} * scale_{factor}
+          W_out = W_{in} * scale_{factor}
+有关最近邻插值的详细信息，请参阅维基百科：
+https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation
+有关双线性插值的详细信息，请参阅维基百科：
+https://en.wikipedia.org/wiki/Bilinear_interpolation
+有关三线插值的详细信息，请参阅维基百科：
+https://en.wikipedia.org/wiki/Trilinear_interpolation
+有关双三次插值的详细信息，请参阅维基百科：
+https://en.wikipedia.org/wiki/Bicubic_interpolation
+参数:
+    - **input** (Variable) - 4-D或5-D Tensor，数据类型为float32、float64或uint8，其数据格式由参数 ``data_format`` 指定。
+    - **out_shape** (list|tuple|Variable|None) - 输出Tensor，输入为4D张量时，形状为为(out_h, out_w)的2-D Tensor。输入为5-D Tensor时，形状为(out_d, out_h, out_w)的3-D Tensor。如果 :code:`out_shape` 是列表，每一个元素可以是整数或者形状为[1]的变量。如果 :code:`out_shape` 是变量，则其维度大小为1。默认值为None。
+    - **scale** (float|Variable|None)-输入的高度或宽度的乘数因子 。 out_shape和scale至少要设置一个。out_shape的优先级高于scale。默认值为None。
+    - **name** (str|None) - 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` 。默认值为None。
+    - **resample** (str) - 插值方法。支持“双线性”,“三线性”,“临近插值”,"双三次"。默认值为双线性插值。
+    - **actual_shape** (Variable) - 可选输入，用于动态指定输出形状。如果指定actual_shape，图像将根据给定的形状调整大小，而不是根据指定形状的 :code:`out_shape` 和 :code:`scale` 进行调整。也就是说， :code:`actual_shape` 具有最高的优先级。如果希望动态指定输出形状，建议使用 :code:`out_shape` ，因为 :code:`actual_shape` 未来将被弃用。在使用actual_shape指定输出形状时，还需要设置out_shape和scale之一，否则在图形构建阶段会出现错误。默认值:None
+    - **align_corners** （bool）- 一个可选的bool型参数，如果为True，则将输入和输出张量的4个角落像素的中心对齐，并保留角点像素的值。 默认值为True
+    - **align_mode** （int）- 双线性插值的可选项。 可以是 '0' 代表src_idx = scale *（dst_indx + 0.5）-0.5；如果为'1' ，代表src_idx = scale * dst_index。
+    - **data_format** （str，可选）- 指定输入的数据格式，输出的数据格式将与输入保持一致。对于4-D Tensor，支持 NCHW(num_batches, channels, height, width) 或者 NHWC(num_batches, height, width, channels)，对于5-D Tensor，支持 NCDHW(num_batches, channels, depth, height, width)或者 NDHWC(num_batches, depth, height, width, channels)，默认值：'NCHW'。
+返回：4-D Tensor，形状为 (num_batches, channels, out_h, out_w) 或 (num_batches, out_h, out_w, channels)；或者5-D Tensor，形状为 (num_batches, channels, out_d, out_h, out_w) 或 (num_batches, out_d, out_h, out_w, channels)。
+返回类型: 变量（variable）
+抛出异常：
+    - :code:`TypeError` - out_shape应该是一个列表、元组或变量。
+    - :code:`TypeError` - actual_shape应该是变量或None。
+    - :code:`ValueError` - image_resize的"resample"只能是"BILINEAR"或"TRILINEAR"或"NEAREST"或"BICUBIC"。
+    - :code:`ValueError` - out_shape 和 scale 不可同时为 None。
+    - :code:`ValueError` - out_shape 的长度必须为2如果输入是4D张量。
+    - :code:`ValueError` - out_shape 的长度必须为3如果输入是5D张量。
+    - :code:`ValueError` - scale应大于0。
+    - :code:`TypeError`  - align_corners 应为bool型。
+    - :code:`ValueError` - align_mode 只能取 ‘0’ 或 ‘1’。
+    - :code:`ValueError` - data_format 只能取 ‘NCHW’、‘NHWC’、‘NCDHW’ 或者 ‘NDHWC’。
+**代码示例**
+..  code-block:: python
+    import paddle
+    import numpy as np
+    input = fluid.data(name="input", shape=[None,3,6,10])
+    output = paddle.nn.functional.interpolate(input=input,out_shape=[12,12])
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    input_data = np.random.rand(2,3,6,10).astype("float32")
+    output_data = exe.run(fluid.default_main_program(),
+            feed={"input":input_data},
+            fetch_list=[output],
+            return_numpy=True)
+    print(output_data[0].shape)
+    # (2, 3, 12, 12)
+    #imperative mode
+    import paddle.fluid.dygraph as dg
+    with dg.guard(place) as g:
+        input = dg.to_variable(input_data)
+        output = paddle.nn.functional.interpolate(input=input, out_shape=[12,12])
+        print(output.shape)
+    # [2L, 3L, 12L, 12L]
--- a/doc/fluid/api_cn/nn_cn/log_softmax_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/log_softmax_cn.rst
-log
-------------------------------
-**版本升级，文档正在开发中**
--- a/doc/fluid/api_cn/nn_cn/loss_cn/BCELoss_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/BCELoss_cn.rst
+.. _cn_api_paddle_nn_BCELoss:
 BCELoss
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.nn.BCELoss(input, label, weight=None, reduction='mean')
+该接口用于创建一个BCELoss的可调用类，用于计算输入和标签之间的二值交叉熵损失值。二值交叉熵损失函数公式如下：
+当 `weight` 不为空时，公式为：
+.. math::
+  Out = -1 * weight * (label * log(input) + (1 - label) * log(1 - input))
+当 `weight` 为空时，公式为：
+.. math::
+  Out = -1 * (label * log(input) + (1 - label) * log(1 - input))
+当 `reduction` 为 `none` 时，最终的输出结果为：
+.. math::
+  Out = Out
+当 `reduction` 为 `sum` 时，最终的输出结果为：
+.. math::
+  Out = MEAN(Out)
+当 `reduction` 为 `sum` 时，最终的输出结果为：
+.. math::
+  Out = SUM(Out)
+**注意：输入数据一般是 `fluid.layers.sigmoid` 的输出。因为是二分类，所以标签值应该是0或者1。
+输入input和标签label的维度是[N, *], 其中N是batch_size， `*` 是任意其他维度。
+如果 :attr:`reduction` 是 ``'none'``, 则输出的维度为 [N, *], 与输入input的形状相同。
+如果 :attr:`reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出的维度为 [1]。
+参数：
+  - **weight(Variable, optional)**：- 手动指定每个batch二值交叉熵的权重，如果指定的话，维度必须是一个batch的数据的维度。数据类型是float32, float64。默认是：None。
+  - **reduction(str, optional)**：- 指定应用于输出结果的计算方式，可选值有: ``'none'``, ``'mean'``, ``'sum'`` 。默认为 ``'mean'``，计算 `BCELoss` 的均值；设置为 ``'sum'`` 时，计算 `BCELoss` 的总和；设置为 ``'none'`` 时，则返回BCELoss。
+返回：返回计算BCELoss的可调用对象。
+**代码示例**
+.. code-block:: python
+    # declarative mode
+    import paddle.fluid as fluid
+    import numpy as np
+    import paddle
+    input = fluid.data(name="input", shape=[3, 1], dtype='float32')
+    label = fluid.data(name="label", shape=[3, 1], dtype='float32')
+    bce_loss = paddle.nn.loss.BCELoss()
+    output = bce_loss(input, label)
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    input_data = np.array([0.5, 0.6, 0.7]).astype("float32")
+    label_data = np.array([1.0, 0.0, 1.0]).astype("float32")
+    output_data = exe.run(fluid.default_main_program(),
+            feed={"input":input_data, "label":label_data},
+            fetch_list=[output],
+            return_numpy=True)
+    print(output_data)  # [array([0.65537095], dtype=float32)]
+    # imperative mode
+    import paddle.fluid.dygraph as dg
+    with dg.guard(place) as g:
+        input = dg.to_variable(input_data)
+        label = dg.to_variable(label_data)
+        output = bce_loss(input, label)
+        print(output.numpy())  # [0.65537095]
--- a/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/L1Loss_cn.rst
 L1Loss
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.nn.loss.L1Loss(reduction='mean')
+该接口用于创建一个L1Loss的可调用类，L1Loss计算输入input和标签label间的 `L1 loss` 损失。
+该损失函数的数学计算公式如下：
+当 `reduction` 设置为 ``'none'`` 时，
+    .. math::
+        Out = |input - label|
+当 `reduction` 设置为 ``'mean'`` 时，
+    .. math::
+       Out = MEAN(|input - label|)
+当 `reduction` 设置为 ``'sum'`` 时，
+    .. math::
+       Out = SUM(|input - label|)
+输入input和标签label的维度是[N, *], 其中N是batch_size， `*` 是任意其他维度。
+如果 :attr:`reduction` 是 ``'none'``, 则输出Loss的维度为 [N, *], 与输入input相同。
+如果 :attr:`reduction` 是 ``'mean'`` 或 ``'sum'``, 则输出Loss的维度为 [1]。
+参数：
+    - **reduction** (string, 可选): - 指定应用于输出结果的计算方式，可选值有: ``'none'``, ``'mean'``, ``'sum'`` 。默认为 ``'mean'``，计算 `L1Loss` 的均值；设置为 ``'sum'`` 时，计算 `L1Loss` 的总和；设置为 ``'none'`` 时，则返回L1Loss。数据类型为string。
+返回：返回计算L1Loss的可调用对象。
+**代码示例**
+.. code-block:: python
+        # declarative mode
+        import paddle.fluid as fluid
+        import numpy as np
+        import paddle
+        input = fluid.data(name="input", shape=[1])
+        label = fluid.data(name="label", shape=[1])
+        l1_loss = paddle.nn.loss.L1Loss(reduction='mean')
+        output = l1_loss(input,label)
+        place = fluid.CPUPlace()
+        exe = fluid.Executor(place)
+        exe.run(fluid.default_startup_program())
+        input_data = np.array([1.5]).astype("float32")
+        label_data = np.array([1.7]).astype("float32")
+        output_data = exe.run(fluid.default_main_program(),
+                feed={"input":input_data, "label":label_data},
+                fetch_list=[output],
+                return_numpy=True)
+        print(output_data)  # [array([0.2], dtype=float32)]
+        # imperative mode
+        import paddle.fluid.dygraph as dg
+        with dg.guard(place) as g:
+            input = dg.to_variable(input_data)
+            label = dg.to_variable(label_data)
+            l1_loss = paddle.nn.loss.L1Loss(reduction='mean')
+            output = l1_loss(input,label)
+            print(output.numpy())  # [0.2]
--- a/doc/fluid/api_cn/nn_cn/loss_cn/NLLLoss_cn.rst
+++ b/doc/fluid/api_cn/nn_cn/loss_cn/NLLLoss_cn.rst
 NLLLoss
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.nn.loss.NLLLoss(weight=None, reduction='mean', ignore_index=-100)
+该OP计算输入input和标签label间的 `negative log likelihood loss` 损失 ，可用于训练一个 `n` 类分类器。
+如果提供 `weight` 参数的话，它是一个 `1-D` 的tensor, 里面的值对应类别的权重。当你的训练集样本
+不均衡的话，使用这个参数是非常有用的。
+该损失函数的数学计算公式如下：
+当 `reduction` 设置为 `none` 时，损失函数的数学计算公式为：
+    .. math::
+        \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
+        l_n = - w_{y_n} x_{n,y_n}, \quad
+        w_{c} = \text{weight}[c] \cdot \mathbb{1}\{c \not= \text{ignore_index}\},
+其中 `N` 表示 `batch_size` 。如果 `reduction` 的值不是 `none` (默认为 `mean`)，那么此时损失函数
+的数学计算公式为：
+    .. math::
+        \ell(x, y) = \begin{cases}
+            \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n}} l_n, &
+            \text{if reduction} = \text{'mean';}\\
+            \sum_{n=1}^N l_n,  &
+            \text{if reduction} = \text{'sum'.}
+        \end{cases}
+参数：
+    - **input** (Variable): - 输入 `Tensor`, 其形状为 :math:`[N, C]` , 其中 `C` 为类别数。但是对于多维度的情形下，它的形状为 :math:`[N, C, d_1, d_2, ..., d_K]` 。数据类型为float32或float64。
+    - **label** (Variable): - 输入input对应的标签值。其形状为 :math:`[N,]` 或者 :math:`[N, d_1, d_2, ..., d_K]`, 数据类型为int64。
+    - **weight** (Variable, 可选): - 手动指定每个类别的权重。其默认为 `None` 。如果提供该参数的话，长度必须为 `num_classes` 。数据类型为float32或float64。
+    - **reduction** (string, 可选): - 指定应用于输出结果的计算方式，可选值有: `none`, `mean`, `sum` 。默认为 `mean` ，计算 `mini-batch` loss均值。设置为 `sum` 时，计算 `mini-batch` loss的总和。设置为 `none` 时，则返回loss Tensor。数据类型为string。
+    - **ignore_index** (int64, 可选): - 指定一个忽略的标签值，此标签值不参与计算。默认值为-100。数据类型为int64。
+返回：返回存储表示 `negative log likihood loss` 的损失值。
+返回类型：Variable
+**代码示例**
+..  code-block:: python
+            # declarative mode
+            import paddle.fluid as fluid
+            import numpy as np
+            import paddle
+            input_np = np.random.random(size=(10, 10)).astype(np.float32)
+            label_np = np.random.randint(0, 10, size=(10,)).astype(np.int64)
+            prog = fluid.Program()
+            startup_prog = fluid.Program()
+            place = fluid.CPUPlace()
+            with fluid.program_guard(prog, startup_prog):
+                input = fluid.data(name='input', shape=[10, 10], dtype='float32')
+                label = fluid.data(name='label', shape=[10], dtype='int64')
+                nll_loss = paddle.nn.loss.NLLLoss()
+                res = nll_loss(input, label)
+                exe = fluid.Executor(place)
+                static_result = exe.run(
+                    prog,
+                    feed={"input": input_np,
+                          "label": label_np},
+                    fetch_list=[res])
+            print(static_result)
+            # imperative mode
+            import paddle.fluid.dygraph as dg
+            with dg.guard(place) as g:
+                input = dg.to_variable(input_np)
+                label = dg.to_variable(label_np)
+                output = nll_loss(input, label)
+                print(output.numpy())
--- a/doc/fluid/api_cn/profiler_cn/cuda_profiler_cn.rst
+++ b/doc/fluid/api_cn/profiler_cn/cuda_profiler_cn.rst
@@ -30,7 +30,7 @@ CUDA性能分析器。该分析器通过调用CUDA运行时编程接口，对CUD
    epoc = 8
    dshape = [4, 3, 28, 28]
-    data = fluid.layers.data(name='data', shape=[3, 28, 28], dtype='float32')
+    data = fluid.data(name='data', shape=[None, 3, 28, 28], dtype='float32')
    conv = fluid.layers.conv2d(data, 20, 3, stride=[1, 1], padding=[1, 1])
    place = fluid.CUDAPlace(0)

--- a/doc/fluid/api_cn/tensor_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn.rst
@@ -36,6 +36,7 @@ paddle.tensor
    tensor_cn/index_select_cn.rst
    tensor_cn/inverse_cn.rst
    tensor_cn/isnan_cn.rst
+    tensor_cn/kron_cn.rst
    tensor_cn/linspace_cn.rst
    tensor_cn/log1p_cn.rst
    tensor_cn/logsumexp_cn.rst
@@ -67,6 +68,7 @@ paddle.tensor
    tensor_cn/tanh_cn.rst
    tensor_cn/t_cn.rst
    tensor_cn/tensordot_cn.rst
+    tensor_cn/trace_cn.rst
    tensor_cn/transpose_cn.rst
    tensor_cn/tril_cn.rst
    tensor_cn/triu_cn.rst
@@ -75,4 +77,4 @@ paddle.tensor
    tensor_cn/var_cn.rst
    tensor_cn/where_cn.rst
    tensor_cn/zeros_cn.rst
    tensor_cn/zeros_like_cn.rst
\ No newline at end of file
--- a/doc/fluid/api_cn/tensor_cn/addcmul_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/addcmul_cn.rst
+.. _cn_api_tensor_addcmul:
 addcmul
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.addcmul(input, tensor1, tensor2, value=1.0, out=None, name=None)
+计算tensor1和tensor2的逐元素乘积，然后将结果乘以标量value，再加到input上输出。其中input, tensor1, tensor2的维度必须是可广播的。
+计算过程的公式为：
+..  math::
+    out = input + value * tensor1 * tensor2
+参数:
+    - **input** (Variable) : 输入Tensor input，数据类型支持float32, float64, int32, int64。
+    - **itensor1** (Variable) : 输入Tensor tensor1，数据类型支持float32, float64, int32, int64。
+    - **itensor2** (Variable) : 输入Tensor tensor2，数据类型支持float32, float64, int32, int64。
+    - **value** (int|float) : 乘以tensor1*tensor2的标量。如果输入input类型为float32或float64，value类型必须为float，如果输入input类型为int32或int64，value类型必须为int。
+    - **out** (Variable, 可选) – 指定存储运算结果的Tensor。如果设置为None或者不设置，将创建新的Tensor存储运算结果，默认值为None。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：计算得到的Tensor。Tensor数据类型与输入input数据类型一致。
+返回类型：变量（Variable）
+**代码示例**:
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    input = fluid.data(name='input', dtype='float32', shape=[3, 4])
+    tensor1 = fluid.data(name='tenosr1', dtype='float32', shape=[1, 4])
+    tensor2 = fluid.data(name='tensor2', dtype='float32', shape=[3, 4])
+    data = paddle.addcmul(input, tensor1, tensor2, value=1.0)
--- a/doc/fluid/api_cn/tensor_cn/addmm_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/addmm_cn.rst
+.. _cn_api_tensor_addmm:
 addmm
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.addmm(input, x, y, alpha=1.0, beta=1.0, name=None)
+计算x和y的乘积，将结果乘以标量alpha，再加上input与beta的乘积，得到输出。其中input与x、y乘积的维度必须是可广播的。
+计算过程的公式为：
+..  math::
+    out = alpha * x * y + beta * input
+参数:
+    - **input** (Variable) : 输入Tensor input，数据类型支持float32, float64。
+    - **x** (Variable) : 输入Tensor x，数据类型支持float32, float64。
+    - **y** (Variable) : 输入Tensor y，数据类型支持float32, float64。
+    - **alpha** (float，可选) : 乘以x*y的标量，数据类型支持float32, float64，默认值为1.0。
+    - **beta** (float，可选) : 乘以input的标量，数据类型支持float32, float64，默认值为1.0。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：计算得到的Tensor。Tensor数据类型与输入input数据类型一致。
+返回类型：变量（Variable）
+**代码示例**:
+.. code-block:: python
+    import numpy as np
+    import paddle
+    import paddle.fluid as fluid
+    input = fluid.data(name='input', shape=[2, 2], dtype='float32')
+    x = fluid.data(name='x', shape=[2, 2], dtype='float32')
+    y = fluid.data(name='y', shape=[2, 2], dtype='float32')
+    out = paddle.addmm( input=input, x=x, y=y, alpha=5.0, beta=0.5 )
+    data_x = np.ones((2, 2)).astype(np.float32)
+    data_y = np.ones((2, 2)).astype(np.float32)
+    data_input = np.ones((2, 2)).astype(np.float32)
+    place =  fluid.CUDAPlace(0) if fluid.core.is_compiled_with_cuda() else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    results = exe.run(fluid.default_main_program(), 
+                      fetch_list=[out], feed={"input": data_input, 'x': data_x, "y": data_y})
+    print(np.array(results[0]))
+    # [[10.5 10.5]
+    # [10.5 10.5]]
--- a/doc/fluid/api_cn/tensor_cn/allclose_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/allclose_cn.rst
+.. _cn_api_tensor_allclose:
 allclose
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.allclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False, name=None)
+逐个检查input和other的所有元素是否均满足如下条件：
+..  math::
+    \left| input - other \right| \leq atol + rtol \times \left| other \right|
+该API的行为类似于 :math:`numpy.allclose` ，即当两个待比较Tensor的所有元素均在一定容忍误差范围内视为相等则该API返回True值。
+参数:
+    - **input** (Variable) - 第一个输入待比较Tensor input。
+    - **other** (Variable) - 第二个输入待比较Tensor other。
+    - **rtol** (float，可选) - 相对容忍误差，默认值为1e-5。
+    - **atol** (float，可选) - 绝对容忍误差，默认值为1e-8。
+    - **equal_nan** (bool，可选) - 如果设置为True，则两个NaN数值将被视为相等，默认值为False。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：计算得到的布尔类型单值Tensor。
+返回类型：变量（Variable）
+**代码示例**:
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    use_cuda = fluid.core.is_compiled_with_cuda()
+    a = fluid.data(name="a", shape=[2], dtype='float32')
+    b = fluid.data(name="b", shape=[2], dtype='float32')
+    result = paddle.allclose(a, b, rtol=1e-05, atol=1e-08,
+                            equal_nan=False, name="ignore_nan")
+    result_nan = paddle.allclose(a, b, rtol=1e-05, atol=1e-08,
+                                equal_nan=True, name="equal_nan")
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    x = np.array([10000., 1e-07]).astype("float32")
+    y = np.array([10000.1, 1e-08]).astype("float32")
+    result_v, result_nan_v = exe.run(
+        feed={'a': x, 'b': y},
+        fetch_list=[result, result_nan])
+    print(result_v, result_nan_v)
+    # Output: (array([False]), array([False]))
+    x = np.array([10000., 1e-08]).astype("float32")
+    y = np.array([10000.1, 1e-09]).astype("float32")
+    result_v, result_nan_v = exe.run(
+        feed={'a': x, 'b': y},
+        fetch_list=[result, result_nan])
+    print(result_v, result_nan_v)
+    # Output: (array([ True]), array([ True]))
+    x = np.array([1.0, float('nan')]).astype("float32")
+    y = np.array([1.0, float('nan')]).astype("float32")
+    result_v, result_nan_v = exe.run(
+        feed={'a': x, 'b': y},
+        fetch_list=[result, result_nan])
+    print(result_v, result_nan_v)
+    # Output: (array([False]), array([ True]))
--- a/doc/fluid/api_cn/tensor_cn/arange_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/arange_cn.rst
+.. _cn_api_paddle_tensor_arange
 arange
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.arange(start, end, step=1, dtype=None, name=None)
+该API根据step均匀分隔给定数值区间[start, end)，并返回该分隔结果。
+**参数**：
+        - **start** （float32 | float64 | int32 | int64 | Variable） - 区间起点，且区间包括此值, 当类型是Variable时，是shape为 [1] 的1-D Tensor。
+        - **end** （float32 | float64 | int32 | int64 | Variable） - 区间终点，通常区间不包括此值。但当step不是整数，且浮点数取整会影响输出的长度时例外。
+        - **step** （float32 | float64 | int32 | int64 | Variable） - 均匀分割的步长。
+        - **dtype** （str | core.VarDesc.VarType） - 输出Tensor的数据类型，可为 'float32', 'float64', 'int32', 'int64' 。
+**返回**：均匀分割给定数值区间后得到的1-D Tensor, 数据类型为输入 dtype 。
+**返回类型**：Variable
+**代码示例**
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    with fluid.dygraph.guard():
+                 x = paddle.arange(0, 6, 2) 
+                 # x: [0, 2, 4]
+                 # x dtype: float32
--- a/doc/fluid/api_cn/tensor_cn/bmm_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/bmm_cn.rst
+.. _cn_api_paddle_tensor_bmm:
 bmm
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.bmm(x, y, name=None):
+对输入x及输入y进行矩阵相乘。
+两个输入的维度必须等于3，并且矩阵x和矩阵y的第一维必须相等
+同时矩阵x的第二维必须等于矩阵y的第三维
+例如：若x和y分别为（b, m, k）和 （b, k, n)的矩阵，则函数的输出为一个（b, m, n）的矩阵
+**参数**：
+    -**x** (Variable) : 输入变量，类型为 Tensor 或 LoDTensor。
+    -**y** (Variable) : 输入变量，类型为 Tensor 或 LoDTensor。
+    -**name** (str|None) : 该层名称（可选），如果设置为空，则自动为该层命名。
+**返回**：
+    - Variable (Tensor / LoDTensor)，矩阵相乘后的结果。
+**返回类型**：
+    - Variable（变量）。
+**示例**:
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    # size input1: (2, 2, 3) and input2: (2, 3, 2)
+    input1 = np.array([[[1.0, 1.0, 1.0],[2.0, 2.0, 2.0]],[[3.0, 3.0, 3.0],[4.0, 4.0, 4.0]]])
+    input2 = np.array([[[1.0, 1.0],[2.0, 2.0],[3.0, 3.0]],[[4.0, 4.0],[5.0, 5.0],[6.0, 6.0]]])
+    with fluid.dygraph.guard():
+        x = fluid.dygraph.to_variable(input1)
+        y = fluid.dygraph.to_variable(input2)
+        out = paddle.bmm(x, y)
+        #output size: (2, 2, 2)
+        #output value:
+        #[[[6.0, 6.0],[12.0, 12.0]],[[45.0, 45.0],[60.0, 60.0]]]
+        out_np = out.numpy()
--- a/doc/fluid/api_cn/tensor_cn/clamp_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/clamp_cn.rst
+.. _cn_api_tensor_clamp:
 clamp
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.clamp(input, min=None, max=None, output=None, name=None)
+该OP将输入的所有元素进行剪裁，使得输出元素限制在[min, max]内，具体公式如下：
+.. math::
+        Out = MIN(MAX(x, min), max) 
+参数：
+    - **input** (Variable) – 指定输入为一个多维的Tensor，数据类型可以是float32，float64。
+    - **min** (float32|Variable, 可选) - 裁剪的最小值，输入中小于该值的元素将由该元素代替，若参数为空，则不对输入的最小值做限制。数据类型可以是float32或形状为[1]的Tensor，类型可以为int32，float32，float64，默认值为None。
+    - **max** (float32|Variable, 可选) - 裁剪的最大值，输入中大于该值的元素将由该元素代替，若参数为空，则不对输入的最大值做限制。数据类型可以是float32或形状为[1]的Tensor，类型可以为int32，float32，float64，默认值为None。
+    - **output** （Variable， 可选）- 输出Tensor或LoDTensor。如果为None，则创建一个新的Tensor作为输出Tensor，默认值为None。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：返回一个和输入形状相同的Tensor。
+返回类型：Variable
+**代码示例**：
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    in1 = np.array([[1.2,3.5],
+                    [4.5,6.4]]).astype('float32')
+    with fluid.dygraph.guard():
+        x1 = fluid.dygraph.to_variable(in1)
+        out1 = paddle.tensor.clamp(x1, min=3.5, max=5.0)
+        out2 = paddle.tensor.clamp(x1, min=2.5)
+        print(out1.numpy())
+        # [[3.5, 3.5]
+        # [4.5, 5.0]]
+        print(out2.numpy())
+        # [[2.5, 3.5]
+        # [[4.5, 6.4]
--- a/doc/fluid/api_cn/tensor_cn/cross_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/cross_cn.rst
+.. _cn_api_tensor_linalg_cross:
 cross
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.cross(input, other, dim=None)
+该OP返回在 ``dim`` 维度上，两个张量 ``input`` 和 ``other`` 的向量积（叉积）。 ``input`` 和 ``other`` 必须有相同的形状，
+且指定的 ``dim`` 维上 ``size`` 必须为3，如果 ``dim`` 未指定，默认选取第一个 ``size`` 等于3的维度。
+**参数**：
+    - **input** （Variable）– 第一个输入张量。
+    - **other** （Variable）– 第二个输入张量。
+    - **dim**    (int, optional) – 沿着此维进行叉积操作，若未指定，则默认选取第一个 ``size`` 等于3的维度
+**返回**：
+    - **Variable** ，数据类型同输入。
+**代码示例**：
+.. code-block:: python
+        import paddle
+        import paddle.fluid as fluid
+        import numpy as np
+        data_x = np.array([[1.0, 1.0, 1.0],
+                           [2.0, 2.0, 2.0],
+                           [3.0, 3.0, 3.0]])
+        data_y = np.array([[1.0, 1.0, 1.0],
+                           [1.0, 1.0, 1.0],
+                           [1.0, 1.0, 1.0]])
+        with fluid.dygraph.guard():
+            x = fluid.dygraph.to_variable(data_x)
+            y = fluid.dygraph.to_variable(data_y)
+            out_z1 = paddle.cross(x, y)
+            print(out_z1.numpy())
+            #[[-1. -1. -1.]
+            # [ 2.  2.  2.]
+            # [-1. -1. -1.]]
+            out_z2 = paddle.cross(x, y, dim=1)
+            print(out_z2.numpy())
+            #[[0. 0. 0.]
+            # [0. 0. 0.]
+            # [0. 0. 0.]]
--- a/doc/fluid/api_cn/tensor_cn/equal_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/equal_cn.rst
+.. _cn_api_tensor_equal:
 equal
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.equal(x, y, axis=-1, name=None)
+该OP返回 :math:`x==y` 逐元素比较x和y是否相等，所有的元素都相同则返回True，否则返回False。
+参数：
+    - **x** (Variable) - 输入Tensor，支持的数据类型包括 float32， float64，int32， int64。
+    - **y** (Variable) - 输入Tensor，支持的数据类型包括 float32， float64， int32， int64。
+    - **axis** (int, 可选) - 如果输入的两个Tensor的维度不相同，并且如果y的维度是x的一部分, 那就可以通过broadcast的方式来进行op计算。axis是进行broadcast的开始的维度，具体broadcast的方式可以参考elementwise_add。 
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：输出结果的Tensor，输出Tensor只有一个元素值，元素值是True或者False，Tensor数据类型为bool。
+返回类型：变量（Variable）
+**代码示例**:
+.. code-block:: python
+    import paddle.fluid as fluid
+    import paddle
+    import numpy as np
+    label = fluid.layers.assign(np.array([3, 4], dtype="int32"))
+    label_1 = fluid.layers.assign(np.array([1, 2], dtype="int32"))
+    limit = fluid.layers.assign(np.array([3, 4], dtype="int32"))
+    out1 = paddle.equal(x=label, y=limit) #out1=[True]
+    out2 = paddle.equal(x=label_1, y=limit) #out2=[False]
+.. code-block:: python
+    import paddle.fluid as fluid
+    import paddle
+    import numpy as np
+    def gen_data():
+        return {
+              "x": np.ones((2, 3, 4, 5)).astype('float32'),
+              "y": np.zeros((3, 4)).astype('float32')
+          }
+    x = fluid.data(name="x", shape=[2,3,4,5], dtype='float32')
+    y = fluid.data(name="y", shape=[3,4], dtype='float32')
+    out = paddle.equal(x, y, axis=1)
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    res = exe.run(feed=gen_data(),
+                      fetch_list=[out])
+    print(res[0]) #[False]
--- a/doc/fluid/api_cn/tensor_cn/full_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/full_cn.rst
+.. _cn_api_tensor_full:
 full
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.full(shape, fill_value, out=None, dtype=None, device=None, stop_gradient=True, name=None)
+该OP创建一个和具有相同的形状和数据类型的Tensor，其中元素值均为fill_value。
+参数：
+    - **shape** (list|tuple|Variable) – 指定创建Tensor的形状(shape)。
+    - **fill_value** (bool|float16|float32|int32|int64|Variable) - 用于初始化输出Tensor的常量数据的值。默认为0。注意：该参数不可超过输出变量数据类型的表示范围。
+    - **out** (Variable，可选) - 输出Tensor。如果为None，则创建一个新的Tensor作为输出Tensor，默认值为None。
+    - **dtype** （np.dtype|core.VarDesc.VarType|str， 可选）- 输出变量的数据类型。若参数为空，则输出变量的数据类型和输入变量相同，默认值为None。
+    - **device** (str，可选) – 选择在哪个设备运行该操作，可选值包括None，'cpu'和'gpu'。如果 ``device`` 为None，则将选择运行Paddle程序的设备，默认为None。
+    - **stop_gradient** (bool，可选) – 是否从此 Variable 开始，之前的相关部分都停止梯度计算，默认为True。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：返回一个存储结果的Tensor。
+返回类型：Variable
+抛出异常：
+    - ``TypeError`` - 如果 ``dtype`` 的类型不是bool, float16, float32, float64, int32, int64其中之一。
+    - ``TypeError`` - 如果 ``out`` 的元素的类型不是Variable。
+    - ``TypeError`` - 如果 ``shape`` 的类型不是list或tuple或Varibable。
+**代码示例**：
+.. code-block:: python
+    import paddle
+    data1 = paddle.full(shape=[2,1], fill_value=0, dtype='int64') # data1=[[0],[0]]
+    data2 = paddle.full(shape=[2,1], fill_value=5, dtype='int64', device='gpu') # data2=[[5],[5]]
+    # attr shape is a list which contains Variable Tensor.
+    positive_2 = paddle.fill_constant([1], "int32", 2)
+    data3 = paddle.full(shape=[1, positive_2], dtype='float32', fill_value=1.5) # data3=[1.5, 1.5]
+    # attr shape is an Variable Tensor.
+    shape = paddle.fill_constant([1,2], "int32", 2) # shape=[2,2]
+    data4 = paddle.full(shape=shape, dtype='bool', fill_value=True) # data4=[[True,True],[True,True]]
+    # attr value is an Variable Tensor.
+    val = paddle.fill_constant([1], "float32", 2.0) # val=[2.0]
+    data5 = paddle.full(shape=[2,1], fill_value=val, dtype='float32') #data5=[[2.0],[2.0]]
--- a/doc/fluid/api_cn/tensor_cn/full_like_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/full_like_cn.rst
-full
+.. _cn_api_tensor_full_like:
+full_like
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.full_like(input, fill_value, out=None, dtype=None, device=None, stop_gradient=True, name=None)
+该OP创建一个和input具有相同的形状和数据类型的Tensor，其中元素值均为fill_value。
+参数：
+    - **input** (Variable) – 指定输入为一个多维的Tensor，数据类型可以是bool，float16，float32，float64，int32，int64。
+    - **fill_value** (bool|float|int) - 用于初始化输出Tensor的常量数据的值。默认为0。注意：该参数不可超过输出变量数据类型的表示范围。
+    - **out** (Variable，可选) - 输出Tensor。如果为None，则创建一个新的Tensor作为输出Tensor，默认值为None。
+    - **dtype** （np.dtype|core.VarDesc.VarType|str， 可选）- 输出变量的数据类型。若参数为空，则输出变量的数据类型和输入变量相同，默认值为None。
+    - **device** (str，可选) – 选择在哪个设备运行该操作，可选值包括None，'cpu'和'gpu'。如果 ``device`` 为None，则将选择运行Paddle程序的设备，默认为None。
+    - **stop_gradient** (bool，可选) – 是否从此 Variable 开始，之前的相关部分都停止梯度计算，默认为True。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：返回一个存储结果的Tensor。
+返回类型：Variable
+**代码示例**：
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    input = fluid.data(name='input', dtype='float32', shape=[2, 3])
+    output = paddle.full_like(input, 2.0)
+    exe = fluid.Executor(fluid.CPUPlace())
+    exe.run(fluid.default_startup_program())
+    img=np.array([[1, 2, 3], [4, 5, 6]]).astype(np.float32)
+    res = exe.run(fluid.default_main_program(), feed={'input':img}, fetch_list=[output])
+    print(res) # [array([[2., 2., 2.], [2., 2., 2.]], dtype=float32)]
--- a/doc/fluid/api_cn/tensor_cn/gather_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/gather_cn.rst
+.. _cn_api_paddle_tensor_gather
 gather
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.gather(input, index, overwrite=True)
+根据索引 index 获取输入（input）的最外层维度的条目，并将它们拼接在一起。
+.. math::
+        Out=X[Index]
+**参数**:
+        - **input** (Variable) - 输入, 秩 ``rank >= 1`` , 支持的数据类型包括 int32、int64、float32、float64 和 uint8 (CPU)、float16（GPU） 。
+        - **index** (Variable) - 索引，秩 ``rank = 1``, 数据类型为 int32 或 int64。
+        - **overwrite** (bool) - 具有相同索引时在反向更新梯度的模式。如果为 ``True`` ，则使用覆盖模式更新相同索引的梯度；如果为 ``False`` ，则使用累积模式更新相同索引的梯度。默认值为 ``True`` 。
+**返回**：和输入的秩相同的输出张量。
+**返回类型**：Variable
+**代码示例**：
+..  code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    with fluid.dygraph.guard():
+        input_1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
+        index_1 = np.array([0,1])
+        input = fluid.dygraph.to_variable(input_1)
+        index = fluid.dygraph.to_variable(index_1)
+        output = paddle.fluid.layers.gather(input, index)
+        # expected output: [[1, 2, 3],[4, 5, 6]]
--- a/doc/fluid/api_cn/tensor_cn/index_select_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/index_select_cn.rst
-index
+.. _cn_api_tensor_search_index_select:
+index_select
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.index_select(input, index, dim=0)
+该OP沿着指定维度 ``dim`` 对输入 ``input`` 进行索引，取 ``index`` 中指定的相应项，然后返回到一个新的张量。这里 ``index`` 是一个 ``1-D`` 张量。除 ``dim`` 维外，返回的张量其余维度大小同输入 ``input`` ， ``dim`` 维大小等于 ``index`` 的大小。
+**参数**：
+    - **input** （Variable）– 输入张量。
+    - **index** （Variable）– 包含索引下标的一维张量。
+    - **dim**    (int, optional) – 索引轴，若未指定，则默认选取第一维。
+**返回**：
+    -**Variable** ，数据类型同输入。
+**代码示例**：
+.. code-block:: python
+        import paddle
+        import paddle.fluid as fluid
+        import numpy as np
+        data = np.array([[1.0, 2.0, 3.0, 4.0],
+                            [5.0, 6.0, 7.0, 8.0],
+                            [9.0, 10.0, 11.0, 12.0]])
+        data_index = np.array([0, 1, 1]).astype('int32')
+        with fluid.dygraph.guard():
+            x = fluid.dygraph.to_variable(data)
+            index = fluid.dygraph.to_variable(data_index)
+            out_z1 = paddle.index_select(x, index)
+            print(out_z1.numpy())
+            #[[1. 2. 3. 4.]
+            # [5. 6. 7. 8.]
+            # [5. 6. 7. 8.]]
+            out_z2 = paddle.index_select(x, index, dim=1)
+            print(out_z2.numpy())
+            #[[ 1.  2.  2.]
+            # [ 5.  6.  6.]
+            # [ 9. 10. 10.]]
--- a/doc/fluid/api_cn/tensor_cn/inverse_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/inverse_cn.rst
+.. _cn_api_tensor_inverse:
 inverse
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.inverse(input, out=None, name=None)
+计算方阵的逆。方阵是行数和列数相等的矩阵。输入可以是一个方阵（2-D张量），或者是批次方阵（维数大于2时）。
+**参数**：
+  - **input** (Variable) – 输入张量，最后两维的大小必须相等。如果输入张量的维数大于2，则高维部分代表2-D矩阵的批次（batch）。支持的数据类型：float32，float64。
+  - **out** (Variable，可选) – 指定求和的结果Tensor，可以是程序中已经创建的任何Variable。默认值为None，此时将创建新的Variable来保存输出结果。
+  - **name** (str，可选) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None。
+**返回**：数据类型同输入。
+返回类型：Variable
+抛出异常：
+    - :code:`TypeError` ，input不是Variable类型，或者数据类型不是float32、float64时
+    - :code:`ValueError` ，input的维数小于2时
+    - :code:`TypeError` ，out不是Variable类型，或者数据类型和input不相同时
+**代码示例**：
+.. code-block:: python
+    import numpy as np
+    import paddle
+    mat_np = np.array([[2, 0], [0, 2]]).astype("float32")
+    with paddle.imperative.guard():
+        mat = paddle.imperative.to_variable(mat_np)
+        inv = paddle.inverse(mat)
+        print(inv.numpy()) # [[0.5, 0], [0, 0.5]]
--- a/doc/fluid/api_cn/tensor_cn/kron_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/kron_cn.rst
+.. _cn_api_paddle_tensor_kron:
+kron
+-------------------------------
+.. py:function:: paddle.tensor.kron(x, y, out=None, name=None)
+Kronecker Product 算子。
+该 OP 计算两个张量的克罗内克积，结果是一个合成的张量，由第二个张量经过第一个张量中的元素缩放
+后的组块构成。
+这个 OP 预设两个张量 $X$ 和 $Y$ 的秩 (rank) 相同，如有必要，将会在秩较小的张量的形状前面补
+上 1。令 $X$ 的形状是 [$r_0$, $r_1$, ..., $r_N$]，$Y$ 的形状是 
+[$s_0$, $s_1$, ..., $s_N$]，那么输出张量的形状是 
+[$r_{0}s_{0}$, $r_{1}s_{1}$, ..., $r_{N}s_{N}$]. 其中的元素是 $X$ 和 $Y$ 中的元素
+的乘积。
+公式为
+.. math::
+          output[k_{0}, k_{1}, ..., k_{N}] = X[i_{0}, i_{1}, ..., i_{N}] *
+          Y[j_{0}, j_{1}, ..., j_{N}]
+其中
+.. math::
+          k_{t} = i_{t} * s_{t} + j_{t}, t = 0, 1, ..., N
+参数:
+  - **x** (Variable) – Kron OP 的第一个输入。多维 Tensor，数据类型为 float16, float32, float64, int32 或 int64。
+  - **y** (Variable) – Kron OP 的第二个输入。多维 Tensor，数据类型为 float16, float32, float64, int32 或 int64，与 x 相同。
+  - **out**  (Variable， 可选) -  指定算子输出结果的 Tensor，可以是程序中已经创建的任何 Variable。默认值为 None，此时将创建新的 Variable 来保存输出结果。
+  - **name** (str，可选) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为 None。
+返回：
+  - Kron OP 的输出。多维 Tensor，数据类型为 float16, float32, float64, int32 或 int64，与 x 一致。
+返回类型: Variable 
+**代码示例**
+..  code-block:: python
+  import paddle
+  from paddle import fluid
+  import paddle.fluid.dygraph as dg
+  import numpy as np
+  a = np.arange(1, 5).reshape(2, 2).astype(np.float32)
+  b = np.arange(1, 10).reshape(3, 3).astype(np.float32)
+  place = fluid.CPUPlace()
+  with dg.guard(place):
+      a_var = dg.to_variable(a)
+      b_var = dg.to_variable(b)
+      c_var = paddle.kron(a_var, b_var)
+      c_np = c_var.numpy()
+  print(c_np)
+  #[[ 1.  2.  3.  2.  4.  6.]
+  # [ 4.  5.  6.  8. 10. 12.]
+  # [ 7.  8.  9. 14. 16. 18.]
+  # [ 3.  6.  9.  4.  8. 12.]
+  # [12. 15. 18. 16. 20. 24.]
+  # [21. 24. 27. 28. 32. 36.]]
--- a/doc/fluid/api_cn/tensor_cn/linspace_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/linspace_cn.rst
+.. _cn_api_tensor_linspace:
 linspace
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.linspace(start, stop, num, dtype, out=None, device=None, name=None)
+该OP在给定区间内返回固定数目的均匀间隔的值。
+**注意：该OP不进行梯度计算**
+参数：
+    - **start** (float|Variable) – start是区间开始的变量，可以是一个浮点标量，或是一个shape为[1]的Tensor，该Tensor的数据类型可以是float32或者是float64。
+    - **stop** (float|Variable) – end是区间结束的变量，可以是一个浮点标量，或是一个shape为[1]的Tensor，该Tensor的数据类型可以是float32或者是float64。
+    - **num** (int|Variable) – num是给定区间内需要划分的区间数，可以是一个整型标量，或是一个shape为[1]的Tensor，该Tensor的数据类型需为int32。
+    - **dtype** (string) – 输出Tensor的数据类型，可以是‘float32’或者是‘float64’。
+    - **out** (Variable，可选) – 指定存储运算结果的Tensor。如果设置为None或者不设置，将创建新的Tensor存储运算结果，默认值为None。
+    - **device** (str，可选) – 选择在哪个设备运行该操作，可选值包括None，'cpu'和'gpu'。如果 ``device``  为None，则将选择运行Paddle程序的设备，默认为None。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：输出结果的数据类型是float32或float64，表示等间隔划分结果的1-D Tensor，该Tensor的shape大小为 :math:`[num]` ，在mum为1的情况下，仅返回包含start元素值的Tensor。
+返回类型：Variable
+**代码示例**：
+.. code-block:: python
+      import paddle
+      data = paddle.linspace(0, 10, 5, dtype='float32') # [0.0,  2.5,  5.0,  7.5, 10.0]
+      data = paddle.linspace(0, 10, 1, dtype='float32') # [0.0]
--- a/doc/fluid/api_cn/tensor_cn/matmul_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/matmul_cn.rst
+.. _cn_api_tensor_matmul:
 matmul
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.matmul(x, y, transpose_x=False, transpose_y=False, alpha=1.0, name=None)
+输入 ``x`` 和输入 ``y`` 矩阵相乘。
+两个输入的形状可为任意维度，但当任一输入维度大于3时，两个输入的维度必须相等。
+实际的操作取决于 ``x`` 、 ``y`` 的维度和 ``transpose_x`` 、 ``transpose_y`` 的布尔值。具体如下：
+- 如果 ``transpose`` 为真，则对应 Tensor 的后两维会转置。假定 ``x`` 是一个 shape=[D] 的一维 Tensor，则 ``x`` 非转置形状为 [1, D]，转置形状为 [D, 1]。转置之后的输入形状需满足矩阵乘法要求，即 `x_width` 与 `y_height` 相等。
+- 转置后，输入的两个 Tensor 维度将为 2-D 或 n-D，将根据下列规则矩阵相乘：
+    - 如果两个矩阵都是 2-D，则同普通矩阵一样进行矩阵相乘。
+    - 如果任意一个矩阵是 n-D，则将其视为带 batch 的二维矩阵乘法。
+- 如果原始 Tensor x 或 y 的秩为 1 且未转置，则矩阵相乘后的前置或附加维度 1 将移除。
+参数：
+    - **x** (Variable) : 输入变量，类型为 Tensor 或 LoDTensor，数据类型为float32， float64，GPU设备下支持float16。
+    - **y** (Variable) : 输入变量，类型为 Tensor 或 LoDTensor，数据类型为float32， float64，GPU设备下支持float16。
+    - **transpose_x** (bool，可选) : 相乘前是否转置 x，默认值为False。
+    - **transpose_y** (bool，可选) : 相乘前是否转置 y，默认值为False。
+    - **alpha** (float，可选) : 输出比例，默认为 3.0。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：
+    - Variable (Tensor / LoDTensor)，矩阵相乘后的结，数据类型和输入数据类型一致。
+返回类型：
+    - Variable（变量）。
+::
+    * 例 1:
+    x: [B, ..., M, K], y: [B, ..., K, N]
+    # paddle.matmul(x, y)  # out: [B, ..., M, N]
+    * 例 2:
+    x: [B, M, K], y: [B, K, N]
+    # paddle.matmul(x, y)  # out: [B, M, N]
+    * 例 3:
+    x: [B, M, K], y: [K, N]
+    # paddle.matmul(x, y)  # out: [B, M, N]
+    * 例 4:
+    x: [M, K], y: [K, N]
+    # paddle.matmul(x, y)  # out: [M, N]
+    * 例 5:
+    x: [B, M, K], y: [K]
+    # paddle.matmul(x, y)  # out: [B, M]
+    * 例 6:
+    x: [K], y: [K]
+    # paddle.matmul(x, y)  # out: [1]
+    * 例 7:
+    x: [M], y: [N]
+    # paddle.matmul(x, y, True, True)  # out: [M, N]
+**代码示例**：
+.. code-block:: python
+    import paddle.fluid as fluid
+    x = fluid.data(name='x', shape=[2, 3], dtype='float32')
+    y = fluid.data(name='y', shape=[3, 2], dtype='float32')
+    out = paddle.matmul(x, y, True, True)
--- a/doc/fluid/api_cn/tensor_cn/meshgrid_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/meshgrid_cn.rst
+.. _cn_api_paddle_tensor_meshgrid:
 meshgrid
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.meshgrid(input, name=None)
+该OP的输入是tensor list, 包含 k 个一维Tensor，对每个Tensor做扩充操作，输出 k 个 k 维tensor。
+参数：
+         - **input** （Variable）- 输入变量为 k 个一维Tensor，形状分别为(N1,), (N2,), ..., (Nk, )。支持数据类型为float32，float64，int32，int64。
+         - **name** （str， 可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回： 
+k 个 k 维Tensor，每个Tensor的形状均为(N1, N2, ..., Nk)。
+返回类型：  变量（Variable）
+**代码示例**
+..  code-block:: python
+    #静态图示例
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    x = fluid.data(name='x', shape=[100], dtype='int32')
+    y = fluid.data(name='y', shape=[200], dtype='int32')
+    input_1 = np.random.randint(0, 100, [100, ]).astype('int32')
+    input_2 = np.random.randint(0, 100, [200, ]).astype('int32')
+    exe = fluid.Executor(place=fluid.CPUPlace())
+    grid_x, grid_y = paddle.tensor.meshgrid([x, y])
+    res_1, res_2 = exe.run(fluid.default_main_program(),
+                            feed={'x': input_1,
+                                  'y': input_2},
+                            fetch_list=[grid_x, grid_y])
+    #the shape of res_1 is (100, 200)
+    #the shape of res_2 is (100, 200)
+..  code-block:: python
+    #动态图示例
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    input_3 = np.random.randint(0, 100, [100, ]).astype('int32')
+    input_4 = np.random.randint(0, 100, [200, ]).astype('int32')
+    with fluid.dygraph.guard():
+        tensor_3 = fluid.dygraph.to_variable(input_3)
+        tensor_4 = fluid.dygraph.to_variable(input_4)
+        grid_x, grid_y = paddle.tensor.meshgrid([tensor_3, tensor_4])
+    #the shape of grid_x is (100, 200)
+    #the shape of grid_y is (100, 200)    
--- a/doc/fluid/api_cn/tensor_cn/nonzero_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/nonzero_cn.rst
+.. _cn_api_tensor_search_nonzero:
 nonzero
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.nonzero(input, as_tuple=False)
+该OP返回输入 ``input`` 中非零元素的坐标。如果输入 ``input`` 有 ``n`` 维，共包含 ``z`` 个非零元素，当 ``as_tuple = False`` 时，
+返回结果是一个 ``shape`` 等于 ``[z x n]`` 的 ``Tensor`` , 第 ``i`` 行代表输入中第 ``i`` 个非零元素的坐标；当 ``as_tuple = True`` 时，
+返回结果是由 ``n`` 个大小为 ``z`` 的 ``1-D Tensor`` 构成的元组，第 ``i`` 个 ``1-D Tensor`` 记录输入的非零元素在第 ``i`` 维的坐标。
+**参数**：
+    - **input** （Variable）– 输入张量。
+    - **as_tuple** (bool, optinal) - 返回格式。是否以 ``1-D Tensor`` 构成的元组格式返回。
+**返回**：
+    - **Variable** (Tensor or tuple(1-D Tensor))，数据类型为 **INT64** 。
+**代码示例**：
+.. code-block:: python
+        import paddle
+        import paddle.fluid as fluid
+        import numpy as np
+        data1 = np.array([[1.0, 0.0, 0.0],
+                            [0.0, 2.0, 0.0],
+                            [0.0, 0.0, 3.0]])
+        data2 = np.array([0.0, 1.0, 0.0, 3.0])
+        data3 = np.array([0.0, 0.0, 0.0])
+        with fluid.dygraph.guard():
+            x1 = fluid.dygraph.to_variable(data1)
+            x2 = fluid.dygraph.to_variable(data2)
+            x3 = fluid.dygraph.to_variable(data3)
+            out_z1 = paddle.nonzero(x1)
+            print(out_z1.numpy())
+            #[[0 0]
+            # [1 1]
+            # [2 2]]
+            out_z1_tuple = paddle.nonzero(x1, as_tuple=True)
+            for out in out_z1_tuple:
+                print(out.numpy())
+            #[[0]
+            # [1]
+            # [2]]
+            #[[0]
+            # [1]
+            # [2]]
+            out_z2 = paddle.nonzero(x2)
+            print(out_z2.numpy())
+            #[[1]
+            # [3]]
+            out_z2_tuple = paddle.nonzero(x2, as_tuple=True)
+            for out in out_z2_tuple:
+                print(out.numpy())
+            #[[1]
+            # [3]]
+            out_z3 = paddle.nonzero(x3)
+            print(out_z3.numpy())
+            #[]
+            out_z3_tuple = paddle.nonzero(x3, as_tuple=True)
+            for out in out_z3_tuple:
+                print(out.numpy())
+            #[]         
--- a/doc/fluid/api_cn/tensor_cn/randint_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/randint_cn.rst
+.. _cn_api_tensor_randint:
 randint
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.randint(low, high=None, shape=None, out=None, dtype=None, device=None, stop_gradient=False, seed=0, name=None)
+该OP使用从区间[low，high)内均匀分布采样的随机整数初始化一个Tensor。当high为None时（默认），均匀采样的区间为[0,low)。
+参数：
+    - **low** (int)-要生成的随机值范围的下限，low包含在范围中。当high为None时，均匀采样的区间为[0,low)。
+    - **high** (int，可选)-要生成的随机值范围的上限，high不包含在范围中。默认值为None。
+    - **shape** (list|tuple|Variable，可选)-输出Tensor的维度，shape类型支持list，tuple，Variable。如果shape类型是list或者tuple，它的元素可以是整数或者形状为[1]的Tensor，其中整数的数据类型为int，Tensor的数据类型为int32或int64。如果shape的类型是Variable，则是1D的Tensor，Tensor的数据类型为int32或int64。如果shape为None，则会将shape设置为[1]。默认值为None。
+    - **out** (Variable，可选)-用于存储创建的Tensor，可以是程序中已经创建的任何Variable。默认值为None，此时将创建新的Variable来保存输出结果。
+    - **dtype** (np.dtype|core.VarDesc.VarType|str，可选)- 输出Tensor的数据类型，支持数据类型为int32，int64。如果dtype为None，则会将dtype设置为int64。默认值为None。
+    - **device** (str， 可选)-指定在GPU或CPU上创建Tensor。如果device为None，则将选择运行Paddle程序的设备，默认为None。
+    - **stop_gradient** (bool，可选)-指定是否停止梯度计算，默认值为False。
+    - **seed** (int，可选)-随机种子，用于生成样本。0表示使用系统生成的种子。注意如果种子不为0，该操作符每次都生成同样的随机数。默认为 0。
+    - **name** (str，可选)-具体用法请参见:ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：表示一个随机初始化结果的Tensor，该Tensor的数据类型由dtype参数决定，该Tensor的维度由shape参数决定。
+返回类型：Variable
+抛出异常：
+    - :code:`TypeError`: shape的类型应该是list、tuple 或 Variable。
+    - :code:`TypeError`: dtype的类型应该是int32或int64。
+    - :code:`ValueError`: 该OP的high必须大于low（high为None时，则会先将high设置为low，将low设置为0，再判断low和high的大小关系）。
+**代码示例**：
+.. code-block:: python
+    import paddle.fluid as fluid
+    import paddle
+    # example 1:
+    # attr shape is a list which doesn't contain tensor Variable.
+    result_1 = paddle.randint(low=-5, high=5, shape=[3, 4], dtype="int64")
+    # example 2:
+    # attr shape is a list which contains tensor Variable.
+    dim_1 = fluid.layers.fill_constant([1],"int64",3)
+    dim_2 = fluid.layers.fill_constant([1],"int32",5)
+    result_2 = paddle.randint(low=-5, high=5, shape=[dim_1, dim_2], dtype="int32")
+    # example 3:
+    # attr shape is a Variable, the data type must be int64 or int32.
+    var_shape = fluid.data(name='var_shape', shape=[2], dtype="int64")
+    result_3 = paddle.randint(low=-5, high=5, shape=var_shape, dtype="int32")
+    var_shape_int32 = fluid.data(name='var_shape_int32', shape=[2], dtype="int32")
+    result_4 = paddle.randint(low=-5, high=5, shape=var_shape_int32, dtype="int64")
+    # example 4:
+    # Input only one parameter
+    # low=0, high=10, shape=[1], dtype='int64'
+    result_4 = paddle.randint(10)
--- a/doc/fluid/api_cn/tensor_cn/randperm_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/randperm_cn.rst
+.. _cn_api_tensor_random_randperm:
 randperm
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.random.randperm(n, out=None, dtype="int64", device=None, stop_gradient=True, seed=0)
+该OP返回一个数值在0到n-1、顺序随机的整数排列。
+参数: 
+  - **n** (int): 整数排列的上限，应该大于0。 
+  - **out** (Variable, optional): 可选的输出变量，如果不为 `None` ，返回的整数排列保存在该变量中，默认是 `None` 。
+  - **dtype** (np.dtype|core.VarDesc.VarType|str, optional): 整数排列的数据类型，支持 `int64` 和 `int32` ，默认是 `int64` 。
+  - **device** (str, optional): 指定整数排列所在的设备内存。设置为 `cpu` 则保存在 `cpu` 内存中，设置为 `gpu` ，则保存在 `gpu` 内存中，设置为 `None` 则保存在运行的设备内存中。默认是 `None` 。
+  - **stop_gradient** (bool, optional): 返回的整数排列是否记录并更新梯度，默认是 `True` 。 
+  - **seed** (int, optional): 设置随机种子。`seed` 等于0时，每次返回不同的整数排列；`seed` 不等于0时，相同的 `seed` 返回相同的整数排列。
+返回:  一个数值在0到n-1、顺序随机的整数排列。
+返回类型: Variable
+**代码示例**:
+..  code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    # Note that, the random permutation returned by randperm depends
+    # the random seed in computer, so the output in the next example
+    # will be change.
+    with fluid.dygraph.guard():
+        out_1 = paddle.randperm(6)
+        print(out_1.numpy())  # Random permutation, for example [2 4 5 0 3 1]
+        out_2 = fluid.dygraph.to_variable(
+				np.array([0, 1, 2, 3])).astype(np.int64)
+        paddle.randperm(6, out_2)
+        print(out_2.numpy())  # Random permutation, for example [5 0 2 4 1 3]
+        out_3 = paddle.randperm(6, dtype="int32", device="cpu")
+        print(out_3.numpy())  # Random permutation, for example [3 1 4 2 5 0]
+        out_4 = paddle.randperm(6, device="cpu", stop_gradient=True)
+        print(out_4.numpy())  # Random permutation, for example [3 1 5 2 0 4]     
--- a/doc/fluid/api_cn/tensor_cn/roll_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/roll_cn.rst
+.. _cn_api_tensor_manipulation_roll:
 roll
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.roll(input, shifts, dims=None):
+该OP沿着指定维度对输入 ``input`` 进行循环滚动，当元素移动到最后位置时，会从第一个位置重新插入。如果 ``dims`` 为 ``None`` ，则输入在被循环滚动之前，会先展平成 ``1-D Tensor`` ，滚动操作完成后恢复成原来的形状。
+**参数**：
+    - **input** （Variable）– 输入张量。
+    - **shifts** (int|list|tuple) - 滚动位移。如果 ``shifts`` 是一个元组或者列表，则 ``dims`` 必须是相同大小的元组或者列表，输入张量将依次沿着每个维度滚动相应的数值。
+    - **dim**    (int|list|tuple, optinal) – 滚动轴。
+**返回**：
+    - **Variable**，数据类型同输入。
+**代码示例**：
+.. code-block:: python
+        import numpy as np
+        import paddle
+        import paddle.fluid as fluid
+        data = np.array([[1.0, 2.0, 3.0],
+                            [4.0, 5.0, 6.0],
+                            [7.0, 8.0, 9.0]])
+        with fluid.dygraph.guard():
+            x = fluid.dygraph.to_variable(data)
+            out_z1 = paddle.roll(x, shifts=1)
+            print(out_z1.numpy())
+            #[[9. 1. 2.]
+            # [3. 4. 5.]
+            # [6. 7. 8.]]
+            out_z2 = paddle.roll(x, shifts=1, dims=0)
+            print(out_z2.numpy())
+            #[[7. 8. 9.]
+            # [1. 2. 3.]
+            # [4. 5. 6.]]
--- a/doc/fluid/api_cn/tensor_cn/split_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/split_cn.rst
+.. _cn_api_paddle_tensor_split
 split
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.split(input, num_or_sections, dim=-1, name=None)
+该OP将输入Tensor分割成多个子Tensor。
+**参数**：
+       - **input** (Variable) - 输入变量，数据类型为float32，float64，int32，int64的多维Tensor或者LoDTensor。
+       - **num_or_sections** (int|list|tuple) - 如果 num_or_sections 是一个整数，则表示Tensor平均划分为相同大小子Tensor的数量。如果 num_or_sections 是一个list或tuple，那么它的长度代表子Tensor的数量，它的元素可以是整数或者形状为[1]的Tensor，依次代表子Tensor需要分割成的维度的大小。list或tuple的长度不能超过输入Tensor待分割的维度的大小。在list或tuple中，至多有一个元素值为-1，表示该值是由input的维度和其他num_or_sections中元素推断出来的。例如对一个维度为[4,6,6]Tensor的第三维进行分割时，指定num_or_sections=[2,-1,1]，输出的三个Tensor维度分别为：[4,6,2]，[4,6,3]，[4,6,1]。
+       - **dim** (int|Variable，可选) - 整数或者形状为[1]的Tensor，数据类型为int32或int64。表示需要分割的维度。如果dim < 0，则划分的维度为rank(input) + dim。默认值为-1。
+       - **name** (str，可选) - 一般无需设置，默认值为None。
+**返回**：分割后的Tensor列表。
+**返回类型**：列表(Variable(Tensor|LoDTensor))，数据类型为int32，int64，float32，float64。
+**代码示例**：
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    with fluid.dygraph.guard():
+        input_1 = np.random.random([4, 6, 6]).astype("int32")
+        # input is a variable which shape is [4, 6, 6]
+        input = fluid.dygraph.to_variable(input_1)
+        x0, x1, x2 = paddle.split(input, num_or_sections= 3, dim=1)
+        # x0.shape [4, 2, 6]
+        # x1.shape [4, 2, 6]
+        # x2.shape [4, 2, 6]
--- a/doc/fluid/api_cn/tensor_cn/squeeze_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/squeeze_cn.rst
+.. _cn_api_paddle_tensor_squeeze
 squeeze
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.squeeze(input, zxes, name=None)
+该OP会根据axes压缩输入Tensor的维度。如果指定了axes，则会删除axes中指定的维度，axes指定的维度要等于1。如果没有指定axes，那么所有等于1的维度都会被删除。
+**参数**：
+        - **input** (Variable) - 输入任意维度的Tensor。 支持的数据类型：float32，float64，int8，int32，int64。
+        - **axes** (list) - 输入一个或一列整数，代表要压缩的轴。axes的范围： [−rank(input),rank(input))] 。 axes为负数时， axes=axes+rank(input) 。
+        - **name** (str，可选) - 一般无需设置，默认值为None。
+**返回**：返回对维度进行压缩后的Tensor。数据类型与输入Tensor一致。
+**返回类型**：Variable
+**代码示例**：
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    with fluid.dygraph.guard():
+        input_1 = np.random.random([5, 1, 10]).astype("int32")
+        # input is a variable which shape is [5, 1, 10]
+        input = fluid.dygraph.to_variable(input_1)
+        output = paddle.fluid.layers.squeeze(input, axes=[1])
+        # output.shape [5, 10]
--- a/doc/fluid/api_cn/tensor_cn/stack_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/stack_cn.rst
+.. _cn_api_paddle_tensor_arange
 stack
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.stack(x, axis=0)
+该OP沿 axis 轴对输入 x 进行堆叠操作。
+**参数**：
+        - **x** (Variable|list(Variable)) – 输入 x 可以是单个Tensor，或是多个Tensor组成的列表。如果 x 是一个列表，那么这些Tensor的维度必须相同。 假设输入是N维Tensor [d0,d1,...,dn−1]，则输出变量的维度为N+1维 [d0,d1,...daxis−1,len(x),daxis...,dn−1] 。支持的数据类型: float32，float64，int32，int64。
+        - **axis** (int, 可选) – 指定对输入Tensor进行堆叠运算的轴，有效 axis 的范围是: [−(R+1),R+1)]，R是输入中第一个Tensor的rank。如果 axis < 0，则 axis=axis+rank(x[0])+1 。axis默认值为0。
+**返回**：堆叠运算后的Tensor，数据类型与输入Tensor相同。输出维度等于 rank(x[0])+1 维。
+**返回类型**：Variable
+**代码示例**:
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    data1 = np.array([[1.0, 2.0,3.0]])
+    data2 = np.array([[3.0, 4.0, 5.0]])
+    data3 = np.array([[5.0, 6.0,7.0]])
+    with fluid.dygraph.guard():
+        x1 = fluid.dygraph.to_variable(data1)
+        x2 = fluid.dygraph.to_variable(data2)
+        x3 = fluid.dygraph.to_variable(data3)
+        result = paddle.stack([x1, x2, x3], axis=2)
+        # result shape: [3, 1, 2]
+        # result value: [[[1.0, 2.0]],
+        #                [[3.0, 4.0]],
+        #                [[5.0, 6.0]]]
--- a/doc/fluid/api_cn/tensor_cn/trace_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/trace_cn.rst
+.. _cn_api_tensor_trace:
+trace
+-------------------------------
+.. py:function:: paddle.trace(input, offset=0, dim1=0, dim2=1)
+该 OP 计算输入 Tensor 在指定平面上的对角线元素之和，并输出相应的计算结果。
+如果输入是 2D Tensor，则返回对角线元素之和。 
+如果输入的维度大于 2D，则返回一个由对角线元素之和组成的数组，其中对角线从由 dim1 和 dim2 指定的二维平面中获得。默认由输入的前两维组成获得对角线的 2D 平面。
+参数 ``offset`` 确定从指定的二维平面中获取对角线的位置：
+    - 如果 offset = 0，则取主对角线。
+    - 如果 offset > 0，则取主对角线右上的对角线。
+    - 如果 offset < 0，则取主对角线左下的对角线。
+参数：
+    - **input** （Variable）- 输入变量，至少为 2D 数组，支持数据类型为 float32，float64，int32，int64。
+    - **offset** （int ，可选）- 从指定的二维平面中获取对角线的位置，默认值为 0，既主对角线。
+    - **dim1** （int ， 可选）- 获取对角线的二维平面的第一维，默认值为 0。
+    - **dim2** （int ， 可选）- 获取对角线的二维平面的第二维，默认值为 1。
+返回： 指定二维平面的对角线元素之和。数据类型和输入数据类型一致。
+返回类型：  变量（Variable）
+**代码示例**
+..  code-block:: python
+    import paddle.tensor as tensor
+    import paddle.fluid.dygraph as dg
+    import numpy as np
+    case1 = np.random.randn(2, 3).astype('float32')
+    case2 = np.random.randn(3, 10, 10).astype('float32')
+    case3 = np.random.randn(3, 10, 5, 10).astype('float32')
+    with dg.guard():
+        case1 = dg.to_variable(case1)
+        case2 = dg.to_variable(case2)
+        case3 = dg.to_variable(case3)
+        data1 = tensor.trace(case1) # data1.shape = [1]
+        data2 = tensor.trace(case2, offset=1, dim1=1, dim2=2) # data2.shape = [3]
+        data3 = tensor.trace(case3, offset=-3, dim1=1, dim2=-1) # data2.shape = [3, 5]
--- a/doc/fluid/api_cn/tensor_cn/unbind_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/unbind_cn.rst
+.. _cn_api_paddle_tensor_unbind
 unbind
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.unbind(input, axis=0)
+该OP将输入Tensor按照指定的维度分割成多个子Tensor。
+**参数**：
+       - **input** (Variable) - 输入变量，数据类型为float32，float64，int32，int64的多维Tensor。
+       - **axis** (int32|int64，可选) - 数据类型为int32或int64,表示需要分割的维度。如果axis < 0，则划分的维度为rank(input) + axis。默认值为0。
+**返回**：分割后的Tensor列表。
+**返回类型**：列表(Variable)，数据类型为int32，int64，float32，float64。
+**代码示例**：
+.. code-block:: python
+    import paddle
+    # input is a variable which shape is [3, 4, 5]
+    input = paddle.fluid.data(
+        name="input", shape=[3, 4, 5], dtype="float32")
+    [x0, x1, x2] = paddle.tensor.unbind(input, axis=0)
+    # x0.shape [4, 5]
+    # x1.shape [4, 5]
+    # x2.shape [4, 5]
+    [x0, x1, x2, x3] = paddle.tensor.unbind(input, axis=1)
+    # x0.shape [3, 5]
+    # x1.shape [3, 5]
+    # x2.shape [3, 5]
+    # x3.shape [3, 5]
--- a/doc/fluid/api_cn/tensor_cn/unsqueeze_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/unsqueeze_cn.rst
+.. _cn_api_paddle_tensor_unsqueeze
 unsqueeze
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.tensor.unsqueeze(input, axes, name=None)
+该OP向输入（input）的shape中一个或多个位置（axes）插入维度。
+**参数**：
+        - **input** (Variable)- 多维 Tensor，数据类型为 float32， float64， int8， int32，或 int64。
+        - **axes** (int|list|tuple|Variable) - 表示要插入维度的位置。数据类型是 int32 。如果 axes 的类型是 list 或 tuple，它的元素可以是整数或者形状为[1]的 Tensor 。如果 axes 的类型是 Variable，则是1-D Tensor。
+        - **name** （str，可选）- 一般无需设置。默认值： None。
+**返回**：扩展维度后的多维Tensor
+**返回类型**：Variable
+**代码示例**：
+.. code-block:: python
+    import paddle
+    import paddle.fluid as fluid
+    import numpy as np
+    with fluid.dygraph.guard():
+        input_1 = np.random.random([5, 10]).astype("int32")
+        # input is a variable which shape is [5, 1, 10]
+        input = fluid.dygraph.to_variable(input_1)
+        output = paddle.unsqueeze(input, axes=[1])
+        # output.shape [5, 1, 10]
--- a/doc/fluid/api_cn/tensor_cn/where_cn.rst
+++ b/doc/fluid/api_cn/tensor_cn/where_cn.rst
+.. _cn_api_tensor_where:
 where
 -------------------------------
-**版本升级，文档正在开发中**
+.. py:function:: paddle.where(condition, x, y, name=None)
+该OP返回一个根据输入 ``condition``, 选择 ``x`` 或 ``y`` 的元素组成的多维 ``Tensor``  ：
+.. math::
+      Out_i =
+      \left\{
+      \begin{aligned}
+      &X_i, & & if \ cond_i \ is \ True \\
+      &Y_i, & & if \ cond_i \ is \ False \\
+      \end{aligned}
+      \right.
+参数：
+    - **condition** （Variable）- 选择 ``x`` 或 ``y`` 元素的条件 。
+    - **x** （Variable）- 多维 ``Tensor`` ，数据类型为 ``float32`` 或 ``float64`` 或 ``int32`` 或 ``int64`` 。
+    - **y** （Variable）- 多维 ``Tensor`` ，数据类型为 ``float32`` 或 ``float64`` 或 ``int32`` 或 ``int64`` 。
+    - **name** （str，可选）- 具体用法请参见 :ref:`api_guide_Name` ，一般无需设置，默认值为None。
+返回：数据类型与 ``x`` 相同的 ``Tensor`` 。
+返回类型：Variable。
+**代码示例：**
+.. code-block:: python
+          import paddle
+          import numpy as np
+          import paddle.fluid as fluid
+          x_i = np.array([0.9383, 0.1983, 3.2, 1.2]).astype("float32")
+          y_i = np.array([1.0, 1.0, 1.0, 1.0]).astype("float32")
+          with fluid.dygraph.guard():
+              x = fluid.dygraph.to_variable(x_i)
+              y = fluid.dygraph.to_variable(y_i)
+              out = paddle.where(x>1, x, y)
+          print(out.numpy())
+          #out: [1.0, 1.0, 3.2, 1.2]
--- a/doc/fluid/design/mkldnn/inplace/images/inplace.svg
+++ b/doc/fluid/design/mkldnn/inplace/images/inplace.svg
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.40.1 (0)
+ -->
+<!-- Title: G Pages: 1 -->
+<svg width="305pt" height="479pt"
+ viewBox="0.00 0.00 304.64 479.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 475)">
+<title>G</title>
+<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-475 300.6419,-475 300.6419,4 -4,4"/>
+<g id="clust1" class="cluster">
+<title>cluster_0</title>
+<polygon fill="none" stroke="#000000" points="90.6419,-136 90.6419,-355 288.6419,-355 288.6419,-136 90.6419,-136"/>
+<text text-anchor="middle" x="189.6419" y="-339.8" font-family="Times,serif" font-size="14.00" fill="#000000">in&#45;placed</text>
+</g>
+<!-- e1 -->
+<g id="node1" class="node">
+<title>e1</title>
+<ellipse fill="none" stroke="#000000" cx="225.6419" cy="-381" rx="29.4969" ry="18"/>
+<text text-anchor="middle" x="225.6419" y="-377.3" font-family="Times,serif" font-size="14.00" fill="#000000">relu</text>
+</g>
+<!-- b -->
+<g id="node5" class="node">
+<title>b</title>
+<ellipse fill="none" stroke="#000000" cx="225.6419" cy="-306" rx="27" ry="18"/>
+<text text-anchor="middle" x="225.6419" y="-302.3" font-family="Times,serif" font-size="14.00" fill="#000000">b</text>
+</g>
+<!-- e1&#45;&gt;b -->
+<g id="edge2" class="edge">
+<title>e1&#45;&gt;b</title>
+<path fill="none" stroke="#000000" d="M225.6419,-362.8446C225.6419,-354.3401 225.6419,-344.0076 225.6419,-334.4964"/>
+<polygon fill="#000000" stroke="#000000" points="229.142,-334.2481 225.6419,-324.2482 222.142,-334.2482 229.142,-334.2481"/>
+</g>
+<!-- e2 -->
+<g id="node2" class="node">
+<title>e2</title>
+<ellipse fill="none" stroke="#000000" cx="189.6419" cy="-234" rx="90.9839" ry="18"/>
+<text text-anchor="middle" x="189.6419" y="-230.3" font-family="Times,serif" font-size="14.00" fill="#000000">elementwise_add</text>
+</g>
+<!-- e -->
+<g id="node6" class="node">
+<title>e</title>
+<ellipse fill="none" stroke="#000000" cx="158.6419" cy="-162" rx="27" ry="18"/>
+<text text-anchor="middle" x="158.6419" y="-158.3" font-family="Times,serif" font-size="14.00" fill="#000000">b</text>
+</g>
+<!-- e2&#45;&gt;e -->
+<g id="edge5" class="edge">
+<title>e2&#45;&gt;e</title>
+<path fill="none" stroke="#000000" d="M181.8193,-215.8314C178.2891,-207.6323 174.0482,-197.7824 170.1646,-188.7624"/>
+<polygon fill="#000000" stroke="#000000" points="173.3105,-187.2184 166.1412,-179.4177 166.8811,-189.9867 173.3105,-187.2184"/>
+</g>
+<!-- e3 -->
+<g id="node3" class="node">
+<title>e3</title>
+<ellipse fill="none" stroke="#000000" cx="91.6419" cy="-90" rx="91.784" ry="18"/>
+<text text-anchor="middle" x="91.6419" y="-86.3" font-family="Times,serif" font-size="14.00" fill="#000000">elementwise_mul</text>
+</g>
+<!-- g -->
+<g id="node9" class="node">
+<title>g</title>
+<ellipse fill="none" stroke="#000000" cx="91.6419" cy="-18" rx="27" ry="18"/>
+<text text-anchor="middle" x="91.6419" y="-14.3" font-family="Times,serif" font-size="14.00" fill="#000000">g</text>
+</g>
+<!-- e3&#45;&gt;g -->
+<g id="edge8" class="edge">
+<title>e3&#45;&gt;g</title>
+<path fill="none" stroke="#000000" d="M91.6419,-71.8314C91.6419,-64.131 91.6419,-54.9743 91.6419,-46.4166"/>
+<polygon fill="#000000" stroke="#000000" points="95.142,-46.4132 91.6419,-36.4133 88.142,-46.4133 95.142,-46.4132"/>
+</g>
+<!-- a -->
+<g id="node4" class="node">
+<title>a</title>
+<ellipse fill="none" stroke="#000000" cx="225.6419" cy="-453" rx="27" ry="18"/>
+<text text-anchor="middle" x="225.6419" y="-449.3" font-family="Times,serif" font-size="14.00" fill="#000000">a</text>
+</g>
+<!-- a&#45;&gt;e1 -->
+<g id="edge1" class="edge">
+<title>a&#45;&gt;e1</title>
+<path fill="none" stroke="#000000" d="M225.6419,-434.8314C225.6419,-427.131 225.6419,-417.9743 225.6419,-409.4166"/>
+<polygon fill="#000000" stroke="#000000" points="229.142,-409.4132 225.6419,-399.4133 222.142,-409.4133 229.142,-409.4132"/>
+</g>
+<!-- b&#45;&gt;e2 -->
+<g id="edge3" class="edge">
+<title>b&#45;&gt;e2</title>
+<path fill="none" stroke="#000000" d="M216.9273,-288.5708C212.8191,-280.3544 207.8223,-270.3608 203.2329,-261.1821"/>
+<polygon fill="#000000" stroke="#000000" points="206.2549,-259.3996 198.6522,-252.0206 199.9939,-262.5301 206.2549,-259.3996"/>
+</g>
+<!-- e&#45;&gt;e3 -->
+<g id="edge6" class="edge">
+<title>e&#45;&gt;e3</title>
+<path fill="none" stroke="#000000" d="M144.1039,-146.3771C135.7005,-137.3466 124.9236,-125.7654 115.3258,-115.4514"/>
+<polygon fill="#000000" stroke="#000000" points="117.6495,-112.8107 108.2749,-107.8744 112.525,-117.5794 117.6495,-112.8107"/>
+</g>
+<!-- d -->
+<g id="node7" class="node">
+<title>d</title>
+<ellipse fill="none" stroke="#000000" cx="153.6419" cy="-306" rx="27" ry="18"/>
+<text text-anchor="middle" x="153.6419" y="-302.3" font-family="Times,serif" font-size="14.00" fill="#000000">d</text>
+</g>
+<!-- d&#45;&gt;e2 -->
+<g id="edge4" class="edge">
+<title>d&#45;&gt;e2</title>
+<path fill="none" stroke="#000000" d="M162.3565,-288.5708C166.4647,-280.3544 171.4615,-270.3608 176.0508,-261.1821"/>
+<polygon fill="#000000" stroke="#000000" points="179.2899,-262.5301 180.6316,-252.0206 173.0289,-259.3996 179.2899,-262.5301"/>
+</g>
+<!-- f -->
+<g id="node8" class="node">
+<title>f</title>
+<ellipse fill="none" stroke="#000000" cx="55.6419" cy="-162" rx="27" ry="18"/>
+<text text-anchor="middle" x="55.6419" y="-158.3" font-family="Times,serif" font-size="14.00" fill="#000000">f</text>
+</g>
+<!-- f&#45;&gt;e3 -->
+<g id="edge7" class="edge">
+<title>f&#45;&gt;e3</title>
+<path fill="none" stroke="#000000" d="M64.3565,-144.5708C68.4647,-136.3544 73.4615,-126.3608 78.0508,-117.1821"/>
+<polygon fill="#000000" stroke="#000000" points="81.2899,-118.5301 82.6316,-108.0206 75.0289,-115.3996 81.2899,-118.5301"/>
+</g>
+</g>
+</svg>
--- a/doc/fluid/design/mkldnn/inplace/images/multi-output-inplace.svg
+++ b/doc/fluid/design/mkldnn/inplace/images/multi-output-inplace.svg
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.40.1 (0)
+ -->
+<!-- Title: G Pages: 1 -->
+<svg width="792pt" height="436pt"
+ viewBox="0.00 0.00 792.00 435.74" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 431.7401)">
+<title>G</title>
+<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-431.7401 788,-431.7401 788,4 -4,4"/>
+<g id="clust1" class="cluster">
+<title>cluster_before</title>
+<polygon fill="none" stroke="#000000" stroke-dasharray="1,5" points="8,-8 8,-419.7401 398,-419.7401 398,-8 8,-8"/>
+<text text-anchor="middle" x="203" y="-404.5401" font-family="Times,serif" font-size="14.00" fill="#000000">before</text>
+</g>
+<g id="clust2" class="cluster">
+<title>cluster_0</title>
+<polygon fill="none" stroke="#000000" points="104,-169.7401 104,-388.7401 302,-388.7401 302,-169.7401 104,-169.7401"/>
+<text text-anchor="middle" x="203" y="-373.5401" font-family="Times,serif" font-size="14.00" fill="#000000">to be in&#45;placed</text>
+</g>
+<g id="clust3" class="cluster">
+<title>cluster_after</title>
+<polygon fill="none" stroke="#000000" stroke-dasharray="1,5" points="406,-8 406,-419.7401 776,-419.7401 776,-8 406,-8"/>
+<text text-anchor="middle" x="591" y="-404.5401" font-family="Times,serif" font-size="14.00" fill="#000000">after</text>
+</g>
+<g id="clust4" class="cluster">
+<title>cluster_0b</title>
+<polygon fill="none" stroke="#000000" points="492,-169.7401 492,-388.7401 690,-388.7401 690,-169.7401 492,-169.7401"/>
+<text text-anchor="middle" x="591" y="-373.5401" font-family="Times,serif" font-size="14.00" fill="#000000">applied in&#45;placed</text>
+</g>
+<!-- op1 -->
+<g id="node1" class="node">
+<title>op1</title>
+<ellipse fill="none" stroke="#000000" cx="203" cy="-267.7401" rx="90.9839" ry="18"/>
+<text text-anchor="middle" x="203" y="-264.0401" font-family="Times,serif" font-size="14.00" fill="#000000">elementwise_add</text>
+</g>
+<!-- c -->
+<g id="node4" class="node">
+<title>c</title>
+<ellipse fill="none" stroke="#000000" cx="203" cy="-195.7401" rx="27" ry="18"/>
+<text text-anchor="middle" x="203" y="-192.0401" font-family="Times,serif" font-size="14.00" fill="#000000">c</text>
+</g>
+<!-- op1&#45;&gt;c -->
+<g id="edge3" class="edge">
+<title>op1&#45;&gt;c</title>
+<path fill="none" stroke="#000000" d="M203,-249.5715C203,-241.8711 203,-232.7145 203,-224.1567"/>
+<polygon fill="#000000" stroke="#000000" points="206.5001,-224.1533 203,-214.1534 199.5001,-224.1534 206.5001,-224.1533"/>
+</g>
+<!-- op2 -->
+<g id="node2" class="node">
+<title>op2</title>
+<ellipse fill="none" stroke="#000000" cx="105" cy="-114.8701" rx="89.191" ry="26.7407"/>
+<text text-anchor="middle" x="105" y="-118.6701" font-family="Times,serif" font-size="14.00" fill="#000000">top_k</text>
+<text text-anchor="middle" x="105" y="-103.6701" font-family="Times,serif" font-size="14.00" fill="#000000">inputs_vars{c}</text>
+</g>
+<!-- d -->
+<g id="node7" class="node">
+<title>d</title>
+<ellipse fill="none" stroke="#000000" cx="84" cy="-34" rx="27" ry="18"/>
+<text text-anchor="middle" x="84" y="-30.3" font-family="Times,serif" font-size="14.00" fill="#000000">d</text>
+</g>
+<!-- op2&#45;&gt;d -->
+<g id="edge6" class="edge">
+<title>op2&#45;&gt;d</title>
+<path fill="none" stroke="#000000" d="M98.0073,-87.9415C95.8358,-79.579 93.4381,-70.3458 91.2495,-61.9176"/>
+<polygon fill="#000000" stroke="#000000" points="94.6139,-60.9481 88.7128,-52.1488 87.8386,-62.7075 94.6139,-60.9481"/>
+</g>
+<!-- e -->
+<g id="node8" class="node">
+<title>e</title>
+<ellipse fill="none" stroke="#000000" cx="156" cy="-34" rx="27" ry="18"/>
+<text text-anchor="middle" x="156" y="-30.3" font-family="Times,serif" font-size="14.00" fill="#000000">e</text>
+</g>
+<!-- op2&#45;&gt;e -->
+<g id="edge7" class="edge">
+<title>op2&#45;&gt;e</title>
+<path fill="none" stroke="#000000" d="M121.6993,-88.3902C127.5384,-79.1312 134.0793,-68.7595 139.8709,-59.5756"/>
+<polygon fill="#000000" stroke="#000000" points="142.9853,-61.1985 145.3592,-50.873 137.0644,-57.4645 142.9853,-61.1985"/>
+</g>
+<!-- op3 -->
+<g id="node3" class="node">
+<title>op3</title>
+<ellipse fill="none" stroke="#000000" cx="301" cy="-114.8701" rx="89.191" ry="26.7407"/>
+<text text-anchor="middle" x="301" y="-118.6701" font-family="Times,serif" font-size="14.00" fill="#000000">top_k</text>
+<text text-anchor="middle" x="301" y="-103.6701" font-family="Times,serif" font-size="14.00" fill="#000000">inputs_vars{c}</text>
+</g>
+<!-- g -->
+<g id="node9" class="node">
+<title>g</title>
+<ellipse fill="none" stroke="#000000" cx="249" cy="-34" rx="27" ry="18"/>
+<text text-anchor="middle" x="249" y="-30.3" font-family="Times,serif" font-size="14.00" fill="#000000">g</text>
+</g>
+<!-- op3&#45;&gt;g -->
+<g id="edge8" class="edge">
+<title>op3&#45;&gt;g</title>
+<path fill="none" stroke="#000000" d="M283.9733,-88.3902C277.9576,-79.0347 271.2115,-68.5433 265.261,-59.289"/>
+<polygon fill="#000000" stroke="#000000" points="268.2019,-57.3913 259.8495,-50.873 262.314,-61.1772 268.2019,-57.3913"/>
+</g>
+<!-- h -->
+<g id="node10" class="node">
+<title>h</title>
+<ellipse fill="none" stroke="#000000" cx="321" cy="-34" rx="27" ry="18"/>
+<text text-anchor="middle" x="321" y="-30.3" font-family="Times,serif" font-size="14.00" fill="#000000">h</text>
+</g>
+<!-- op3&#45;&gt;h -->
+<g id="edge9" class="edge">
+<title>op3&#45;&gt;h</title>
+<path fill="none" stroke="#000000" d="M307.6597,-87.9415C309.7278,-79.579 312.0113,-70.3458 314.0957,-61.9176"/>
+<polygon fill="#000000" stroke="#000000" points="317.5084,-62.6967 316.5116,-52.1488 310.7131,-61.0161 317.5084,-62.6967"/>
+</g>
+<!-- c&#45;&gt;op2 -->
+<g id="edge4" class="edge">
+<title>c&#45;&gt;op2</title>
+<path fill="none" stroke="#000000" d="M185.9297,-181.6536C174.2695,-172.0316 158.3368,-158.8838 143.691,-146.798"/>
+<polygon fill="#000000" stroke="#000000" points="145.6628,-143.8874 135.7221,-140.2221 141.2074,-149.2865 145.6628,-143.8874"/>
+</g>
+<!-- c&#45;&gt;op3 -->
+<g id="edge5" class="edge">
+<title>c&#45;&gt;op3</title>
+<path fill="none" stroke="#000000" d="M220.0703,-181.6536C231.7305,-172.0316 247.6632,-158.8838 262.309,-146.798"/>
+<polygon fill="#000000" stroke="#000000" points="264.7926,-149.2865 270.2779,-140.2221 260.3372,-143.8874 264.7926,-149.2865"/>
+</g>
+<!-- a -->
+<g id="node5" class="node">
+<title>a</title>
+<ellipse fill="none" stroke="#000000" cx="239" cy="-339.7401" rx="27" ry="18"/>
+<text text-anchor="middle" x="239" y="-336.0401" font-family="Times,serif" font-size="14.00" fill="#000000">a</text>
+</g>
+<!-- a&#45;&gt;op1 -->
+<g id="edge1" class="edge">
+<title>a&#45;&gt;op1</title>
+<path fill="none" stroke="#000000" d="M230.2854,-322.3109C226.1772,-314.0945 221.1804,-304.1009 216.5911,-294.9222"/>
+<polygon fill="#000000" stroke="#000000" points="219.613,-293.1397 212.0103,-285.7607 213.352,-296.2703 219.613,-293.1397"/>
+</g>
+<!-- b -->
+<g id="node6" class="node">
+<title>b</title>
+<ellipse fill="none" stroke="#000000" cx="167" cy="-339.7401" rx="27" ry="18"/>
+<text text-anchor="middle" x="167" y="-336.0401" font-family="Times,serif" font-size="14.00" fill="#000000">b</text>
+</g>
+<!-- b&#45;&gt;op1 -->
+<g id="edge2" class="edge">
+<title>b&#45;&gt;op1</title>
+<path fill="none" stroke="#000000" d="M175.7146,-322.3109C179.8228,-314.0945 184.8196,-304.1009 189.4089,-294.9222"/>
+<polygon fill="#000000" stroke="#000000" points="192.648,-296.2703 193.9897,-285.7607 186.387,-293.1397 192.648,-296.2703"/>
+</g>
+<!-- op1b -->
+<g id="node11" class="node">
+<title>op1b</title>
+<ellipse fill="none" stroke="#000000" cx="591" cy="-267.7401" rx="90.9839" ry="18"/>
+<text text-anchor="middle" x="591" y="-264.0401" font-family="Times,serif" font-size="14.00" fill="#000000">elementwise_add</text>
+</g>
+<!-- cb -->
+<g id="node14" class="node">
+<title>cb</title>
+<ellipse fill="none" stroke="#000000" cx="591" cy="-195.7401" rx="27" ry="18"/>
+<text text-anchor="middle" x="591" y="-192.0401" font-family="Times,serif" font-size="14.00" fill="#000000">a</text>
+</g>
+<!-- op1b&#45;&gt;cb -->
+<g id="edge12" class="edge">
+<title>op1b&#45;&gt;cb</title>
+<path fill="none" stroke="#000000" d="M591,-249.5715C591,-241.8711 591,-232.7145 591,-224.1567"/>
+<polygon fill="#000000" stroke="#000000" points="594.5001,-224.1533 591,-214.1534 587.5001,-224.1534 594.5001,-224.1533"/>
+</g>
+<!-- op2b -->
+<g id="node12" class="node">
+<title>op2b</title>
+<ellipse fill="none" stroke="#000000" cx="498" cy="-114.8701" rx="84.2917" ry="26.7407"/>
+<text text-anchor="middle" x="498" y="-118.6701" font-family="Times,serif" font-size="14.00" fill="#000000">top_k</text>
+<text text-anchor="middle" x="498" y="-103.6701" font-family="Times,serif" font-size="14.00" fill="#000000">input_vars{a}</text>
+</g>
+<!-- db -->
+<g id="node17" class="node">
+<title>db</title>
+<ellipse fill="none" stroke="#000000" cx="476" cy="-34" rx="27" ry="18"/>
+<text text-anchor="middle" x="476" y="-30.3" font-family="Times,serif" font-size="14.00" fill="#000000">d</text>
+</g>
+<!-- op2b&#45;&gt;db -->
+<g id="edge15" class="edge">
+<title>op2b&#45;&gt;db</title>
+<path fill="none" stroke="#000000" d="M490.6743,-87.9415C488.3994,-79.579 485.8876,-70.3458 483.5947,-61.9176"/>
+<polygon fill="#000000" stroke="#000000" points="486.9396,-60.8794 480.9372,-52.1488 480.1851,-62.7169 486.9396,-60.8794"/>
+</g>
+<!-- eb -->
+<g id="node18" class="node">
+<title>eb</title>
+<ellipse fill="none" stroke="#000000" cx="548" cy="-34" rx="27" ry="18"/>
+<text text-anchor="middle" x="548" y="-30.3" font-family="Times,serif" font-size="14.00" fill="#000000">e</text>
+</g>
+<!-- op2b&#45;&gt;eb -->
+<g id="edge16" class="edge">
+<title>op2b&#45;&gt;eb</title>
+<path fill="none" stroke="#000000" d="M514.3719,-88.3902C520.0965,-79.1312 526.5091,-68.7595 532.1872,-59.5756"/>
+<polygon fill="#000000" stroke="#000000" points="535.2859,-61.2192 537.5678,-50.873 529.332,-57.538 535.2859,-61.2192"/>
+</g>
+<!-- op3b -->
+<g id="node13" class="node">
+<title>op3b</title>
+<ellipse fill="none" stroke="#000000" cx="684" cy="-114.8701" rx="84.2917" ry="26.7407"/>
+<text text-anchor="middle" x="684" y="-118.6701" font-family="Times,serif" font-size="14.00" fill="#000000">top_k</text>
+<text text-anchor="middle" x="684" y="-103.6701" font-family="Times,serif" font-size="14.00" fill="#000000">input_vars{a}</text>
+</g>
+<!-- gb -->
+<g id="node19" class="node">
+<title>gb</title>
+<ellipse fill="none" stroke="#000000" cx="633" cy="-34" rx="27" ry="18"/>
+<text text-anchor="middle" x="633" y="-30.3" font-family="Times,serif" font-size="14.00" fill="#000000">g</text>
+</g>
+<!-- op3b&#45;&gt;gb -->
+<g id="edge17" class="edge">
+<title>op3b&#45;&gt;gb</title>
+<path fill="none" stroke="#000000" d="M667.3007,-88.3902C661.4616,-79.1312 654.9207,-68.7595 649.1291,-59.5756"/>
+<polygon fill="#000000" stroke="#000000" points="651.9356,-57.4645 643.6408,-50.873 646.0147,-61.1985 651.9356,-57.4645"/>
+</g>
+<!-- hb -->
+<g id="node20" class="node">
+<title>hb</title>
+<ellipse fill="none" stroke="#000000" cx="705" cy="-34" rx="27" ry="18"/>
+<text text-anchor="middle" x="705" y="-30.3" font-family="Times,serif" font-size="14.00" fill="#000000">h</text>
+</g>
+<!-- op3b&#45;&gt;hb -->
+<g id="edge18" class="edge">
+<title>op3b&#45;&gt;hb</title>
+<path fill="none" stroke="#000000" d="M690.9927,-87.9415C693.1642,-79.579 695.5619,-70.3458 697.7505,-61.9176"/>
+<polygon fill="#000000" stroke="#000000" points="701.1614,-62.7075 700.2872,-52.1488 694.3861,-60.9481 701.1614,-62.7075"/>
+</g>
+<!-- cb&#45;&gt;op2b -->
+<g id="edge13" class="edge">
+<title>cb&#45;&gt;op2b</title>
+<path fill="none" stroke="#000000" d="M574.3735,-181.2822C563.4548,-171.7876 548.7254,-158.9793 535.1151,-147.1443"/>
+<polygon fill="#000000" stroke="#000000" points="537.1109,-144.2416 527.2682,-140.3209 532.5176,-149.5238 537.1109,-144.2416"/>
+</g>
+<!-- cb&#45;&gt;op3b -->
+<g id="edge14" class="edge">
+<title>cb&#45;&gt;op3b</title>
+<path fill="none" stroke="#000000" d="M607.6265,-181.2822C618.5452,-171.7876 633.2746,-158.9793 646.8849,-147.1443"/>
+<polygon fill="#000000" stroke="#000000" points="649.4824,-149.5238 654.7318,-140.3209 644.8891,-144.2416 649.4824,-149.5238"/>
+</g>
+<!-- ab -->
+<g id="node15" class="node">
+<title>ab</title>
+<ellipse fill="none" stroke="#000000" cx="627" cy="-339.7401" rx="27" ry="18"/>
+<text text-anchor="middle" x="627" y="-336.0401" font-family="Times,serif" font-size="14.00" fill="#000000">a</text>
+</g>
+<!-- ab&#45;&gt;op1b -->
+<g id="edge10" class="edge">
+<title>ab&#45;&gt;op1b</title>
+<path fill="none" stroke="#000000" d="M618.2854,-322.3109C614.1772,-314.0945 609.1804,-304.1009 604.5911,-294.9222"/>
+<polygon fill="#000000" stroke="#000000" points="607.613,-293.1397 600.0103,-285.7607 601.352,-296.2703 607.613,-293.1397"/>
+</g>
+<!-- bb -->
+<g id="node16" class="node">
+<title>bb</title>
+<ellipse fill="none" stroke="#000000" cx="555" cy="-339.7401" rx="27" ry="18"/>
+<text text-anchor="middle" x="555" y="-336.0401" font-family="Times,serif" font-size="14.00" fill="#000000">b</text>
+</g>
+<!-- bb&#45;&gt;op1b -->
+<g id="edge11" class="edge">
+<title>bb&#45;&gt;op1b</title>
+<path fill="none" stroke="#000000" d="M563.7146,-322.3109C567.8228,-314.0945 572.8196,-304.1009 577.4089,-294.9222"/>
+<polygon fill="#000000" stroke="#000000" points="580.648,-296.2703 581.9897,-285.7607 574.387,-293.1397 580.648,-296.2703"/>
+</g>
+</g>
+</svg>
--- a/doc/fluid/design/mkldnn/inplace/images/unwanted-inplace.svg
+++ b/doc/fluid/design/mkldnn/inplace/images/unwanted-inplace.svg
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.40.1 (0)
+ -->
+<!-- Title: G Pages: 1 -->
+<svg width="186pt" height="406pt"
+ viewBox="0.00 0.00 186.19 406.48" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 402.4802)">
+<title>G</title>
+<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-402.4802 182.1909,-402.4802 182.1909,4 -4,4"/>
+<g id="clust1" class="cluster">
+<title>cluster_0</title>
+<polygon fill="none" stroke="#000000" points="8.0955,-153.7401 8.0955,-390.4802 170.0955,-390.4802 170.0955,-153.7401 8.0955,-153.7401"/>
+<text text-anchor="middle" x="89.0955" y="-375.2802" font-family="Times,serif" font-size="14.00" fill="#000000">in&#45;placed</text>
+</g>
+<!-- e1 -->
+<g id="node1" class="node">
+<title>e1</title>
+<ellipse fill="none" stroke="#000000" cx="89.0955" cy="-260.6102" rx="72.6644" ry="26.7407"/>
+<text text-anchor="middle" x="89.0955" y="-264.4102" font-family="Times,serif" font-size="14.00" fill="#000000">softmax</text>
+<text text-anchor="middle" x="89.0955" y="-249.4102" font-family="Times,serif" font-size="14.00" fill="#000000">&lt;oneDNN&gt;</text>
+</g>
+<!-- c -->
+<g id="node3" class="node">
+<title>c</title>
+<ellipse fill="none" stroke="#000000" cx="89.0955" cy="-179.7401" rx="27" ry="18"/>
+<text text-anchor="middle" x="89.0955" y="-176.0401" font-family="Times,serif" font-size="14.00" fill="#000000">b</text>
+</g>
+<!-- e1&#45;&gt;c -->
+<g id="edge2" class="edge">
+<title>e1&#45;&gt;c</title>
+<path fill="none" stroke="#000000" d="M89.0955,-233.6816C89.0955,-225.4111 89.0955,-216.2888 89.0955,-207.9358"/>
+<polygon fill="#000000" stroke="#000000" points="92.5956,-207.8889 89.0955,-197.8889 85.5956,-207.889 92.5956,-207.8889"/>
+</g>
+<!-- e2 -->
+<g id="node2" class="node">
+<title>e2</title>
+<ellipse fill="none" stroke="#000000" cx="89.0955" cy="-98.8701" rx="89.191" ry="26.7407"/>
+<text text-anchor="middle" x="89.0955" y="-102.6701" font-family="Times,serif" font-size="14.00" fill="#000000">layer_norm</text>
+<text text-anchor="middle" x="89.0955" y="-87.6701" font-family="Times,serif" font-size="14.00" fill="#000000">&lt;Paddle CPU&gt;</text>
+</g>
+<!-- e -->
+<g id="node4" class="node">
+<title>e</title>
+<ellipse fill="none" stroke="#000000" cx="89.0955" cy="-18" rx="27" ry="18"/>
+<text text-anchor="middle" x="89.0955" y="-14.3" font-family="Times,serif" font-size="14.00" fill="#000000">a</text>
+</g>
+<!-- e2&#45;&gt;e -->
+<g id="edge4" class="edge">
+<title>e2&#45;&gt;e</title>
+<path fill="none" stroke="#000000" d="M89.0955,-71.9415C89.0955,-63.6709 89.0955,-54.5487 89.0955,-46.1957"/>
+<polygon fill="#000000" stroke="#000000" points="92.5956,-46.1488 89.0955,-36.1488 85.5956,-46.1489 92.5956,-46.1488"/>
+</g>
+<!-- c&#45;&gt;e2 -->
+<g id="edge3" class="edge">
+<title>c&#45;&gt;e2</title>
+<path fill="none" stroke="#000000" d="M89.0955,-161.3894C89.0955,-153.8196 89.0955,-144.7601 89.0955,-135.9182"/>
+<polygon fill="#000000" stroke="#000000" points="92.5956,-135.7406 89.0955,-125.7407 85.5956,-135.7407 92.5956,-135.7406"/>
+</g>
+<!-- a -->
+<g id="node5" class="node">
+<title>a</title>
+<ellipse fill="none" stroke="#000000" cx="89.0955" cy="-341.4802" rx="27" ry="18"/>
+<text text-anchor="middle" x="89.0955" y="-337.7802" font-family="Times,serif" font-size="14.00" fill="#000000">a</text>
+</g>
+<!-- a&#45;&gt;e1 -->
+<g id="edge1" class="edge">
+<title>a&#45;&gt;e1</title>
+<path fill="none" stroke="#000000" d="M89.0955,-323.1296C89.0955,-315.5597 89.0955,-306.5002 89.0955,-297.6583"/>
+<polygon fill="#000000" stroke="#000000" points="92.5956,-297.4808 89.0955,-287.4808 85.5956,-297.4808 92.5956,-297.4808"/>
+</g>
+</g>
+</svg>
--- a/doc/fluid/design/mkldnn/inplace/index_en.rst
+++ b/doc/fluid/design/mkldnn/inplace/index_en.rst
+MKL-DNN IN-PLACE execution support
+--------------------------------------
+.. toctree::
+  :maxdepth: 1
+  inplace.md
--- a/doc/fluid/design/mkldnn/inplace/inplace.md
+++ b/doc/fluid/design/mkldnn/inplace/inplace.md
+## Introduction
+PaddlePaddle is implementing concept of in-place execution of some of operators.
+The idea of in-place execution is present on following picture:
+![](images/inplace.svg)   
+Exemplary graph presents three operators where one of them (type of elementwise_add) is to be performing in-place computation. In-place computation means that input variable (Tensor) is used for both input and output. This means that one of inputs will be overwritten with computational results. In presented picture in-place operator (elementwise_add) is 
+having two input nodes: *b* and *d* and output *b*. So *b* is used for input and output and underneath it is represented by a one, shared Tensor. So this means that variable *b* is initially holding some input data and after the operator computation, input data is lost and replaced by computation's result.
+Currently assumption is that if operator can have in-place processing then all its kernel (including oneDNN) should be able to work properly in in-place mode. To match this functionality oneDNN integration was extended to support in-place execution for some of its operators:
+- activations
+- softmax
+- elementwise_add
+- gelu*
+- sum**
+Adventages of in-place computation are:
+* lower memory usage
+* improved performance of operators
+To have in-place computation, we need to analyze graph to search for where in-place execution could happen
+and then make some of variables to be shared by input and output of in-place capable operator.
+Hence there are two parts of in-place support:
+- in-place execution support within an operator
+- oneDNN inplace C-API pass
+#### in-place execution support within an operator
+For in-place execution, oneDNN primitive needs to have the same oneDNN memory object passed as input (src) and output (dst). More precisely, we check if pointers to allocated buffers are the same for input and output
+and this indicates if we use one oneDNN memory object or two. For example:
+`auto src_memory_p = handler.AcquireSrcMemory(x);`
+`auto dst_memory_p = x->IsSharedBufferWith(*y) ? 
+           src_memory_p : handler.AcquireDstMemory(y);`
+#### oneDNN in-place pass
+As mentioned earlier, idea of in-place pass is to locate operators with oneDNN kerenels that can perform in-place execution and then modify output node's variables to match input node's variable of the operator. 
+##### Identifying operators with oneDNN kernels capable of in-place execution
+This identification is a result of two checks:
+- Whether operator does have *inplaceInferer* structure
+- Whether operator is on a list of oneDNN's in-place supported operators
+*InplaceInferer* is a struct that declares a mapping (one of inputs to one of outputs) indicating that
+considered operator can perform in-place execution and both vars (mentioned input and output in *InplaceInferer*) will
+share a tensor. This is not enough for oneDNN in-place C-API execution as oneDNN library may not provide in-place 
+computation for all required (to have in-place execution) operators of PaddlePaddle and some of operators would have to
+simulate in-place computation through the external buffer which would not bring any benefits, so there is no point enabling those in-place computations for C-API inference.
+##### Restrictions
+oneDNN in-place pass is taking advantage of graph pattern detector. So pattern consists of:
+Node (Var 1) -> Node (oneDNN Op to be inplaced) -> Node (Var2) -> Node (next op - any type, oneDNN/native CPU - after in-placed one) -> Node (Var3)
+Pattern is restricted so that in-placed to be op is of oneDNN type. Due to fact that some operators have
+more than one input and their output may be consumed by more than one operator it is expected that pattern
+maybe detected multiple times for the same operator e.g. once for one input, then for second input etc..
+Just having oneDNN operator capable of in-place is not enough to have in-place execution enabled, hence follwing rules
+are checked by oneDNN in-place pass:
+1. If input node to in-place operator is also an input to different operator, then in-place computation cannot be performed, as there is a risk that other operator consuming in-placed op operator will be executed after in-placed operator and therefore get invalid input data (overwritten by in-place computation).
+2. If after in-placed operator there is another operator that is reusing in-place op's input var then in-place cannot happen unless next op can perform in-place computation. Next picture presents the idea.
+![](images/unwanted-inplace.svg)   
+In the picture we are seeing that in-place pass is considering to enable in-place execution for softmax oneDNN kernel. All is fine, but next operator after softmax is layer norm (non-oneDNN). Layer norm is already reusing input of softmax due to some earlier memory optimization pass being applied. If we make softmax op to perform in-place computation, then
+it will also make layer norm to work in-place (b -> a). The thing is that layer norm cannot work in-place (InplaceInferer is not present), so if we force it do so layer norm will produce invalid result.
+##### In-place pass modification to graph when applied
+When sub-graph is aligned with restrictions then in-place computation can be enabled. This is done by:
+1. Changing the name of output node of in-place op to be match input node of in-place op.
+2. Renaming output var in output lists of node representing operator.
+3. Changing the name of input var in next op inputs list.
+4. If next Op is performing in-place computation then we need to updated next op's output as well not to break its
+   in-place computation.
+5. if there are multiple operators after our in-place operator then we need to update all of them (their input vars). Idea is presented in the following picture:
+![](images/multi-output-inplace.svg)   
+We can see that there are two *top_k* operators after *elementwise_add* operator that is set to work in-placed. Each of *top_k* is having its own list of input vars, so we need to rename relevant input var to new name. As in-place pattern
+consists of: input node -> in-place op -> output node -> next op -> next op's output. For presented graph, there will be 8 patterns detected:
+- b -> elementwise_add -> c -> top_k (left one) -> d
+- b -> elementwise_add -> c -> top_k (left one) -> e
+- b -> elementwise_add -> c -> top_k (right one) -> g
+- b -> elementwise_add -> c -> top_k (right one) -> h
+- a -> elementwise_add -> c -> top_k (left one) -> d
+- a -> elementwise_add -> c -> top_k (left one) -> e
+- a -> elementwise_add -> c -> top_k (right one) -> g
+- a -> elementwise_add -> c -> top_k (right one) -> h
+Important thing is to remember original name of output, before it is renamed, so later we can
+replace this original name in all of next op instances.
+\* oneDNN gelu kernel is able to perform in-place execution, but currently gelu op does not support in-place execution.
+\*\* sum kernel is using oneDNN sum primitive that does not provide in-place exection, so in-place computation is done faked through external buffer. So it was not added into oneDNN inplace pass.
--- a/doc/fluid/design/mkldnn/inplace/scripts/inplace.dot
+++ b/doc/fluid/design/mkldnn/inplace/scripts/inplace.dot
+digraph G {
+  overlap=false
+  e1[label="relu"]
+  e2[label="elementwise_add"]
+  e3[label="elementwise_mul"]
+  a -> e1
+  e1 -> b
+  b[label="b"]
+  e[label="b"]
+  subgraph cluster_0 {
+      label="in-placed"
+  b -> e2
+  d -> e2
+  e2 -> e
+  }
+  e -> e3
+  f -> e3 -> g
+}
--- a/doc/fluid/design/mkldnn/inplace/scripts/multi-output-inplace.dot
+++ b/doc/fluid/design/mkldnn/inplace/scripts/multi-output-inplace.dot
+digraph G {
+subgraph cluster_before {
+  label="before"
+  style=dotted
+  op1[label="elementwise_add"]
+  op2[label="top_k\ninputs_vars{c}"]
+  op3[label="top_k\ninputs_vars{c}"]
+  c[label="c"]
+  subgraph cluster_0 {
+  style=solid
+  label="to be in-placed"
+  a -> op1
+  b-> op1
+  op1 -> c
+  }
+  c -> op2
+  c -> op3 
+  op2 -> d
+  op2 -> e
+  op3 -> g
+  op3 -> h
+}
+subgraph cluster_after {
+  label="after"
+  style=dotted
+  op1b[label="elementwise_add"]
+  op2b[label="top_k\ninput_vars{a}"]
+  op3b[label="top_k\ninput_vars{a}"]
+  cb[label="a"]
+  ab[label="a"]
+  bb[label="b"]
+  db[label="d"]
+  eb[label="e"]
+  gb[label="g"]
+  hb[label="h"]
+  subgraph cluster_0b {
+  style=solid
+  label="applied in-placed"
+  ab -> op1b
+  bb-> op1b
+  op1b -> cb
+  }
+  cb -> op2b
+  cb -> op3b 
+  op2b -> db
+  op2b -> eb
+  op3b -> gb
+  op3b -> hb
+}
+}
--- a/doc/fluid/design/mkldnn/inplace/scripts/unwanted-inplace.dot
+++ b/doc/fluid/design/mkldnn/inplace/scripts/unwanted-inplace.dot
+digraph G {
+  e1[label="softmax\n<oneDNN>"]
+  e2[label="layer_norm\n<Paddle CPU>"]
+  c[label="b"]
+  e[label="a"]
+  subgraph cluster_0 {  
+  label="in-placed"
+  a -> e1
+  e1 -> c
+  }
+  c -> e2
+  e2 -> e
+}
--- a/doc/fluid/install/Tables.md
+++ b/doc/fluid/install/Tables.md
--- a/doc/fluid/install/Tables_en.md
+++ b/doc/fluid/install/Tables_en.md
--- a/doc/fluid/install/index_cn.rst
+++ b/doc/fluid/install/index_cn.rst
@@ -214,20 +214,20 @@
        如果您是使用 Python 2，CUDA 9，cuDNN 7.3+，安装GPU版本的命令为：
        ::
-            python -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://mirror.baidu.com/pypi/simple
+            python -m pip install paddlepaddle-gpu==1.8.0.post97 -i https://mirror.baidu.com/pypi/simple
            或
-            python -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple
+            python -m pip install paddlepaddle-gpu==1.8.0.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple
        如果您是使用 Python 2，CUDA 10.0，cuDNN 7.3+，安装GPU版本的命令为：
        ::
-            python -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://mirror.baidu.com/pypi/simple
+            python -m pip install paddlepaddle-gpu==1.8.0.post107 -i https://mirror.baidu.com/pypi/simple
            或
-            python -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
+            python -m pip install paddlepaddle-gpu==1.8.0.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
        如果您是使用 Python 3，请将上述命令中的 **python** 更换为 **python3** 进行安装。
@@ -430,12 +430,12 @@
    (2). 拉取预安装 PaddlePaddle 的镜像：
    ::
-        docker pull hub.baidubce.com/paddlepaddle/paddle:1.7.2
+        docker pull hub.baidubce.com/paddlepaddle/paddle:1.8.0
    (3). 用镜像构建并进入Docker容器：
    ::
-        docker run --name paddle -it -v dir1:dir2 hub.baidubce.com/paddlepaddle/paddle:1.7.2 /bin/bash
+        docker run --name paddle -it -v dir1:dir2 hub.baidubce.com/paddlepaddle/paddle:1.8.0 /bin/bash
        > --name [Name of container] 设定Docker的名称；
@@ -443,7 +443,7 @@
        > -v 参数用于宿主机与容器里文件共享；其中dir1为宿主机目录，dir2为挂载到容器内部的目录，用户可以通过设定dir1和dir2自定义自己的挂载目录；例如：$PWD:/paddle 指定将宿主机的当前路径（Linux中PWD变量会展开为当前路径的绝对路径）挂载到容器内部的 /paddle 目录；
-        > hub.baidubce.com/paddlepaddle/paddle:1.7.2 是需要使用的image名称；/bin/bash是在Docker中要执行的命令
+        > hub.baidubce.com/paddlepaddle/paddle:1.8.0 是需要使用的image名称；/bin/bash是在Docker中要执行的命令
 2. **GPU 版本**
@@ -471,12 +471,12 @@
    (2). 拉取支持 CUDA 10.0 , cuDNN 7.3+ 预安装 PaddlePaddle 的镜像：
    ::
-        nvidia-docker pull hub.baidubce.com/paddlepaddle/paddle:1.7.2-gpu-cuda10.0-cudnn7
+        nvidia-docker pull hub.baidubce.com/paddlepaddle/paddle:1.8.0-gpu-cuda10.0-cudnn7
    (3). 用镜像构建并进入Docker容器：
    ::
-        nvidia-docker run --name paddle -it -v dir1:dir2 hub.baidubce.com/paddlepaddle/paddle:1.7.2-gpu-cuda10.0-cudnn7 /bin/bash
+        nvidia-docker run --name paddle -it -v dir1:dir2 hub.baidubce.com/paddlepaddle/paddle:1.8.0-gpu-cuda10.0-cudnn7 /bin/bash
        > --name [Name of container] 设定Docker的名称；
@@ -484,7 +484,7 @@
        > -v 参数用于宿主机与容器里文件共享；其中dir1为宿主机目录，dir2为挂载到容器内部的目录，用户可以通过设定dir1和dir2自定义自己的挂载目录；例如：$PWD:/paddle 指定将宿主机的当前路径（Linux中PWD变量会展开为当前路径的绝对路径）挂载到容器内部的 /paddle 目录；
-        > hub.baidubce.com/paddlepaddle/paddle:1.7.2-gpu-cuda10.0-cudnn7 是需要使用的image名称；/bin/bash是在Docker中要执行的命令  
+        > hub.baidubce.com/paddlepaddle/paddle:1.8.0-gpu-cuda10.0-cudnn7 是需要使用的image名称；/bin/bash是在Docker中要执行的命令  
    或如果您需要支持 **CUDA 9** 的版本，将上述命令的 **cuda10.0** 替换成 **cuda9.0** 即可
@@ -492,7 +492,7 @@
    ::
-        docker run --name paddle -it -v dir1:dir2 paddlepaddle/paddle:1.7.2 /bin/bash
+        docker run --name paddle -it -v dir1:dir2 paddlepaddle/paddle:1.8.0 /bin/bash
        > --name [Name of container] 设定Docker的名称；
@@ -500,7 +500,7 @@
        > -v 参数用于宿主机与容器里文件共享；其中dir1为宿主机目录，dir2为挂载到容器内部的目录，用户可以通过设定dir1和dir2自定义自己的挂载目录；例如：$PWD:/paddle 指定将宿主机的当前路径（Linux中PWD变量会展开为当前路径的绝对路径）挂载到容器内部的 /paddle 目录；
-        > paddlepaddle/paddle:1.7.2 是需要使用的image名称；/bin/bash是在Docker中要执行的命令
+        > paddlepaddle/paddle:1.8.0 是需要使用的image名称；/bin/bash是在Docker中要执行的命令
 4. 验证安装

--- a/doc/fluid/install/index_en.rst
+++ b/doc/fluid/install/index_en.rst
@@ -216,20 +216,20 @@ This section describes how to use pip to install.
        If you are using Python 2, CUDA 9, cuDNN 7.3+, command to install GPU version:
        ::
-            python -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://mirror.baidu.com/pypi/simple
+            python -m pip install paddlepaddle-gpu==1.8.0.post97 -i https://mirror.baidu.com/pypi/simple
            or
-            python -m pip install paddlepaddle-gpu==1.7.2.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple
+            python -m pip install paddlepaddle-gpu==1.8.0.post97 -i https://pypi.tuna.tsinghua.edu.cn/simple
        If you are using Python 2, CUDA 10.0, cuDNN 7.3+, command to install GPU version:
        ::
-            python -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://mirror.baidu.com/pypi/simple
+            python -m pip install paddlepaddle-gpu==1.8.0.post107 -i https://mirror.baidu.com/pypi/simple
            or
-            python -m pip install paddlepaddle-gpu==1.7.2.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
+            python -m pip install paddlepaddle-gpu==1.8.0.post107 -i https://pypi.tuna.tsinghua.edu.cn/simple
        If you are using Python 3, please change **python** in the above command to **python3** and install.
@@ -437,12 +437,12 @@ If you want to use `docker <https://www.docker.com>`_ to install PaddlePaddle, y
    (2). Pull the image of the preinstalled PaddlePaddle:
    ::
-        docker pull hub.baidubce.com/paddlepaddle/paddle:1.7.2
+        docker pull hub.baidubce.com/paddlepaddle/paddle:1.8.0
    (3). Use the image to build and enter the Docker container:
    ::
-        docker run --name paddle -it -v dir1:dir2 hub.baidubce.com/paddlepaddle/paddle:1.7.2 /bin/bash
+        docker run --name paddle -it -v dir1:dir2 hub.baidubce.com/paddlepaddle/paddle:1.8.0 /bin/bash
        > --name [Name of container] set the name of Docker;
@@ -450,7 +450,7 @@ If you want to use `docker <https://www.docker.com>`_ to install PaddlePaddle, y
        > -v Parameter is used to share files between the host and the container. dir1 is the host directory and dir2 is the directory mounted inside the container. Users can customize their own mounting directory by setting dir1 and dir2.For example, $PWD:/paddle specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container; 
-        > hub.baidubce.com/paddlepaddle/paddle:1.7.2 is the image name you need to use；/bin/bash is the command to be executed in Docker
+        > hub.baidubce.com/paddlepaddle/paddle:1.8.0 is the image name you need to use；/bin/bash is the command to be executed in Docker
 2. **GPU version**
@@ -478,12 +478,12 @@ If you want to use `docker <https://www.docker.com>`_ to install PaddlePaddle, y
    (2). Pull the image that supports CUDA 10.0, cuDNN 7.3 + pre installed PaddlePaddle:
    ::
-        nvidia-docker pull hub.baidubce.com/paddlepaddle/paddle:1.7.2-gpu-cuda10.0-cudnn7
+        nvidia-docker pull hub.baidubce.com/paddlepaddle/paddle:1.8.0-gpu-cuda10.0-cudnn7
    (3). Use the image to build and enter the docker container:
    ::
-        nvidia-docker run --name paddle -it -v dir1:dir2 hub.baidubce.com/paddlepaddle/paddle:1.7.2-gpu-cuda10.0-cudnn7 /bin/bash
+        nvidia-docker run --name paddle -it -v dir1:dir2 hub.baidubce.com/paddlepaddle/paddle:1.8.0-gpu-cuda10.0-cudnn7 /bin/bash
        > --name [Name of container] set name of Docker;
@@ -491,7 +491,7 @@ If you want to use `docker <https://www.docker.com>`_ to install PaddlePaddle, y
        > -v Parameter is used to share files between the host and the container. dir1 is the host directory and dir2 is the directory mounted inside the container. Users can customize their own mounting directory by setting dir1 and dir2.For example, $PWD:/paddle specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container;
-        > hub.baidubce.com/paddlepaddle/paddle:1.7.2 is the image name you need to use；/bin/bash is the command to be executed in Docker
+        > hub.baidubce.com/paddlepaddle/paddle:1.8.0 is the image name you need to use；/bin/bash is the command to be executed in Docker
    Or if you need the version supporting **CUDA 9**, replace **cuda10.0** of the above command with **cuda9.0** 
@@ -499,7 +499,7 @@ If you want to use `docker <https://www.docker.com>`_ to install PaddlePaddle, y
    ::
-        docker run --name paddle -it -v dir1:dir2 paddlepaddle/paddle:1.7.2 /bin/bash
+        docker run --name paddle -it -v dir1:dir2 paddlepaddle/paddle:1.8.0 /bin/bash
        > --name [Name of container] set name of Docker;
@@ -507,7 +507,7 @@ If you want to use `docker <https://www.docker.com>`_ to install PaddlePaddle, y
        > -v Parameter is used to share files between the host and the container. dir1 is the host directory and dir2 is the directory mounted inside the container. Users can customize their own mounting directory by setting dir1 and dir2.For example, $PWD:/paddle specifies to mount the current path of the host (PWD variable in Linux will expand to the absolute path of the current path) to the /paddle directory inside the container; 
-        > paddlepaddle/paddle:1.7.2 is the image name you need to use；/bin/bash is the command to be executed in docker
+        > paddlepaddle/paddle:1.8.0 is the image name you need to use；/bin/bash is the command to be executed in docker
 4. Verify installation

--- a/doc/fluid/user_guides/nlp_case/machine_translation/README.md
+++ b/doc/fluid/user_guides/nlp_case/machine_translation/README.md
@@ -155,15 +155,6 @@ import paddle.fluid.layers as pd
 from paddle.fluid.executor import Executor
 from functools import partial
 import os
-try:
-    from paddle.fluid.contrib.trainer import *
-    from paddle.fluid.contrib.inferencer import *
-except ImportError:
-    print(
-        "In the fluid 1.0, the trainer and inferencer are moving to paddle.fluid.contrib",
-        file=sys.stderr)
-    from paddle.fluid.trainer import *
-    from paddle.fluid.inferencer import *
 dict_size = 30000 # dictionary dimension
 source_dict_dim = target_dict_dim = dict_size # source/target language dictionary dimension

--- a/doc/fluid/user_guides/nlp_case/machine_translation/index.html
+++ b/doc/fluid/user_guides/nlp_case/machine_translation/index.html
@@ -197,15 +197,6 @@ import paddle.fluid.layers as pd
 from paddle.fluid.executor import Executor
 from functools import partial
 import os
-try:
-    from paddle.fluid.contrib.trainer import *
-    from paddle.fluid.contrib.inferencer import *
-except ImportError:
-    print(
-        "In the fluid 1.0, the trainer and inferencer are moving to paddle.fluid.contrib",
-        file=sys.stderr)
-    from paddle.fluid.trainer import *
-    from paddle.fluid.inferencer import *
 dict_size = 30000 # dictionary dimension
 source_dict_dim = target_dict_dim = dict_size # source/target language dictionary dimension

--- a/doc/fluid/user_guides/nlp_case/understand_sentiment/README.cn.md
+++ b/doc/fluid/user_guides/nlp_case/understand_sentiment/README.cn.md
@@ -268,12 +268,12 @@ print("Loading IMDB word dict....")
 word_dict = paddle.dataset.imdb.word_dict()
 print ("Reading training data....")
-train_reader = paddle.batch(
+train_reader = fluid.io.batch(
-    paddle.reader.shuffle(
+    fluid.io.shuffle(
        paddle.dataset.imdb.train(word_dict), buf_size=25000),
    batch_size=BATCH_SIZE)
 print("Reading testing data....")
-test_reader = paddle.batch(
+test_reader = fluid.io.batch(
    paddle.dataset.imdb.test(word_dict), batch_size=BATCH_SIZE)
 ```
 word_dict是一个字典序列，是词和label的对应关系，运行下一行可以看到具体内容：

--- a/doc/fluid/user_guides/nlp_case/understand_sentiment/README.md
+++ b/doc/fluid/user_guides/nlp_case/understand_sentiment/README.md
@@ -257,12 +257,12 @@ print("Loading IMDB word dict....")
 word_dict = paddle.dataset.imdb.word_dict()
 print ("Reading training data....")
-train_reader = paddle.batch(
+train_reader = fluid.io.batch(
-    paddle.reader.shuffle(
+    fluid.io.shuffle(
        paddle.dataset.imdb.train(word_dict), buf_size=25000),
    batch_size=BATCH_SIZE)
 print("Reading testing data....")
-test_reader = paddle.batch(
+test_reader = fluid.io.batch(
    paddle.dataset.imdb.test(word_dict), batch_size=BATCH_SIZE)
 ```
 Word_dict is a dictionary sequence, which is the correspondence between words and labels. You can see it specifically by running the next code:

--- a/doc/fluid/user_guides/nlp_case/understand_sentiment/index.cn.html
+++ b/doc/fluid/user_guides/nlp_case/understand_sentiment/index.cn.html
@@ -310,12 +310,12 @@ print("Loading IMDB word dict....")
 word_dict = paddle.dataset.imdb.word_dict()
 print ("Reading training data....")
-train_reader = paddle.batch(
+train_reader = fluid.io.batch(
-    paddle.reader.shuffle(
+    fluid.io.shuffle(
        paddle.dataset.imdb.train(word_dict), buf_size=25000),
    batch_size=BATCH_SIZE)
 print("Reading testing data....")
-test_reader = paddle.batch(
+test_reader = fluid.io.batch(
    paddle.dataset.imdb.test(word_dict), batch_size=BATCH_SIZE)
 ```
 word_dict是一个字典序列，是词和label的对应关系，运行下一行可以看到具体内容：

--- a/doc/fluid/user_guides/nlp_case/understand_sentiment/index.html
+++ b/doc/fluid/user_guides/nlp_case/understand_sentiment/index.html
@@ -299,12 +299,12 @@ print("Loading IMDB word dict....")
 word_dict = paddle.dataset.imdb.word_dict()
 print ("Reading training data....")
-train_reader = paddle.batch(
+train_reader = fluid.io.batch(
-    paddle.reader.shuffle(
+    fluid.io.shuffle(
        paddle.dataset.imdb.train(word_dict), buf_size=25000),
    batch_size=BATCH_SIZE)
 print("Reading testing data....")
-test_reader = paddle.batch(
+test_reader = fluid.io.batch(
    paddle.dataset.imdb.test(word_dict), batch_size=BATCH_SIZE)
 ```
 Word_dict is a dictionary sequence, which is the correspondence between words and labels. You can see it specifically by running the next code:

--- a/doc/fluid/user_guides/nlp_case/understand_sentiment/train_conv.py
+++ b/doc/fluid/user_guides/nlp_case/understand_sentiment/train_conv.py
@@ -88,16 +88,16 @@ def train(use_cuda, params_dirname):
    print("Reading training data....")
    if args.enable_ce:
-        train_reader = paddle.batch(
+        train_reader = fluid.io.batch(
            paddle.dataset.imdb.train(word_dict), batch_size=BATCH_SIZE)
    else:
-        train_reader = paddle.batch(
+        train_reader = fluid.io.batch(
-            paddle.reader.shuffle(
+            fluid.io.shuffle(
                paddle.dataset.imdb.train(word_dict), buf_size=25000),
            batch_size=BATCH_SIZE)
    print("Reading testing data....")
-    test_reader = paddle.batch(
+    test_reader = fluid.io.batch(
        paddle.dataset.imdb.test(word_dict), batch_size=BATCH_SIZE)
    feed_order = ['words', 'label']

--- a/doc/fluid/user_guides/nlp_case/understand_sentiment/train_dyn_rnn.py
+++ b/doc/fluid/user_guides/nlp_case/understand_sentiment/train_dyn_rnn.py
@@ -79,16 +79,16 @@ def train(use_cuda, params_dirname):
    print("Reading training data....")
    if args.enable_ce:
-        train_reader = paddle.batch(
+        train_reader = fluid.io.batch(
            paddle.dataset.imdb.train(word_dict), batch_size=BATCH_SIZE)
    else:
-        train_reader = paddle.batch(
+        train_reader = fluid.io.batch(
-            paddle.reader.shuffle(
+            fluid.io.shuffle(
                paddle.dataset.imdb.train(word_dict), buf_size=25000),
            batch_size=BATCH_SIZE)
    print("Reading testing data....")
-    test_reader = paddle.batch(
+    test_reader = fluid.io.batch(
        paddle.dataset.imdb.test(word_dict), batch_size=BATCH_SIZE)
    feed_order = ['words', 'label']

--- a/doc/fluid/user_guides/nlp_case/understand_sentiment/train_stacked_lstm.py
+++ b/doc/fluid/user_guides/nlp_case/understand_sentiment/train_stacked_lstm.py
@@ -99,11 +99,11 @@ def train(use_cuda, params_dirname):
    print("Reading training data....")
    if args.enable_ce:
-        train_reader = paddle.batch(
+        train_reader = fluid.io.batch(
            paddle.dataset.imdb.train(word_dict), batch_size=BATCH_SIZE)
    else:
-        train_reader = paddle.batch(
+        train_reader = fluid.io.batch(
-            paddle.reader.shuffle(
+            fluid.io.shuffle(
                paddle.dataset.imdb.train(word_dict), buf_size=25000),
            batch_size=BATCH_SIZE)

--- a/doc/fluid/user_guides/tools/index_cn.rst
+++ b/doc/fluid/user_guides/tools/index_cn.rst
@@ -4,10 +4,11 @@
 ..  todo::
-这里PaddlePaddle为大家提供了一篇：百度云分布式训练CTR预估任务和Serving流程一键部署的案例文章
+这里PaddlePaddle为大家提供了两篇案例文章：百度云分布式训练CTR预估任务和Serving流程一键部署的案例文章，以及飞桨大规模分类库使用的案例文章。
 ..  toctree::
    :titlesonly:
    elastic_ctr/deploy_ctr_on_baidu_cloud_cn.rst
+    plsc/plsc_guider_cn.rst
--- a/doc/fluid/user_guides/tools/plsc/plsc_guider_cn.rst
+++ b/doc/fluid/user_guides/tools/plsc/plsc_guider_cn.rst
+飞桨大规模分类库简介
+===================
+图像分类技术日趋成熟，ResNet网络在ImageNet数据集上的top5准确率已超过96%。然而，如何高效地完成百万类别甚至是更大规模的分类任务，则是一个极具挑战性的课题。
+从多分类神经网络的实现角度分析，其最后一层通常是由全连接层和Softmax构成的组合层，全连接层输出结点数挂钩分类任务的类别数，所以对应的参数量随分类类别数的增长而线性增长。因此，当类别数非常大时，神经网络训练过程占用的显存空间也会很大，甚至是超出单张GPU卡的显存容量，导致神经网络模型无法训练。
+以新闻推荐系统为例，假设要对百万类细分类别的新闻条目进行分类，那么仅存储全连接层参数就需要约2GB的显存空间（这里假设神经网络最后一层隐层的输出结点的维度为512，并假设以32比特浮点数表示数据）。再考虑神经网络训练过程中生成的数量庞多的中间变量，那么训练过程中需要的存储总量往往会超出单张GPU卡的显存容量。
+该如何解决这个问题呢？常用的做法是“拆分”。考虑到全连接层的线性可分性，可以将全连接层参数切分到多张GPU卡，采用模型并行方案，减少每张GPU卡的参数存储量。
+以下图为例，全连接层参数按行切分到不同的GPU卡上。每次训练迭代过程中，各张GPU卡分别以各自的训练数据计算隐层的输出特征(feature)，并通过集合通信操作AllGather得到汇聚后的特征。接着，各张GPU卡以汇聚后的特征和部分全连接层参数计算部分logit值(partial logit)，并基于此计算神经网络的损失值。
+.. image:: ./plsc_overview.png
+   :target: ./plsc_overview.png
+   :alt: plsc_overview
+   :width: 400px
+这个方案可以有效解决全连接层参数量随分类类别数线性增长导致的显存空间不足的问题。然而，为了实现这一方案，开发者需要基于现有的深度学习平台设计和实现上例描述的所有操作，包括全连接层参数的切分和集合通信等，动辄需要数百行实现代码，大大增加了开发者的负担。
+现在，开发者的福音来了，飞桨近期开源了基于核心框架构建的大规模分类库(PLSC: PaddlePaddle Large Scale Classification)，为用户提供了大规模分类任务从训练到部署的全流程解决方案。只需数行代码，即可实现千万类别分类的神经网络。并且，通过PLSC库提供的serving功能用户可以快速部署模型，提供一站式服务。
+简单易用，五行代码实现千万类别神经网络
+--------------------------------------
+飞桨大规模分类库PLSC（以下简称PLSC）封装了大规模分类神经网络实现，提供简洁易用的高层API，用户通过五行代码即可实现千万类别分类神经网络。
+安装飞桨
+^^^^^^^^
+可以参考官网下载并安装飞桨: `飞桨安装文档 <https://www.paddlepaddle.org.cn/install/quick>`_。
+安装PLSC
+^^^^^^^^
+执行下面的命令安装PLSC。
+.. code-block:: shell
+   pip install plsc
+准备模型训练配置代码，保存为train.py文件
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+使用PLSC组建分类神经网络主要包括下面三个步骤：
+#. 
+   从plsc包导入Entry类，Entry类封装PLSC所有API的接口类；
+#. 
+   实例化Entry类的对象；
+#. 
+   调用Entry类的train方法，开始训练过程。
+默认情况下，该训练脚本使用的loss值计算方法为'dist_arcface'，即将全连接层参数切分到多张GPU卡的模型并行方案，需要使用两张或以上的GPU卡。
+.. code-block:: python
+   from plsc import Entry
+   if __name__ == "main":
+           ins = Entry()
+           ins.set_class_num(1000000) #设置分类类别数
+           ins.train()
+启动训练任务
+^^^^^^^^^^^^
+可以使用下面的命令行启动训练任务，其中selected_gpus参数用于指定训练中使用的GPU卡。
+.. code-block:: shell
+   python -m paddle.distributed.launch \
+               --selected_gpus=0,1,2,3,4,5,6,7 \
+               train.py
+PLSC训练效果达到SOTA精度
+------------------------
+PLSC库在多个数据集上可以取得SOTA的训练精度，下表列出PLSC库分别使用MS1M-ArcFace和CASIA数据集作为训练数据，在不同验证数据集上取得的精度。
+.. list-table::
+   :header-rows: 1
+   * - 模型
+     - 训练集
+     - lfw
+     - agendb_30
+     - cfp_ff
+     - cfp_fp
+     - MegaFace (Id/Ver)
+   * - ResNet50
+     - MS1M-ArcFace
+     - 0.99817
+     - 0.99827
+     - 0.99857
+     - 0.96314
+     - 0.980/0.993
+   * - ResNet50
+     - CASIA
+     - 0.98950
+     - 0.90950
+     - 0.99057
+     - 0.91500
+     - N/A
+备注：上述模型训练使用的loss_type为'dist_arcface'。更多关于ArcFace的内容请参考
+**ArcFace:** Additive Angular Margin Loss for Deep Face Recognition
+https://arxiv.org/abs/1801.07698
+LSC支持多机分布式训练和千万规模分类
+-----------------------------------
+PLSC支持多机分布式训练。一方面，通过多机分布式训练可以将全连接层参数切分到更多的GPU卡，从而支持千万类别分类，并且飞桨大规模分类库理论上支持的分类类别数随着使用的GPU卡数的增加而增加。例如，单机8张V100 GPU配置下支持的最大分类类别数相比不使用PLSC扩大2.52倍。
+另一方面，使用多机分布式训练可以有效提升训练速度。
+通过下面几行命令即可启动多机分布式训练。其中，cluster_node_ips参数用于指定所有训练节点的ip地址列表，node_ip参数用于指定当前训练节点的ip地址。
+.. code-block:: shel
+   python -m paddle.distributed.launch \
+           --cluster_node_ips="127.0.0.1,127.0.0.2" \
+           --node_ip="127.0.0.1" \
+           --selected_gpus=0,1,2,3,4,5,6,7 \
+           train.py
+下图给出使用不同数量的节点时的训练速度（吞吐）。实验中使用的训练数据集为MS1M-ArcFace，分类类别数为85742，每个节点配备8张NVIDIA V100 GPUs，backbone模型为ResNet50。如图所示，使用飞桨大规模分类库可以取得近似线性的加速比。
+.. image:: ./plsc_performance.png
+   :target: ./plsc_performance.png
+   :alt: performance
+PLSC提供从训练到部署的全流程解决方案
+------------------------------------
+用户完成分类神经网络训练后，通常要基于得到的预训练模型部署预测服务。通过飞桨大规模分类库提供的serving功能可实现快速部署。
+飞桨大规模分类库提供支持预测服务部署的serving端和client端。serving端基于飞桨服务器端部署库Paddle Serving开发，使用serving端功能可以基于预训练模型快速部署预测服务。client端则提供了和serving端的交互功能，用户通过client端提交查询请求并获取预测结果。只需三步即可完成部署。
+安装serving端和client端
+^^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: shell
+   pip install plsc-serving ujson
+通过下面的脚本部署serving端
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: python
+   from plsc_serving.run import PLSCServer
+   fs = PLSCServer()
+   # 设定使用的模型路径
+   fs.with_model(model_path = '/XXX/XXX')
+   # gpu_index指定使用的gpu，port指定使用的端口
+   fs.run(gpu_index = 0, port = 8010)
+通过下面的脚本使用client端功能
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: python
+   from face_service import FaceService
+   with open('./data/00000000.jpg', 'rb') as f:
+       image = f.read()
+   fc = FaceService()
+   # 添加server端连接
+   fc.connect('127.0.0.1:8010')
+   #调用server端预测
+   result = fc.encode([image])
+   print(result[0])
+   fc.close()
+PLSC支持混合精度训练
+--------------------
+单机8张Nvidia Tesla v100 GPU配置下，混合精度比常规单精度训练速度提升42%。
+使用混合精度训练可以提升训练的速度，同时减少训练使用的显存开销。开启混合精度训练方法如下：
+.. code-block:: python
+   from plsc import Entry
+   def main():
+       ins = Entry()
+       ins.set_mixed_precision(True)
+       ins.train()
+   if __name__ == "__main__":
+       main()
+在单机8张Nvidia Tesla v100 GPU配置下，对比resnet50模型单精度训练和混合精度训练的效果，混合精度训练速度可提升42%：
+.. list-table::
+   :header-rows: 1
+   * - 模型
+     - 单精度训练
+     - 混合精度训练
+     - 加速比
+   * - ResNet50
+     - 2567 images/s
+     - 3643 images/s
+     - 1.42
+关于混合精度训练的内容请参考：
+https://arxiv.org/abs/1710.03740
+PLSC支持Base64格式图像数据预处理
+--------------------------------
+实际业务中，一种常见的数据存储格式是将图像数据编码为base64格式，训练数据文件的每一行存储一张base64格式编码的图像数据和该图像的标签，并通常以制表符('\t')分隔图像数据和图像标签。
+神经网络训练过程中，通常需要对训练数据做全局shuffle。此外，需要切分训练数据，确保每张GPU卡使用相同数量的训练数据。对Base64格式的数据做全局shuffle的开销较大，若在训练过程中执行全局shuffle，会严重影响训练速度。
+飞桨大规模分类库内置Base64格式数据预处理工具，可以对训练数据做全局shuffle，并将训练数据均分到多个数据文件，确保数据文件的数量和训练中使用的GPU卡数相同，且每个数据文档包含相同数量的训练数据。训练效率显著提升。
+PLSC支持fine-tuning训练时GPU卡数的动态调整
+------------------------------------------
+我们有时需要基于预训练模型做fine-tuning这种场景下，fine-tuning阶段的训练GPU卡数和预训练阶段使用的GPU卡数可能不同，尤其是当预训练和fine-tuning是分别由不同的组织执行时。考虑全连接层参数是根据使用的GPU卡数切分的这一情形，当fine-tuning阶段和预训练阶段使用不同的GPU卡数时，在加载模型参数前，用户需要重构模型参数，以适应fine-tuning阶段的GPU卡数。为了简化用户操作，飞桨大规模分类库提供了自动化的模型参数重构功能。当fine-tuning阶段使用的GPU卡数和预训练阶段不同时，飞桨大规模分类库在加载预训练模型参数时会自动根据fine-tuning阶段使用的GPU卡数重构预训练模型参数，以适应fine-tuning阶段的GPU卡数。
+PLSC助力百度AI口罩检测方案快速上线
+----------------------------------
+面对疫情，百度近期攻克了戴口罩人脸识别技术难关，快速上线了AI口罩检测方案，并在地铁、园区、厂区等场所上线，高效保障防疫工作。
+百度AI口罩检测方案采用百度最新的PyramidBox-lite检测算法，加入超过10万张口罩人脸训练数据。为了解决数百万ID数据训练问题，采用飞桨大规模分类库PLSC实现了快速训练。在准确率不变的情况下，召回率提升30%，佩戴口罩的人脸检测准确率超过99%。
+更多飞桨PLSC的应用方法，欢迎访问飞桨PLSC项目地址：
+https://github.com/PaddlePaddle/PLSC
--- a/doc/fluid/user_guides/tools/plsc/plsc_overview.png
+++ b/doc/fluid/user_guides/tools/plsc/plsc_overview.png
--- a/doc/fluid/user_guides/tools/plsc/plsc_performance.png
+++ b/doc/fluid/user_guides/tools/plsc/plsc_performance.png
--- a/scripts/checkapproval.sh
+++ b/scripts/checkapproval.sh
@@ -6,12 +6,12 @@ for API_FILE in ${API_FILES[*]}; do
  if [ "${API_CHANGE}" ];then
    approval_line=`curl -H "Authorization: token ${GITHUB_API_TOKEN}" https://api.github.com/repos/PaddlePaddle/FluidDoc/pulls/${GIT_PR_ID}/reviews?per_page=10000`
    if [ "${API_FILE}" == "doc/fluid" ];then
-      APPROVALS=`echo ${approval_line}|python ./scripts/check_pr_approval.py 1 31623103 2870059 27208573` 
+      APPROVALS=`echo ${approval_line}|python ./scripts/check_pr_approval.py 1 31623103 2870059 27208573 28379894` 
    fi
  fi
  if [ "${APPROVALS}" == "FALSE" ]; then
    if [ "${API_FILE}" == "doc/fluid" ];then
-      echo "You must have one TPM (saxon-zh or Boyan-Liu or swtkiwi) approval for the api change! ${API_FILE} for the management reason of API interface and API document."
+      echo "You must have one TPM (saxon-zh or Boyan-Liu or swtkiwi or Heeenrrry) approval for the api change! ${API_FILE} for the management reason of API interface and API document."
    fi
    exit 1
  fi