Merge branch 'prv-calibration'

dac93eb3 · Haihao Shen · af9d884b · b6ff1105 · dac93eb3 · dac93eb3
1000 changed file
--- a/.github/ISSUE_TEMPLATE/---feature-request-.md
+++ b/.github/ISSUE_TEMPLATE/---feature-request-.md
+---
+name: 建议(Feature request)
+about: 您可以提出您的建议。 You could use this template for reporting a suggestion  issue.
+---
+欢迎您对PaddlePaddle提出建议，非常感谢您对PaddlePaddle的贡献！
+在留下您的建议时，辛苦您同步提供如下信息：
+- 版本、环境信息
+1）PaddlePaddle版本：请提供您的PaddlePaddle版本号，例如1.1
+2）CPU/GPU：您是否使用GPU进行训练，如是，请提供您的CUDA和cuDNN版本号
+3）系统环境：请您描述系统类型、版本，例如Mac OS 10.14
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 建议描述：请您详细描述，您认为需优化的功能
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
+Please make sure that this is a feature request. 
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg.Mac OS 10.14)
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe the feature and the current behavior/state.**
+**Any Other info.**
--- a/.github/ISSUE_TEMPLATE/---inference-issue-.md
+++ b/.github/ISSUE_TEMPLATE/---inference-issue-.md
+---
+name: 预测（Inference Issue）
+about: 您可以提问预测中报错、应用等问题。 You could use this template for reporting an inference issue.
+---
+为使您的问题得到快速解决，在建立Issue前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+如果您没有查询到相似问题，为快速解决您的提问，建立issue时请提供如下细节信息：
+- 标题：简洁、精准描述您的问题，例如“最新预测库的API文档在哪儿 ”
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供您的PaddlePaddle版本号（如1.1）或CommitID
+    2）CPU：预测若用CPU，请提供CPU型号，MKL/OpenBlas/MKLDNN/等数学库使用情况
+    3）GPU：预测若用GPU，请提供GPU型号、CUDA和CUDNN版本号
+    4）系统环境：请您描述系统类型、版本（如Mac OS 10.14），Python版本
+-预测信息
+    1）C++预测：请您提供预测库安装包的版本信息，及其中的version.txt文件
+    2）CMake包含路径的完整命令
+    3）API信息（如调用请提供）
+    4）预测库来源：官网下载/特殊环境（如BCLOUD编译）
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志/代码关键片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github in case that th
+If there is no solution,please make sure that this is an inference issue including the following details :
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg.Mac OS 10.14)
+-Python version
+-Cmake orders
+-C++version.txt
+-API information
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.github/ISSUE_TEMPLATE/---installation-issue-.md
+++ b/.github/ISSUE_TEMPLATE/---installation-issue-.md
+---
+name: 安装（Installation Issue）
+about: 您可以提问安装、编译出现报错等问题。 You could use this template for reporting an installation
+   issue.
+---
+为使您的问题得到快速解决，在建立Issue前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+建立issue时，为快速解决问题，请您根据使用情况给出如下信息：
+- 标题：请包含关键词“安装错误”/“编译错误”，例如“Mac编译错误”
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供您的PaddlePaddle版本号（如1.1）或CommitID
+    2）CPU：请提供CPU型号，MKL/OpenBlas/MKLDNN/等数学库的使用情况
+    3）GPU：请提供GPU型号，CUDA和CUDNN版本号
+    4）系统环境：请说明系统类型、版本（如Mac OS 10.14）、Python版本
+- 安装方式信息：
+1）pip安装/docker安装
+2）本地编译：请提供cmake命令，编译命令
+3）docker编译：请提供docker镜像，编译命令            
+  特殊环境请注明：如离线安装等
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志/代码关键片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in Github in case that there was a similar issue submitted or resolved before.
+If there is no solution,please make sure that this is an installation issue including the following details:
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg. Mac OS 10.14)
+-Python version
+- Install method: pip install/install with docker/build from source(without docker)/build within docker
+- Other special cases that you think may be related to this problem, eg. offline install, special internet condition   
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.github/ISSUE_TEMPLATE/---model-issue-.md
+++ b/.github/ISSUE_TEMPLATE/---model-issue-.md
+---
+name: 模型（Model Issue）
+about: 您可以提问模型、算法、数据集方向的使用报错等问题。You could use this template for reporting a model/
+  algorithm/dataset  issue.
+---
+为使您的问题得到快速解决，在建立Issue前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+建立issue时，为快速解决问题，请您根据使用情况给出如下信息：
+- 标题：简洁、精准描述您的问题，例如“ssd 模型前置lstm报错  ”
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供PaddlePaddle版本号，例如1.1或CommitID
+    2）CPU：请提供CPU型号，MKL/OpenBlas/MKLDNN/等数学库的使用情况
+    3）GPU：请提供GPU型号，CUDA和CUDNN版本号
+    4）系统环境：请说明系统类型、版本（例如Mac OS 10.14），Python版本
+- 模型信息
+    1）模型名称 2）使用数据集名称 3）使用算法名称 4）模型链接
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志/代码关键片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github.Probably there was a similar issue submitted or resolved before.
+If there is no solution,please make sure that this is a issue of models including the following details:
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg.Mac OS 10.14)
+-Python version
+-Name of Models&Dataset/details of operator
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.github/ISSUE_TEMPLATE/---others-.md
+++ b/.github/ISSUE_TEMPLATE/---others-.md
+---
+name: 其他（Others）
+about: 如上述分类未包含您的问题，可在此提出。 You could use this template for reporting other issues
+---
+为使您的问题得到快速解决，在建立Issues前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+如果您没有查询到相似问题，为快速解决您的提问，建立issue时请提供如下细节信息：
+- 标题：简洁、精准概括您的问题
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供您的PaddlePaddle版本号，例如1.1或CommitID
+    2）CPU/GPU：如果您使用GPU训练，请提供GPU驱动版本、CUDA和cuDNN版本号
+    3）系统环境：请您描述系统类型、版本，例如Mac OS 10.14
+    4）Python版本号
+    5）显存信息
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志/代码关键片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
+If there is no solution,please provide us with the following details :
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/cuDNN version
+-OS Platform and Distribution(eg.Mac OS 10.14)
+-Python version 
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.github/ISSUE_TEMPLATE/---training-issue-.md
+++ b/.github/ISSUE_TEMPLATE/---training-issue-.md
+---
+name: 训练（Training issue）
+about: 您可以提问训练中报错、应用、出core等问题。 You could use this template for reporting an training
+   issue.
+---
+为使您的问题得到快速解决，在建立Issues前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
+如果您没有查询到相似问题，为快速解决您的提问，建立issue时请提供如下细节信息：
+- 标题：简洁、精准概括您的问题，例如“Insufficient Memory xxx" ”
+- 版本、环境信息：
+    1）PaddlePaddle版本：请提供您的PaddlePaddle版本号，例如1.1或CommitID
+    2）CPU：预测若用CPU，请提供CPU型号，MKL/OpenBlas/MKLDNN/等数学库使用情况
+    3）GPU：预测若用GPU，请提供GPU型号、CUDA和CUDNN版本号
+    4）系统环境：请您描述系统类型、版本，例如Mac OS 10.14，Python版本
+- 训练信息
+    1）单机/多机，单卡/多卡
+    2）显存信息
+    3）Operator信息
+- 复现信息：如为报错，请给出复现环境、复现步骤
+- 问题描述：请详细描述您的问题，同步贴出报错信息、日志、可复现的代码片段
+Thank you for contributing to PaddlePaddle.
+Before submitting the issue, you could search issue in the github in case that there was a similar issue submitted or resolved before.
+If there is no solution,please make sure that this is a training issue including the following details:
+**System information**
+-PaddlePaddle version （eg.1.1）or CommitID
+-CPU: including CPUMKL/OpenBlas/MKLDNN version
+-GPU: including CUDA/CUDNN version
+-OS Platform (eg.Mac OS 10.14)
+-Other imformation: Distriuted training/informantion of operator/
+Graphics card storage
+**To Reproduce**
+Steps to reproduce the behavior
+**Describe your current behavior**
+**Code to reproduce the issue**
+**Other info / logs**
--- a/.gitignore
+++ b/.gitignore
+python/paddle/fluid/tests/unittests/reader_reset_test.recordio
 paddle/operators/check_t.save
 paddle/operators/check_tensor.ls
 paddle/operators/tensor.save
 python/paddle/v2/fluid/tests/book/image_classification_resnet.inference.model/
 python/paddle/v2/fluid/tests/book/image_classification_vgg.inference.model/
 python/paddle/v2/fluid/tests/book/label_semantic_roles.inference.model/
+paddle/fluid/operators/distributed/send_recv.proto
 *.DS_Store
 *.vs
 build/
@@ -25,5 +27,8 @@ third_party/
 bazel-*
 third_party/
+build_*
 # clion workspace.
 cmake-build-*
+paddle/fluid/operators/distributed/send_recv.proto
+model_test
--- a/.travis.yml
+++ b/.travis.yml
@@ -12,7 +12,6 @@ services:
 os:
  - linux
 env:
-  - JOB=doc
  - JOB=check_style
  - JOB=build_android
 addons:

--- a/AUTHORS.md
+++ b/AUTHORS.md
@@ -25,6 +25,7 @@
 | kexinzhao | Ke-Xin Zhao |
 | kuke | Yi-Bing Liu |
 | lcy-seso | Ying Cao |
+| cjld | Dun Liang |
 | lipeng-unisound | Peng Li |
 | liuyuan | Yuan Liu |
 | livc | Zhao Li |
@@ -42,7 +43,9 @@
 | QiJune | Jun Qi |
 | qingqing01 | Qing-Qing Dang |
 | reyoung | Yang Yu |
+| Sand3r- | Michal Gallus |
 | Superjom | Chun-Wei Yan |
+| tensor-tang | Jian Tang |
 | tianbingsz | Tian-Bing Xu |
 | tpatejko | Tomasz Patejko |
 | typhoonzero | Yi Wu |

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -26,6 +26,11 @@ message(STATUS "C compiler: ${CMAKE_C_COMPILER}, version: "
        "${CMAKE_C_COMPILER_ID} ${CMAKE_C_COMPILER_VERSION}")
 if(WIN32)
    set(CMAKE_STATIC_LIBRARY_PREFIX lib)
+    add_definitions("/DGOOGLE_GLOG_DLL_DECL=")
+    set(CMAKE_C_FLAGS_DEBUG   "${CMAKE_C_FLAGS_DEBUG} /bigobj /MTd")
+    set(CMAKE_C_FLAGS_RELEASE  "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
+    set(CMAKE_CXX_FLAGS_DEBUG  "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
+    set(CMAKE_CXX_FLAGS_RELEASE   "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
 endif(WIN32)
 if(NOT CMAKE_CROSSCOMPILING)
@@ -41,6 +46,7 @@ option(WITH_GPU         "Compile PaddlePaddle with NVIDIA GPU"          ${CUDA_F
 option(WITH_AMD_GPU     "Compile PaddlePaddle with AMD GPU"             OFF)
 option(WITH_AVX         "Compile PaddlePaddle with AVX intrinsics"      ${AVX_FOUND})
 option(WITH_MKL         "Compile PaddlePaddle with MKL support."        ${AVX_FOUND})
+option(WITH_NGRAPH      "Compile PaddlePaddle with nGraph support."     OFF)
 option(WITH_DSO         "Compile PaddlePaddle with dynamic linked CUDA" ON)
 option(WITH_TESTING     "Compile PaddlePaddle with unit testing"        OFF)
 option(WITH_SWIG_PY     "Compile PaddlePaddle with inference api"       ON)
@@ -62,15 +68,18 @@ option(WITH_DISTRIBUTE  "Compile with distributed support"              OFF)
 option(USE_EIGEN_FOR_BLAS   "Use matrix multiplication in Eigen"        OFF)
 option(EIGEN_USE_THREADS "Compile with multi-threaded Eigen"            OFF)
 option(WITH_ARM_FP16    "Use half precision support on armv8.2-a cpu"   OFF)
-option(WITH_FAST_BUNDLE_TEST    "Bundle tests that can be run in a single process together to reduce launch overhead"   OFF)
 option(WITH_CONTRIB     "Compile the third-party contributation"        OFF)
 option(REPLACE_ENFORCE_GLOG "Replace PADDLE_ENFORCE with glog/CHECK for better debug." OFF)
 option(WITH_ANAKIN      "Compile with Anakin library"                   OFF)
+option(ANAKIN_BUILD_FAT_BIN "Build anakin cuda fat-bin lib for all device plantform, ignored when WITH_ANAKIN=OFF" OFF)
+option(ANAKIN_BUILD_CROSS_PLANTFORM "Build anakin lib for any nvidia device plantform. ignored when WITH_ANAKIN=OFF" ON)
 option(WITH_GRPC     "Use grpc as the default rpc framework"            ${WITH_DISTRIBUTE})
 option(WITH_BRPC_RDMA     "Use brpc rdma as the rpc protocal"           OFF)
-option(WITH_INFERENCE    "Compile fluid inference library"              ON)
+option(ON_INFER         "Turn on inference optimization."               OFF)
+option(WITH_INFERENCE_API_TEST   "Test fluid inference high-level api interface"  OFF)
 option(WITH_SYSTEM_BLAS   "Use system blas library"           OFF)
 option(PY_VERSION       "Compile PaddlePaddle with python3 support"     ${PY_VERSION})
+option(WITH_FAST_MATH   "Make use of fast math library, might affect the precision to some extent" ON)
 # PY_VERSION
 if(NOT PY_VERSION)
@@ -102,6 +111,8 @@ if(ANDROID OR IOS)
        "Disable RDMA when cross-compiling for Android and iOS" FORCE)
    set(WITH_MKL OFF CACHE STRING
        "Disable MKL when cross-compiling for Android and iOS" FORCE)
+    set(WITH_NGRAPH OFF CACHE STRING
+        "Disable nGraph when cross-compiling for Android and iOS" FORCE)
    set(WITH_GOLANG OFF CACHE STRING
        "Disable golang when cross-compiling for Android and iOS" FORCE)
@@ -119,12 +130,30 @@ if (APPLE OR WIN32)
        "Disable MKL for building on mac and windows" FORCE)
 endif()
+if (WIN32)
+    set(WITH_AVX OFF CACHE STRING
+            "Disable AVX when compiling for Windows" FORCE)
+    set(WITH_DSO OFF CACHE STRING
+            "Disable DSO when compiling for Windows" FORCE)
+    set(WITH_MKL OFF CACHE STRING
+            "Disable MKL when compiling for Windows" FORCE)
+    set(WITH_DISTRIBUTE OFF CACHE STRING
+            "Disable DISTRIBUTE when compiling for Windows" FORCE)
+    set(WITH_C_API OFF CACHE STRING
+            "Disable C_API when compiling for Windows" FORCE)
+    set(WITH_FLUID_ONLY ON CACHE STRING
+            "Enable FLUID_ONLY when compiling for Windows" FORCE)
+endif()
 set(THIRD_PARTY_PATH "${CMAKE_BINARY_DIR}/third_party" CACHE STRING
  "A path setting third party libraries download & build directories.")
 set(FLUID_INSTALL_DIR "${CMAKE_BINARY_DIR}/fluid_install_dir" CACHE STRING
  "A path setting fluid shared and static libraries")
+set(FLUID_INFERENCE_INSTALL_DIR "${CMAKE_BINARY_DIR}/fluid_inference_install_dir" CACHE STRING
+  "A path setting fluid inference shared and static libraries")
 if (WITH_C_API AND WITH_PYTHON)
  message(WARNING "It is suggest not embedded a python interpreter in Paddle "
    "when using C-API. It will give an unpredictable behavior when using a "
@@ -167,6 +196,7 @@ include(external/protobuf)  # download, build, install protobuf
 include(external/python)    # download, build, install python
 include(external/openblas)  # download, build, install openblas
 include(external/mkldnn)    # download, build, install mkldnn
+include(external/ngraph)    # download, build, install nGraph
 include(external/swig)      # download, build, install swig
 include(external/boost)     # download boost
 include(external/any)       # download libn::any
@@ -174,11 +204,14 @@ include(external/eigen)     # download eigen3
 include(external/pybind11)  # download pybind11
 include(external/cares)
 include(external/cub)
+include(external/rocprim)
-if (NOT WIN32)
+include(external/xxhash)    # download xxhash
-# there is no official support of snappystream, warpctc, nccl, cupti in windows
+include(external/dlpack)
 include(external/snappy)    # download snappy
 include(external/snappystream) # download snappystream
+if (NOT WIN32)
+# there is no official support of warpctc, nccl, cupti in windows
 include(external/warpctc)   # download, build, install warpctc
 include(cupti)
 endif (NOT WIN32)
@@ -286,6 +319,14 @@ set(PADDLE_PYTHON_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/python/build")
 set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
 set(CMAKE_C_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
+if (ON_INFER)
+    message(STATUS "On inference mode, will take place some specific optimization.")
+    add_definitions(-DPADDLE_ON_INFERENCE)
+else()
+    #TODO(luotao), combine this warning with `make inference_lib_dist` command.
+    message(WARNING "On inference mode, will take place some specific optimization. Turn on the ON_INFER flag when building inference_lib only.")
+endif()
 add_subdirectory(paddle)
 if(WITH_PYTHON)
    add_subdirectory(python)

--- a/Dockerfile
+++ b/Dockerfile
@@ -22,8 +22,30 @@ ENV HOME /root
 # Add bash enhancements
 COPY ./paddle/scripts/docker/root/ /root/
+# Prepare packages for Python
+RUN apt-get update && \
+    apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \
+    libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \
+    xz-utils tk-dev libffi-dev liblzma-dev
+# Install Python3.6
+RUN mkdir -p /root/python_build/ && wget -q https://www.sqlite.org/2018/sqlite-autoconf-3250300.tar.gz && \
+    tar -zxf sqlite-autoconf-3250300.tar.gz && cd sqlite-autoconf-3250300 && \
+    ./configure -prefix=/usr/local && make -j8 && make install && cd ../ && rm sqlite-autoconf-3250300.tar.gz && \
+    wget -q https://www.python.org/ftp/python/3.6.0/Python-3.6.0.tgz && \
+    tar -xzf Python-3.6.0.tgz && cd Python-3.6.0 && \
+    CFLAGS="-Wformat" ./configure --prefix=/usr/local/ --enable-shared > /dev/null && \
+    make -j8 > /dev/null && make altinstall > /dev/null
+# Install Python3.7
+RUN wget -q https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tgz && \
+    tar -xzf Python-3.7.0.tgz && cd Python-3.7.0 && \
+    CFLAGS="-Wformat" ./configure --prefix=/usr/local/ --enable-shared > /dev/null && \
+    make -j8 > /dev/null && make altinstall > /dev/null
 RUN apt-get update && \
    apt-get install -y --allow-downgrades patchelf \
+    python3 python3-dev python3-pip \
    git python-pip python-dev python-opencv openssh-server bison \
    libnccl2=2.1.2-1+cuda8.0 libnccl-dev=2.1.2-1+cuda8.0 \
    wget unzip unrar tar xz-utils bzip2 gzip coreutils ntp \
@@ -70,24 +92,51 @@ RUN localedef -i en_US -f UTF-8 en_US.UTF-8
 # specify sphinx version as 1.5.6 and remove -U option for [pip install -U
 # sphinx-rtd-theme] since -U option will cause sphinx being updated to newest
 # version(1.7.1 for now), which causes building documentation failed.
-RUN easy_install -U pip && \
+RUN pip3 install -U wheel && \
-    pip install -U wheel && \
+    pip3 install -U docopt PyYAML sphinx==1.5.6 && \
+    pip3 install sphinx-rtd-theme==0.1.9 recommonmark && \
+    pip3.6 install -U wheel && \
+    pip3.6 install -U docopt PyYAML sphinx==1.5.6 && \
+    pip3.6 install sphinx-rtd-theme==0.1.9 recommonmark && \
+    pip3.7 install -U wheel && \
+    pip3.7 install -U docopt PyYAML sphinx==1.5.6 && \
+    pip3.7 install sphinx-rtd-theme==0.1.9 recommonmark && \
+    easy_install -U pip && \
+    pip install -U pip setuptools wheel && \
    pip install -U docopt PyYAML sphinx==1.5.6 && \
    pip install sphinx-rtd-theme==0.1.9 recommonmark
-RUN pip install pre-commit 'ipython==5.3.0' && \
+RUN pip3 install 'pre-commit==1.10.4' 'ipython==5.3.0' && \
+    pip3 install 'ipykernel==4.6.0' 'jupyter==1.0.0' && \
+    pip3 install opencv-python && \
+    pip3.6 install 'pre-commit==1.10.4' 'ipython==5.3.0' && \
+    pip3.6 install 'ipykernel==4.6.0' 'jupyter==1.0.0' && \
+    pip3.6 install opencv-python && \
+    pip3.7 install 'pre-commit==1.10.4' 'ipython==5.3.0' && \
+    pip3.7 install 'ipykernel==4.6.0' 'jupyter==1.0.0' && \
+    pip3.7 install opencv-python && \
+    pip install 'pre-commit==1.10.4' 'ipython==5.3.0' && \
    pip install 'ipykernel==4.6.0' 'jupyter==1.0.0' && \
    pip install opencv-python
 #For docstring checker
+RUN pip3 install pylint pytest astroid isort
+RUN pip3.6 install pylint pytest astroid isort
+RUN pip3.7 install pylint pytest astroid isort
 RUN pip install pylint pytest astroid isort LinkChecker
 COPY ./python/requirements.txt /root/
+RUN pip3 install -r /root/requirements.txt
+RUN pip3.6 install -r /root/requirements.txt
+RUN pip3.7 install -r /root/requirements.txt
 RUN pip install -r /root/requirements.txt
 # To fix https://github.com/PaddlePaddle/Paddle/issues/1954, we use
 # the solution in https://urllib3.readthedocs.io/en/latest/user-guide.html#ssl-py2
 RUN apt-get install -y libssl-dev libffi-dev
+RUN pip3 install certifi urllib3[secure]
+RUN pip3.6 install certifi urllib3[secure]
+RUN pip3.7 install certifi urllib3[secure]
 RUN pip install certifi urllib3[secure]

--- a/README.md
+++ b/README.md
@@ -2,8 +2,8 @@
 [![Build Status](https://travis-ci.org/PaddlePaddle/Paddle.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/Paddle)
-[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://www.paddlepaddle.org/docs/develop/documentation/en/getstarted/index_en.html)
+[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](http://paddlepaddle.org/documentation/docs/en/1.1/getstarted/index_en.html)
-[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](http://www.paddlepaddle.org/docs/develop/documentation/zh/getstarted/index_cn.html)
+[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/index.html)
 [![Release](https://img.shields.io/github/release/PaddlePaddle/Paddle.svg)](https://github.com/PaddlePaddle/Paddle/releases)
 [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
@@ -19,7 +19,7 @@ Our vision is to enable deep learning for everyone via PaddlePaddle.
 Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddle/releases) to track the latest feature of PaddlePaddle.
-### Latest PaddlePaddle Release: [Fluid 0.14.0](https://github.com/PaddlePaddle/Paddle/tree/v0.14.0)
+### Latest PaddlePaddle Release: [Fluid 1.1.0](https://github.com/PaddlePaddle/Paddle/tree/release/1.1)
 ### Install Latest Stable Release:
 ```
 # Linux CPU
@@ -27,9 +27,9 @@ pip install paddlepaddle
 # Linux GPU cuda9cudnn7
 pip install paddlepaddle-gpu
 # Linux GPU cuda8cudnn7
-pip install paddlepaddle-gpu==0.14.0.post87
+pip install paddlepaddle-gpu==1.1.0.post87
 # Linux GPU cuda8cudnn5
-pip install paddlepaddle-gpu==0.14.0.post85
+pip install paddlepaddle-gpu==1.1.0.post85
 # For installation on other platform, refer to http://paddlepaddle.org/
 ```
@@ -76,26 +76,26 @@ pip install paddlepaddle-gpu==0.14.0.post85
 ## Installation
-It is recommended to read [this doc](http://paddlepaddle.org/documentation/docs/zh/0.14.0/new_docs/beginners_guide/install/install_doc.html) on our website.
+It is recommended to read [this doc](http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/index.html) on our website.
 ## Documentation
-We provide [English](http://paddlepaddle.org/documentation/docs/en/0.14.0/getstarted/index_en.html) and
+We provide [English](http://paddlepaddle.org/documentation/docs/en/1.1/getstarted/index_en.html) and
-[Chinese](http://paddlepaddle.org/documentation/docs/zh/0.14.0/new_docs/beginners_guide/index.html) documentation.
+[Chinese](http://paddlepaddle.org/documentation/docs/zh/1.1/beginners_guide/index.html) documentation.
 - [Deep Learning 101](https://github.com/PaddlePaddle/book)
  You might want to start from this online interactive book that can run in a Jupyter Notebook.
- [Distributed Training](http://paddlepaddle.org/documentation/docs/zh/0.14.0/new_docs/user_guides/howto/training/cluster_howto.html)
+- [Distributed Training](http://paddlepaddle.org/documentation/docs/zh/1.1/user_guides/howto/training/cluster_howto.html)
  You can run distributed training jobs on MPI clusters.
- [Python API](http://paddlepaddle.org/documentation/api/zh/0.14.0/fluid.html)
+- [Python API](http://paddlepaddle.org/documentation/api/zh/1.1/fluid.html)
   Our new API enables much shorter programs.
- [How to Contribute](http://paddlepaddle.org/documentation/docs/zh/0.14.0/new_docs/advanced_usage/development/contribute_to_paddle.html)
+- [How to Contribute](http://paddlepaddle.org/documentation/docs/zh/1.1/advanced_usage/development/contribute_to_paddle.html)
   We appreciate your contributions!

--- a/benchmark/fluid/args.py
+++ b/benchmark/fluid/args.py
@@ -137,8 +137,15 @@ def parse_args():
        action='store_true',
        help='If set, keep the random seed and do not shuffle the data.')
    parser.add_argument(
-        '--use_lars',
+        '--reduce_strategy',
+        type=str,
+        choices=['reduce', 'all_reduce'],
+        default='all_reduce',
+        help='Specify the reduce strategy, can be reduce, all_reduce')
+    parser.add_argument(
+        '--fuse_broadcast_op',
        action='store_true',
-        help='If set, use lars for optimizers, ONLY support resnet module.')
+        help='If set, would fuse multiple broadcast operators into one fused_broadcast operator.'
+    )
    args = parser.parse_args()
    return args
--- a/benchmark/fluid/fluid_benchmark.py
+++ b/benchmark/fluid/fluid_benchmark.py
@@ -170,6 +170,15 @@ def train_parallel(train_args, test_args, args, train_prog, test_prog,
    strategy = fluid.ExecutionStrategy()
    strategy.num_threads = args.cpus
    strategy.allow_op_delay = False
+    build_strategy = fluid.BuildStrategy()
+    if args.reduce_strategy == "reduce":
+        build_strategy.reduce_strategy = fluid.BuildStrategy(
+        ).ReduceStrategy.Reduce
+    else:
+        build_strategy.reduce_strategy = fluid.BuildStrategy(
+        ).ReduceStrategy.AllReduce
+    build_strategy.fuse_broadcast_op = args.fuse_broadcast_op
    avg_loss = train_args[0]
    if args.update_method == "pserver":
@@ -184,6 +193,7 @@ def train_parallel(train_args, test_args, args, train_prog, test_prog,
        avg_loss.name,
        main_program=train_prog,
        exec_strategy=strategy,
+        build_strategy=build_strategy,
        num_trainers=num_trainers,
        trainer_id=trainer_id)
@@ -231,7 +241,6 @@ def train_parallel(train_args, test_args, args, train_prog, test_prog,
            if args.use_fake_data or args.use_reader_op:
                try:
                    fetch_ret = exe.run(fetch_list)
                except fluid.core.EOFException as eof:
                    break

--- a/benchmark/fluid/models/mnist.py
+++ b/benchmark/fluid/models/mnist.py
@@ -67,11 +67,14 @@ def cnn_model(data):
 def get_model(args, is_train, main_prog, startup_prog):
    # NOTE: mnist is small, we don't implement data sharding yet.
-    filelist = [
+    opt = None
-        os.path.join(args.data_path, f) for f in os.listdir(args.data_path)
+    data_file_handle = None
-    ]
    with fluid.program_guard(main_prog, startup_prog):
        if args.use_reader_op:
+            filelist = [
+                os.path.join(args.data_path, f)
+                for f in os.listdir(args.data_path)
+            ]
            data_file_handle = fluid.layers.open_files(
                filenames=filelist,
                shapes=[[-1, 1, 28, 28], (-1, 1)],
@@ -100,7 +103,7 @@ def get_model(args, is_train, main_prog, startup_prog):
            if is_train:
                opt = fluid.optimizer.AdamOptimizer(
                    learning_rate=0.001, beta1=0.9, beta2=0.999)
-                opt.minimize()
+                opt.minimize(avg_cost)
                if args.memory_optimize:
                    fluid.memory_optimize(main_prog)

--- a/benchmark/fluid/models/resnet.py
+++ b/benchmark/fluid/models/resnet.py
@@ -20,6 +20,7 @@ import functools
 import numpy as np
 import time
 import os
+import math
 import cProfile, pstats, StringIO
@@ -27,128 +28,120 @@ import paddle
 import paddle.fluid as fluid
 import paddle.fluid.core as core
 import paddle.fluid.profiler as profiler
-# from recordio_converter import imagenet_train, imagenet_test
 from imagenet_reader import train, val
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [30, 60, 90],
+        "steps": [0.1, 0.01, 0.001, 0.0001]
+    }
+}
+class ResNet():
+    def __init__(self, layers=50, is_train=True):
+        self.params = train_parameters
+        self.layers = layers
+        self.is_train = is_train
+    def net(self, input, class_dim=1000):
+        layers = self.layers
+        supported_layers = [50, 101, 152]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+        if layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        num_filters = [64, 128, 256, 512]
+        conv = self.conv_bn_layer(
+            input=input, num_filters=64, filter_size=7, stride=2, act='relu')
+        conv = fluid.layers.pool2d(
+            input=conv,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+        for block in range(len(depth)):
+            for i in range(depth[block]):
+                conv = self.bottleneck_block(
+                    input=conv,
+                    num_filters=num_filters[block],
+                    stride=2 if i == 0 and block != 0 else 1)
+        pool = fluid.layers.pool2d(
+            input=conv, pool_size=7, pool_type='avg', global_pooling=True)
+        stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+        out = fluid.layers.fc(input=pool,
+                              size=class_dim,
+                              act='softmax',
+                              param_attr=fluid.param_attr.ParamAttr(
+                                  initializer=fluid.initializer.Uniform(-stdv,
+                                                                        stdv)))
+        return out
-def conv_bn_layer(input,
+    def conv_bn_layer(self,
-                  ch_out,
+                      input,
+                      num_filters,
                      filter_size,
-                  stride,
+                      stride=1,
-                  padding,
+                      groups=1,
-                  act='relu',
+                      act=None):
-                  is_train=True):
+        conv = fluid.layers.conv2d(
-    conv1 = fluid.layers.conv2d(
            input=input,
+            num_filters=num_filters,
            filter_size=filter_size,
-        num_filters=ch_out,
            stride=stride,
-        padding=padding,
+            padding=(filter_size - 1) // 2,
+            groups=groups,
            act=None,
            bias_attr=False)
-    return fluid.layers.batch_norm(input=conv1, act=act, is_test=not is_train)
+        return fluid.layers.batch_norm(
+            input=conv, act=act, is_test=not self.is_train)
+    def shortcut(self, input, ch_out, stride):
-def shortcut(input, ch_out, stride, is_train=True):
+        ch_in = input.shape[1]
-    ch_in = input.shape[1]  # if args.data_format == 'NCHW' else input.shape[-1]
+        if ch_in != ch_out or stride != 1:
-    if ch_in != ch_out:
+            return self.conv_bn_layer(input, ch_out, 1, stride)
-        return conv_bn_layer(
-            input, ch_out, 1, stride, 0, None, is_train=is_train)
        else:
            return input
+    def bottleneck_block(self, input, num_filters, stride):
+        conv0 = self.conv_bn_layer(
+            input=input, num_filters=num_filters, filter_size=1, act='relu')
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            stride=stride,
+            act='relu')
+        conv2 = self.conv_bn_layer(
+            input=conv1, num_filters=num_filters * 4, filter_size=1, act=None)
-def basicblock(input, ch_out, stride, is_train=True):
+        short = self.shortcut(input, num_filters * 4, stride)
-    short = shortcut(input, ch_out, stride, is_train=is_train)
-    conv1 = conv_bn_layer(input, ch_out, 3, stride, 1, is_train=is_train)
-    conv2 = conv_bn_layer(conv1, ch_out, 3, 1, 1, act=None, is_train=is_train)
-    return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
-def bottleneck(input, ch_out, stride, is_train=True):
-    short = shortcut(input, ch_out * 4, stride, is_train=is_train)
-    conv1 = conv_bn_layer(input, ch_out, 1, stride, 0, is_train=is_train)
-    conv2 = conv_bn_layer(conv1, ch_out, 3, 1, 1, is_train=is_train)
-    conv3 = conv_bn_layer(
-        conv2, ch_out * 4, 1, 1, 0, act=None, is_train=is_train)
-    return fluid.layers.elementwise_add(x=short, y=conv3, act='relu')
-def layer_warp(block_func, input, ch_out, count, stride):
-    res_out = block_func(input, ch_out, stride)
-    for i in range(1, count):
-        res_out = block_func(res_out, ch_out, 1)
-    return res_out
-def resnet_imagenet(input,
-                    class_dim,
-                    depth=50,
-                    data_format='NCHW',
-                    is_train=True):
-    cfg = {
-        18: ([2, 2, 2, 1], basicblock),
-        34: ([3, 4, 6, 3], basicblock),
-        50: ([3, 4, 6, 3], bottleneck),
-        101: ([3, 4, 23, 3], bottleneck),
-        152: ([3, 8, 36, 3], bottleneck)
-    }
-    stages, block_func = cfg[depth]
-    conv1 = conv_bn_layer(input, ch_out=64, filter_size=7, stride=2, padding=3)
-    pool1 = fluid.layers.pool2d(
-        input=conv1, pool_type='avg', pool_size=3, pool_stride=2)
-    res1 = layer_warp(block_func, pool1, 64, stages[0], 1)
-    res2 = layer_warp(block_func, res1, 128, stages[1], 2)
-    res3 = layer_warp(block_func, res2, 256, stages[2], 2)
-    res4 = layer_warp(block_func, res3, 512, stages[3], 2)
-    pool2 = fluid.layers.pool2d(
-        input=res4,
-        pool_size=7,
-        pool_type='avg',
-        pool_stride=1,
-        global_pooling=True)
-    out = fluid.layers.fc(input=pool2, size=class_dim, act='softmax')
-    return out
-def resnet_cifar10(input, class_dim, depth=32, data_format='NCHW'):
-    assert (depth - 2) % 6 == 0
-    n = (depth - 2) // 6
-    conv1 = conv_bn_layer(
+        return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
-        input=input, ch_out=16, filter_size=3, stride=1, padding=1)
-    res1 = layer_warp(basicblock, conv1, 16, n, 1)
-    res2 = layer_warp(basicblock, res1, 32, n, 2)
-    res3 = layer_warp(basicblock, res2, 64, n, 2)
-    pool = fluid.layers.pool2d(
-        input=res3, pool_size=8, pool_type='avg', pool_stride=1)
-    out = fluid.layers.fc(input=pool, size=class_dim, act='softmax')
-    return out
 def _model_reader_dshape_classdim(args, is_train):
-    model = resnet_cifar10
+    model = None
    reader = None
-    if args.data_set == "cifar10":
+    if args.data_set == "flowers":
-        class_dim = 10
-        if args.data_format == 'NCHW':
-            dshape = [3, 32, 32]
-        else:
-            dshape = [32, 32, 3]
-        model = resnet_cifar10
-        if is_train:
-            reader = paddle.dataset.cifar.train10()
-        else:
-            reader = paddle.dataset.cifar.test10()
-    elif args.data_set == "flowers":
        class_dim = 102
        if args.data_format == 'NCHW':
            dshape = [3, 224, 224]
        else:
            dshape = [224, 224, 3]
-        model = resnet_imagenet
        if is_train:
            reader = paddle.dataset.flowers.train()
        else:
@@ -159,7 +152,6 @@ def _model_reader_dshape_classdim(args, is_train):
            dshape = [3, 224, 224]
        else:
            dshape = [224, 224, 3]
-        model = resnet_imagenet
        if not args.data_path:
            raise Exception(
                "Must specify --data_path when training with imagenet")
@@ -173,12 +165,11 @@ def _model_reader_dshape_classdim(args, is_train):
                reader = train(xmap=False)
            else:
                reader = val(xmap=False)
-    return model, reader, dshape, class_dim
+    return reader, dshape, class_dim
 def get_model(args, is_train, main_prog, startup_prog):
-    model, reader, dshape, class_dim = _model_reader_dshape_classdim(args,
+    reader, dshape, class_dim = _model_reader_dshape_classdim(args, is_train)
-                                                                     is_train)
    pyreader = None
    trainer_count = int(os.getenv("PADDLE_TRAINERS"))
@@ -198,7 +189,8 @@ def get_model(args, is_train, main_prog, startup_prog):
                label = fluid.layers.data(
                    name='label', shape=[1], dtype='int64')
-            predict = model(input, class_dim, is_train=is_train)
+            model = ResNet(is_train=is_train)
+            predict = model.net(input, class_dim=class_dim)
            cost = fluid.layers.cross_entropy(input=predict, label=label)
            avg_cost = fluid.layers.mean(x=cost)
@@ -208,23 +200,17 @@ def get_model(args, is_train, main_prog, startup_prog):
            # configure optimize
            optimizer = None
            if is_train:
-                if args.use_lars:
-                    lars_decay = 1.0
-                else:
-                    lars_decay = 0.0
                total_images = 1281167 / trainer_count
-                step = int(total_images / args.batch_size + 1)
+                step = int(total_images / (args.batch_size * args.gpus) + 1)
-                epochs = [30, 60, 80, 90]
+                epochs = [30, 60, 90]
                bd = [step * e for e in epochs]
                base_lr = args.learning_rate
                lr = []
                lr = [base_lr * (0.1**i) for i in range(len(bd) + 1)]
                optimizer = fluid.optimizer.Momentum(
-                    learning_rate=base_lr,
+                    learning_rate=fluid.layers.piecewise_decay(
-                    #learning_rate=fluid.layers.piecewise_decay(
+                        boundaries=bd, values=lr),
-                    #    boundaries=bd, values=lr),
                    momentum=0.9,
                    regularization=fluid.regularizer.L2Decay(1e-4))
                optimizer.minimize(avg_cost)

--- a/benchmark/fluid/models/resnet_with_preprocess.py
+++ b/benchmark/fluid/models/resnet_with_preprocess.py
@@ -224,11 +224,6 @@ def get_model(args, is_train, main_prog, startup_prog):
            # configure optimize
            optimizer = None
            if is_train:
-                if args.use_lars:
-                    lars_decay = 1.0
-                else:
-                    lars_decay = 0.0
                total_images = 1281167 / trainer_count
                step = int(total_images / args.batch_size + 1)

--- a/benchmark/fluid/models/se_resnext.py
+++ b/benchmark/fluid/models/se_resnext.py
@@ -244,11 +244,6 @@ def get_model(args, is_train, main_prog, startup_prog):
            optimizer = None
            if is_train:
-                if args.use_lars:
-                    lars_decay = 1.0
-                else:
-                    lars_decay = 0.0
                total_images = 1281167 / trainer_count
                step = int(total_images / args.batch_size + 1)
@@ -262,8 +257,7 @@ def get_model(args, is_train, main_prog, startup_prog):
                    learning_rate=fluid.layers.piecewise_decay(
                        boundaries=bd, values=lr),
                    momentum=0.9,
-                    regularization=fluid.regularizer.L2Decay(1e-4),
+                    regularization=fluid.regularizer.L2Decay(1e-4))
-                    LARS_weight_decay=lars_decay)
                optimizer.minimize(avg_cost)
                if args.memory_optimize:

--- a/benchmark/fluid/run.sh
+++ b/benchmark/fluid/run.sh
--- a/cmake/cblas.cmake
+++ b/cmake/cblas.cmake
@@ -40,7 +40,7 @@ set(OPENBLAS_LIB_SEARCH_PATHS
        /usr/local/opt/openblas/lib)
 find_path(OPENBLAS_INC_DIR NAMES cblas.h
-  PATHS ${OPENBLAS_INCLUDE_SEARCH_PATHS})
+  PATHS ${OPENBLAS_INCLUDE_SEARCH_PATHS} NO_DEFAULT_PATH)
 find_path(OPENBLAS_LAPACKE_INC_DIR NAMES lapacke.h
  PATHS ${OPENBLAS_INCLUDE_SEARCH_PATHS})
 find_library(OPENBLAS_LIB NAMES openblas

--- a/cmake/configure.cmake
+++ b/cmake/configure.cmake
@@ -50,11 +50,7 @@ if(NOT WITH_PROFILER)
 endif(NOT WITH_PROFILER)
 if(NOT CMAKE_CROSSCOMPILING)
-    if(WITH_AVX AND AVX512F_FOUND)
+    if(WITH_AVX AND AVX_FOUND)
-        set(SIMD_FLAG ${AVX512F_FLAG})
-    elseif(WITH_AVX AND AVX2_FOUND)
-        set(SIMD_FLAG ${AVX2_FLAG})
-    elseif(WITH_AVX AND AVX_FOUND)
        set(SIMD_FLAG ${AVX_FLAG})
    elseif(SSE3_FOUND)
        set(SIMD_FLAG ${SSE3_FLAG})
@@ -62,8 +58,26 @@ if(NOT CMAKE_CROSSCOMPILING)
 endif()
 if(WIN32)
-  # windows stupid compile option for all targets.
+  # windows header option for all targets.
  add_definitions(-D_XKEYCHECK_H)
+  # Use symbols instead of absolute path, reduce the cmake link command length. 
+  SET(CMAKE_C_USE_RESPONSE_FILE_FOR_LIBRARIES 1)
+  SET(CMAKE_CXX_USE_RESPONSE_FILE_FOR_LIBRARIES 1)
+  SET(CMAKE_C_USE_RESPONSE_FILE_FOR_OBJECTS 1)
+  SET(CMAKE_CXX_USE_RESPONSE_FILE_FOR_OBJECTS 1)
+  SET(CMAKE_C_USE_RESPONSE_FILE_FOR_INCLUDES 1)
+  SET(CMAKE_CXX_USE_RESPONSE_FILE_FOR_INCLUDES 1)
+  SET(CMAKE_C_RESPONSE_FILE_LINK_FLAG "@")
+  SET(CMAKE_CXX_RESPONSE_FILE_LINK_FLAG "@")
+  # Specify the program to use when building static libraries
+  SET(CMAKE_C_CREATE_STATIC_LIBRARY "<CMAKE_AR> lib <TARGET> <LINK_FLAGS> <OBJECTS>")
+  SET(CMAKE_CXX_CREATE_STATIC_LIBRARY "<CMAKE_AR> lib <TARGET> <LINK_FLAGS> <OBJECTS>")
+  # set defination for the dll export
+  if (NOT MSVC)
+    message(FATAL "Windows build only support msvc. Which was binded by the nvcc compiler of NVIDIA.")
+  endif(NOT MSVC)
 endif(WIN32)
 if(NOT WITH_GOLANG)
@@ -204,3 +218,7 @@ endif(WITH_GRPC)
 if(WITH_BRPC_RDMA)
    add_definitions(-DPADDLE_WITH_BRPC_RDMA)
 endif(WITH_BRPC_RDMA)
+if(ON_INFER)
+    add_definitions(-DPADDLE_ON_INFERENCE)
+endif(ON_INFER)
--- a/cmake/cuda.cmake
+++ b/cmake/cuda.cmake
@@ -157,6 +157,9 @@ list(APPEND EXTERNAL_LIBS ${CUDA_LIBRARIES} ${CUDA_rt_LIBRARY})
 if(NOT WITH_DSO)
    # TODO(panyx0718): CUPTI only allows DSO?
    list(APPEND EXTERNAL_LIBS ${CUDNN_LIBRARY} ${CUPTI_LIBRARY} ${CUDA_CUBLAS_LIBRARIES} ${CUDA_curand_LIBRARY} ${NCCL_LIBRARY})
+    if(WIN32)
+      set_property(GLOBAL PROPERTY CUDA_MODULES ${CUDNN_LIBRARY} ${CUDA_CUBLAS_LIBRARIES} ${CUDA_curand_LIBRARY})
+    endif(WIN32)
 endif(NOT WITH_DSO)
 # setting nvcc arch flags
@@ -175,7 +178,10 @@ list(APPEND CUDA_NVCC_FLAGS "-std=c++11")
 list(APPEND CUDA_NVCC_FLAGS "-Xcompiler -fPIC")
 endif(NOT WIN32)
-list(APPEND CUDA_NVCC_FLAGS "--use_fast_math")
+if(WITH_FAST_MATH)
+  # Make use of fast math library. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html
+  list(APPEND CUDA_NVCC_FLAGS "--use_fast_math")
+endif()
 # in cuda9, suppress cuda warning on eigen 
 list(APPEND CUDA_NVCC_FLAGS "-w")
 # Set :expt-relaxed-constexpr to suppress Eigen warnings
@@ -193,10 +199,12 @@ elseif(CMAKE_BUILD_TYPE  STREQUAL "MinSizeRel")
    list(APPEND CUDA_NVCC_FLAGS  ${CMAKE_CXX_FLAGS_RELEASE})
 endif()
 else(NOT WIN32)
-if(CMAKE_BUILD_TYPE STREQUAL "Release")
+if(CMAKE_BUILD_TYPE  STREQUAL "Debug")
+    list(APPEND CUDA_NVCC_FLAGS  "-g -G")
+elseif(CMAKE_BUILD_TYPE STREQUAL "Release")
  list(APPEND CUDA_NVCC_FLAGS "-O3 -DNDEBUG")
 else()
-  message(FATAL "Windows only support Release build now. Please set visual studio build type to Release, x64 build.")
+  message(FATAL "Windows only support Release or Debug build now. Please set visual studio build type to Release/Debug, x64 build.")
 endif()
 endif(NOT WIN32)

--- a/cmake/cudnn.cmake
+++ b/cmake/cudnn.cmake
@@ -2,7 +2,12 @@ if(NOT WITH_GPU)
    return()
 endif()
-set(CUDNN_ROOT "/usr" CACHE PATH "CUDNN ROOT")
+if(WIN32)
+    set(CUDNN_ROOT ${CUDA_TOOLKIT_ROOT_DIR})
+else(WIN32)
+    set(CUDNN_ROOT "/usr" CACHE PATH "CUDNN ROOT")
+endif(WIN32)
 find_path(CUDNN_INCLUDE_DIR cudnn.h
    PATHS ${CUDNN_ROOT} ${CUDNN_ROOT}/include
    $ENV{CUDNN_ROOT} $ENV{CUDNN_ROOT}/include ${CUDA_TOOLKIT_INCLUDE}

--- a/cmake/external/anakin.cmake
+++ b/cmake/external/anakin.cmake
@@ -52,24 +52,27 @@ ExternalProject_Add(
    PREFIX              ${ANAKIN_SOURCE_DIR}
    UPDATE_COMMAND      ""
    CMAKE_ARGS          ${CMAKE_ARGS_PREFIX}
+                        -DUSE_LOGGER=YES
                        -DUSE_X86_PLACE=YES
                        -DBUILD_WITH_UNIT_TEST=NO
                        -DPROTOBUF_ROOT=${THIRD_PARTY_PATH}/install/protobuf
                        -DMKLML_ROOT=${THIRD_PARTY_PATH}/install/mklml
                        -DENABLE_OP_TIMER=${ANAKIN_ENABLE_OP_TIMER}
+                        -DBUILD_FAT_BIN=${ANAKIN_BUILD_FAT_BIN}
+                        -DBUILD_CROSS_PLANTFORM=${ANAKIN_BUILD_CROSS_PLANTFORM}
                        ${EXTERNAL_OPTIONAL_ARGS}
    CMAKE_CACHE_ARGS    -DCMAKE_INSTALL_PREFIX:PATH=${ANAKIN_INSTALL_DIR}
 )
 message(STATUS "Anakin for inference is enabled")
 message(STATUS "Anakin is set INCLUDE:${ANAKIN_INCLUDE} LIBRARY:${ANAKIN_LIBRARY}")
+add_dependencies(extern_anakin protobuf mklml)
 add_library(anakin_shared SHARED IMPORTED GLOBAL)
 set_property(TARGET anakin_shared PROPERTY IMPORTED_LOCATION ${ANAKIN_SHARED_LIB})
-add_dependencies(anakin_shared extern_anakin protobuf mklml)
+add_dependencies(anakin_shared extern_anakin)
 add_library(anakin_saber SHARED IMPORTED GLOBAL)
 set_property(TARGET anakin_saber PROPERTY IMPORTED_LOCATION ${ANAKIN_SABER_LIB})
-add_dependencies(anakin_saber extern_anakin protobuf mklml)
+add_dependencies(anakin_saber extern_anakin)
 list(APPEND external_project_dependencies anakin_shared anakin_saber)
--- a/cmake/external/boost.cmake
+++ b/cmake/external/boost.cmake
@@ -28,34 +28,28 @@ if((NOT DEFINED BOOST_TAR) OR (NOT DEFINED BOOST_URL))
    set(BOOST_TAR "boost_1_41_0" CACHE STRING "" FORCE)
    set(BOOST_URL "http://paddlepaddledeps.cdn.bcebos.com/${BOOST_TAR}.tar.gz" CACHE STRING "" FORCE)
 endif()
-IF (WIN32)
-    MESSAGE(WARNING, "In windows, boost can not be downloaded automaticlly, please build it manually and put it at " ${THIRD_PARTY_PATH}install/boost)
+MESSAGE(STATUS "BOOST_TAR: ${BOOST_TAR}, BOOST_URL: ${BOOST_URL}")
-else()
-    MESSAGE(STATUS "BOOST_TAR: ${BOOST_TAR}, BOOST_URL: ${BOOST_URL}")
-ENDIF(WIN32)
 set(BOOST_SOURCES_DIR ${THIRD_PARTY_PATH}/boost)
 set(BOOST_DOWNLOAD_DIR  "${BOOST_SOURCES_DIR}/src/${BOOST_PROJECT}")
-set(BOOST_INCLUDE_DIR "${BOOST_DOWNLOAD_DIR}/${BOOST_TAR}" CACHE PATH "boost include directory." FORCE)
-set_directory_properties(PROPERTIES CLEAN_NO_CUSTOM 1)
+set(BOOST_INCLUDE_DIR "${BOOST_DOWNLOAD_DIR}" CACHE PATH "boost include directory." FORCE)
+set_directory_properties(PROPERTIES CLEAN_NO_CUSTOM 1)
 include_directories(${BOOST_INCLUDE_DIR})
-if (NOT WIN32)
 ExternalProject_Add(
    ${BOOST_PROJECT}
    ${EXTERNAL_PROJECT_LOG_ARGS}
    DOWNLOAD_DIR          ${BOOST_DOWNLOAD_DIR}
-    DOWNLOAD_COMMAND      wget --no-check-certificate ${BOOST_URL} -c -q -O ${BOOST_TAR}.tar.gz
+    URL      ${BOOST_URL}
-    && tar zxf ${BOOST_TAR}.tar.gz
    DOWNLOAD_NO_PROGRESS  1
    PREFIX                ${BOOST_SOURCES_DIR}
    CONFIGURE_COMMAND     ""
    BUILD_COMMAND         ""
    INSTALL_COMMAND       ""
    UPDATE_COMMAND        ""
-)
+    )
-endif(NOT WIN32)
 if (${CMAKE_VERSION} VERSION_LESS "3.3.0" OR NOT WIN32)
    set(dummyfile ${CMAKE_CURRENT_BINARY_DIR}/boost_dummy.c)

--- a/cmake/external/dlpack.cmake
+++ b/cmake/external/dlpack.cmake
+include(ExternalProject)
+set(DLPACK_SOURCE_DIR ${THIRD_PARTY_PATH}/dlpack)
+set(DLPACK_INCLUDE_DIR ${DLPACK_SOURCE_DIR}/src/extern_dlpack/include)
+include_directories(${DLPACK_INCLUDE_DIR})
+ExternalProject_Add(
+  extern_dlpack
+  ${EXTERNAL_PROJECT_LOG_ARGS}
+  GIT_REPOSITORY "https://github.com/dmlc/dlpack.git"
+  GIT_TAG        "v0.2"
+  PREFIX         ${DLPACK_SOURCE_DIR}
+  UPDATE_COMMAND ""
+  CONFIGURE_COMMAND ""
+  BUILD_COMMAND     ""
+  INSTALL_COMMAND   ""
+  TEST_COMMAND      ""
+)
+if(${CMAKE_VERSION} VERSION_LESS "3.3.0")
+  set(dummyfile ${CMAKE_CURRENT_BINARY_DIR}/dlpack_dummy.c)
+  file(WRITE ${dummyfile} "const char *dummy = \"${dummyfile}\";")
+  add_library(dlpack STATIC ${dummyfile})
+else()
+  add_library(dlpack INTERFACE)
+endif()
+add_dependencies(dlpack extern_dlpack)
+LIST(APPEND externl_project_dependencies dlpack)
--- a/cmake/external/eigen.cmake
+++ b/cmake/external/eigen.cmake
@@ -3,13 +3,21 @@ INCLUDE(ExternalProject)
 SET(EIGEN_SOURCE_DIR ${THIRD_PARTY_PATH}/eigen3)
 SET(EIGEN_INCLUDE_DIR ${EIGEN_SOURCE_DIR}/src/extern_eigen3)
 INCLUDE_DIRECTORIES(${EIGEN_INCLUDE_DIR})
+if(NOT WITH_FAST_MATH)
+  # EIGEN_FAST_MATH: https://eigen.tuxfamily.org/dox/TopicPreprocessorDirectives.html
+  # enables some optimizations which might affect the accuracy of the result. 
+  # This currently enables the SSE vectorization of sin() and cos(), 
+  # and speedups sqrt() for single precision.
+  # Defined to 1 by default. Define it to 0 to disable.
+  add_definitions(-DEIGEN_FAST_MATH=0)
+endif()
 if(WITH_AMD_GPU)
    ExternalProject_Add(
        extern_eigen3
        ${EXTERNAL_PROJECT_LOG_ARGS}
        GIT_REPOSITORY  "https://github.com/sabreshao/hipeigen.git"
-        GIT_TAG         0cba03ff9f8f9f70bbd92ac5857b031aa8fed6f9
+        GIT_TAG         7cb2b6e5a4b4a1efe658abb215cd866c6fb2275e
        PREFIX          ${EIGEN_SOURCE_DIR}
        UPDATE_COMMAND  ""
        CONFIGURE_COMMAND ""

--- a/cmake/external/gflags.cmake
+++ b/cmake/external/gflags.cmake
@@ -35,7 +35,12 @@ ExternalProject_Add(
    CMAKE_ARGS      -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
                    -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
                    -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
+                    -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
+                    -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
                    -DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
+                    -DCMAKE_C_FLAGS_DEBUG=${CMAKE_C_FLAGS_DEBUG}
+                    -DCMAKE_C_FLAGS_RELEASE=${CMAKE_C_FLAGS_RELEASE}
+                    -DBUILD_STATIC_LIBS=ON
                    -DCMAKE_INSTALL_PREFIX=${GFLAGS_INSTALL_DIR}
                    -DCMAKE_POSITION_INDEPENDENT_CODE=ON
                    -DBUILD_TESTING=OFF
@@ -48,7 +53,7 @@ ExternalProject_Add(
 IF(WIN32)
  IF(NOT EXISTS "${GFLAGS_INSTALL_DIR}/lib/libgflags.lib")
    add_custom_command(TARGET extern_gflags POST_BUILD
-    COMMAND cmake -E rename ${GFLAGS_INSTALL_DIR}/lib/gflags_static.lib ${GFLAGS_INSTALL_DIR}/lib/libgflags.lib
+            COMMAND cmake -E copy ${GFLAGS_INSTALL_DIR}/lib/gflags_static.lib ${GFLAGS_INSTALL_DIR}/lib/libgflags.lib
            )
  ENDIF()
 ENDIF(WIN32)

--- a/cmake/external/glog.cmake
+++ b/cmake/external/glog.cmake
@@ -46,7 +46,11 @@ ExternalProject_Add(
    CMAKE_ARGS      -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
                    -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
                    -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
+                    -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
+                    -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
                    -DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
+                    -DCMAKE_C_FLAGS_DEBUG=${CMAKE_C_FLAGS_DEBUG}
+                    -DCMAKE_C_FLAGS_RELEASE=${CMAKE_C_FLAGS_RELEASE}
                    -DCMAKE_INSTALL_PREFIX=${GLOG_INSTALL_DIR}
                    -DCMAKE_INSTALL_LIBDIR=${GLOG_INSTALL_DIR}/lib
                    -DCMAKE_POSITION_INDEPENDENT_CODE=ON
@@ -63,7 +67,7 @@ ExternalProject_Add(
 IF(WIN32)
  IF(NOT EXISTS "${GLOG_INSTALL_DIR}/lib/libglog.lib")
    add_custom_command(TARGET extern_glog POST_BUILD
-    COMMAND cmake -E rename ${GLOG_INSTALL_DIR}/lib/glog.lib ${GLOG_INSTALL_DIR}/lib/libglog.lib
+    COMMAND cmake -E copy ${GLOG_INSTALL_DIR}/lib/glog.lib ${GLOG_INSTALL_DIR}/lib/libglog.lib
  )
  ENDIF()
 ENDIF(WIN32)

--- a/cmake/external/gtest.cmake
+++ b/cmake/external/gtest.cmake
@@ -50,7 +50,11 @@ IF(WITH_TESTING)
        CMAKE_ARGS      -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
                        -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
                        -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
+                        -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
+                        -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
                        -DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
+                        -DCMAKE_C_FLAGS_DEBUG=${CMAKE_C_FLAGS_DEBUG}
+                        -DCMAKE_C_FLAGS_RELEASE=${CMAKE_C_FLAGS_RELEASE}
                        -DCMAKE_INSTALL_PREFIX=${GTEST_INSTALL_DIR}
                        -DCMAKE_POSITION_INDEPENDENT_CODE=ON
                        -DBUILD_GMOCK=ON

--- a/cmake/external/mkldnn.cmake
+++ b/cmake/external/mkldnn.cmake
@@ -37,7 +37,6 @@ SET(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)
 SET(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_RPATH}" "${MKLDNN_INSTALL_DIR}/lib")
 INCLUDE_DIRECTORIES(${MKLDNN_INC_DIR}) # For MKLDNN code to include internal headers.
-INCLUDE_DIRECTORIES(${THIRD_PARTY_PATH}/install) # For Paddle code to include mkldnn.h
 IF(${CBLAS_PROVIDER} STREQUAL "MKLML")
    SET(MKLDNN_DEPENDS   ${MKLML_PROJECT})
@@ -45,7 +44,7 @@ IF(${CBLAS_PROVIDER} STREQUAL "MKLML")
 ELSE()
    MESSAGE(FATAL_ERROR "Should enable MKLML when build MKLDNN")
 ENDIF()
-SET(MKLDNN_FLAG "-Wno-error=strict-overflow -Wno-error=unused-result")
+SET(MKLDNN_FLAG "-Wno-error=strict-overflow -Wno-error=unused-result -Wno-error=array-bounds")
 SET(MKLDNN_FLAG "${MKLDNN_FLAG} -Wno-unused-result -Wno-unused-value")
 SET(MKLDNN_CFLAG "${CMAKE_C_FLAGS} ${MKLDNN_FLAG}")
 SET(MKLDNN_CXXFLAG "${CMAKE_CXX_FLAGS} ${MKLDNN_FLAG}")
@@ -54,7 +53,7 @@ ExternalProject_Add(
    ${EXTERNAL_PROJECT_LOG_ARGS}
    DEPENDS             ${MKLDNN_DEPENDS}
    GIT_REPOSITORY      "https://github.com/01org/mkl-dnn.git"
-    GIT_TAG             "64e03a1939e0d526aa8e9f2e3f7dc0ad8d372944"
+    GIT_TAG             "830a10059a018cd2634d94195140cf2d8790a75a"
    PREFIX              ${MKLDNN_SOURCES_DIR}
    UPDATE_COMMAND      ""
    CMAKE_ARGS          -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}

--- a/cmake/external/mklml.cmake
+++ b/cmake/external/mklml.cmake
@@ -29,7 +29,7 @@ INCLUDE(ExternalProject)
 SET(MKLML_PROJECT       "extern_mklml")
 IF((NOT DEFINED MKLML_VER) OR (NOT DEFINED MKLML_URL))
  MESSAGE(STATUS "use pre defined download url")
-  SET(MKLML_VER "mklml_lnx_2018.0.3.20180406" CACHE STRING "" FORCE)
+  SET(MKLML_VER "mklml_lnx_2019.0.20180710" CACHE STRING "" FORCE)
  SET(MKLML_URL "http://paddlepaddledeps.cdn.bcebos.com/${MKLML_VER}.tgz" CACHE STRING "" FORCE)
 ENDIF()
 MESSAGE(STATUS "MKLML_VER: ${MKLML_VER}, MKLML_URL: ${MKLML_URL}")

--- a/cmake/external/ngraph.cmake
+++ b/cmake/external/ngraph.cmake
+# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+add_library(ngraph INTERFACE)
+IF(WIN32 OR APPLE)
+    MESSAGE(WARNING
+        "Windows or Mac is not supported with nGraph in Paddle yet."
+        "Force WITH_NGRAPH=OFF")
+    SET(WITH_NGRAPH OFF CACHE STRING "Disable nGraph in Windows and MacOS" FORCE)
+ENDIF()
+IF(${WITH_NGRAPH} AND NOT ${WITH_MKLDNN})
+    MESSAGE(WARNING
+        "nGraph needs mkl-dnn to be enabled."
+        "Force WITH_NGRAPH=OFF")
+    SET(WITH_NGRAPH OFF CACHE STRING "Disable nGraph if mkl-dnn is disabled" FORCE)
+ENDIF()
+IF(NOT ${WITH_NGRAPH})
+    return()
+ENDIF()
+INCLUDE(ExternalProject)
+SET(NGRAPH_PROJECT         "extern_ngraph")
+SET(NGRAPH_VERSION         "0.9")
+SET(NGRAPH_GIT_TAG         "f9fd9d4cc318dc59dd4b68448e7fbb5f67a28bd0")
+SET(NGRAPH_SOURCES_DIR     ${THIRD_PARTY_PATH}/ngraph)
+SET(NGRAPH_INSTALL_DIR     ${THIRD_PARTY_PATH}/install/ngraph)
+SET(NGRAPH_INC_DIR         ${NGRAPH_INSTALL_DIR}/include)
+SET(NGRAPH_SHARED_LIB_NAME libngraph.so.${NGRAPH_VERSION})
+SET(NGRAPH_CPU_LIB_NAME    libcpu_backend.so)
+SET(NGRAPH_TBB_LIB_NAME    libtbb.so.2)
+SET(NGRAPH_GIT_REPO        "https://github.com/NervanaSystems/ngraph.git")
+ExternalProject_Add(
+    ${NGRAPH_PROJECT}
+    ${EXTERNAL_PROJECT_LOG_ARGS}
+    DEPENDS             ${MKLDNN_PROJECT} ${MKLML_PROJECT}
+    GIT_REPOSITORY      ${NGRAPH_GIT_REPO}
+    GIT_TAG             ${NGRAPH_GIT_TAG}
+    PREFIX              ${NGRAPH_SOURCES_DIR}
+    UPDATE_COMMAND      ""
+    CMAKE_ARGS          -DCMAKE_INSTALL_PREFIX=${NGRAPH_INSTALL_DIR}
+    CMAKE_ARGS          -DNGRAPH_UNIT_TEST_ENABLE=FALSE
+    CMAKE_ARGS          -DNGRAPH_TOOLS_ENABLE=FALSE
+    CMAKE_ARGS          -DNGRAPH_INTERPRETER_ENABLE=FALSE
+    CMAKE_ARGS          -DNGRAPH_DEX_ONLY=TRUE
+    CMAKE_ARGS          -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
+    CMAKE_ARGS          -DMKLDNN_INCLUDE_DIR=${MKLDNN_INC_DIR}
+    CMAKE_ARGS          -DMKLDNN_LIB_DIR=${MKLDNN_INSTALL_DIR}/lib
+)
+if(UNIX AND NOT APPLE)
+    include(GNUInstallDirs)
+    SET(NGRAPH_LIB_DIR ${NGRAPH_INSTALL_DIR}/${CMAKE_INSTALL_LIBDIR})
+else()
+    SET(NGRAPH_LIB_DIR ${NGRAPH_INSTALL_DIR}/lib)
+endif()
+MESSAGE(STATUS "nGraph lib will be installed at: ${NGRAPH_LIB_DIR}")
+SET(NGRAPH_SHARED_LIB      ${NGRAPH_LIB_DIR}/${NGRAPH_SHARED_LIB_NAME})
+SET(NGRAPH_CPU_LIB         ${NGRAPH_LIB_DIR}/${NGRAPH_CPU_LIB_NAME})
+SET(NGRAPH_TBB_LIB         ${NGRAPH_LIB_DIR}/${NGRAPH_TBB_LIB_NAME})
+# Workaround for nGraph expecting mklml to be in mkldnn install directory.
+ExternalProject_Add_Step(
+    ${NGRAPH_PROJECT}
+    PrepareMKL
+    COMMAND ${CMAKE_COMMAND} -E create_symlink ${MKLML_LIB} ${MKLDNN_INSTALL_DIR}/lib/libmklml_intel.so
+    COMMAND ${CMAKE_COMMAND} -E create_symlink ${MKLML_IOMP_LIB} ${MKLDNN_INSTALL_DIR}/lib/libiomp5.so
+    DEPENDEES download
+    DEPENDERS configure
+)
+add_dependencies(ngraph ${NGRAPH_PROJECT})
+target_compile_definitions(ngraph INTERFACE -DPADDLE_WITH_NGRAPH)
+target_include_directories(ngraph INTERFACE ${NGRAPH_INC_DIR})
+target_link_libraries(ngraph INTERFACE ${NGRAPH_SHARED_LIB})
+LIST(APPEND external_project_dependencies ngraph)
--- a/cmake/external/openblas.cmake
+++ b/cmake/external/openblas.cmake
@@ -17,23 +17,20 @@ IF(USE_EIGEN_FOR_BLAS)
 ENDIF(USE_EIGEN_FOR_BLAS)
 INCLUDE(cblas)
-# IF(WIN32 AND NOT ${CBLAS_FOUND})
 IF(NOT ${CBLAS_FOUND})
    INCLUDE(ExternalProject)
    SET(CBLAS_SOURCES_DIR ${THIRD_PARTY_PATH}/openblas)
    SET(CBLAS_INSTALL_DIR ${THIRD_PARTY_PATH}/install/openblas)
-    SET(CBLAS_INCLUDE_DIR "${CBLAS_INSTALL_DIR}/include" CACHE PATH "openblas include directory." FORCE)
+    SET(CBLAS_INC_DIR "${CBLAS_INSTALL_DIR}/include" CACHE PATH "openblas include directory." FORCE)
    SET(CBLAS_LIBRARIES
        "${CBLAS_INSTALL_DIR}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}openblas${CMAKE_STATIC_LIBRARY_SUFFIX}"
        CACHE FILEPATH "openblas library." FORCE)
    ADD_DEFINITIONS(-DPADDLE_USE_OPENBLAS)
    IF (WIN32)
        SET(CBLAS_FOUND true)
        MESSAGE(WARNING, "In windows, openblas only support msvc build, please build it manually and put it at " ${CBLAS_INSTALL_DIR})
@@ -96,7 +93,7 @@ IF(NOT ${CBLAS_FOUND})
    ENDIF(NOT WIN32)
    SET(CBLAS_PROVIDER openblas)
    IF(WITH_C_API)
-        INSTALL(DIRECTORY ${CBLAS_INCLUDE_DIR} DESTINATION third_party/openblas)
+        INSTALL(DIRECTORY ${CBLAS_INC_DIR} DESTINATION third_party/openblas)
        # Because libopenblas.a is a symbolic link of another library, thus need to
        # install the whole directory.
        IF(ANDROID)
@@ -117,8 +114,8 @@ IF(NOT ${CBLAS_FOUND})
 ENDIF(NOT ${CBLAS_FOUND})
 MESSAGE(STATUS "BLAS library: ${CBLAS_LIBRARIES}")
-MESSAGE(STATUS "BLAS Include: ${CBLAS_INCLUDE_DIR}")
+MESSAGE(STATUS "BLAS Include: ${CBLAS_INC_DIR}")
-INCLUDE_DIRECTORIES(${CBLAS_INCLUDE_DIR})
+INCLUDE_DIRECTORIES(${CBLAS_INC_DIR})
 # FIXME(gangliao): generate cblas target to track all high performance
 # linear algebra libraries for cc_library(xxx SRCS xxx.c DEPS cblas)

--- a/cmake/external/protobuf.cmake
+++ b/cmake/external/protobuf.cmake
@@ -30,9 +30,7 @@ UNSET_VAR(PROTOBUF_LITE_LIBRARY)
 UNSET_VAR(PROTOBUF_LIBRARY)
 UNSET_VAR(PROTOBUF_INCLUDE_DIR)
 UNSET_VAR(Protobuf_PROTOC_EXECUTABLE)
+function(protobuf_generate_python SRCS)
-if(NOT COMMAND protobuf_generate_python)  # before cmake 3.4, protobuf_genrerate_python is not defined.
-    function(protobuf_generate_python SRCS)
    # shameless copy from https://github.com/Kitware/CMake/blob/master/Modules/FindProtobuf.cmake
    if(NOT ARGN)
        message(SEND_ERROR "Error: PROTOBUF_GENERATE_PYTHON() called without any proto files")
@@ -52,7 +50,6 @@ if(NOT COMMAND protobuf_generate_python)  # before cmake 3.4, protobuf_genrerate
    else()
        set(_protobuf_include_path -I ${CMAKE_CURRENT_SOURCE_DIR})
    endif()
    if(DEFINED PROTOBUF_IMPORT_DIRS AND NOT DEFINED Protobuf_IMPORT_DIRS)
        set(Protobuf_IMPORT_DIRS "${PROTOBUF_IMPORT_DIRS}")
    endif()
@@ -77,19 +74,17 @@ if(NOT COMMAND protobuf_generate_python)  # before cmake 3.4, protobuf_genrerate
                set(FIL_WE "${FIL_DIR}/${FIL_WE}")
            endif()
        endif()
        list(APPEND ${SRCS} "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2.py")
        add_custom_command(
                OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/${FIL_WE}_pb2.py"
-                    COMMAND  ${Protobuf_PROTOC_EXECUTABLE} --python_out ${CMAKE_CURRENT_BINARY_DIR} ${_protobuf_include_path} ${ABS_FIL}
+                COMMAND  ${PROTOBUF_PROTOC_EXECUTABLE} --python_out ${CMAKE_CURRENT_BINARY_DIR} ${_protobuf_include_path} ${ABS_FIL}
-                    DEPENDS ${ABS_FIL} ${Protobuf_PROTOC_EXECUTABLE}
+                DEPENDS ${ABS_FIL} ${PROTOBUF_PROTOC_EXECUTABLE}
                COMMENT "Running Python protocol buffer compiler on ${FIL}"
                VERBATIM )
    endforeach()
    set(${SRCS} ${${SRCS}} PARENT_SCOPE)
-    endfunction()
+endfunction()
-endif()
 # Print and set the protobuf library information,
 # finish this cmake process and exit from this file.
@@ -126,6 +121,7 @@ macro(PROMPT_PROTOBUF_LIB)
    # FIND_Protobuf.cmake uses `Protobuf_PROTOC_EXECUTABLE`.
    # make `protobuf_generate_cpp` happy.
    SET(Protobuf_PROTOC_EXECUTABLE ${PROTOBUF_PROTOC_EXECUTABLE})
    FOREACH(dep ${protobuf_DEPS})
        ADD_DEPENDENCIES(protobuf ${dep})
        ADD_DEPENDENCIES(protobuf_lite ${dep})
@@ -144,7 +140,6 @@ endmacro()
 set(PROTOBUF_ROOT "" CACHE PATH "Folder contains protobuf")
 IF (WIN32)
    SET(PROTOBUF_ROOT ${THIRD_PARTY_PATH}/install/protobuf)
-    MESSAGE(WARNING, "In windows, protobuf only support msvc build, please build it manually and put it at " ${PROTOBUF_ROOT})
 ENDIF(WIN32)
 if (NOT "${PROTOBUF_ROOT}" STREQUAL "")
@@ -192,13 +187,20 @@ FUNCTION(build_protobuf TARGET_NAME BUILD_FOR_HOST)
        SET(OPTIONAL_ARGS
            "-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}"
            "-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}"
-            "-DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}"
            "-DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}"
+            "-DCMAKE_C_FLAGS_DEBUG=${CMAKE_C_FLAGS_DEBUG}"
+            "-DCMAKE_C_FLAGS_RELEASE=${CMAKE_C_FLAGS_RELEASE}"
+            "-DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}"
+            "-DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}"
+            "-DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}"
            "-Dprotobuf_WITH_ZLIB=ON"
            "-DZLIB_ROOT:FILEPATH=${ZLIB_ROOT}"
            ${EXTERNAL_OPTIONAL_ARGS})
        SET(OPTIONAL_CACHE_ARGS "-DZLIB_ROOT:STRING=${ZLIB_ROOT}")
    ENDIF()
+    IF(WIN32)
+        SET(OPTIONAL_ARGS ${OPTIONAL_ARGS} "-DCMAKE_GENERATOR_PLATFORM=x64")
+    ENDIF()
    SET(PROTOBUF_REPO "https://github.com/google/protobuf.git")
    SET(PROTOBUF_TAG "9f75c5aa851cd877fb0d93ccc31b8567a6706546")

--- a/cmake/external/python.cmake
+++ b/cmake/external/python.cmake
@@ -21,6 +21,48 @@ INCLUDE(python_module)
 FIND_PACKAGE(PythonInterp ${PY_VERSION})
 FIND_PACKAGE(PythonLibs ${PY_VERSION})
+if(WIN32)
+    execute_process(COMMAND "${PYTHON_EXECUTABLE}" "-c"
+"from distutils import sysconfig as s;import sys;import struct;
+print(sys.prefix);
+print(s.get_config_var('LDVERSION') or s.get_config_var('VERSION'));
+"
+            RESULT_VARIABLE _PYTHON_SUCCESS
+            OUTPUT_VARIABLE _PYTHON_VALUES
+            ERROR_VARIABLE _PYTHON_ERROR_VALUE)
+    if(NOT _PYTHON_SUCCESS MATCHES 0)
+        set(PYTHONLIBS_FOUND FALSE)
+        return()
+    endif()
+    # Convert the process output into a list
+    string(REGEX REPLACE ";" "\\\\;" _PYTHON_VALUES ${_PYTHON_VALUES})
+    string(REGEX REPLACE "\n" ";" _PYTHON_VALUES ${_PYTHON_VALUES})
+    list(GET _PYTHON_VALUES 0 PYTHON_PREFIX)
+    list(GET _PYTHON_VALUES 1 PYTHON_LIBRARY_SUFFIX)
+    # Make sure all directory separators are '/'
+    string(REGEX REPLACE "\\\\" "/" PYTHON_PREFIX ${PYTHON_PREFIX})
+    set(PYTHON_LIBRARY
+            "${PYTHON_PREFIX}/libs/Python${PYTHON_LIBRARY_SUFFIX}.lib")
+    # when run in a venv, PYTHON_PREFIX points to it. But the libraries remain in the
+    # original python installation. They may be found relative to PYTHON_INCLUDE_DIR.
+    if(NOT EXISTS "${PYTHON_LIBRARY}")
+        get_filename_component(_PYTHON_ROOT ${PYTHON_INCLUDE_DIR} DIRECTORY)
+        set(PYTHON_LIBRARY
+                "${_PYTHON_ROOT}/libs/Python${PYTHON_LIBRARY_SUFFIX}.lib")
+    endif()
+    # raise an error if the python libs are still not found.
+    if(NOT EXISTS "${PYTHON_LIBRARY}")
+        message(FATAL_ERROR "Python libraries not found")
+    endif()
+    SET(PYTHON_LIBRARIES "${PYTHON_LIBRARY}")
+endif(WIN32)
 # Fixme: Maybe find a static library. Get SHARED/STATIC by FIND_PACKAGE.
 ADD_LIBRARY(python SHARED IMPORTED GLOBAL)
 SET_PROPERTY(TARGET python PROPERTY IMPORTED_LOCATION ${PYTHON_LIBRARIES})

--- a/cmake/external/rocprim.cmake
+++ b/cmake/external/rocprim.cmake
+if (NOT WITH_AMD_GPU)
+    return()
+endif()
+# rocprim is "ROCm Parallel Primitives" for short.
+# It is a header-only library providing HIP and HC parallel primitives
+# for developing performant GPU-accelerated code on AMD ROCm platform.
+if("x${HCC_HOME}" STREQUAL "x")
+  set(HCC_HOME "/opt/rocm/hcc")
+endif()
+INCLUDE(ExternalProject)
+SET(ROCPRIM_SOURCE_DIR ${THIRD_PARTY_PATH}/rocprim)
+SET(ROCPRIM_INSTALL_DIR  ${THIRD_PARTY_PATH}/install/rocprim)
+SET(ROCPRIM_INCLUDE_DIR ${ROCPRIM_INSTALL_DIR}/include)
+ExternalProject_Add(
+    extern_rocprim
+    GIT_REPOSITORY "https://github.com/ROCmSoftwarePlatform/rocPRIM.git"
+    GIT_TAG        5bd41b96ab8d8343330fb2c3e1b96775bde3b3fc 
+    PREFIX         ${ROCPRIM_SOURCE_DIR}
+    UPDATE_COMMAND  ""
+    CMAKE_ARGS     -DCMAKE_CXX_COMPILER=${HCC_HOME}/bin/hcc
+    CMAKE_ARGS     -DONLY_INSTALL=ON
+    CMAKE_ARGS     -DBUILD_TEST=OFF
+    CMAKE_ARGS     -DCMAKE_INSTALL_PREFIX=${ROCPRIM_INSTALL_DIR}
+    INSTALL_DIR    ${ROCPRIM_INSTALL_DIR}
+    ${EXTERNAL_PROJECT_LOG_ARGS}
+)
+INCLUDE_DIRECTORIES(${ROCPRIM_INCLUDE_DIR})
+if (${CMAKE_VERSION} VERSION_LESS "3.3.0")
+    set(dummyfile ${CMAKE_CURRENT_BINARY_DIR}/rocprim_dummy.c)
+    file(WRITE ${dummyfile} "const char *dummy_rocprim = \"${dummyfile}\";")
+    add_library(rocprim STATIC ${dummyfile})
+else()
+    add_library(rocprim INTERFACE)
+endif()
+add_dependencies(rocprim extern_rocprim)
--- a/cmake/external/snappy.cmake
+++ b/cmake/external/snappy.cmake
@@ -24,7 +24,11 @@ set(SNAPPY_SOURCES_DIR ${THIRD_PARTY_PATH}/snappy)
 set(SNAPPY_INSTALL_DIR ${THIRD_PARTY_PATH}/install/snappy)
 set(SNAPPY_INCLUDE_DIR "${SNAPPY_INSTALL_DIR}/include" CACHE PATH "snappy include directory." FORCE)
-set(SNAPPY_LIBRARIES "${SNAPPY_INSTALL_DIR}/lib/libsnappy.a")
+if (WIN32)
+    set(SNAPPY_LIBRARIES "${SNAPPY_INSTALL_DIR}/lib/snappy.lib")
+else(WIN32)
+    set(SNAPPY_LIBRARIES "${SNAPPY_INSTALL_DIR}/lib/libsnappy.a")
+endif (WIN32)
 ExternalProject_Add(
    extern_snappy
@@ -34,8 +38,12 @@ ExternalProject_Add(
    UPDATE_COMMAND  ""
    CMAKE_ARGS      -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
                    -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
-                    -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
                    -DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
+                    -DCMAKE_C_FLAGS_DEBUG=${CMAKE_C_FLAGS_DEBUG}
+                    -DCMAKE_C_FLAGS_RELEASE=${CMAKE_C_FLAGS_RELEASE}
+                    -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
+                    -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
+                    -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
                    -DCMAKE_INSTALL_PREFIX=${SNAPPY_INSTALL_DIR}
                    -DCMAKE_INSTALL_LIBDIR=${SNAPPY_INSTALL_DIR}/lib
                    -DCMAKE_POSITION_INDEPENDENT_CODE=ON

--- a/cmake/external/snappystream.cmake
+++ b/cmake/external/snappystream.cmake
@@ -18,15 +18,19 @@ ENDIF()
 include (ExternalProject)
-# NOTE: snappy is needed when linking with recordio
 set(SNAPPYSTREAM_SOURCES_DIR ${THIRD_PARTY_PATH}/snappy_stream)
 set(SNAPPYSTREAM_INSTALL_DIR ${THIRD_PARTY_PATH}/install/snappy_stream)
 set(SNAPPYSTREAM_INCLUDE_DIR "${SNAPPYSTREAM_INSTALL_DIR}/include" CACHE PATH "snappy stream include directory." FORCE)
-set(SNAPPYSTREAM_LIBRARIES "${SNAPPYSTREAM_INSTALL_DIR}/lib/libsnappystream.a")
+if(WIN32)
+    # Fix me, VS2015 come without VLA support
+    set(SNAPPYSTREAM_LIBRARIES "${SNAPPYSTREAM_INSTALL_DIR}/lib/snappystream.lib")
+    MESSAGE(WARNING, "In windows, snappystream has no compile support for windows,
+    please build it manually and put it at " ${SNAPPYSTREAM_INSTALL_DIR})
+else(WIN32)
+    set(SNAPPYSTREAM_LIBRARIES "${SNAPPYSTREAM_INSTALL_DIR}/lib/libsnappystream.a")
-ExternalProject_Add(
+    ExternalProject_Add(
            extern_snappystream
            GIT_REPOSITORY "https://github.com/hoxnox/snappystream.git"
            GIT_TAG "0.2.8"
@@ -34,8 +38,12 @@ ExternalProject_Add(
            UPDATE_COMMAND  ""
            CMAKE_ARGS      -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
                            -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
-                        -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
                            -DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
+                            -DCMAKE_C_FLAGS_DEBUG=${CMAKE_C_FLAGS_DEBUG}
+                            -DCMAKE_C_FLAGS_RELEASE=${CMAKE_C_FLAGS_RELEASE}
+                            -DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
+                            -DCMAKE_CXX_FLAGS_RELEASE=${CMAKE_CXX_FLAGS_RELEASE}
+                            -DCMAKE_CXX_FLAGS_DEBUG=${CMAKE_CXX_FLAGS_DEBUG}
                            -DCMAKE_INSTALL_PREFIX=${SNAPPY_INSTALL_DIR}
                            -DCMAKE_INSTALL_LIBDIR=${SNAPPY_INSTALL_DIR}/lib
                            -DCMAKE_POSITION_INDEPENDENT_CODE=ON
@@ -47,7 +55,8 @@ ExternalProject_Add(
                            -DCMAKE_INSTALL_LIBDIR:PATH=${SNAPPYSTREAM_INSTALL_DIR}/lib
                            -DCMAKE_BUILD_TYPE:STRING=${THIRD_PARTY_BUILD_TYPE}
            DEPENDS snappy
-)
+    )
+endif(WIN32)
 add_library(snappystream STATIC IMPORTED GLOBAL)
 set_property(TARGET snappystream PROPERTY IMPORTED_LOCATION ${SNAPPYSTREAM_LIBRARIES})

--- a/cmake/external/xxhash.cmake
+++ b/cmake/external/xxhash.cmake
+INCLUDE(ExternalProject)
+set(XXHASH_SOURCE_DIR ${THIRD_PARTY_PATH}/xxhash)
+set(XXHASH_INSTALL_DIR ${THIRD_PARTY_PATH}/install/xxhash)
+set(XXHASH_INCLUDE_DIR "${XXHASH_INSTALL_DIR}/include")
+IF(WITH_STATIC_LIB)
+  SET(BUILD_CMD make lib)
+ELSE()
+  IF(APPLE)
+    SET(BUILD_CMD sed -i \"\" "s/-Wstrict-prototypes -Wundef/-Wstrict-prototypes -Wundef -fPIC/g" ${XXHASH_SOURCE_DIR}/src/extern_xxhash/Makefile && make lib)
+  ELSE(APPLE)
+    SET(BUILD_CMD sed -i "s/-Wstrict-prototypes -Wundef/-Wstrict-prototypes -Wundef -fPIC/g" ${XXHASH_SOURCE_DIR}/src/extern_xxhash/Makefile && make lib)
+  ENDIF(APPLE)
+ENDIF()
+if(WIN32)
+  ExternalProject_Add(
+          extern_xxhash
+          ${EXTERNAL_PROJECT_LOG_ARGS}
+          GIT_REPOSITORY  "https://github.com/Cyan4973/xxHash"
+          GIT_TAG         "v0.6.5"
+          PREFIX          ${XXHASH_SOURCE_DIR}
+          DOWNLOAD_NAME   "xxhash"
+          UPDATE_COMMAND  ""
+          BUILD_IN_SOURCE 1
+          PATCH_COMMAND
+          CONFIGURE_COMMAND
+          ${CMAKE_COMMAND} ${XXHASH_SOURCE_DIR}/src/extern_xxhash/cmake_unofficial
+          -DCMAKE_INSTALL_PREFIX:PATH=${XXHASH_INSTALL_DIR}
+          -DCMAKE_BUILD_TYPE:STRING=${CMAKE_BUILD_TYPE}
+          -DCMAKE_VERBOSE_MAKEFILE:BOOL=OFF
+          -DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=ON
+          -DBUILD_XXHSUM=OFF
+          -DCMAKE_GENERATOR_PLATFORM=x64
+          -DBUILD_SHARED_LIBS=OFF
+          ${OPTIONAL_CACHE_ARGS}
+          TEST_COMMAND      ""
+  )
+else()
+  ExternalProject_Add(
+      extern_xxhash
+      ${EXTERNAL_PROJECT_LOG_ARGS}
+      GIT_REPOSITORY  "https://github.com/Cyan4973/xxHash"
+      GIT_TAG         "v0.6.5"
+      PREFIX          ${XXHASH_SOURCE_DIR}
+      DOWNLOAD_NAME   "xxhash"
+      UPDATE_COMMAND  ""
+      CONFIGURE_COMMAND ""
+      BUILD_IN_SOURCE 1
+      PATCH_COMMAND
+      BUILD_COMMAND     ${BUILD_CMD}
+      INSTALL_COMMAND   export PREFIX=${XXHASH_INSTALL_DIR}/ && make install
+      TEST_COMMAND      ""
+  )
+endif()
+if (WIN32)
+  set(XXHASH_LIBRARIES "${XXHASH_INSTALL_DIR}/lib/xxhash.lib")
+else()
+  set(XXHASH_LIBRARIES "${XXHASH_INSTALL_DIR}/lib/libxxhash.a")
+endif ()
+INCLUDE_DIRECTORIES(${XXHASH_INCLUDE_DIR})
+add_library(xxhash STATIC IMPORTED GLOBAL)
+set_property(TARGET xxhash PROPERTY IMPORTED_LOCATION ${XXHASH_LIBRARIES})
+include_directories(${XXHASH_INCLUDE_DIR})
+add_dependencies(xxhash extern_xxhash)
+LIST(APPEND external_project_dependencies xxhash)
+IF(WITH_C_API)
+  INSTALL(DIRECTORY ${XXHASH_INCLUDE_DIR} DESTINATION third_party/xxhash)
+  IF(ANDROID)
+    INSTALL(FILES ${XXHASH_LIBRARIES} DESTINATION third_party/xxhash/lib/${ANDROID_ABI})
+  ELSE()
+    INSTALL(FILES ${XXHASH_LIBRARIES} DESTINATION third_party/xxhash/lib)
+  ENDIF()
+ENDIF()
--- a/cmake/flags.cmake
+++ b/cmake/flags.cmake
@@ -27,7 +27,6 @@ endfunction()
 CheckCompilerCXX11Flag()
 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
 # safe_set_flag
 #
 # Set a compile flag only if compiler is support
@@ -71,6 +70,20 @@ macro(safe_set_nvflag flag_name)
    endif()
 endmacro()
+macro(safe_set_static_flag) # set c_flags and cxx_flags to static or shared
+    if (BUILD_SHARED_LIBS) 
+        return() # if build shared libs, the flags keep same with '/MD'
+    endif(BUILD_SHARED_LIBS)
+    foreach(flag_var
+        CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
+        CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO
+        CMAKE_C_FLAGS CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_RELEASE
+        CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELWITHDEBINFO)
+      if(${flag_var} MATCHES "/MD")
+        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
+      endif(${flag_var} MATCHES "/MD")
+    endforeach(flag_var)
+endmacro()
 CHECK_CXX_SYMBOL_EXISTS(UINT64_MAX "stdint.h" UINT64_MAX_EXISTS)
 if(NOT UINT64_MAX_EXISTS)
@@ -97,9 +110,13 @@ SET(CMAKE_EXTRA_INCLUDE_FILES "")
 # Common flags. the compiler flag used for C/C++ sources whenever release or debug
 # Do not care if this flag is support for gcc.
+# https://github.com/PaddlePaddle/Paddle/issues/12773
+if (NOT WIN32)
 set(COMMON_FLAGS
    -fPIC
    -fno-omit-frame-pointer
+    -Werror
    -Wall
    -Wextra
    -Wnon-virtual-dtor
@@ -112,13 +129,11 @@ set(COMMON_FLAGS
    -Wno-error=parentheses-equality # Warnings in pybind11
    -Wno-error=ignored-attributes  # Warnings in Eigen, gcc 6.3
    -Wno-error=terminate  # Warning in PADDLE_ENFORCE
+    -Wno-error=int-in-bool-context # Warning in Eigen gcc 7.2
+    -Wimplicit-fallthrough=0 # Warning in tinyformat.h
+    -Wno-error=maybe-uninitialized # Warning in boost gcc 7.2
 )
-# https://github.com/PaddlePaddle/Paddle/issues/12773
-if (NOT WIN32)
-list(APPEND COMMON_FLAGS -Werror)
-endif()
 set(GPU_COMMON_FLAGS
    -fPIC
    -fno-omit-frame-pointer
@@ -133,30 +148,55 @@ set(GPU_COMMON_FLAGS
    -Wno-error=array-bounds # Warnings in Eigen::array
 )
+else(NOT WIN32)
+set(COMMON_FLAGS
+    "/w") #disable all warnings.
+set(GPU_COMMON_FLAGS
+    "/w") #disable all warnings
+endif(NOT WIN32)
 if (APPLE)
    if(NOT CMAKE_CROSSCOMPILING)
        # On Mac OS X build fat binaries with x86_64 architectures by default.
        set (CMAKE_OSX_ARCHITECTURES "x86_64" CACHE STRING "Build architectures for OSX" FORCE)
    endif()
-else()
+    # On Mac OS X register class specifier is deprecated and will cause warning error on latest clang 10.0
+    set (COMMON_FLAGS -Wno-deprecated-register)
+endif(APPLE)
+if(LINUX)
    set(GPU_COMMON_FLAGS
        -Wall
        -Wextra
        -Werror
        ${GPU_COMMON_FLAGS})
-endif()
+endif(LINUX)
 if(UNIX AND NOT APPLE)
  # except apple from nix*Os family
  set(LINUX TRUE)
 endif(UNIX AND NOT APPLE)
 foreach(flag ${COMMON_FLAGS})
    safe_set_cflag(CMAKE_C_FLAGS ${flag})
    safe_set_cxxflag(CMAKE_CXX_FLAGS ${flag})
 endforeach()
 foreach(flag ${GPU_COMMON_FLAGS})
    safe_set_nvflag(${flag})
 endforeach()
+if(WIN32)
+# windows build turn off warnings.
+safe_set_static_flag()
+    foreach(flag_var
+        CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
+        CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO
+        CMAKE_C_FLAGS CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_RELEASE
+        CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELWITHDEBINFO)
+      if(${flag_var} MATCHES "/W3")
+        string(REGEX REPLACE "/W3" "/w" ${flag_var} "${${flag_var}}")
+      endif(${flag_var} MATCHES "/W3")
+    endforeach(flag_var)
+endif(WIN32)
--- a/cmake/generic.cmake
+++ b/cmake/generic.cmake
@@ -110,6 +110,20 @@ function(find_fluid_modules TARGET_NAME)
  endif()
 endfunction(find_fluid_modules)
+# find all third_party modules is used for paddle static library
+# for reduce the dependency when building the inference libs.
+set_property(GLOBAL PROPERTY FLUID_THIRD_PARTY)
+function(find_fluid_thirdparties TARGET_NAME)
+  get_filename_component(__target_path ${TARGET_NAME} ABSOLUTE)
+  string(REGEX REPLACE "^${PADDLE_SOURCE_DIR}/" "" __target_path ${__target_path})
+  string(FIND "${__target_path}" "third_party" pos)
+  if(pos GREATER 1)
+    get_property(fluid_ GLOBAL PROPERTY FLUID_THIRD_PARTY)
+    set(fluid_third_partys ${fluid_third_partys} ${TARGET_NAME})
+    set_property(GLOBAL PROPERTY FLUID_THIRD_PARTY "${fluid_third_partys}")
+  endif()
+endfunction(find_fluid_thirdparties)
 function(merge_static_libs TARGET_NAME)
  set(libs ${ARGN})
  list(REMOVE_DUPLICATES libs)
@@ -204,18 +218,13 @@ function(merge_static_libs TARGET_NAME)
    foreach(lib ${libs})
      # Get the file names of the libraries to be merged
-      #if(NOT $<TARGET_FILE:${lib}> MATCHES "lib.*\\.lib")
-      #  message("library" ${lib})
-      #  set(libfiles ${libfiles} lib$<TARGET_FILE:${lib}>)
-      #else()
      set(libfiles ${libfiles} $<TARGET_FILE:${lib}>)
-      #endif()
    endforeach()
+    # msvc will put libarary in directory of "/Release/xxxlib" by default 
-    # windows cmd return error in clean env.
+    #       COMMAND cmake -E remove "${CMAKE_CURRENT_BINARY_DIR}/${CMAKE_BUILD_TYPE}/${TARGET_NAME}.lib"
-    # COMMAND del "${CMAKE_CURRENT_BINARY_DIR}/${CMAKE_BUILD_TYPE}/${TARGET_NAME}.lib"
    add_custom_command(TARGET ${TARGET_NAME} POST_BUILD
-      COMMAND lib /OUT:${CMAKE_CURRENT_BINARY_DIR}/lib${TARGET_NAME}.lib ${libfiles}
+      COMMAND cmake -E make_directory "${CMAKE_CURRENT_BINARY_DIR}/${CMAKE_BUILD_TYPE}"
+      COMMAND lib /OUT:${CMAKE_CURRENT_BINARY_DIR}/${CMAKE_BUILD_TYPE}/lib${TARGET_NAME}.lib ${libfiles}
      )
  endif(WIN32)
 endfunction(merge_static_libs)
@@ -252,6 +261,17 @@ function(cc_library TARGET_NAME)
        add_dependencies(${TARGET_NAME} mklml)
        target_link_libraries(${TARGET_NAME} "-L${MKLML_LIB_DIR} -liomp5 -Wl,--as-needed")
      endif()
+      # remove link to python, see notes at:
+      # https://github.com/pybind/pybind11/blob/master/docs/compiling.rst#building-manually
+      if("${cc_library_DEPS};" MATCHES "python;")
+        list(REMOVE_ITEM cc_library_DEPS python)
+        add_dependencies(${TARGET_NAME} python)
+        if(WIN32)
+          target_link_libraries(${TARGET_NAME} ${PYTHON_LIBRARIES})
+        else()
+          target_link_libraries(${TARGET_NAME} "-Wl,-undefined,dynamic_lookup")
+        endif(WIN32)
+      endif()
      target_link_libraries(${TARGET_NAME} ${cc_library_DEPS})
      add_dependencies(${TARGET_NAME} ${cc_library_DEPS})
    endif()
@@ -272,6 +292,45 @@ function(cc_library TARGET_NAME)
  endif(cc_library_SRCS)
 endfunction(cc_library)
+# The link operation under windows may exceeds the maximum characters limit, simply break the link command
+# into multiple link opeartion can fix that, say
+# original:
+#     lib /out:target.lib a.lib b.lib c.lib d.lib
+# after:
+#    1. lib /out:dummy_lib_1.lib a.lib b.lib
+#    2. lib /out:dummy_lib_2.lib c.lib d.lib
+#    1. lib /out:target.lib dummy_lib_1.lib dummy_lib_2.lib
+function(sep_library TARGET_NAME)
+  set(options STATIC static SHARED shared)
+  set(oneValueArgs "")
+  set(multiValueArgs SRCS DEPS)
+  cmake_parse_arguments(sep_library "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
+  set(dummy_index 1)
+  set(dummy_offset 1)
+  # the dummy target would be consisted of limit size libraries
+  set(dummy_limit 50)
+  list(LENGTH sep_library_DEPS sep_all_len)
+  foreach(v ${sep_library_DEPS})
+    list(APPEND dummy_list ${v})
+    list(LENGTH dummy_list listlen )
+    if ((${listlen} GREATER ${dummy_limit}) OR (${dummy_offset} EQUAL ${sep_all_len}))
+      message("create dummy library ${TARGET_NAME}_dummy_lib_${dummy_index} for ${TARGET_NAME}")
+      cc_library(${TARGET_NAME}_dummy_lib_${dummy_index} STATIC DEPS ${dummy_list})
+      foreach(i ${dummy_list})
+        list(REMOVE_AT dummy_list 0)
+      endforeach()
+      list(APPEND ${TARGET_NAME}_dummy_list ${TARGET_NAME}_dummy_lib_${dummy_index})
+      MATH(EXPR dummy_index "${dummy_index}+1")
+    endif()
+    MATH(EXPR dummy_offset "${dummy_offset}+1")
+  endforeach()
+  if(${sep_library_SHARED})
+    cc_library(${TARGET_NAME} SHARED SRCS ${sep_library_SRCS} DEPS ${${TARGET_NAME}_dummy_list})
+  else(${sep_library_SHARED})
+    cc_library(${TARGET_NAME} STATIC SRCS ${sep_library_SRCS} DEPS ${${TARGET_NAME}_dummy_list})
+  endif(${sep_library_SHARED})
+endfunction(sep_library)
 function(cc_binary TARGET_NAME)
  set(options "")
  set(oneValueArgs "")
@@ -292,17 +351,21 @@ function(cc_test TARGET_NAME)
    cmake_parse_arguments(cc_test "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
    add_executable(${TARGET_NAME} ${cc_test_SRCS})
    target_link_libraries(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main lod_tensor memory gtest gflags glog)
+    if(WIN32)
+      target_link_libraries(${TARGET_NAME} shlwapi)
+    endif(WIN32)
    add_dependencies(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main lod_tensor memory gtest gflags glog)
    add_test(NAME ${TARGET_NAME}
             COMMAND ${TARGET_NAME} ${cc_test_ARGS}
             WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
    if (${cc_test_SERIAL})
        set_property(TEST ${TARGET_NAME} PROPERTY RUN_SERIAL 1)
+    endif()
    set_property(TEST ${TARGET_NAME} PROPERTY ENVIRONMENT FLAGS_cpu_deterministic=true)
    set_property(TEST ${TARGET_NAME} PROPERTY ENVIRONMENT FLAGS_init_allocated_mem=true)
    set_property(TEST ${TARGET_NAME} PROPERTY ENVIRONMENT FLAGS_cudnn_deterministic=true)
-    endif()
+    # No unit test should exceed 10 minutes.
+    set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 600)
  endif()
 endfunction(cc_test)
@@ -366,12 +429,11 @@ function(nv_test TARGET_NAME)
    add_test(${TARGET_NAME} ${TARGET_NAME})
    if (nv_test_SERIAL)
        set_property(TEST ${TARGET_NAME} PROPERTY RUN_SERIAL 1)
+    endif()
    set_property(TEST ${TARGET_NAME} PROPERTY ENVIRONMENT FLAGS_cpu_deterministic=true)
    set_property(TEST ${TARGET_NAME} PROPERTY ENVIRONMENT FLAGS_init_allocated_mem=true)
    set_property(TEST ${TARGET_NAME} PROPERTY ENVIRONMENT FLAGS_cudnn_deterministic=true)
  endif()
-  endif()
 endfunction(nv_test)
 function(hip_library TARGET_NAME)
@@ -392,11 +454,15 @@ function(hip_library TARGET_NAME)
      else()
        add_library(${TARGET_NAME} STATIC ${_cmake_options} ${_generated_files} ${_sources})
        set_target_properties(${TARGET_NAME} PROPERTIES LINKER_LANGUAGE CXX)
-        target_link_libraries(${TARGET_NAME} /opt/rocm/hip/lib/libhip_hcc.so /opt/rocm/hip/lib/libhip_device.a)
+        target_link_libraries(${TARGET_NAME} /opt/rocm/hip/lib/libhip_hcc.so /opt/rocm/hip/lib/libhip_device.a /opt/rocm/rccl/lib/librccl.so /opt/rocm/hiprand/lib/libhiprand.so)
        find_fluid_modules(${TARGET_NAME})
      endif()
-      if (hip_library_DEPS)
+      if("${hip_library_DEPS}" MATCHES "ARCHIVE_START")
-	add_dependencies(${TARGET_NAME} ${hip_library_DEPS})
+        # Support linking flags: --whole-archive (Linux) / -force_load (MacOS).
+        # WARNING: Please don't use ARCHIVE_START&ARCHIVE_END if TARGET_NAME will be linked by other libraries.
+        target_circle_link_libraries(${TARGET_NAME} ${hip_library_DEPS})
+        list(REMOVE_ITEM hip_library_DEPS ARCHIVE_START ARCHIVE_END)
+      else()
        target_link_libraries(${TARGET_NAME} ${hip_library_DEPS})
      endif()
      # cpplint code style
@@ -622,6 +688,8 @@ function(py_test TARGET_NAME)
             PYTHONPATH=${PADDLE_BINARY_DIR}/python ${py_test_ENVS}
             ${PYTHON_EXECUTABLE} -u ${py_test_SRCS} ${py_test_ARGS}
             WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
+    # No unit test should exceed 10 minutes.
+    set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 600)
  endif()
 endfunction()

--- a/cmake/hip.cmake
+++ b/cmake/hip.cmake
@@ -3,6 +3,8 @@ if(NOT WITH_AMD_GPU)
 endif()
 include_directories("/opt/rocm/include")
+include_directories("/opt/rocm/hip/include")
+include_directories("/opt/rocm/miopen/include")
 include_directories("/opt/rocm/hipblas/include")
 include_directories("/opt/rocm/hiprand/include")
 include_directories("/opt/rocm/rocrand/include")
@@ -11,20 +13,40 @@ include_directories("/opt/rocm/thrust")
 list(APPEND EXTERNAL_LIBS "-L/opt/rocm/lib/ -lhip_hcc")
-set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -fPIC -DPADDLE_WITH_HIP -std=c++14" )
+set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -fPIC -DPADDLE_WITH_HIP -std=c++11" )
 if(WITH_DSO)
  set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -DPADDLE_USE_DSO")
 endif(WITH_DSO)
-if(WITH_DOUBLE)
-  set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -DPADDLE_TYPE_DOUBLE")
-endif(WITH_DOUBLE)
 if(WITH_TESTING)
  set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -DPADDLE_WITH_TESTING")
 endif(WITH_TESTING)
+if(WITH_DISTRIBUTE)
+  set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -DPADDLE_WITH_DISTRIBUTE")
+endif(WITH_DISTRIBUTE)
+if(WITH_GRPC)
+  set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -DPADDLE_WITH_GRPC")
+endif(WITH_GRPC)
+if(NOT WITH_GOLANG)
+  set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -DPADDLE_WITHOUT_GOLANG")
+endif(NOT WITH_GOLANG)
+if(WITH_MKLDNN)
+  set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -DPADDLE_WITH_MKLDNN")
+endif(WITH_MKLDNN)
+set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -DANY_IMPL_ANY_CAST_MOVEABLE")
+if(NOT WITH_RDMA)
+  set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -DPADDLE_DISABLE_RDMA")
+endif(NOT WITH_RDMA)
 if(CMAKE_BUILD_TYPE  STREQUAL "Debug")
    list(APPEND HIP_HCC_FLAGS  ${CMAKE_CXX_FLAGS_DEBUG})
 elseif(CMAKE_BUILD_TYPE  STREQUAL "RelWithDebInfo")

--- a/cmake/inference_lib.cmake
+++ b/cmake/inference_lib.cmake
@@ -18,24 +18,45 @@ function(copy TARGET)
    set(oneValueArgs "")
    set(multiValueArgs SRCS DSTS DEPS)
    cmake_parse_arguments(copy_lib "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
-    set(inference_lib_dist_dep ${TARGET} ${inference_lib_dist_dep} PARENT_SCOPE)
+    set(fluid_lib_dist_dep ${TARGET} ${fluid_lib_dist_dep} PARENT_SCOPE)
    list(LENGTH copy_lib_SRCS copy_lib_SRCS_len)
    list(LENGTH copy_lib_DSTS copy_lib_DSTS_len)
-    if(NOT ${copy_lib_SRCS_len} EQUAL ${copy_lib_DSTS_len})
+    if (NOT ${copy_lib_SRCS_len} EQUAL ${copy_lib_DSTS_len})
        message(FATAL_ERROR "${TARGET} source numbers are not equal to destination numbers")
-    endif()
+    endif ()
    math(EXPR len "${copy_lib_SRCS_len} - 1")
    add_custom_target(${TARGET} DEPENDS ${copy_lib_DEPS})
-    foreach(index RANGE ${len})
+    foreach (index RANGE ${len})
        list(GET copy_lib_SRCS ${index} src)
        list(GET copy_lib_DSTS ${index} dst)
+        if (WIN32)
+            # windows cmd shell will not expand wildcard automatically.
+            # below expand the files,libs and copy them by rules.
+            file(GLOB header_files ${src} "*.h")
+            file(GLOB static_lib_files ${src} "*.lib")
+            file(GLOB dll_lib_files ${src} "*.dll")
+            set(src_files ${header_files} ${static_lib_files} ${dll_lib_files})
+            if (NOT "${src_files}" STREQUAL "")
+                list(REMOVE_DUPLICATES src_files)
+            endif ()
+            add_custom_command(TARGET ${TARGET} PRE_BUILD
+                    COMMAND ${CMAKE_COMMAND} -E make_directory "${dst}"
+                    )
+            foreach (src_file ${src_files})
+                add_custom_command(TARGET ${TARGET} PRE_BUILD
+                        COMMAND ${CMAKE_COMMAND} -E copy "${src_file}" "${dst}"
+                        COMMENT "copying ${src_file} -> ${dst}")
+            endforeach ()
+        else (WIN32) # not windows
            add_custom_command(TARGET ${TARGET} PRE_BUILD
                    COMMAND mkdir -p "${dst}"
                    COMMAND cp -r "${src}" "${dst}"
                    COMMENT "copying ${src} -> ${dst}")
-    endforeach()
+        endif (WIN32) # not windows
+    endforeach ()
 endfunction()
 # third party
@@ -44,39 +65,46 @@ copy(eigen3_lib
        SRCS ${EIGEN_INCLUDE_DIR}/Eigen/Core ${EIGEN_INCLUDE_DIR}/Eigen/src ${EIGEN_INCLUDE_DIR}/unsupported/Eigen
        DSTS ${dst_dir}/Eigen ${dst_dir}/Eigen ${dst_dir}/unsupported
        DEPS eigen3
-)
+        )
 set(dst_dir "${FLUID_INSTALL_DIR}/third_party/install/gflags")
 copy(gflags_lib
        SRCS ${GFLAGS_INCLUDE_DIR} ${GFLAGS_LIBRARIES}
        DSTS ${dst_dir} ${dst_dir}/lib
        DEPS gflags
-)
+        )
 set(dst_dir "${FLUID_INSTALL_DIR}/third_party/install/glog")
 copy(glog_lib
        SRCS ${GLOG_INCLUDE_DIR} ${GLOG_LIBRARIES}
        DSTS ${dst_dir} ${dst_dir}/lib
        DEPS glog
-)
+        )
 set(dst_dir "${FLUID_INSTALL_DIR}/third_party/boost/")
 copy(boost_lib
        SRCS ${BOOST_INCLUDE_DIR}/boost
        DSTS ${dst_dir}
        DEPS boost
-)
+        )
+set(dst_dir "${FLUID_INSTALL_DIR}/third_party/install/xxhash")
+copy(xxhash_lib
+        SRCS ${XXHASH_INCLUDE_DIR} ${XXHASH_LIBRARIES}
+        DSTS ${dst_dir} ${dst_dir}/lib
+        DEPS xxhash
+        )
-if(NOT PROTOBUF_FOUND)
+if (NOT PROTOBUF_FOUND)
    set(dst_dir "${FLUID_INSTALL_DIR}/third_party/install/protobuf")
    copy(protobuf_lib
            SRCS ${PROTOBUF_INCLUDE_DIR} ${PROTOBUF_LIBRARY}
            DSTS ${dst_dir} ${dst_dir}/lib
            DEPS extern_protobuf
            )
-endif()
+endif ()
-if(NOT CBLAS_FOUND)
+if (NOT CBLAS_FOUND)
    set(dst_dir "${FLUID_INSTALL_DIR}/third_party/install/openblas")
    copy(openblas_lib
            SRCS ${CBLAS_INSTALL_DIR}/lib ${CBLAS_INSTALL_DIR}/include
@@ -90,19 +118,19 @@ elseif (WITH_MKLML)
            DSTS ${dst_dir}/lib ${dst_dir}/lib ${dst_dir}
            DEPS mklml
            )
-endif()
+endif ()
-if(WITH_MKLDNN)
+if (WITH_MKLDNN)
    set(dst_dir "${FLUID_INSTALL_DIR}/third_party/install/mkldnn")
    copy(mkldnn_lib
            SRCS ${MKLDNN_INC_DIR} ${MKLDNN_SHARED_LIB}
            DSTS ${dst_dir} ${dst_dir}/lib
            DEPS mkldnn
            )
-endif()
+endif ()
 if (NOT WIN32)
-if(NOT MOBILE_INFERENCE AND NOT RPI)
+    if (NOT MOBILE_INFERENCE AND NOT RPI)
        set(dst_dir "${FLUID_INSTALL_DIR}/third_party/install/snappy")
        copy(snappy_lib
                SRCS ${SNAPPY_INCLUDE_DIR} ${SNAPPY_LIBRARIES}
@@ -120,27 +148,27 @@ if(NOT MOBILE_INFERENCE AND NOT RPI)
                SRCS ${ZLIB_INCLUDE_DIR} ${ZLIB_LIBRARIES}
                DSTS ${dst_dir} ${dst_dir}/lib
                DEPS zlib)
-endif()
+    endif ()
-endif(NOT WIN32)
+endif (NOT WIN32)
 # paddle fluid module
 set(src_dir "${PADDLE_SOURCE_DIR}/paddle/fluid")
 set(dst_dir "${FLUID_INSTALL_DIR}/paddle/fluid")
 set(module "framework")
 if (NOT WIN32)
-set(framework_lib_deps framework_py_proto)
+    set(framework_lib_deps framework_py_proto)
-endif(NOT WIN32)
+endif (NOT WIN32)
 copy(framework_lib DEPS ${framework_lib_deps}
        SRCS ${src_dir}/${module}/*.h ${src_dir}/${module}/details/*.h ${PADDLE_BINARY_DIR}/paddle/fluid/framework/framework.pb.h
        ${src_dir}/${module}/ir/*.h
        DSTS ${dst_dir}/${module} ${dst_dir}/${module}/details ${dst_dir}/${module} ${dst_dir}/${module}/ir
-)
+        )
 set(module "memory")
 copy(memory_lib
-  SRCS ${src_dir}/${module}/*.h ${src_dir}/${module}/detail/*.h
+        SRCS ${src_dir}/${module}/*.h ${src_dir}/${module}/detail/*.h ${src_dir}/${module}/allocation/*.h
-  DSTS ${dst_dir}/${module} ${dst_dir}/${module}/detail
+        DSTS ${dst_dir}/${module} ${dst_dir}/${module}/detail ${dst_dir}/${module}/allocation
-)
+        )
 set(inference_deps paddle_fluid_shared paddle_fluid)
@@ -150,55 +178,76 @@ if (WITH_ANAKIN AND WITH_MKL)
            SRCS
            ${PADDLE_BINARY_DIR}/paddle/fluid/inference/api/libinference_anakin_api* # compiled anakin api
            ${ANAKIN_INSTALL_DIR} # anakin release
-        DSTS ${dst_dir}/inference/anakin ${FLUID_INSTALL_DIR}/third_party/install/anakin)
+            DSTS ${FLUID_INSTALL_DIR}/third_party/install/anakin ${FLUID_INSTALL_DIR}/third_party/install/anakin)
    list(APPEND inference_deps anakin_inference_lib)
-endif()
+endif ()
 set(module "inference")
 copy(inference_lib DEPS ${inference_deps}
  SRCS ${src_dir}/${module}/*.h ${PADDLE_BINARY_DIR}/paddle/fluid/inference/libpaddle_fluid.*
-       ${src_dir}/${module}/api/paddle_inference_api.h ${src_dir}/${module}/api/demo_ci
+       ${src_dir}/${module}/api/paddle_*.h
       ${PADDLE_BINARY_DIR}/paddle/fluid/inference/api/paddle_inference_pass.h
-  DSTS ${dst_dir}/${module} ${dst_dir}/${module} ${dst_dir}/${module} ${dst_dir}/${module} ${dst_dir}/${module}
+  DSTS ${dst_dir}/${module} ${dst_dir}/${module} ${dst_dir}/${module} ${dst_dir}/${module}
-)
+        )
 set(module "platform")
 copy(platform_lib DEPS profiler_py_proto
        SRCS ${src_dir}/${module}/*.h ${src_dir}/${module}/dynload/*.h ${src_dir}/${module}/details/*.h
        DSTS ${dst_dir}/${module} ${dst_dir}/${module}/dynload ${dst_dir}/${module}/details
-)
+        )
 set(module "string")
 copy(string_lib
        SRCS ${src_dir}/${module}/*.h ${src_dir}/${module}/tinyformat/*.h
        DSTS ${dst_dir}/${module} ${dst_dir}/${module}/tinyformat
-)
+        )
 set(module "pybind")
 copy(pybind_lib
        SRCS ${CMAKE_CURRENT_BINARY_DIR}/paddle/fluid/${module}/pybind.h
        DSTS ${dst_dir}/${module}
-)
+        )
 # CMakeCache Info
 copy(cmake_cache
        SRCS ${CMAKE_CURRENT_BINARY_DIR}/CMakeCache.txt
        DSTS ${FLUID_INSTALL_DIR})
-add_custom_target(inference_lib_dist DEPENDS ${inference_lib_dist_dep}) 
+# This command generates a complete fluid library for both train and inference
+add_custom_target(fluid_lib_dist DEPENDS ${fluid_lib_dist_dep})
+# Following commands generate a inference-only fluid library
+# third_party, version.txt and CMakeCache.txt are the same position with ${FLUID_INSTALL_DIR}
+copy(third_party DEPS fluid_lib_dist
+        SRCS ${FLUID_INSTALL_DIR}/third_party ${FLUID_INSTALL_DIR}/CMakeCache.txt
+        DSTS ${FLUID_INFERENCE_INSTALL_DIR} ${FLUID_INFERENCE_INSTALL_DIR}
+        )
+# only need libpaddle_fluid.so/a and paddle_*.h for inference-only library
+copy(inference_api_lib DEPS fluid_lib_dist
+  SRCS ${FLUID_INSTALL_DIR}/paddle/fluid/inference/libpaddle_fluid.*
+       ${FLUID_INSTALL_DIR}/paddle/fluid/inference/paddle_*.h
+  DSTS ${FLUID_INFERENCE_INSTALL_DIR}/paddle/lib ${FLUID_INFERENCE_INSTALL_DIR}/paddle/include
+)
+add_custom_target(inference_lib_dist DEPENDS third_party inference_api_lib)
 # paddle fluid version
-execute_process(
+function(version version_file)
+    execute_process(
            COMMAND ${GIT_EXECUTABLE} log --pretty=format:%H -1
            WORKING_DIRECTORY ${PADDLE_SOURCE_DIR}
            OUTPUT_VARIABLE PADDLE_GIT_COMMIT)
-set(version_file ${FLUID_INSTALL_DIR}/version.txt)
+    file(WRITE ${version_file}
-file(WRITE ${version_file}
            "GIT COMMIT ID: ${PADDLE_GIT_COMMIT}\n"
            "WITH_MKL: ${WITH_MKL}\n"
+            "WITH_MKLDNN: ${WITH_MKLDNN}\n"
            "WITH_GPU: ${WITH_GPU}\n")
-if(WITH_GPU)
+    if (WITH_GPU)
        file(APPEND ${version_file}
                "CUDA version: ${CUDA_VERSION}\n"
                "CUDNN version: v${CUDNN_MAJOR_VERSION}\n")
-endif()
+    endif ()
+endfunction()
+version(${FLUID_INSTALL_DIR}/version.txt)
+version(${FLUID_INFERENCE_INSTALL_DIR}/version.txt)
--- a/cmake/operators.cmake
+++ b/cmake/operators.cmake
+set(PART_CUDA_KERNEL_FILES)
+function(op_library TARGET)
+    # op_library is a function to create op library. The interface is same as
+    # cc_library. But it handle split GPU/CPU code and link some common library
+    # for ops.
+    set(cc_srcs)
+    set(cu_srcs)
+    set(hip_cu_srcs)
+    set(miopen_hip_cc_srcs)
+    set(cu_cc_srcs)
+    set(cudnn_cu_cc_srcs)
+    set(CUDNN_FILE)
+    set(mkldnn_cc_srcs)
+    set(MKLDNN_FILE)
+    set(op_common_deps operator op_registry math_function)
+    set(options "")
+    set(oneValueArgs "")
+    set(multiValueArgs SRCS DEPS)
+    set(pybind_flag 0)
+    cmake_parse_arguments(op_library "${options}" "${oneValueArgs}"
+            "${multiValueArgs}" ${ARGN})
+    list(LENGTH op_library_SRCS op_library_SRCS_len)
+    if (${op_library_SRCS_len} EQUAL 0)
+        if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${TARGET}.cc)
+            list(APPEND cc_srcs ${TARGET}.cc)
+        endif()
+        if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${TARGET}.cu.cc)
+            list(APPEND cu_cc_srcs ${TARGET}.cu.cc)
+        endif()
+        if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${TARGET}.cu)
+            list(APPEND cu_srcs ${TARGET}.cu)
+        endif()
+        if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${TARGET}.part.cu)
+            set(PART_CUDA_KERNEL_FILES ${CMAKE_CURRENT_SOURCE_DIR}/${TARGET}.part.cu
+                    ${PART_CUDA_KERNEL_FILES} PARENT_SCOPE)
+            list(APPEND cu_srcs ${CMAKE_CURRENT_SOURCE_DIR}/${TARGET}.part.cu)
+        endif()
+        if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${TARGET}.hip.cu)
+            list(APPEND hip_cu_srcs ${TARGET}.hip.cu)
+        endif()
+        string(REPLACE "_op" "_cudnn_op" CUDNN_FILE "${TARGET}")
+        if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${CUDNN_FILE}.cu.cc)
+            list(APPEND cudnn_cu_cc_srcs ${CUDNN_FILE}.cu.cc)
+        endif()
+        if(WITH_AMD_GPU)
+            string(REPLACE "_op" "_miopen_op" MIOPEN_FILE "${TARGET}")
+            if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${MIOPEN_FILE}.hip.cc)
+                list(APPEND miopen_hip_cc_srcs ${MIOPEN_FILE}.hip.cc)
+            endif()
+        endif()
+        if(WITH_MKLDNN)
+            string(REPLACE "_op" "_mkldnn_op" MKLDNN_FILE "${TARGET}")
+            if (EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${MKLDNN_FILE}.cc)
+                list(APPEND mkldnn_cc_srcs ${MKLDNN_FILE}.cc)
+            endif()
+        endif()
+    else()
+        foreach(src ${op_library_SRCS})
+            if (${src} MATCHES ".*\\.hip.cu$")
+                list(APPEND hip_cu_srcs ${src})
+            elseif (${src} MATCHES ".*\\.cu$")
+                list(APPEND cu_srcs ${src})
+            elseif(${src} MATCHES ".*_cudnn_op.cu.cc$")
+                list(APPEND cudnn_cu_cc_srcs ${src})
+            elseif(WITH_AMD_GPU AND ${src} MATCHES ".*_miopen_op.hip.cc$")
+                list(APPEND miopen_hip_cc_srcs ${src})
+            elseif(WITH_MKLDNN AND ${src} MATCHES ".*_mkldnn_op.cc$")
+                list(APPEND mkldnn_cc_srcs ${src})
+            elseif(${src} MATCHES ".*\\.cu.cc$")
+                list(APPEND cu_cc_srcs ${src})
+            elseif(${src} MATCHES ".*\\.cc$")
+                list(APPEND cc_srcs ${src})
+            else()
+                message(FATAL_ERROR "${TARGET} Source file ${src} should only be .cc or .cu")
+            endif()
+        endforeach()
+    endif()
+    list(LENGTH cc_srcs cc_srcs_len)
+    if (${cc_srcs_len} EQUAL 0)
+        message(FATAL_ERROR "The op library ${TARGET} should contains at least one .cc file")
+    endif()
+    if (WIN32)
+    # remove windows unsupported op, because windows has no nccl, no warpctc such ops.
+    foreach(windows_unsupport_op "nccl_op" "gen_nccl_id_op" "warpctc_op")
+        if ("${TARGET}" STREQUAL "${windows_unsupport_op}")
+          return()
+        endif()
+    endforeach()
+    endif(WIN32)
+    set(OP_LIBRARY ${TARGET} ${OP_LIBRARY} CACHE INTERNAL "op libs")
+    list(LENGTH op_library_DEPS op_library_DEPS_len)
+    if (${op_library_DEPS_len} GREATER 0)
+        set(DEPS_OPS ${TARGET} ${DEPS_OPS} PARENT_SCOPE)
+    endif()
+    if (WITH_GPU)
+        nv_library(${TARGET} SRCS ${cc_srcs} ${cu_cc_srcs} ${cudnn_cu_cc_srcs} ${mkldnn_cc_srcs} ${cu_srcs} DEPS ${op_library_DEPS}
+                ${op_common_deps})
+    elseif (WITH_AMD_GPU)
+        hip_library(${TARGET} SRCS ${cc_srcs} ${hip_cu_srcs} ${miopen_hip_cc_srcs} ${mkldnn_cc_srcs} DEPS ${op_library_DEPS}
+                ${op_common_deps})
+    else()
+        cc_library(${TARGET} SRCS ${cc_srcs} ${mkldnn_cc_srcs} DEPS ${op_library_DEPS}
+            ${op_common_deps})
+    endif()
+    # Define operators that don't need pybind here.
+    foreach(manual_pybind_op "compare_op" "logical_op" "nccl_op"
+"tensor_array_read_write_op" "tensorrt_engine_op" "conv_fusion_op")
+        if ("${TARGET}" STREQUAL "${manual_pybind_op}")
+            set(pybind_flag 1)
+        endif()
+    endforeach()
+    # The registration of USE_OP, please refer to paddle/fluid/framework/op_registry.h.
+    # Note that it's enough to just adding one operator to pybind in a *_op.cc file.
+    # And for detail pybind information, please see generated paddle/pybind/pybind.h.
+    file(READ ${TARGET}.cc TARGET_CONTENT)
+    string(REGEX MATCH "REGISTER_OPERATOR\\(.*REGISTER_OPERATOR\\(" multi_register "${TARGET_CONTENT}")
+    string(REGEX MATCH "REGISTER_OPERATOR\\([a-z0-9_]*," one_register "${multi_register}")
+    if (one_register STREQUAL "")
+        string(REPLACE "_op" "" TARGET "${TARGET}")
+    else ()
+        string(REPLACE "REGISTER_OPERATOR(" "" TARGET "${one_register}")
+        string(REPLACE "," "" TARGET "${TARGET}")
+    endif()
+    # pybind USE_NO_KERNEL_OP
+    # HACK: if REGISTER_OP_CPU_KERNEL presents the operator must have kernel
+    string(REGEX MATCH "REGISTER_OP_CPU_KERNEL" regex_result "${TARGET_CONTENT}")
+    string(REPLACE "_op" "" TARGET "${TARGET}")
+    if (${pybind_flag} EQUAL 0 AND regex_result STREQUAL "")
+        file(APPEND ${pybind_file} "USE_NO_KERNEL_OP(${TARGET});\n")
+        set(pybind_flag 1)
+    endif()
+    # pybind USE_CPU_ONLY_OP
+    list(LENGTH cu_srcs cu_srcs_len)
+    list(LENGTH cu_cc_srcs cu_cc_srcs_len)
+    list(LENGTH mkldnn_cc_srcs mkldnn_cc_srcs_len)
+    list(LENGTH hip_cu_srcs hip_cu_srcs_len)
+    list(LENGTH miopen_hip_cc_srcs miopen_hip_cc_srcs_len)
+    if (${pybind_flag} EQUAL 0 AND ${mkldnn_cc_srcs_len} EQUAL 0 AND ${cu_srcs_len} EQUAL 0 AND ${cu_cc_srcs_len} EQUAL 0 AND
+        ${hip_cu_srcs_len} EQUAL 0 AND ${miopen_hip_cc_srcs_len} EQUAL 0)
+        file(APPEND ${pybind_file} "USE_CPU_ONLY_OP(${TARGET});\n")
+        set(pybind_flag 1)
+    endif()
+    # pybind USE_OP_DEVICE_KERNEL for CUDNN
+    list(LENGTH cudnn_cu_cc_srcs cudnn_cu_cc_srcs_len)
+    if (WITH_GPU AND ${cudnn_cu_cc_srcs_len} GREATER 0)
+        file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(${TARGET}, CUDNN);\n")
+    endif()
+    # pybind USE_OP_DEVICE_KERNEL for MIOPEN
+    if (WITH_AMD_GPU AND ${miopen_hip_cc_srcs_len} GREATER 0)
+        file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(${TARGET}, MIOPEN);\n")
+    endif()
+    # pybind USE_OP_DEVICE_KERNEL for MKLDNN
+    if (WITH_MKLDNN AND ${mkldnn_cc_srcs_len} GREATER 0)
+      # Append first implemented MKLDNN activation operator
+      if (${MKLDNN_FILE} STREQUAL "activation_mkldnn_op")
+        file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(relu, MKLDNN);\n")
+      else()
+        file(APPEND ${pybind_file} "USE_OP_DEVICE_KERNEL(${TARGET}, MKLDNN);\n")
+      endif()
+    endif()
+    # pybind USE_OP
+    if (${pybind_flag} EQUAL 0)
+      # NOTE(*): activation use macro to regist the kernels, set use_op manually.
+      if(${TARGET} STREQUAL "activation")
+        file(APPEND ${pybind_file} "USE_OP(relu);\n")
+      elseif(${TARGET} STREQUAL "fake_dequantize")
+        file(APPEND ${pybind_file} "USE_OP(fake_dequantize_max_abs);\n")
+      elseif(${TARGET} STREQUAL "fake_quantize")
+        file(APPEND ${pybind_file} "USE_OP(fake_quantize_abs_max);\n")
+      elseif(${TARGET} STREQUAL "tensorrt_engine_op")
+          message(STATUS "Pybind skips [tensorrt_engine_op], for this OP is only used in inference")
+      elseif(${TARGET} STREQUAL "fc")
+        # HACK: fc only have mkldnn and cpu, which would mismatch the cpu only condition
+        file(APPEND ${pybind_file} "USE_CPU_ONLY_OP(${TARGET});\n")
+      else()
+        file(APPEND ${pybind_file} "USE_OP(${TARGET});\n")
+      endif()
+    endif()
+endfunction()
+function(register_operators)
+    set(options "")
+    set(oneValueArgs "")
+    set(multiValueArgs EXCLUDES DEPS)
+    cmake_parse_arguments(register_operators "${options}" "${oneValueArgs}"
+            "${multiValueArgs}" ${ARGN})
+    file(GLOB OPS RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}" "*_op.cc")
+    string(REPLACE "_mkldnn" "" OPS "${OPS}")
+    string(REPLACE ".cc" "" OPS "${OPS}")
+    list(REMOVE_DUPLICATES OPS)
+    list(LENGTH register_operators_DEPS register_operators_DEPS_len)
+    foreach(src ${OPS})
+        list(FIND register_operators_EXCLUDES ${src} _index)
+        if (${_index} EQUAL -1)
+            if (${register_operators_DEPS_len} GREATER 0)
+                op_library(${src} DEPS ${register_operators_DEPS})
+            else()
+                op_library(${src})
+            endif()
+        endif()
+    endforeach()
+endfunction()
--- a/cmake/simd.cmake
+++ b/cmake/simd.cmake
@@ -57,41 +57,46 @@ int main()
    return 0;
 }" SSE3_FOUND)
-# Check AVX
+# disable AVX by default on windows
-set(CMAKE_REQUIRED_FLAGS ${AVX_FLAG})
+if(NOT WIN32)
-set(AVX_FOUND_EXITCODE 1 CACHE STRING "Result from TRY_RUN" FORCE)
+    # Check AVX
-CHECK_CXX_SOURCE_RUNS("
+    set(CMAKE_REQUIRED_FLAGS ${AVX_FLAG})
-#include <immintrin.h>
+    set(AVX_FOUND_EXITCODE 1 CACHE STRING "Result from TRY_RUN" FORCE)
-int main()
+    CHECK_CXX_SOURCE_RUNS("
-{
+    #include <immintrin.h>
+    int main()
+    {
        __m256 a = _mm256_set_ps (-1.0f, 2.0f, -3.0f, 4.0f, -1.0f, 2.0f, -3.0f, 4.0f);
        __m256 b = _mm256_set_ps (1.0f, 2.0f, 3.0f, 4.0f, 1.0f, 2.0f, 3.0f, 4.0f);
        __m256 result = _mm256_add_ps (a, b);
        return 0;
-}" AVX_FOUND)
+    }" AVX_FOUND)
-# Check AVX 2
+    # Check AVX 2
-set(CMAKE_REQUIRED_FLAGS ${AVX2_FLAG})
+    set(CMAKE_REQUIRED_FLAGS ${AVX2_FLAG})
-set(AVX2_FOUND_EXITCODE 1 CACHE STRING "Result from TRY_RUN" FORCE)
+    set(AVX2_FOUND_EXITCODE 1 CACHE STRING "Result from TRY_RUN" FORCE)
-CHECK_CXX_SOURCE_RUNS("
+    CHECK_CXX_SOURCE_RUNS("
-#include <immintrin.h>
+    #include <immintrin.h>
-int main()
+    int main()
-{
+    {
        __m256i a = _mm256_set_epi32 (-1, 2, -3, 4, -1, 2, -3, 4);
        __m256i result = _mm256_abs_epi32 (a);
        return 0;
-}" AVX2_FOUND)
+    }" AVX2_FOUND)
-# Check AVX512F
+    # Check AVX512F
-set(CMAKE_REQUIRED_FLAGS ${AVX512F_FLAG})
+    set(CMAKE_REQUIRED_FLAGS ${AVX512F_FLAG})
-set(AVX512F_FOUND_EXITCODE 1 CACHE STRING "Result from TRY_RUN" FORCE)
+    set(AVX512F_FOUND_EXITCODE 1 CACHE STRING "Result from TRY_RUN" FORCE)
-CHECK_CXX_SOURCE_RUNS("
+    CHECK_CXX_SOURCE_RUNS("
-#include <immintrin.h>
+    #include <immintrin.h>
-int main()
+    int main()
-{
+    {
-    __m512i a = _mm512_undefined_epi32();
+        __m512i a = _mm512_set_epi32 (-1, 2, -3, 4, -1, 2, -3, 4,
+                                      13, -5, 6, -7, 9, 2, -6, 3);
+        __m512i result = _mm512_abs_epi32 (a);
        return 0;
-}" AVX512F_FOUND)
+    }" AVX512F_FOUND)
+endif(NOT WIN32)
 set(CMAKE_REQUIRED_FLAGS ${CMAKE_REQUIRED_FLAGS_RETAINED})
 mark_as_advanced(MMX_FOUND SSE2_FOUND SSE3_FOUND AVX_FOUND AVX2_FOUND AVX512F_FOUND)
--- a/cmake/tensorrt.cmake
+++ b/cmake/tensorrt.cmake
@@ -16,7 +16,9 @@ find_library(TENSORRT_LIBRARY NAMES libnvinfer.so libnvinfer.a
    DOC "Path to TensorRT library.")
 if(TENSORRT_INCLUDE_DIR AND TENSORRT_LIBRARY)
+  if(WITH_DSO)
    set(TENSORRT_FOUND ON)
+  endif(WITH_DSO)
 else()
    set(TENSORRT_FOUND OFF)
 endif()
@@ -32,4 +34,5 @@ if(TENSORRT_FOUND)
        "Current TensorRT version is v${TENSORRT_MAJOR_VERSION}. ")
    include_directories(${TENSORRT_INCLUDE_DIR})
    list(APPEND EXTERNAL_LIBS ${TENSORRT_LIBRARY})
+    add_definitions(-DPADDLE_WITH_TENSORRT)
 endif()
--- a/doc/CMakeLists.txt
+++ b/doc/CMakeLists.txt
 add_custom_target(paddle_apis ALL
-                  DEPENDS paddle_v2_apis paddle_fluid_apis)
+                  DEPENDS paddle_v2_apis)
 add_custom_target(paddle_docs ALL
                  DEPENDS paddle_v2_docs paddle_v2_docs_cn
-                  paddle_fluid_docs paddle_fluid_docs_cn
                  paddle_mobile_docs paddle_mobile_docs_cn)
 add_subdirectory(v2)
-add_subdirectory(fluid)
 add_subdirectory(mobile)
--- a/doc/README.md
+++ b/doc/README.md
+# For Readers and Developers
+Thanks for reading PaddlePaddle documentation. 
+Since **September 17th, 2018**, the **0.15.0 and develop** documentation source has been moved to [FluidDoc Repo](https://github.com/PaddlePaddle/FluidDoc) and updated there.
+Please turn to FluidDoc Repo for the latest documentation.
--- a/doc/fluid/CMakeLists.txt
+++ b/doc/fluid/CMakeLists.txt
-if(NOT DEFINED SPHINX_THEME)
-    set(SPHINX_THEME default)
-endif()
-if(NOT DEFINED SPHINX_THEME_DIR)
-    set(SPHINX_THEME_DIR)
-endif()
-# configured documentation tools and intermediate build results
-set(BINARY_BUILD_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/_build")
-# Sphinx cache with pickled ReST documents
-set(SPHINX_CACHE_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/_doctrees")
-# HTML output director
-set(SPHINX_HTML_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/html")
-set(IMPORT_PADDLE_STRING "")
-set(IMPORT_PADDLEV2_STRING "")
-configure_file(
-    "${CMAKE_CURRENT_SOURCE_DIR}/../templates/conf.py.en.in"
-    "${BINARY_BUILD_DIR_EN}/conf.py"
-    @ONLY)
-sphinx_add_target(paddle_fluid_docs
-                  html
-                  ${BINARY_BUILD_DIR_EN}
-                  ${SPHINX_CACHE_DIR_EN}
-                  ${CMAKE_CURRENT_SOURCE_DIR}
-                  ${SPHINX_HTML_DIR_EN})
-# configured documentation tools and intermediate build results
-set(BINARY_BUILD_DIR_CN "${CMAKE_CURRENT_BINARY_DIR}/cn/_build")
-# Sphinx cache with pickled ReST documents
-set(SPHINX_CACHE_DIR_CN "${CMAKE_CURRENT_BINARY_DIR}/cn/_doctrees")
-# HTML output directory
-set(SPHINX_HTML_DIR_CN "${CMAKE_CURRENT_BINARY_DIR}/cn/html")
-configure_file(
-    "${CMAKE_CURRENT_SOURCE_DIR}/../templates/conf.py.cn.in"
-    "${BINARY_BUILD_DIR_CN}/conf.py"
-    @ONLY)
-sphinx_add_target(paddle_fluid_docs_cn
-                  html
-                  ${BINARY_BUILD_DIR_CN}
-                  ${SPHINX_CACHE_DIR_CN}
-                  ${CMAKE_CURRENT_SOURCE_DIR}
-                  ${SPHINX_HTML_DIR_CN})
-add_subdirectory(api)
--- a/doc/fluid/api/CMakeLists.txt
+++ b/doc/fluid/api/CMakeLists.txt
-# configured documentation tools and intermediate build results
-set(BINARY_BUILD_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/_build")
-# Sphinx cache with pickled ReST documents
-set(SPHINX_CACHE_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/_doctrees")
-# HTML output director
-set(SPHINX_HTML_DIR_EN "${CMAKE_CURRENT_BINARY_DIR}/en/html")
-set(IMPORT_PADDLE_STRING "import paddle")
-set(IMPORT_PADDLEV2_STRING "import paddle.v2")
-configure_file(
-    "${CMAKE_CURRENT_SOURCE_DIR}/../../templates/conf.py.en.in"
-    "${BINARY_BUILD_DIR_EN}/conf.py"
-    @ONLY)
-sphinx_add_target(paddle_fluid_apis
-                  html
-                  ${BINARY_BUILD_DIR_EN}
-                  ${SPHINX_CACHE_DIR_EN}
-                  ${CMAKE_CURRENT_SOURCE_DIR}
-                  ${SPHINX_HTML_DIR_EN})
-add_dependencies(paddle_fluid_apis  gen_proto_py framework_py_proto copy_paddle_pybind paddle_python)
--- a/doc/fluid/api/average.rst
+++ b/doc/fluid/api/average.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-=============
-fluid.average
-=============
-.. _api_fluid_average_WeightedAverage:
-WeightedAverage
---------------
-..  autoclass:: paddle.fluid.average.WeightedAverage
-    :members:
-    :noindex:
--- a/doc/fluid/api/backward.rst
+++ b/doc/fluid/api/backward.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-==============
-fluid.backward
-==============
-.. _api_fluid_backward_append_backward:
-append_backward
---------------
-..  autofunction:: paddle.fluid.backward.append_backward
-    :noindex:
-.. _api_fluid_backward_calc_gradient:
-calc_gradient
-------------
-..  autofunction:: paddle.fluid.backward.calc_gradient
-    :noindex:
--- a/doc/fluid/api/clip.rst
+++ b/doc/fluid/api/clip.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-==========
-fluid.clip
-==========
-.. _api_fluid_clip_ErrorClipByValue:
-ErrorClipByValue
----------------
-..  autoclass:: paddle.fluid.clip.ErrorClipByValue
-    :members:
-    :noindex:
-.. _api_fluid_clip_GradientClipByValue:
-GradientClipByValue
-------------------
-..  autoclass:: paddle.fluid.clip.GradientClipByValue
-    :members:
-    :noindex:
-.. _api_fluid_clip_GradientClipByNorm:
-GradientClipByNorm
------------------
-..  autoclass:: paddle.fluid.clip.GradientClipByNorm
-    :members:
-    :noindex:
-.. _api_fluid_clip_GradientClipByGlobalNorm:
-GradientClipByGlobalNorm
------------------------
-..  autoclass:: paddle.fluid.clip.GradientClipByGlobalNorm
-    :members:
-    :noindex:
--- a/doc/fluid/api/data/data_reader.rst
+++ b/doc/fluid/api/data/data_reader.rst
-=====================
-Data Reader Interface
-=====================
-DataTypes
-=========
-..  autofunction:: paddle.v2.data_type.dense_array
-    :noindex:
-..  autofunction:: paddle.v2.data_type.integer_value
-    :noindex:
-..  autofunction:: paddle.v2.data_type.integer_value_sequence
-    :noindex:
-..  autofunction:: paddle.v2.data_type.integer_value_sub_sequence
-    :noindex:
-..  autofunction:: paddle.v2.data_type.sparse_binary_vector
-    :noindex:
-..  autofunction:: paddle.v2.data_type.sparse_binary_vector_sequence
-    :noindex:
-..  autofunction:: paddle.v2.data_type.sparse_binary_vector_sub_sequence
-    :noindex:
-..  autofunction:: paddle.v2.data_type.sparse_float_vector
-    :noindex:
-..  autofunction:: paddle.v2.data_type.sparse_float_vector_sequence
-    :noindex:
-..  autofunction:: paddle.v2.data_type.sparse_float_vector_sub_sequence
-    :noindex:
-..  autofunction:: paddle.v2.data_type.sparse_non_value_slot
-    :noindex:
-..  autofunction:: paddle.v2.data_type.sparse_value_slot
-    :noindex:
-..  autoclass:: paddle.v2.data_type.InputType
-    :members:
-    :noindex:
-DataFeeder
-==========
-..  automodule:: paddle.v2.data_feeder
-    :members:
-    :noindex:
-Reader
-======
-..  automodule:: paddle.reader
-    :members:
-    :noindex:
-..  automodule:: paddle.reader.creator
-    :members:
-    :noindex:
-minibatch
-=========
-..  automodule:: paddle.v2.minibatch
-    :members:
-    :noindex:
--- a/doc/fluid/api/data/dataset.rst
+++ b/doc/fluid/api/data/dataset.rst
-Dataset
-=======
-..  automodule:: paddle.dataset
-    :members:
-    :noindex:
-mnist
-+++++
-..  automodule:: paddle.dataset.mnist
-    :members:
-    :noindex:
-cifar
-+++++
-..  automodule:: paddle.dataset.cifar
-    :members:
-    :noindex:
-conll05
-+++++++
-..  automodule:: paddle.dataset.conll05
-    :members: get_dict,get_embedding,test
-    :noindex:
-imdb
-++++
-..  automodule:: paddle.dataset.imdb
-    :members:
-    :noindex:
-imikolov
-++++++++
-..  automodule:: paddle.dataset.imikolov
-    :members:
-    :noindex:
-movielens
-+++++++++
-..  automodule:: paddle.dataset.movielens
-    :members:
-    :noindex:
-..  autoclass:: paddle.dataset.movielens.MovieInfo
-    :noindex:
-..  autoclass:: paddle.dataset.movielens.UserInfo
-    :noindex:
-sentiment
-+++++++++
-..  automodule:: paddle.dataset.sentiment
-    :members:
-    :noindex:
-uci_housing
-+++++++++++
-..  automodule:: paddle.dataset.uci_housing
-    :members:
-    :noindex:
-wmt14
-+++++
-..  automodule:: paddle.dataset.wmt14
-    :members:
-    :noindex:
-wmt16
-+++++
-..  automodule:: paddle.dataset.wmt16
-    :members:
-    :noindex:
--- a/doc/fluid/api/data/image.rst
+++ b/doc/fluid/api/data/image.rst
-Image Interface
-===============
-..  automodule:: paddle.v2.image
-    :members:
--- a/doc/fluid/api/data_feeder.rst
+++ b/doc/fluid/api/data_feeder.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-=================
-fluid.data_feeder
-=================
-.. _api_fluid_data_feeder_DataFeeder:
-DataFeeder
----------
-..  autoclass:: paddle.fluid.data_feeder.DataFeeder
-    :members:
-    :noindex:
--- a/doc/fluid/api/executor.rst
+++ b/doc/fluid/api/executor.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-==============
-fluid.executor
-==============
-.. _api_fluid_executor_Executor:
-Executor
--------
-..  autoclass:: paddle.fluid.executor.Executor
-    :members:
-    :noindex:
-.. _api_fluid_executor_global_scope:
-global_scope
------------
-..  autofunction:: paddle.fluid.executor.global_scope
-    :noindex:
-.. _api_fluid_executor_scope_guard:
-scope_guard
-----------
-..  autofunction:: paddle.fluid.executor.scope_guard
-    :noindex:
-.. _api_fluid_executor__switch_scope:
-_switch_scope
-------------
-..  autofunction:: paddle.fluid.executor._switch_scope
-    :noindex:
--- a/doc/fluid/api/fluid.rst
+++ b/doc/fluid/api/fluid.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-=====
-fluid
-=====
-.. _api_fluid_Block:
-Block
-----
-..  autoclass:: paddle.fluid.Block
-    :members:
-    :noindex:
-.. _api_fluid_Variable:
-Variable
--------
-..  autoclass:: paddle.fluid.Variable
-    :members:
-    :noindex:
-.. _api_fluid_Program:
-Program
-------
-..  autoclass:: paddle.fluid.Program
-    :members:
-    :noindex:
-.. _api_fluid_Operator:
-Operator
--------
-..  autoclass:: paddle.fluid.Operator
-    :members:
-    :noindex:
-.. _api_fluid_default_startup_program:
-default_startup_program
-----------------------
-..  autofunction:: paddle.fluid.default_startup_program
-    :noindex:
-.. _api_fluid_default_main_program:
-default_main_program
--------------------
-..  autofunction:: paddle.fluid.default_main_program
-    :noindex:
-.. _api_fluid_program_guard:
-program_guard
-------------
-..  autofunction:: paddle.fluid.program_guard
-    :noindex:
-.. _api_fluid_get_var:
-get_var
-------
-..  autofunction:: paddle.fluid.get_var
-    :noindex:
-.. _api_fluid_Executor:
-Executor
--------
-..  autoclass:: paddle.fluid.Executor
-    :members:
-    :noindex:
-.. _api_fluid_global_scope:
-global_scope
------------
-..  autofunction:: paddle.fluid.global_scope
-    :noindex:
-.. _api_fluid_scope_guard:
-scope_guard
-----------
-..  autofunction:: paddle.fluid.scope_guard
-    :noindex:
-.. _api_fluid__switch_scope:
-_switch_scope
-------------
-..  autofunction:: paddle.fluid._switch_scope
-    :noindex:
-.. _api_fluid_make_channel:
-make_channel
------------
-..  autofunction:: paddle.fluid.make_channel
-    :noindex:
-.. _api_fluid_channel_send:
-channel_send
------------
-..  autofunction:: paddle.fluid.channel_send
-    :noindex:
-.. _api_fluid_channel_recv:
-channel_recv
------------
-..  autofunction:: paddle.fluid.channel_recv
-    :noindex:
-.. _api_fluid_channel_close:
-channel_close
-------------
-..  autofunction:: paddle.fluid.channel_close
-    :noindex:
-.. _api_fluid_Select:
-Select
------
-..  autoclass:: paddle.fluid.Select
-    :members:
-    :noindex:
-.. _api_fluid_Trainer:
-Trainer
-------
-..  autoclass:: paddle.fluid.Trainer
-    :members:
-    :noindex:
-.. _api_fluid_BeginEpochEvent:
-BeginEpochEvent
---------------
-..  autoclass:: paddle.fluid.BeginEpochEvent
-    :members:
-    :noindex:
-.. _api_fluid_EndEpochEvent:
-EndEpochEvent
-------------
-..  autoclass:: paddle.fluid.EndEpochEvent
-    :members:
-    :noindex:
-.. _api_fluid_BeginStepEvent:
-BeginStepEvent
--------------
-..  autoclass:: paddle.fluid.BeginStepEvent
-    :members:
-    :noindex:
-.. _api_fluid_EndStepEvent:
-EndStepEvent
------------
-..  autoclass:: paddle.fluid.EndStepEvent
-    :members:
-    :noindex:
-.. _api_fluid_CheckpointConfig:
-CheckpointConfig
----------------
-..  autoclass:: paddle.fluid.CheckpointConfig
-    :members:
-    :noindex:
-.. _api_fluid_Inferencer:
-Inferencer
----------
-..  autoclass:: paddle.fluid.Inferencer
-    :members:
-    :noindex:
-.. _api_fluid_DistributeTranspiler:
-DistributeTranspiler
--------------------
-..  autoclass:: paddle.fluid.DistributeTranspiler
-    :members:
-    :noindex:
-.. _api_fluid_memory_optimize:
-memory_optimize
---------------
-..  autofunction:: paddle.fluid.memory_optimize
-    :noindex:
-.. _api_fluid_release_memory:
-release_memory
--------------
-..  autofunction:: paddle.fluid.release_memory
-    :noindex:
-.. _api_fluid_ParallelExecutor:
-ParallelExecutor
----------------
-..  autoclass:: paddle.fluid.ParallelExecutor
-    :members:
-    :noindex:
-.. _api_fluid_ExecutionStrategy:
-ExecutionStrategy
-----------------
-..  autoclass:: paddle.fluid.ExecutionStrategy
-    :members:
-    :noindex:
-.. _api_fluid_BuildStrategy:
-BuildStrategy
-------------
-..  autoclass:: paddle.fluid.BuildStrategy
-    :members:
-    :noindex:
-.. _api_fluid_create_lod_tensor:
-create_lod_tensor
-----------------
-..  autofunction:: paddle.fluid.create_lod_tensor
-    :noindex:
-.. _api_fluid_create_random_int_lodtensor:
-create_random_int_lodtensor
---------------------------
-..  autofunction:: paddle.fluid.create_random_int_lodtensor
-    :noindex:
-.. _api_fluid_LoDTensor:
-LoDTensor
---------
-..  autoclass:: paddle.fluid.LoDTensor
-    :members:
-    :noindex:
-.. _api_fluid_CPUPlace:
-CPUPlace
--------
-..  autoclass:: paddle.fluid.CPUPlace
-    :members:
-    :noindex:
-.. _api_fluid_CUDAPlace:
-CUDAPlace
---------
-..  autoclass:: paddle.fluid.CUDAPlace
-    :members:
-    :noindex:
-.. _api_fluid_CUDAPinnedPlace:
-CUDAPinnedPlace
---------------
-..  autoclass:: paddle.fluid.CUDAPinnedPlace
-    :members:
-    :noindex:
-.. _api_fluid_Tensor:
-Tensor
------
-..  autoclass:: paddle.fluid.Tensor
-    :members:
-    :noindex:
-.. _api_fluid_ParamAttr:
-ParamAttr
---------
-..  autoclass:: paddle.fluid.ParamAttr
-    :members:
-    :noindex:
-.. _api_fluid_WeightNormParamAttr:
-WeightNormParamAttr
-------------------
-..  autoclass:: paddle.fluid.WeightNormParamAttr
-    :members:
-    :noindex:
-.. _api_fluid_DataFeeder:
-DataFeeder
----------
-..  autoclass:: paddle.fluid.DataFeeder
-    :members:
-    :noindex:
-.. _api_fluid_Scope:
-Scope
-----
-..  autoclass:: paddle.fluid.Scope
-    :members:
-    :noindex:
--- a/doc/fluid/api/gen_doc.py
+++ b/doc/fluid/api/gen_doc.py
-#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-from __future__ import print_function
-import argparse
-import sys
-import types
-import paddle.fluid as fluid
-def parse_arg():
-    parser = argparse.ArgumentParser()
-    parser.add_argument('--submodules', nargs="*")
-    parser.add_argument(
-        'module', type=str, help='Generate the documentation of which module')
-    return parser.parse_args()
-class DocGenerator(object):
-    def __init__(self, module_name=None, stream=sys.stdout):
-        if module_name == "":
-            module_name = None
-        self.stream = stream
-        if module_name is None:
-            self.module_name = "fluid"
-        else:
-            self.module_name = "fluid." + module_name
-        if module_name is None:
-            self.module = fluid
-        else:
-            if not hasattr(fluid, module_name):
-                raise ValueError("Cannot find fluid.{0}".format(module_name))
-            else:
-                self.module = getattr(fluid, module_name)
-        self.stream.write('''..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-''')
-        self._print_header_(self.module_name, dot='=', is_title=True)
-    def print_submodule(self, submodule_name):
-        submodule = getattr(self.module, submodule_name)
-        if submodule is None:
-            raise ValueError("Cannot find submodule {0}".format(submodule_name))
-        self.print_section(submodule_name)
-        for item in submodule.__all__:
-            self.print_item(item)
-    def print_current_module(self):
-        for item in self.module.__all__:
-            self.print_item(item)
-    def print_section(self, name):
-        self._print_header_(name, dot='=', is_title=False)
-    def print_item(self, name):
-        item = getattr(self.module, name, None)
-        if item is None:
-            return
-        if isinstance(item, types.TypeType):
-            self.print_class(name)
-        elif isinstance(item, types.FunctionType):
-            self.print_method(name)
-        else:
-            pass
-    def print_class(self, name):
-        self._print_ref_(name)
-        self._print_header_(name, dot='-', is_title=False)
-        self.stream.write('''..  autoclass:: paddle.{0}.{1}
-    :members:
-    :noindex:
-'''.format(self.module_name, name))
-    def print_method(self, name):
-        self._print_ref_(name)
-        self._print_header_(name, dot='-', is_title=False)
-        self.stream.write('''..  autofunction:: paddle.{0}.{1}
-    :noindex:
-'''.format(self.module_name, name))
-    def _print_header_(self, name, dot, is_title):
-        dot_line = dot * len(name)
-        if is_title:
-            self.stream.write(dot_line)
-            self.stream.write('\n')
-        self.stream.write(name)
-        self.stream.write('\n')
-        self.stream.write(dot_line)
-        self.stream.write('\n')
-        self.stream.write('\n')
-    def _print_ref_(self, name):
-        self.stream.write(".. _api_{0}_{1}:\n\n".format("_".join(
-            self.module_name.split(".")), name))
-def main():
-    args = parse_arg()
-    gen = DocGenerator(args.module)
-    if args.submodules is None:
-        gen.print_current_module()
-    else:
-        for submodule_name in args.submodules:
-            gen.print_submodule(submodule_name)
-if __name__ == '__main__':
-    main()
--- a/doc/fluid/api/gen_doc.sh
+++ b/doc/fluid/api/gen_doc.sh
-#!/bin/bash
-python gen_doc.py layers --submodules control_flow device io nn ops tensor learning_rate_scheduler detection metric_op tensor > layers.rst
-for module in data_feeder clip metrics executor initializer io nets optimizer param_attr profiler regularizer transpiler recordio_writer backward average profiler
-do
-  python gen_doc.py ${module} > ${module}.rst
-done
-python gen_doc.py "" > fluid.rst
--- a/doc/fluid/api/index_en.rst
+++ b/doc/fluid/api/index_en.rst
-=============
-API Reference
-=============
-..  toctree::
-    :maxdepth: 1
-    fluid.rst
-    layers.rst
-    data_feeder.rst
-    executor.rst
-    initializer.rst
-    metrics.rst
-    nets.rst
-    clip.rst
-    optimizer.rst
-    param_attr.rst
-    profiler.rst
-    regularizer.rst
-    io.rst
-    data.rst
-    transpiler.rst
-    recordio_writer.rst
-    backward.rst
-    average.rst
-    profiler.rst
--- a/doc/fluid/api/initializer.rst
+++ b/doc/fluid/api/initializer.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-=================
-fluid.initializer
-=================
-.. _api_fluid_initializer_Constant:
-Constant
--------
-..  autoclass:: paddle.fluid.initializer.Constant
-    :members:
-    :noindex:
-.. _api_fluid_initializer_Uniform:
-Uniform
-------
-..  autoclass:: paddle.fluid.initializer.Uniform
-    :members:
-    :noindex:
-.. _api_fluid_initializer_Normal:
-Normal
------
-..  autoclass:: paddle.fluid.initializer.Normal
-    :members:
-    :noindex:
-.. _api_fluid_initializer_Xavier:
-Xavier
------
-..  autoclass:: paddle.fluid.initializer.Xavier
-    :members:
-    :noindex:
-.. _api_fluid_initializer_Bilinear:
-Bilinear
--------
-..  autoclass:: paddle.fluid.initializer.Bilinear
-    :members:
-    :noindex:
-.. _api_fluid_initializer_MSRA:
-MSRA
----
-..  autoclass:: paddle.fluid.initializer.MSRA
-    :members:
-    :noindex:
-.. _api_fluid_initializer_force_init_on_cpu:
-force_init_on_cpu
-----------------
-..  autofunction:: paddle.fluid.initializer.force_init_on_cpu
-    :noindex:
-.. _api_fluid_initializer_init_on_cpu:
-init_on_cpu
-----------
-..  autofunction:: paddle.fluid.initializer.init_on_cpu
-    :noindex:
-.. _api_fluid_initializer_ConstantInitializer:
-ConstantInitializer
-------------------
-..  autoclass:: paddle.fluid.initializer.ConstantInitializer
-    :members:
-    :noindex:
-.. _api_fluid_initializer_UniformInitializer:
-UniformInitializer
------------------
-..  autoclass:: paddle.fluid.initializer.UniformInitializer
-    :members:
-    :noindex:
-.. _api_fluid_initializer_NormalInitializer:
-NormalInitializer
-----------------
-..  autoclass:: paddle.fluid.initializer.NormalInitializer
-    :members:
-    :noindex:
-.. _api_fluid_initializer_XavierInitializer:
-XavierInitializer
-----------------
-..  autoclass:: paddle.fluid.initializer.XavierInitializer
-    :members:
-    :noindex:
-.. _api_fluid_initializer_BilinearInitializer:
-BilinearInitializer
-------------------
-..  autoclass:: paddle.fluid.initializer.BilinearInitializer
-    :members:
-    :noindex:
-.. _api_fluid_initializer_MSRAInitializer:
-MSRAInitializer
---------------
-..  autoclass:: paddle.fluid.initializer.MSRAInitializer
-    :members:
-    :noindex:
--- a/doc/fluid/api/io.rst
+++ b/doc/fluid/api/io.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-========
-fluid.io
-========
-.. _api_fluid_io_save_vars:
-save_vars
---------
-..  autofunction:: paddle.fluid.io.save_vars
-    :noindex:
-.. _api_fluid_io_save_params:
-save_params
-----------
-..  autofunction:: paddle.fluid.io.save_params
-    :noindex:
-.. _api_fluid_io_save_persistables:
-save_persistables
-----------------
-..  autofunction:: paddle.fluid.io.save_persistables
-    :noindex:
-.. _api_fluid_io_load_vars:
-load_vars
---------
-..  autofunction:: paddle.fluid.io.load_vars
-    :noindex:
-.. _api_fluid_io_load_params:
-load_params
-----------
-..  autofunction:: paddle.fluid.io.load_params
-    :noindex:
-.. _api_fluid_io_load_persistables:
-load_persistables
-----------------
-..  autofunction:: paddle.fluid.io.load_persistables
-    :noindex:
-.. _api_fluid_io_save_inference_model:
-save_inference_model
--------------------
-..  autofunction:: paddle.fluid.io.save_inference_model
-    :noindex:
-.. _api_fluid_io_load_inference_model:
-load_inference_model
--------------------
-..  autofunction:: paddle.fluid.io.load_inference_model
-    :noindex:
-.. _api_fluid_io_get_inference_program:
-get_inference_program
---------------------
-..  autofunction:: paddle.fluid.io.get_inference_program
-    :noindex:
-.. _api_fluid_io_save_checkpoint:
-save_checkpoint
---------------
-..  autofunction:: paddle.fluid.io.save_checkpoint
-    :noindex:
-.. _api_fluid_io_load_checkpoint:
-load_checkpoint
---------------
-..  autofunction:: paddle.fluid.io.load_checkpoint
-    :noindex:
-.. _api_fluid_io_clean_checkpoint:
-clean_checkpoint
----------------
-..  autofunction:: paddle.fluid.io.clean_checkpoint
-    :noindex:
-.. _api_fluid_io_load_persist_vars_without_grad:
-load_persist_vars_without_grad
------------------------------
-..  autofunction:: paddle.fluid.io.load_persist_vars_without_grad
-    :noindex:
-.. _api_fluid_io_save_persist_vars_without_grad:
-save_persist_vars_without_grad
------------------------------
-..  autofunction:: paddle.fluid.io.save_persist_vars_without_grad
-    :noindex:
-.. _api_fluid_io_get_latest_checkpoint_serial:
-get_latest_checkpoint_serial
----------------------------
-..  autofunction:: paddle.fluid.io.get_latest_checkpoint_serial
-    :noindex:
--- a/doc/fluid/api/layers.rst
+++ b/doc/fluid/api/layers.rst
--- a/doc/fluid/api/metrics.rst
+++ b/doc/fluid/api/metrics.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-=============
-fluid.metrics
-=============
-.. _api_fluid_metrics_MetricBase:
-MetricBase
----------
-..  autoclass:: paddle.fluid.metrics.MetricBase
-    :members:
-    :noindex:
-.. _api_fluid_metrics_CompositeMetric:
-CompositeMetric
---------------
-..  autoclass:: paddle.fluid.metrics.CompositeMetric
-    :members:
-    :noindex:
-.. _api_fluid_metrics_Precision:
-Precision
---------
-..  autoclass:: paddle.fluid.metrics.Precision
-    :members:
-    :noindex:
-.. _api_fluid_metrics_Recall:
-Recall
------
-..  autoclass:: paddle.fluid.metrics.Recall
-    :members:
-    :noindex:
-.. _api_fluid_metrics_Accuracy:
-Accuracy
--------
-..  autoclass:: paddle.fluid.metrics.Accuracy
-    :members:
-    :noindex:
-.. _api_fluid_metrics_ChunkEvaluator:
-ChunkEvaluator
--------------
-..  autoclass:: paddle.fluid.metrics.ChunkEvaluator
-    :members:
-    :noindex:
-.. _api_fluid_metrics_EditDistance:
-EditDistance
------------
-..  autoclass:: paddle.fluid.metrics.EditDistance
-    :members:
-    :noindex:
-.. _api_fluid_metrics_DetectionMAP:
-DetectionMAP
------------
-..  autoclass:: paddle.fluid.metrics.DetectionMAP
-    :members:
-    :noindex:
-.. _api_fluid_metrics_Auc:
-Auc
---
-..  autoclass:: paddle.fluid.metrics.Auc
-    :members:
-    :noindex:
--- a/doc/fluid/api/nets.rst
+++ b/doc/fluid/api/nets.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-==========
-fluid.nets
-==========
-.. _api_fluid_nets_simple_img_conv_pool:
-simple_img_conv_pool
--------------------
-..  autofunction:: paddle.fluid.nets.simple_img_conv_pool
-    :noindex:
-.. _api_fluid_nets_sequence_conv_pool:
-sequence_conv_pool
------------------
-..  autofunction:: paddle.fluid.nets.sequence_conv_pool
-    :noindex:
-.. _api_fluid_nets_glu:
-glu
---
-..  autofunction:: paddle.fluid.nets.glu
-    :noindex:
-.. _api_fluid_nets_scaled_dot_product_attention:
-scaled_dot_product_attention
----------------------------
-..  autofunction:: paddle.fluid.nets.scaled_dot_product_attention
-    :noindex:
--- a/doc/fluid/api/optimizer.rst
+++ b/doc/fluid/api/optimizer.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-===============
-fluid.optimizer
-===============
-.. _api_fluid_optimizer_SGD:
-SGD
---
-..  autoclass:: paddle.fluid.optimizer.SGD
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_Momentum:
-Momentum
--------
-..  autoclass:: paddle.fluid.optimizer.Momentum
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_Adagrad:
-Adagrad
-------
-..  autoclass:: paddle.fluid.optimizer.Adagrad
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_Adam:
-Adam
----
-..  autoclass:: paddle.fluid.optimizer.Adam
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_Adamax:
-Adamax
------
-..  autoclass:: paddle.fluid.optimizer.Adamax
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_DecayedAdagrad:
-DecayedAdagrad
--------------
-..  autoclass:: paddle.fluid.optimizer.DecayedAdagrad
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_Ftrl:
-Ftrl
----
-..  autoclass:: paddle.fluid.optimizer.Ftrl
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_SGDOptimizer:
-SGDOptimizer
------------
-..  autoclass:: paddle.fluid.optimizer.SGDOptimizer
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_MomentumOptimizer:
-MomentumOptimizer
-----------------
-..  autoclass:: paddle.fluid.optimizer.MomentumOptimizer
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_AdagradOptimizer:
-AdagradOptimizer
----------------
-..  autoclass:: paddle.fluid.optimizer.AdagradOptimizer
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_AdamOptimizer:
-AdamOptimizer
-------------
-..  autoclass:: paddle.fluid.optimizer.AdamOptimizer
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_AdamaxOptimizer:
-AdamaxOptimizer
---------------
-..  autoclass:: paddle.fluid.optimizer.AdamaxOptimizer
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_DecayedAdagradOptimizer:
-DecayedAdagradOptimizer
-----------------------
-..  autoclass:: paddle.fluid.optimizer.DecayedAdagradOptimizer
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_RMSPropOptimizer:
-RMSPropOptimizer
----------------
-..  autoclass:: paddle.fluid.optimizer.RMSPropOptimizer
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_FtrlOptimizer:
-FtrlOptimizer
-------------
-..  autoclass:: paddle.fluid.optimizer.FtrlOptimizer
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_Adadelta:
-Adadelta
--------
-..  autoclass:: paddle.fluid.optimizer.Adadelta
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_ModelAverage:
-ModelAverage
------------
-..  autoclass:: paddle.fluid.optimizer.ModelAverage
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_Optimizer:
-Optimizer
---------
-..  autoclass:: paddle.fluid.optimizer.Optimizer
-    :members:
-    :noindex:
-.. _api_fluid_optimizer_RMSPropOptimizer:
-RMSPropOptimizer
----------------
-..  autoclass:: paddle.fluid.optimizer.RMSPropOptimizer
-    :members:
-    :noindex:
--- a/doc/fluid/api/param_attr.rst
+++ b/doc/fluid/api/param_attr.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-================
-fluid.param_attr
-================
-.. _api_fluid_param_attr_ParamAttr:
-ParamAttr
---------
-..  autoclass:: paddle.fluid.param_attr.ParamAttr
-    :members:
-    :noindex:
-.. _api_fluid_param_attr_WeightNormParamAttr:
-WeightNormParamAttr
-------------------
-..  autoclass:: paddle.fluid.param_attr.WeightNormParamAttr
-    :members:
-    :noindex:
--- a/doc/fluid/api/profiler.rst
+++ b/doc/fluid/api/profiler.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-==============
-fluid.profiler
-==============
-.. _api_fluid_profiler_cuda_profiler:
-cuda_profiler
-------------
-..  autofunction:: paddle.fluid.profiler.cuda_profiler
-    :noindex:
-.. _api_fluid_profiler_reset_profiler:
-reset_profiler
--------------
-..  autofunction:: paddle.fluid.profiler.reset_profiler
-    :noindex:
-.. _api_fluid_profiler_profiler:
-profiler
--------
-..  autofunction:: paddle.fluid.profiler.profiler
-    :noindex:
-.. _api_fluid_profiler_start_profiler:
-start_profiler
--------------
-..  autofunction:: paddle.fluid.profiler.start_profiler
-    :noindex:
-.. _api_fluid_profiler_stop_profiler:
-stop_profiler
-------------
-..  autofunction:: paddle.fluid.profiler.stop_profiler
-    :noindex:
--- a/doc/fluid/api/recordio_writer.rst
+++ b/doc/fluid/api/recordio_writer.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-=====================
-fluid.recordio_writer
-=====================
-.. _api_fluid_recordio_writer_convert_reader_to_recordio_file:
-convert_reader_to_recordio_file
-------------------------------
-..  autofunction:: paddle.fluid.recordio_writer.convert_reader_to_recordio_file
-    :noindex:
-.. _api_fluid_recordio_writer_convert_reader_to_recordio_files:
-convert_reader_to_recordio_files
--------------------------------
-..  autofunction:: paddle.fluid.recordio_writer.convert_reader_to_recordio_files
-    :noindex:
--- a/doc/fluid/api/regularizer.rst
+++ b/doc/fluid/api/regularizer.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-=================
-fluid.regularizer
-=================
-.. _api_fluid_regularizer_append_regularization_ops:
-append_regularization_ops
-------------------------
-..  autofunction:: paddle.fluid.regularizer.append_regularization_ops
-    :noindex:
-.. _api_fluid_regularizer_L1Decay:
-L1Decay
-------
-..  autoclass:: paddle.fluid.regularizer.L1Decay
-    :members:
-    :noindex:
-.. _api_fluid_regularizer_L2Decay:
-L2Decay
-------
-..  autoclass:: paddle.fluid.regularizer.L2Decay
-    :members:
-    :noindex:
-.. _api_fluid_regularizer_L1DecayRegularizer:
-L1DecayRegularizer
------------------
-..  autoclass:: paddle.fluid.regularizer.L1DecayRegularizer
-    :members:
-    :noindex:
-.. _api_fluid_regularizer_L2DecayRegularizer:
-L2DecayRegularizer
------------------
-..  autoclass:: paddle.fluid.regularizer.L2DecayRegularizer
-    :members:
-    :noindex:
--- a/doc/fluid/api/transpiler.rst
+++ b/doc/fluid/api/transpiler.rst
-..  THIS FILE IS GENERATED BY `gen_doc.{py|sh}`
-    !DO NOT EDIT THIS FILE MANUALLY!
-================
-fluid.transpiler
-================
-.. _api_fluid_transpiler_DistributeTranspiler:
-DistributeTranspiler
--------------------
-..  autoclass:: paddle.fluid.transpiler.DistributeTranspiler
-    :members:
-    :noindex:
-.. _api_fluid_transpiler_InferenceTranspiler:
-InferenceTranspiler
-------------------
-..  autoclass:: paddle.fluid.transpiler.InferenceTranspiler
-    :members:
-    :noindex:
-.. _api_fluid_transpiler_memory_optimize:
-memory_optimize
---------------
-..  autofunction:: paddle.fluid.transpiler.memory_optimize
-    :noindex:
-.. _api_fluid_transpiler_release_memory:
-release_memory
--------------
-..  autofunction:: paddle.fluid.transpiler.release_memory
-    :noindex:
-.. _api_fluid_transpiler_HashName:
-HashName
--------
-..  autoclass:: paddle.fluid.transpiler.HashName
-    :members:
-    :noindex:
-.. _api_fluid_transpiler_RoundRobin:
-RoundRobin
----------
-..  autoclass:: paddle.fluid.transpiler.RoundRobin
-    :members:
-    :noindex:
--- a/doc/fluid/build_and_install/build_from_source_cn.rst
+++ b/doc/fluid/build_and_install/build_from_source_cn.rst
-../../v2/build_and_install/build_from_source_cn.rst
\ No newline at end of file
--- a/doc/fluid/build_and_install/build_from_source_en.rst
+++ b/doc/fluid/build_and_install/build_from_source_en.rst
-../../v2/build_and_install/build_from_source_en.rst
\ No newline at end of file
--- a/doc/fluid/build_and_install/docker_install_cn.rst
+++ b/doc/fluid/build_and_install/docker_install_cn.rst
-../../v2/build_and_install/docker_install_cn.rst
\ No newline at end of file
--- a/doc/fluid/build_and_install/docker_install_en.rst
+++ b/doc/fluid/build_and_install/docker_install_en.rst
-../../v2/build_and_install/docker_install_en.rst
\ No newline at end of file
--- a/doc/fluid/build_and_install/index_cn.rst
+++ b/doc/fluid/build_and_install/index_cn.rst
-../../v2/build_and_install/index_cn.rst
\ No newline at end of file
--- a/doc/fluid/build_and_install/index_en.rst
+++ b/doc/fluid/build_and_install/index_en.rst
-../../v2/build_and_install/index_en.rst
\ No newline at end of file
--- a/doc/fluid/build_and_install/paddleci.png
+++ b/doc/fluid/build_and_install/paddleci.png
-../../v2/build_and_install/paddleci.png
\ No newline at end of file
--- a/doc/fluid/build_and_install/pip_install_cn.rst
+++ b/doc/fluid/build_and_install/pip_install_cn.rst
-../../v2/build_and_install/pip_install_cn.rst
\ No newline at end of file
--- a/doc/fluid/build_and_install/pip_install_en.rst
+++ b/doc/fluid/build_and_install/pip_install_en.rst
-../../v2/build_and_install/pip_install_en.rst
\ No newline at end of file
--- a/doc/fluid/design/algorithm/images/asgd.gif
+++ b/doc/fluid/design/algorithm/images/asgd.gif
--- a/doc/fluid/design/algorithm/images/theta_star.gif
+++ b/doc/fluid/design/algorithm/images/theta_star.gif
--- a/doc/fluid/design/algorithm/index_cn.rst
+++ b/doc/fluid/design/algorithm/index_cn.rst
-梯度更新算法
------------
-.. toctree::
-  :maxdepth: 1
-  parameter_average.md
--- a/doc/fluid/design/algorithm/index_en.rst
+++ b/doc/fluid/design/algorithm/index_en.rst
-Gradient Update Algorithm
--------------------------------------
-.. toctree::
-  :maxdepth: 1
-  parameter_average.md
--- a/doc/fluid/design/algorithm/parameter_average.md
+++ b/doc/fluid/design/algorithm/parameter_average.md
-# Averaging Parameter in PaddlePaddle
-## Why Averaging
-In a large scale machine learning setup where the size of the training data is huge, it could take us a large number of iterations over the training data before we can achieve the optimal values of parameters of our model. Looking at the problem setup, it is desirable to obtain the optimal values of parameters by going through the data in as few passes as possible.
-Polyak and Juditsky (1992) showed that the test performance of simple average of parameters obtained by Stochastic Gradient Descent (SGD) is as good as that of parameter values that are obtained by training the model over and over again, over the training dataset.
-Hence, to accelerate the speed of Stochastic Gradient Descent, Averaged Stochastic Gradient Descent (ASGD) was proposed in Polyak and Juditsky (1992). For ASGD, the running average of parameters obtained by SGD, is used as the estimator for <img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/theta_star.gif"/><br/> . The averaging is done as follows:
-<p align="center">
-<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/asgd.gif"><br />
-</p>
-We propose averaging for any optimizer similar to how ASGD performs it, as mentioned above.
-### How to perform Parameter Averaging in PaddlePaddle
-Parameter Averaging in PaddlePaddle works in the following way during training :
-1. It will take in an instance of an optimizer as an input, e.g. RMSPropOptimizer
-2. The optimizer itself is responsible for updating the parameters.
-3. The ParameterAverageOptimizer maintains a separate copy of the parameters for itself:
-    1. In theory, the values of this copy are the average of the values of the parameters in the most recent N batches.
-    2. However, saving all N instances of the parameters in memory is not feasible.
-    3. Therefore, an approximation algorithm is used.
-Hence, overall we have have two copies of the parameters: one for the optimizer itself, and one for the ParameterAverageOptimizer. The former should be used in back propagation, while the latter should be used during testing and should be saved.
-During the testing/saving the model phase, we perform the following steps:
-1. Perform the delayed operations.
-2. Save current values of the parameters to a temporary variable.
-3. Replace the values of the parameters with the averaged values.
-4. Perform testing and/or save the parameters.
-5. Restore the values of the parameters once done.
-### How to implement Averaging of Parameter in PaddlePaddle
-We can add the ParameterAverageOptimizer op to the graph through Python API. Using this approach, we manually add this op to the graph and direct the output of the optimizer op to this op during training.
-	**Advantages**:
-    - Allows for greater flexibility to the users of PaddlePaddle. Using this approach, the users can plug different optimizers into ParameterAverageOptimizer by passing in the optimizer to the op.
-    - Makes it easy for the users to customize and extend the framework.
-	**Disadvantages**:
-    - Implementation requires re-writing the averaging methodology in Python.  
-### Low-Level implementation
-In the new design, we propose to create a new operation for averaging parameter updates (ParameterAverageOptimizer). For now, we can add an op that takes in the following as input:
- the optimizer
- the window_size to keep the updates
-The ParameterAverageOptimizer op can be like any other operator with its own CPU/GPU implementation either using Eigen or separate CPU and GPU kernels. As the initial implementation, we can implement the kernel using Eigen following the abstraction pattern implemented for [Operators](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/rmsprop_op.h). We also want to support the case when the Trainer/Optimizer runs on the GPU while ParameterAverageOptimizer runs on a CPU.
-The idea of building an op for averaging is in sync with the refactored PaddlePaddle philosophy of using operators to represent any computation unit. The way the op will be added to the computation graph will be decided by the [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/modules/python_api.md#layer-function) in Python API.
-### Python API implementation for ParameterAverageOptimizer
-Based on Polyak and Juditsky (1992), we can generalize the averaging of updates to any optimizer. The input to the op would be the following:
- Any optimizer (RMSProp , AdaGrad etc.)
- A window size. The op keeps accumulating updated parameter values over a window of N batches and takes an average. Move the averaged value to a buffer when window is full to avoid loss of precision.
-Using the ParameterAverageOptimizer op, any user can add the operation to their computation graphs. However, this will require a lot of lines of code and we should design Python APIs that support averaging. As per the PaddlePaddle [Python API design](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/modules/python_api.md), the layer functions are responsible for creating operators, operator parameters and variables. Since ParameterAverageOptimizer will be an operator, it makes sense to create it in the layer functions.
-We will have a wrapper written in Python that will support the functionality and implement the actual core computation in C++ core as we have done for other [Optimizers](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/rmsprop_op.cc)
-#### Creation of the ParameterAverageOptimizer operator
-There are two ways for creating the ParameterAverageOptimizer op:
-1. We create the op immediately while building the computation graph.
-2. We add the op in a lazy manner, just before the backward pass, similar to the way the optimization ops are added.
-The proposal is to add the op immediately while building the computation graph.
-#### High-level API
-In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/modules/python_api.md#layer-function) to create neural network layers. Hence, we also need to provide parameter average functionality in layer functions.
--- a/doc/fluid/design/concepts/README.md
+++ b/doc/fluid/design/concepts/README.md
-A few months ago when we were trying to replace CMake with Bazel, @emailweixu suggested that we rewrite those handy Bazel functions using CMake. Now it seems that it's the right time to get this done, as we are facing problems from the porting of Majel and the development of new the parameter server using Go and C++.
-Here are some initial thoughts. Your comments are welcome!
-# Required CMake Function
-I think we need only the following few CMake functions to make a project description mean and clean:
-<table>
-<thead>
-<tr>
-<th>C++</th>
-<th>CUDA C++</th>
-<th>Go</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td>cc_library </td>
-<td>nv_library </td>
-<td>go_library </td>
-</tr>
-<tr>
-<td>cc_binary </td>
-<td>nv_binary </td>
-<td>go_binary </td>
-</tr>
-<tr>
-<td> cc_test </td>
-<td> nv_test </td>
-<td> go_test </td>
-</tr>
-</tbody>
-</table>
- The `_library` functions generate  .a files from source code.
- The `_binary` functions generate executable binary files.
- The `_test` functions generate executable unit test files. They work like `_binary` but links `-lgtest` and `-lgtest_main`.
-The difference between `nv_` functions and `cc_` functions is that the former use `nvcc` instead of the system-default C++ compiler.
-Both `nv_` and `cc_` functions enables C++11 (-std=c++11).
-Also,
- to describe external dependencies, we need `external_library`.
- to build shared libraries, we need `shared_library`.
-## An Example Project
-Suppose that we have aforementioned functions defined in our `/cmake` directory.  The following example `CMakeLists.txt` describes a project including the following source files:
- tensor.h
- tensor.cc
- tensor_test.cc
- ops.h
- ops.cu
- ops_test.cu
- api.go
- api_test.go
-Suppose that ops.cu depends on CUDNN.
-```cmake
-# cc_binary parses tensor.cc and figures out that target also depend
-# on tensor.h.
-cc_binary(tensor
-  SRCS
-  tensor.cc)
-# The dependency to target tensor implies that if any of
-# tensor{.h,.cc,_test.cc} is changed, tensor_test need to be re-built.
-cc_test(tensor_test
-  SRCS
-  tensor_test.cc
-  DEPS
-  tensor)
-# I don't have a clear idea what parameters external_library need to
-# have.  @gangliao as a CMake expert would have better ideas.
-external_library(cudnn
-  ....)
-# Suppose that ops.cu depends on external target CUDNN.  Also, ops.cu
-# include global functions that take Tensor as their parameters, so
-# ops depend on tensor.  This implies that if any of tensor.{h.cc},
-# ops.{h,cu} is changed, ops need to be re-built.
-nv_library(ops
-  SRCS
-  ops.cu
-  DEPS
-  tensor
-  cudnn)  # cudnn is defined later.
-nv_test(ops_test
-  SRCS
-  ops_test.cu
-  DEPS
-  ops)
-# Because api.go defines a GO wrapper to ops and tensor, it depends on
-# both.  This implies that if any of tensor.{h,cc}, ops.{h,cu}, or
-# api.go is changed, api need to be re-built.
-go_library(api
-  SRCS
-  api.go
-  DEPS
-  tensor # Because ops depend on tensor, this line is optional.
-  ops)
-go_test(api_test
-  SRCS
-  api_test.go
-  DEPS
-  api)
-# This builds libapi.so.  shared_library might use CMake target
-# api_shared so to distinguish it from above target api.
-shared_library(api
-  DEPS
-  api)
-```
-## Implementation
-As above example CMakeLists.txt executes, each function invocation adds "nodes" to a dependency graph.  It also use this graph to generate CMake commands including `add_executable`, `add_dependencies`, `target_link_libraries`, and `add_test`.
-## Using Package Manager For Go
-Building Go binaries and libraries need to satisfy their dependencies, generally
-we can do `go get ./...` to download and compile all external dependencies. The
-problems are:
-1. `go get` will always get the latest code from the default branch of the
-    remote repo, so changes of dependents might break the build. This is very
-    different with what we already have in `cmake/external` which download a
-    specific version or commit id of the dependency.
-1. Some locations can not access external dependencies through the internet, as mentioned
-   in https://github.com/PaddlePaddle/Paddle/issues/2605. Using package management
-   tools can package the dependencies as a "vendor" package, which can be mirrored
-   at many cloud file hosting, so users what to compile paddle by themselves can
-   download this "vendor" package from a mirror site.
-### Choose A Suitable Tool
-As mentioned by @wangkuiyi, [Here](https://github.com/golang/go/wiki/PackageManagementTools)
-list dozens of Go package managers. We choose the tool using following principles:
- Most "active" projects with more stars, more pull requests or commits
- Widely used project
-After comparing all these projects, we shall choose between the most popular
-tools: Godep and Glide.
-Here's a brief comparison between Godep and Glide
-: https://github.com/Masterminds/glide/wiki/Go-Package-Manager-Comparison. There are
-also many complaints about using `Godep`. There's also a new "official" pakcage
-management tool has been started at: https://github.com/golang/dep to resolve
-such problems, but it's currently at Alpha stage. So the best choice now is
-glide obviously.
-### Manage Go Packages
- Dependencies: `go/glide.yaml` will store the dependencies and their versions which
-  is directly imported by paddle. `go/glide.lock` will store all dependencies recursively
-  with their commit id. Builds will "lock" to these packages if we don't `glide up`
-  them
- Vendor package: `go/vendor` directory will generated when running `cmake` command. `cmake`
-  will download the code corresponding to `go/glide.lock`. If we put a vendor folder
-  under `go/`, cmake will just check the commit id to the packages under the folder,
-  if commit id matches, there will be no download at all.
--- a/doc/fluid/design/concepts/block.md
+++ b/doc/fluid/design/concepts/block.md
-# Design Doc: Block and Scope
-## The Representation of Computation
-Both deep learning systems and programming languages help users describe computation procedures.  These systems use various representations of computation:
- Caffe, Torch, and Paddle: sequences of layers.
- TensorFlow, Caffe2, Mxnet: graph of operators.
- PaddlePaddle: nested blocks, like C++ and Java programs.
-## Block in Programming Languages and Deep Learning
-In programming languages, a block is a pair of curly braces that includes local variables definitions and a sequence of instructions or operators.
-Blocks work with control flow structures like `if`, `else`, and `for`, which have equivalents in deep learning:
-<table>
-<thead>
-<tr>
-<th>programming languages</th>
-<th>PaddlePaddle</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td>for, while loop </td>
-<td>RNN, WhileOp </td>
-</tr>
-<tr>
-<td>if, if-else, switch </td>
-<td>IfElseOp, SwitchOp </td>
-</tr>
-<tr>
-<td>sequential execution </td>
-<td>a sequence of layers </td>
-</tr>
-</tbody>
-</table>
-A key difference is that a C++ program describes a one pass computation, whereas a deep learning program describes both the forward and backward passes.
-## Stack Frames and the Scope Hierarchy
-The existence of the backward pass makes the execution of a block of PaddlePaddle different from traditional programs:
-<table>
-<thead>
-<tr>
-<th>programming languages</th>
-<th>PaddlePaddle</th>
-</tr>
-</thead>
-<tbody>
-<tr>
-<td>stack </td>
-<td>scope hierarchy </td>
-</tr>
-<tr>
-<td>stack frame  </td>
-<td>scope </td>
-</tr>
-<tr>
-<td>push at entering block </td>
-<td>push at entering block </td>
-</tr>
-<tr>
-<td>pop at leaving block </td>
-<td>destroy when minibatch completes </td>
-</tr>
-</tbody>
-</table>
-1. In traditional programs:
-   - When the execution enters the left curly brace of a block, the runtime pushes a frame into the stack, where it realizes local variables.
-   - After the execution leaves the right curly brace, the runtime pops the frame.
-   - The maximum number of frames in the stack is the maximum depth of nested blocks.
-1. In PaddlePaddle
-   - When the execution enters a block, PaddlePaddle adds a new scope, where it realizes variables.
-   - PaddlePaddle doesn't pop a scope after the execution of the block because variables therein are used by the backward pass.  So it has a stack forest known as a *scope hierarchy*.
-   - The height of the highest tree is the maximum depth of nested blocks.
-   - After the processing of a minibatch, PaddlePaddle destroys the scope hierarchy.
-## Use Blocks in C++ and PaddlePaddle Programs
-Let us consolidate the discussion by presenting some examples.
-### Blocks with `if-else` and `IfElseOp`
-The following C++ programs shows how blocks are used with the `if-else` structure:
-```c++
-namespace pd = paddle;
-int x = 10;
-int y = 1;
-int z = 10;
-bool cond = false;
-int o1, o2;
-if (cond) {
-  int z = x + y;
-  o1 = z;
-  o2 = pd::layer::softmax(z);
-} else {
-  int d = pd::layer::fc(z);
-  o1 = d;
-  o2 = d+1;
-}
-```
-An equivalent PaddlePaddle program from the design doc of the [IfElseOp operator](../execution/if_else_op.md) is as follows:
-```python
-import paddle as pd
-x = minibatch([10, 20, 30]) # shape=[None, 1]
-y = var(1) # shape=[1], value=1
-z = minibatch([10, 20, 30]) # shape=[None, 1]
-cond = larger_than(x, 15) # [false, true, true]
-ie = pd.ifelse()
-with ie.true_block():
-    d = pd.layer.add_scalar(x, y)
-    ie.output(d, pd.layer.softmax(d))
-with ie.false_block():
-    d = pd.layer.fc(z)
-    ie.output(d, d+1)
-o1, o2 = ie(cond)
-```
-In both examples, the left branch computes `x+y` and `softmax(x+y)`, the right branch computes `fc(x)` and `x+1` .
-The difference is that variables in the C++ program contain scalar values, whereas those in the PaddlePaddle programs are mini-batches of instances.
-### Blocks with `for` and `RNNOp`
-The following RNN model in PaddlePaddle from the [RNN design doc](../dynamic_rnn/rnn.md) :
-```python
-x = sequence([10, 20, 30]) # shape=[None, 1]
-m = var(0) # shape=[1]
-W = var(0.314, param=true) # shape=[1]
-U = var(0.375, param=true) # shape=[1]
-rnn = pd.rnn()
-with rnn.step():
-  h = rnn.memory(init = m)
-  h_prev = rnn.previous_memory(h)
-  a = layer.fc(W, x)
-  b = layer.fc(U, h_prev)  
-  s = pd.add(a, b)
-  act = pd.sigmoid(s)
-  rnn.update_memory(h, act)
-  rnn.output(a, b)
-o1, o2 = rnn()
-```
-has its equivalent C++ program as follows
-```c++
-int* x = {10, 20, 30};
-int* m = {0};
-int* W = {0.314};
-int* U = {0.375};
-int mem[sizeof(x) / sizeof(x[0]) + 1];
-int o1[sizeof(x) / sizeof(x[0]) + 1];
-int o2[sizeof(x) / sizeof(x[0]) + 1];
-for (int i = 1; i <= sizeof(x)/sizeof(x[0]); ++i) {
-  int x = x[i-1];
-  if (i == 1) mem[0] = m;
-  int a = W * x;
-  int b = Y * mem[i-1];
-  int s = fc_out + hidden_out;
-  int act = sigmoid(sum);
-  mem[i] = act;
-  o1[i] = act;
-  o2[i] = hidden_out;
-}
-```
-## Compilation and Execution
-Like TensorFlow, a PaddlePaddle program is written in Python. The first part describes a neural network as a protobuf message, and the rest executes the message for training or inference.
-The generation of this protobuf message is similar to how a compiler generates a binary executable file. The execution of the message is similar to how the OS executes the binary file.
-## The "Binary Executable File Format"
-The definition of the protobuf message is as follows:
-```protobuf
-message BlockDesc {
-  repeated VarDesc vars = 1;
-  repeated OpDesc ops = 2;
-}
-```
-The step net in above RNN example would look like
-```
-BlockDesc {
-  vars = {
-    VarDesc {...} // x
-    VarDesc {...} // h
-    VarDesc {...} // fc_out
-    VarDesc {...} // hidden_out
-    VarDesc {...} // sum
-    VarDesc {...} // act
-  }
-  ops = {
-    OpDesc {...} // matmul
-    OpDesc {...} // add_two
-    OpDesc {...} // sigmoid
-  }
-};
-```
-Also, the RNN operator in above example is serialized into a protobuf message of type `OpDesc` and would look like:
-```
-OpDesc {
-  inputs = {0} // the index of x in vars of BlockDesc above
-  outputs = {5, 3} // indices of act and hidden_out in vars of BlockDesc above
-  attrs {
-    "states" : {1} // the index of h
-    "step_net" : <above step net>
-  }
-};
-```
-This `OpDesc` value is in the `ops` field of the `BlockDesc` value representing the global block.
-## The Compilation of Blocks
-During the generation of the Protobuf message, the Block should store VarDesc (the Protobuf message which describes Variable) and OpDesc (the Protobuf message which describes Operator).
-VarDesc in a block should have its name scope to avoid local variables affecting parent block's name scope.
-Child block's name scopes should inherit the parent's so that OpDesc in child block can reference a VarDesc that is stored in the parent block. For example:
-```python
-a = pd.Variable(shape=[20, 20])
-b = pd.fc(a, params=["fc.w", "fc.b"])
-rnn = pd.create_rnn()
-with rnn.stepnet():
-    x = a.as_step_input()
-    # reuse fc's parameter
-    fc_without_b = pd.get_variable("fc.w")
-    rnn.output(fc_without_b)
-out = rnn()
-```
-The method `pd.get_variable` can help retrieve a Variable by the name. The Variable may be stored in a parent block, but might be retrieved in a child block, so block should have a variable scope that supports inheritance.
-In compiler design, the symbol table is a data structure created and maintained by compilers to store information about the occurrence of various entities such as variable names, function names, classes, etc.
-To store the definition of variables and operators, we define a C++ class `SymbolTable`, like the one used in compilers.
-`SymbolTable` can do the following:
- store the definitions (some names and attributes) of variables and operators,
- verify if a variable was declared,
- make it possible to implement type checking (offer Protobuf message pointers to `InferShape` handlers).
-```c++
-// Information in SymbolTable is enough to trace the dependency graph. So maybe
-// the Eval() interface takes a SymbolTable is enough.
-class SymbolTable {
- public:
-  SymbolTable(SymbolTable* parent) : parent_(parent) {}
-  OpDesc* NewOp(const string& name="");
-  // TODO determine whether name is generated by python or C++.
-  // Currently assume that a unique name will be generated by C++ if the
-  // argument name is left default.
-  VarDesc* Var(const string& name="");
-  // find a VarDesc by name, if recursive is true, find parent's SymbolTable
-  // recursively.
-  // this interface is introduced to support InferShape, find protobuf messages
-  // of variables and operators, pass pointers into InferShape.
-  //
-  // NOTE maybe some C++ classes such as VarDescBuilder and OpDescBuilder should
-  // be proposed and embedded into pybind to enable python operation on C++ pointers.
-  VarDesc* FindVar(const string& name, bool recursive=true);
-  OpDesc* FindOp(const string& name);
-  BlockDesc Compile() const;
- private:
-  SymbolTable* parent_;
-  map<string, OpDesc> ops_;
-  map<string, VarDesc> vars_;
-};
-```
-After all the description of variables and operators is added into SymbolTable,
-the block has enough information to run.
-The `Block` class takes a `BlockDesc` as input, and provides `Run` and `InferShape` functions.
-```c++
-namespace {
-class Block : OperatorBase {
-public:
-  Block(const BlockDesc& desc) desc_(desc) {}
-  void InferShape(const framework::Scope& scope) const override {
-    if (!symbols_ready_) {
-      CreateVariables(scope);
-      CreateOperators();
-    }
-    // should run InferShape first.
-    for (auto& op : runtime_table_.ops()) {
-      op->InferShape(scope);
-    }
-  }
-  void Run(const framework::Scope& scope,
-           const platform::Place& place) const override {
-    PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first.");
-    for (auto& op : runtime_table_.ops()) {
-      op->Run(scope, place);
-    }
-  }
-  void CreateVariables(const framework::Scope& scope);
-  void CreateOperators();
-  // some other necessary interfaces of NetOp are listed below
-  // ...
-private:
-  BlockDesc desc_;
-  bool symbols_ready_{false};
-};
-```
-## The Execution of Blocks
-Block inherits from OperatorBase, which has a Run method.
-Block's Run method will run its operators sequentially.
-There is another important interface called `Eval`, which takes some arguments called targets and generates a minimal graph which treats targets as the end points and creates a new Block. After `Run`, `Eval` will get the latest value and return the targets.
-The definition of Eval is as follows:
-```c++
-// clean a block description by targets using the corresponding dependency graph.
-// return a new BlockDesc with minimal number of operators.
-// NOTE: The return type is not a Block but the block's description so that this can be distributed
-// to a cluster.
-BlockDesc Prune(const BlockDesc& desc, vector<string> targets);
-void Block::Eval(const vector<string>& targets,
-                 const framework::Scope& scope,
-                 const platform::DeviceContext& dev_ctx) {
-  BlockDesc min_desc = Prune(desc_, targets);
-  Block min_block(min_desc);
-  min_block.Run(scope, dev_ctx);
-}
-```
--- a/doc/fluid/design/concepts/cpp_data_feeding.md
+++ b/doc/fluid/design/concepts/cpp_data_feeding.md
--- a/doc/fluid/design/concepts/executor.md
+++ b/doc/fluid/design/concepts/executor.md
-# Executor Design Doc
-## Motivation
-In [fluid](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/design/motivation/fluid.md), we encourage the user to use deep learning programming paradigms to describe the training process. When the user-written Python program is executed, it will first create a protobuf message
-[`ProgramDesc`](https://github.com/PaddlePaddle/Paddle/blob/a91efdde6910ce92a78e3aa7157412c4c88d9ee8/paddle/framework/framework.proto#L145) that describes the process and is conceptually like an [abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
-The executor runs the `ProgramDesc` like an interpreter. `ProgramDesc` contains the intrinsics (operators in this case) and variables which will be used, executor explicitly executes the stored precompiled code.
-## Overview
-An executor takes a `ProgramDesc`, a `block_id` and a `Scope`.  The `ProgramDesc` is a list of blocks and each block contains the protobuf definition of all the parameters and operators in the block. The `block_id` specifies the entrance block. And the `Scope` is the container of all the variable instances, which is persistent throughout different runs.
-## Executor
-The `Executor` explicitly executes all the intrinsics (operators here) in the `block_id`th block of a `ProgramDesc`. Essentially, it instantiates Variables and Operators, then runs all the operators in sequence one-by-one.
-It is very similar to how a push stack frame works when entering a block, following which it cleans up all the temporary variables when a mini-batch is finished. It does not however, have the stack frame pop process.
-### The interface
-```c++
-  Executor(places);
-```
-A executor does not own any computing resources, a user can only construct an executor using the specified places.
-### Running an Executor
-```
-  void Run(ProgramDesc, Scope, block_id, create_local_scope);
-```
-An `Executor` only provides a unified way to execute `ProgramDesc`. `ProgramDesc` is the target that will be executed, the `Scope` specifies the variable container, the `block_id` indicates the entrance block and `create_local_scope` is a boolean that states whether it will destroy the temporary variables after the execution is finished.
--- a/doc/fluid/design/concepts/functions_operators_layers.md
+++ b/doc/fluid/design/concepts/functions_operators_layers.md
--- a/doc/fluid/design/concepts/images/multiple_reader.png
+++ b/doc/fluid/design/concepts/images/multiple_reader.png
--- a/doc/fluid/design/concepts/images/parallel_executor_overview.dot
+++ b/doc/fluid/design/concepts/images/parallel_executor_overview.dot
-digraph G {
-  subgraph cluster_init {
-    label="Initialization"
-    startup_program [label="startup", shape=box]
-    node_w_g0 [label="W\nGPU0"]
-    startup_program -> node_w_g0 [label="Initialize"]
-    node_w_g1 [label="W\nGPU1"]
-    node_w_g0 -> node_w_g1 [label="broadcast"]
-  }
-  subgraph cluster_train {
-    label="forward_backward"
-    subgraph cluster_gpu0 {
-      label="GPU0"
-      fc_0 [label="fc\nGPU0", shape=box]
-      hidden_0 [label="hidden\nGPU0"]
-      node_w_g0 -> fc_0
-      fc_0 -> hidden_0
-      loss0 [label="loss\nGPU0"]
-      hidden_0 -> loss0 [label="many ops omitted"]
-      scale_loss_0 [label="scale_loss_gradient\nGPU0", shape=box]
-      loss_g0 [label="loss_grad\nGPU0"]
-      scale_loss_0->loss_g0
-      fc_g_0 [label="w_grad\nGPU0", shape=box]
-      loss0 -> fc_g_0
-      loss_g0 -> fc_g_0
-      hidden_0 -> fc_g_0
-    }
-    subgraph cluster_gpu1 {
-      label="GPU1"
-      fc_1 [label="fc\nGPU1", shape=box]
-      hidden_1 [label="hidden\nGPU1"]
-      node_w_g1 -> fc_1
-      fc_1 -> hidden_1
-      loss1 [label="loss\nGPU1"]
-      hidden_1 -> loss1 [label="many ops omitted"]
-      scale_loss_1 [label="scale_loss_gradient\nGPU1", shape=box]
-      loss_g1 [label="loss_grad\nGPU1"]
-      scale_loss_1->loss_g1
-      fc_g_1 [label="w_grad\nGPU1", shape=box]
-      loss1 -> fc_g_1
-      loss_g1 -> fc_g_1
-      hidden_1 -> fc_g_1
-    }
-  }
-  all_reduce_w [label="Merge Gradients(AllReduce)", shape=box]
-  fc_g_0 -> all_reduce_w
-  fc_g_1 -> all_reduce_w
-  fc_g_0_merged [label="w_grad\nMerged\nGPU0"]
-  fc_g_1_merged [label="w_grad\nMerged\nGPU1"]
-  all_reduce_w -> fc_g_0_merged
-  all_reduce_w -> fc_g_1_merged
-  subgraph cluster_optimization {
-    label="Optimization"
-    subgraph cluster_opt_gpu0 {
-      label="GPU0"
-      sgd_0 [label="SGD Op\nGPU0", shape=box]
-      fc_g_0_merged -> sgd_0
-      node_w_g0 -> sgd_0
-      optimized_w_0 [label="Optimized W\nGPU0"]
-      sgd_0 -> optimized_w_0
-    }
-    subgraph cluster_opt_gpu1 {
-      label="GPU1"
-      sgd_1 [label="SGD Op\nGPU1", shape=box]
-      fc_g_1_merged -> sgd_1
-      node_w_g1 -> sgd_1
-      optimized_w_1 [label="Optimized W\nGPU0"]
-      sgd_1 -> optimized_w_1
-    }
-  }
-}
--- a/doc/fluid/design/concepts/images/parallel_executor_overview.png
+++ b/doc/fluid/design/concepts/images/parallel_executor_overview.png
--- a/doc/fluid/design/concepts/images/readers.png
+++ b/doc/fluid/design/concepts/images/readers.png
--- a/doc/fluid/design/concepts/index_cn.rst
+++ b/doc/fluid/design/concepts/index_cn.rst
--- a/doc/fluid/design/concepts/index_en.rst
+++ b/doc/fluid/design/concepts/index_en.rst
--- a/doc/fluid/design/concepts/lod_tensor.md
+++ b/doc/fluid/design/concepts/lod_tensor.md
--- a/doc/fluid/design/concepts/parallel_executor.md
+++ b/doc/fluid/design/concepts/parallel_executor.md
--- a/doc/fluid/design/concepts/program.md
+++ b/doc/fluid/design/concepts/program.md
--- a/doc/fluid/design/concepts/python_data_feeding.md
+++ b/doc/fluid/design/concepts/python_data_feeding.md
--- a/doc/fluid/design/concepts/scope.md
+++ b/doc/fluid/design/concepts/scope.md
--- a/doc/fluid/design/concepts/tensor.md
+++ b/doc/fluid/design/concepts/tensor.md
--- a/doc/fluid/design/concepts/tensor_array.md
+++ b/doc/fluid/design/concepts/tensor_array.md
--- a/doc/fluid/design/concepts/var_desc.md
+++ b/doc/fluid/design/concepts/var_desc.md
--- a/doc/fluid/design/concepts/variable.md
+++ b/doc/fluid/design/concepts/variable.md
--- a/doc/fluid/design/concurrent/channel.md
+++ b/doc/fluid/design/concurrent/channel.md
--- a/doc/fluid/design/concurrent/concurrent_programming.md
+++ b/doc/fluid/design/concurrent/concurrent_programming.md
--- a/doc/fluid/design/concurrent/csp.md
+++ b/doc/fluid/design/concurrent/csp.md
--- a/doc/fluid/design/concurrent/go_op.md
+++ b/doc/fluid/design/concurrent/go_op.md
--- a/doc/fluid/design/concurrent/images/channel_recv.png
+++ b/doc/fluid/design/concurrent/images/channel_recv.png
--- a/doc/fluid/design/concurrent/images/channel_send.png
+++ b/doc/fluid/design/concurrent/images/channel_send.png
--- a/doc/fluid/design/concurrent/images/select_op_workflow.png
+++ b/doc/fluid/design/concurrent/images/select_op_workflow.png
--- a/doc/fluid/design/concurrent/index_cn.rst
+++ b/doc/fluid/design/concurrent/index_cn.rst
--- a/doc/fluid/design/concurrent/index_en.rst
+++ b/doc/fluid/design/concurrent/index_en.rst
--- a/doc/fluid/design/concurrent/parallel_do.md
+++ b/doc/fluid/design/concurrent/parallel_do.md
--- a/doc/fluid/design/concurrent/select_op.md
+++ b/doc/fluid/design/concurrent/select_op.md
--- a/doc/fluid/design/data_type/float16.md
+++ b/doc/fluid/design/data_type/float16.md
--- a/doc/fluid/design/data_type/index_cn.rst
+++ b/doc/fluid/design/data_type/index_cn.rst
--- a/doc/fluid/design/data_type/index_en.rst
+++ b/doc/fluid/design/data_type/index_en.rst
--- a/doc/fluid/design/dist_train/README.md
+++ b/doc/fluid/design/dist_train/README.md
--- a/doc/fluid/design/dist_train/async_update.md
+++ b/doc/fluid/design/dist_train/async_update.md
--- a/doc/fluid/design/dist_train/dist_train_nccl2.md
+++ b/doc/fluid/design/dist_train/dist_train_nccl2.md
--- a/doc/fluid/design/dist_train/distributed_architecture.md
+++ b/doc/fluid/design/dist_train/distributed_architecture.md
--- a/doc/fluid/design/dist_train/distributed_lookup_table_design.md
+++ b/doc/fluid/design/dist_train/distributed_lookup_table_design.md
--- a/doc/fluid/design/dist_train/distributed_traing_review.md
+++ b/doc/fluid/design/dist_train/distributed_traing_review.md
--- a/doc/fluid/design/dist_train/index_cn.rst
+++ b/doc/fluid/design/dist_train/index_cn.rst
--- a/doc/fluid/design/dist_train/index_en.rst
+++ b/doc/fluid/design/dist_train/index_en.rst
--- a/doc/fluid/design/dist_train/mpi_enabled_design.md
+++ b/doc/fluid/design/dist_train/mpi_enabled_design.md
--- a/doc/fluid/design/dist_train/multi_cpu.md
+++ b/doc/fluid/design/dist_train/multi_cpu.md
--- a/doc/fluid/design/dist_train/parameter_server.md
+++ b/doc/fluid/design/dist_train/parameter_server.md
--- a/doc/fluid/design/dist_train/src/async_distributed_training.png
+++ b/doc/fluid/design/dist_train/src/async_distributed_training.png
--- a/doc/fluid/design/dist_train/src/async_pserver.graffle
+++ b/doc/fluid/design/dist_train/src/async_pserver.graffle
--- a/doc/fluid/design/dist_train/src/async_pserver.png
+++ b/doc/fluid/design/dist_train/src/async_pserver.png
--- a/doc/fluid/design/dist_train/src/async_update.graffle
+++ b/doc/fluid/design/dist_train/src/async_update.graffle
--- a/doc/fluid/design/dist_train/src/async_update.png
+++ b/doc/fluid/design/dist_train/src/async_update.png
--- a/doc/fluid/design/dist_train/src/compiler.graffle
+++ b/doc/fluid/design/dist_train/src/compiler.graffle
--- a/doc/fluid/design/dist_train/src/compiler.png
+++ b/doc/fluid/design/dist_train/src/compiler.png
--- a/doc/fluid/design/dist_train/src/dist-graph.graffle
+++ b/doc/fluid/design/dist_train/src/dist-graph.graffle
--- a/doc/fluid/design/dist_train/src/dist-graph.png
+++ b/doc/fluid/design/dist_train/src/dist-graph.png
--- a/doc/fluid/design/dist_train/src/distributed_architecture.graffle
+++ b/doc/fluid/design/dist_train/src/distributed_architecture.graffle
--- a/doc/fluid/design/dist_train/src/distributed_architecture.png
+++ b/doc/fluid/design/dist_train/src/distributed_architecture.png
--- a/doc/fluid/design/dist_train/src/distributed_lookup_table.graffle
+++ b/doc/fluid/design/dist_train/src/distributed_lookup_table.graffle
--- a/doc/fluid/design/dist_train/src/distributed_lookup_table.jpeg
+++ b/doc/fluid/design/dist_train/src/distributed_lookup_table.jpeg
--- a/doc/fluid/design/dist_train/src/distributed_training.graffle
+++ b/doc/fluid/design/dist_train/src/distributed_training.graffle
--- a/doc/fluid/design/dist_train/src/fluid_lookup_remote_table.graffle
+++ b/doc/fluid/design/dist_train/src/fluid_lookup_remote_table.graffle
--- a/doc/fluid/design/dist_train/src/fluid_lookup_remote_table.png
+++ b/doc/fluid/design/dist_train/src/fluid_lookup_remote_table.png
--- a/doc/fluid/design/dist_train/src/local-graph.graffle
+++ b/doc/fluid/design/dist_train/src/local-graph.graffle
--- a/doc/fluid/design/dist_train/src/local-graph.png
+++ b/doc/fluid/design/dist_train/src/local-graph.png
--- a/doc/fluid/design/dist_train/src/local_architecture.graffle
+++ b/doc/fluid/design/dist_train/src/local_architecture.graffle
--- a/doc/fluid/design/dist_train/src/local_architecture.png
+++ b/doc/fluid/design/dist_train/src/local_architecture.png
--- a/doc/fluid/design/dist_train/src/lookup_table.png
+++ b/doc/fluid/design/dist_train/src/lookup_table.png
--- a/doc/fluid/design/dist_train/src/lookup_table_training.png
+++ b/doc/fluid/design/dist_train/src/lookup_table_training.png
--- a/doc/fluid/design/dist_train/src/mpi_module.png
+++ b/doc/fluid/design/dist_train/src/mpi_module.png
--- a/doc/fluid/design/dist_train/src/multi-threads.graffle
+++ b/doc/fluid/design/dist_train/src/multi-threads.graffle
--- a/doc/fluid/design/dist_train/src/multi-threads/multi-threads@3x.png
+++ b/doc/fluid/design/dist_train/src/multi-threads/multi-threads@3x.png
--- a/doc/fluid/design/dist_train/src/multi-threads/single-thread@3x.png
+++ b/doc/fluid/design/dist_train/src/multi-threads/single-thread@3x.png
--- a/doc/fluid/design/dist_train/src/ncc2_design.graffle
+++ b/doc/fluid/design/dist_train/src/ncc2_design.graffle
--- a/doc/fluid/design/dist_train/src/ncc2_design.png
+++ b/doc/fluid/design/dist_train/src/ncc2_design.png
--- a/doc/fluid/design/dist_train/src/paddle-compile.graffle
+++ b/doc/fluid/design/dist_train/src/paddle-compile.graffle
--- a/doc/fluid/design/dist_train/src/paddle-compile.png
+++ b/doc/fluid/design/dist_train/src/paddle-compile.png
--- a/doc/fluid/design/dist_train/src/remote_executor.graffle
+++ b/doc/fluid/design/dist_train/src/remote_executor.graffle
--- a/doc/fluid/design/dist_train/src/remote_executor.png
+++ b/doc/fluid/design/dist_train/src/remote_executor.png
--- a/doc/fluid/design/dist_train/src/sparse_update.graffle
+++ b/doc/fluid/design/dist_train/src/sparse_update.graffle
--- a/doc/fluid/design/dist_train/src/sparse_update.png
+++ b/doc/fluid/design/dist_train/src/sparse_update.png
--- a/doc/fluid/design/dist_train/src/sync_distributed_training.png
+++ b/doc/fluid/design/dist_train/src/sync_distributed_training.png
--- a/doc/fluid/design/dynamic_rnn/2_level_rnn.dot
+++ b/doc/fluid/design/dynamic_rnn/2_level_rnn.dot
--- a/doc/fluid/design/dynamic_rnn/2_level_rnn.png
+++ b/doc/fluid/design/dynamic_rnn/2_level_rnn.png
--- a/doc/fluid/design/dynamic_rnn/index_cn.rst
+++ b/doc/fluid/design/dynamic_rnn/index_cn.rst
--- a/doc/fluid/design/dynamic_rnn/index_en.rst
+++ b/doc/fluid/design/dynamic_rnn/index_en.rst
--- a/doc/fluid/design/dynamic_rnn/rnn.dot
+++ b/doc/fluid/design/dynamic_rnn/rnn.dot
--- a/doc/fluid/design/dynamic_rnn/rnn.jpg
+++ b/doc/fluid/design/dynamic_rnn/rnn.jpg
--- a/doc/fluid/design/dynamic_rnn/rnn.md
+++ b/doc/fluid/design/dynamic_rnn/rnn.md
--- a/doc/fluid/design/dynamic_rnn/rnn.png
+++ b/doc/fluid/design/dynamic_rnn/rnn.png
--- a/doc/fluid/design/dynamic_rnn/rnn_2level_data.dot
+++ b/doc/fluid/design/dynamic_rnn/rnn_2level_data.dot
--- a/doc/fluid/design/dynamic_rnn/rnn_2level_data.png
+++ b/doc/fluid/design/dynamic_rnn/rnn_2level_data.png
--- a/doc/fluid/design/dynamic_rnn/rnn_design.md
+++ b/doc/fluid/design/dynamic_rnn/rnn_design.md
--- a/doc/fluid/design/dynamic_rnn/rnn_design_en.md
+++ b/doc/fluid/design/dynamic_rnn/rnn_design_en.md
--- a/doc/fluid/design/execution/if_else_op.md
+++ b/doc/fluid/design/execution/if_else_op.md
--- a/doc/fluid/design/execution/index_cn.rst
+++ b/doc/fluid/design/execution/index_cn.rst
--- a/doc/fluid/design/execution/index_en.rst
+++ b/doc/fluid/design/execution/index_en.rst
--- a/doc/fluid/design/execution/switch.md
+++ b/doc/fluid/design/execution/switch.md
--- a/doc/fluid/design/index_cn.rst
+++ b/doc/fluid/design/index_cn.rst
--- a/doc/fluid/design/index_en.rst
+++ b/doc/fluid/design/index_en.rst
--- a/doc/fluid/design/interface/index_cn.rst
+++ b/doc/fluid/design/interface/index_cn.rst
--- a/doc/fluid/design/interface/index_en.rst
+++ b/doc/fluid/design/interface/index_en.rst
--- a/doc/fluid/design/ir/overview.md
+++ b/doc/fluid/design/ir/overview.md
--- a/doc/fluid/design/memory/README.md
+++ b/doc/fluid/design/memory/README.md
--- a/doc/fluid/design/memory/images/control_flow_graph.png
+++ b/doc/fluid/design/memory/images/control_flow_graph.png
--- a/doc/fluid/design/memory/images/dataflow_equations.png
+++ b/doc/fluid/design/memory/images/dataflow_equations.png
--- a/doc/fluid/design/memory/images/deep_learning.png
+++ b/doc/fluid/design/memory/images/deep_learning.png
--- a/doc/fluid/design/memory/index_cn.rst
+++ b/doc/fluid/design/memory/index_cn.rst
--- a/doc/fluid/design/memory/index_en.rst
+++ b/doc/fluid/design/memory/index_en.rst
--- a/doc/fluid/design/memory/memory_optimization.md
+++ b/doc/fluid/design/memory/memory_optimization.md
--- a/doc/fluid/design/modules/backward.md
+++ b/doc/fluid/design/modules/backward.md
--- a/doc/fluid/design/modules/batch_norm_op.md
+++ b/doc/fluid/design/modules/batch_norm_op.md
--- a/doc/fluid/design/modules/evaluator.md
+++ b/doc/fluid/design/modules/evaluator.md
--- a/doc/fluid/design/modules/images/batch_norm_fork.dot
+++ b/doc/fluid/design/modules/images/batch_norm_fork.dot
--- a/doc/fluid/design/modules/images/batch_norm_fork.png
+++ b/doc/fluid/design/modules/images/batch_norm_fork.png
--- a/doc/fluid/design/modules/images/batch_norm_op_kernel.png
+++ b/doc/fluid/design/modules/images/batch_norm_op_kernel.png
--- a/doc/fluid/design/modules/images/feed_forward.png
+++ b/doc/fluid/design/modules/images/feed_forward.png
--- a/doc/fluid/design/modules/images/feed_forward_regularized.png
+++ b/doc/fluid/design/modules/images/feed_forward_regularized.png
--- a/doc/fluid/design/modules/images/l1_regularization.png
+++ b/doc/fluid/design/modules/images/l1_regularization.png
--- a/doc/fluid/design/modules/images/l2_regularization.png
+++ b/doc/fluid/design/modules/images/l2_regularization.png
--- a/doc/fluid/design/modules/images/loss_equation.png
+++ b/doc/fluid/design/modules/images/loss_equation.png
--- a/doc/fluid/design/modules/index_cn.rst
+++ b/doc/fluid/design/modules/index_cn.rst
--- a/doc/fluid/design/modules/index_en.rst
+++ b/doc/fluid/design/modules/index_en.rst
--- a/doc/fluid/design/modules/infer_var_type.md
+++ b/doc/fluid/design/modules/infer_var_type.md
--- a/doc/fluid/design/modules/net_op_design.md
+++ b/doc/fluid/design/modules/net_op_design.md
--- a/doc/fluid/design/modules/optimizer.md
+++ b/doc/fluid/design/modules/optimizer.md
--- a/doc/fluid/design/modules/prune.md
+++ b/doc/fluid/design/modules/prune.md
--- a/doc/fluid/design/modules/python_api.md
+++ b/doc/fluid/design/modules/python_api.md
--- a/doc/fluid/design/modules/register_grad_op.md
+++ b/doc/fluid/design/modules/register_grad_op.md
--- a/doc/fluid/design/modules/regularization.md
+++ b/doc/fluid/design/modules/regularization.md
--- a/doc/fluid/design/modules/selected_rows.md
+++ b/doc/fluid/design/modules/selected_rows.md
--- a/doc/fluid/design/motivation/api.md
+++ b/doc/fluid/design/motivation/api.md
--- a/doc/fluid/design/motivation/fluid-compiler.graffle
+++ b/doc/fluid/design/motivation/fluid-compiler.graffle
--- a/doc/fluid/design/motivation/fluid-compiler.png
+++ b/doc/fluid/design/motivation/fluid-compiler.png
--- a/doc/fluid/design/motivation/fluid.md
+++ b/doc/fluid/design/motivation/fluid.md
--- a/doc/fluid/design/motivation/fluid_compiler.md
+++ b/doc/fluid/design/motivation/fluid_compiler.md
--- a/doc/fluid/design/motivation/index_cn.rst
+++ b/doc/fluid/design/motivation/index_cn.rst
--- a/doc/fluid/design/motivation/index_en.rst
+++ b/doc/fluid/design/motivation/index_en.rst
--- a/doc/fluid/design/motivation/refactorization.md
+++ b/doc/fluid/design/motivation/refactorization.md
--- a/doc/fluid/design/multi_devices/index_cn.rst
+++ b/doc/fluid/design/multi_devices/index_cn.rst
--- a/doc/fluid/design/multi_devices/index_en.rst
+++ b/doc/fluid/design/multi_devices/index_en.rst
--- a/doc/fluid/design/multi_devices/kernel_hint_design.md
+++ b/doc/fluid/design/multi_devices/kernel_hint_design.md
--- a/doc/fluid/design/multi_devices/kernel_selection.md
+++ b/doc/fluid/design/multi_devices/kernel_selection.md
--- a/doc/fluid/design/multi_devices/operator_kernel_type.md
+++ b/doc/fluid/design/multi_devices/operator_kernel_type.md
--- a/doc/fluid/design/network/deep_speech_2.md
+++ b/doc/fluid/design/network/deep_speech_2.md
--- a/doc/fluid/design/network/images/LOD-and-shape-changes-during-decoding.jpg
+++ b/doc/fluid/design/network/images/LOD-and-shape-changes-during-decoding.jpg
--- a/doc/fluid/design/network/images/beam_search.png
+++ b/doc/fluid/design/network/images/beam_search.png
--- a/doc/fluid/design/network/images/ds2_network.png
+++ b/doc/fluid/design/network/images/ds2_network.png
--- a/doc/fluid/design/network/index_cn.rst
+++ b/doc/fluid/design/network/index_cn.rst
--- a/doc/fluid/design/network/index_en.rst
+++ b/doc/fluid/design/network/index_en.rst
--- a/doc/fluid/design/network/sequence_decoder.md
+++ b/doc/fluid/design/network/sequence_decoder.md
--- a/doc/fluid/design/onnx/images/project_structure.png
+++ b/doc/fluid/design/onnx/images/project_structure.png
--- a/doc/fluid/design/onnx/onnx_convertor.md
+++ b/doc/fluid/design/onnx/onnx_convertor.md
--- a/doc/fluid/design/others/auto_gradient_check.md
+++ b/doc/fluid/design/others/auto_gradient_check.md
--- a/doc/fluid/design/others/dcgan.png
+++ b/doc/fluid/design/others/dcgan.png
--- a/doc/fluid/design/others/gan_api.md
+++ b/doc/fluid/design/others/gan_api.md
--- a/doc/fluid/design/others/graph.md
+++ b/doc/fluid/design/others/graph.md
--- a/doc/fluid/design/others/graph_survey.md
+++ b/doc/fluid/design/others/graph_survey.md
--- a/doc/fluid/design/others/images/graph_construction_example.bash
+++ b/doc/fluid/design/others/images/graph_construction_example.bash
--- a/doc/fluid/design/others/images/graph_construction_example.dot
+++ b/doc/fluid/design/others/images/graph_construction_example.dot
--- a/doc/fluid/design/others/images/graph_construction_example_all.png
+++ b/doc/fluid/design/others/images/graph_construction_example_all.png
--- a/doc/fluid/design/others/images/graph_construction_example_forward_backward.png
+++ b/doc/fluid/design/others/images/graph_construction_example_forward_backward.png
--- a/doc/fluid/design/others/images/graph_construction_example_forward_only.png
+++ b/doc/fluid/design/others/images/graph_construction_example_forward_only.png
--- a/doc/fluid/design/others/parameters_in_cpp.md
+++ b/doc/fluid/design/others/parameters_in_cpp.md
--- a/doc/fluid/design/others/simple_op_design.md
+++ b/doc/fluid/design/others/simple_op_design.md
--- a/doc/fluid/design/others/test.dot
+++ b/doc/fluid/design/others/test.dot
--- a/doc/fluid/design/others/test.dot.png
+++ b/doc/fluid/design/others/test.dot.png
--- a/doc/fluid/design/quantization/fixed_point_quantization.md
+++ b/doc/fluid/design/quantization/fixed_point_quantization.md
--- a/doc/fluid/design/quantization/quantization_backward_and_optimization.png
+++ b/doc/fluid/design/quantization/quantization_backward_and_optimization.png
--- a/doc/fluid/design/quantization/quantization_equivalent_forward.png
+++ b/doc/fluid/design/quantization/quantization_equivalent_forward.png
--- a/doc/fluid/design/quantization/quantization_forward.png
+++ b/doc/fluid/design/quantization/quantization_forward.png
--- a/doc/fluid/dev/api_doc_std_cn.md
+++ b/doc/fluid/dev/api_doc_std_cn.md
--- a/doc/fluid/dev/api_doc_std_en.md
+++ b/doc/fluid/dev/api_doc_std_en.md
--- a/doc/fluid/dev/ci_build_whl.png
+++ b/doc/fluid/dev/ci_build_whl.png
--- a/doc/fluid/dev/contribute_to_paddle_cn.md
+++ b/doc/fluid/dev/contribute_to_paddle_cn.md
--- a/doc/fluid/dev/contribute_to_paddle_en.md
+++ b/doc/fluid/dev/contribute_to_paddle_en.md
--- a/doc/fluid/dev/index_cn.rst
+++ b/doc/fluid/dev/index_cn.rst
--- a/doc/fluid/dev/index_en.rst
+++ b/doc/fluid/dev/index_en.rst
--- a/doc/fluid/dev/name_convention.md
+++ b/doc/fluid/dev/name_convention.md
--- a/doc/fluid/dev/new_op_cn.md
+++ b/doc/fluid/dev/new_op_cn.md
--- a/doc/fluid/dev/new_op_en.md
+++ b/doc/fluid/dev/new_op_en.md
--- a/doc/fluid/dev/new_op_kernel.md
+++ b/doc/fluid/dev/new_op_kernel.md
--- a/doc/fluid/dev/op_markdown_format.md
+++ b/doc/fluid/dev/op_markdown_format.md
--- a/doc/fluid/dev/releasing_process_cn.md
+++ b/doc/fluid/dev/releasing_process_cn.md
--- a/doc/fluid/dev/releasing_process_en.md
+++ b/doc/fluid/dev/releasing_process_en.md
--- a/doc/fluid/dev/src/fc.py
+++ b/doc/fluid/dev/src/fc.py
--- a/doc/fluid/dev/support_new_device.md
+++ b/doc/fluid/dev/support_new_device.md
--- a/doc/fluid/dev/use_eigen_cn.md
+++ b/doc/fluid/dev/use_eigen_cn.md
--- a/doc/fluid/dev/use_eigen_en.md
+++ b/doc/fluid/dev/use_eigen_en.md
--- a/doc/fluid/dev/write_docs_cn.rst
+++ b/doc/fluid/dev/write_docs_cn.rst
--- a/doc/fluid/dev/write_docs_en.rst
+++ b/doc/fluid/dev/write_docs_en.rst
--- a/doc/fluid/faq/index_cn.rst
+++ b/doc/fluid/faq/index_cn.rst
--- a/doc/fluid/faq/index_en.rst
+++ b/doc/fluid/faq/index_en.rst
--- a/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md
+++ b/doc/fluid/getstarted/Developer's_Guide_to_Paddle_Fluid.md
--- a/doc/fluid/getstarted/concepts/index_cn.rst
+++ b/doc/fluid/getstarted/concepts/index_cn.rst
--- a/doc/fluid/getstarted/concepts/index_en.rst
+++ b/doc/fluid/getstarted/concepts/index_en.rst
--- a/doc/fluid/getstarted/concepts/reader/README.md
+++ b/doc/fluid/getstarted/concepts/reader/README.md
--- a/doc/fluid/getstarted/concepts/save_model/model_format.md
+++ b/doc/fluid/getstarted/concepts/save_model/model_format.md
--- a/doc/fluid/getstarted/index_cn.rst
+++ b/doc/fluid/getstarted/index_cn.rst
--- a/doc/fluid/getstarted/index_en.rst
+++ b/doc/fluid/getstarted/index_en.rst
--- a/doc/fluid/getstarted/quickstart_cn.rst
+++ b/doc/fluid/getstarted/quickstart_cn.rst
--- a/doc/fluid/getstarted/quickstart_en.rst
+++ b/doc/fluid/getstarted/quickstart_en.rst
--- a/doc/fluid/howto/cluster/fluid_cluster_train_cn.md
+++ b/doc/fluid/howto/cluster/fluid_cluster_train_cn.md
--- a/doc/fluid/howto/cluster/fluid_cluster_train_en.md
+++ b/doc/fluid/howto/cluster/fluid_cluster_train_en.md
--- a/doc/fluid/howto/cluster/fluid_recordio.md
+++ b/doc/fluid/howto/cluster/fluid_recordio.md
--- a/doc/fluid/howto/cluster/nccl2_rdma_training.md
+++ b/doc/fluid/howto/cluster/nccl2_rdma_training.md
--- a/doc/fluid/howto/index_cn.rst
+++ b/doc/fluid/howto/index_cn.rst
--- a/doc/fluid/howto/index_en.rst
+++ b/doc/fluid/howto/index_en.rst
--- a/doc/fluid/howto/inference/build_and_install_lib_cn.rst
+++ b/doc/fluid/howto/inference/build_and_install_lib_cn.rst
--- a/doc/fluid/howto/inference/index_cn.rst
+++ b/doc/fluid/howto/inference/index_cn.rst
--- a/doc/fluid/howto/inference/inference_support_in_fluid_cn.md
+++ b/doc/fluid/howto/inference/inference_support_in_fluid_cn.md
--- a/doc/fluid/howto/optimization/benchmark/index_cn.rst
+++ b/doc/fluid/howto/optimization/benchmark/index_cn.rst
--- a/doc/fluid/howto/optimization/benchmark/index_en.rst
+++ b/doc/fluid/howto/optimization/benchmark/index_en.rst
--- a/doc/fluid/howto/optimization/cpu_profiling_cn.md
+++ b/doc/fluid/howto/optimization/cpu_profiling_cn.md
--- a/doc/fluid/howto/optimization/cpu_profiling_en.md
+++ b/doc/fluid/howto/optimization/cpu_profiling_en.md
--- a/doc/fluid/howto/optimization/host_memory_profiling_cn.md
+++ b/doc/fluid/howto/optimization/host_memory_profiling_cn.md
--- a/doc/fluid/howto/optimization/index_cn.rst
+++ b/doc/fluid/howto/optimization/index_cn.rst
--- a/doc/fluid/howto/optimization/index_en.rst
+++ b/doc/fluid/howto/optimization/index_en.rst
--- a/doc/fluid/howto/optimization/pprof_1.png
+++ b/doc/fluid/howto/optimization/pprof_1.png
--- a/doc/fluid/howto/optimization/pprof_2.png
+++ b/doc/fluid/howto/optimization/pprof_2.png
--- a/doc/fluid/howto/optimization/timeline.jpeg
+++ b/doc/fluid/howto/optimization/timeline.jpeg
--- a/doc/fluid/howto/optimization/timeline_cn.md
+++ b/doc/fluid/howto/optimization/timeline_cn.md
--- a/doc/fluid/howto/optimization/timeline_en.md
+++ b/doc/fluid/howto/optimization/timeline_en.md
--- a/doc/fluid/howto/optimization/tracing.jpeg
+++ b/doc/fluid/howto/optimization/tracing.jpeg
--- a/doc/fluid/howto/performance/error_clip.md
+++ b/doc/fluid/howto/performance/error_clip.md
--- a/doc/fluid/howto/performance/images/profiler.png
+++ b/doc/fluid/howto/performance/images/profiler.png
--- a/doc/fluid/howto/performance/profiler.md
+++ b/doc/fluid/howto/performance/profiler.md
--- a/doc/fluid/howto/third_party/images/multigpu_allreduce.graffle
+++ b/doc/fluid/howto/third_party/images/multigpu_allreduce.graffle
--- a/doc/fluid/howto/third_party/images/multigpu_allreduce.png
+++ b/doc/fluid/howto/third_party/images/multigpu_allreduce.png
--- a/doc/fluid/howto/third_party/images/multigpu_before_convert.graffle
+++ b/doc/fluid/howto/third_party/images/multigpu_before_convert.graffle
--- a/doc/fluid/howto/third_party/images/multigpu_before_convert.png
+++ b/doc/fluid/howto/third_party/images/multigpu_before_convert.png
--- a/doc/fluid/howto/third_party/mkldnn_fluid.md
+++ b/doc/fluid/howto/third_party/mkldnn_fluid.md
--- a/doc/fluid/howto/third_party/paddle_nccl.md
+++ b/doc/fluid/howto/third_party/paddle_nccl.md
--- a/doc/fluid/images/1.png
+++ b/doc/fluid/images/1.png
--- a/doc/fluid/images/2.png
+++ b/doc/fluid/images/2.png
--- a/doc/fluid/images/2_level_rnn.dot
+++ b/doc/fluid/images/2_level_rnn.dot
--- a/doc/fluid/images/2_level_rnn.png
+++ b/doc/fluid/images/2_level_rnn.png
--- a/doc/fluid/images/3.png
+++ b/doc/fluid/images/3.png
--- a/doc/fluid/images/4.png
+++ b/doc/fluid/images/4.png
--- a/doc/fluid/images/LOD-and-shape-changes-during-decoding.jpg
+++ b/doc/fluid/images/LOD-and-shape-changes-during-decoding.jpg
--- a/doc/fluid/images/LoDTensor.png
+++ b/doc/fluid/images/LoDTensor.png
--- a/doc/fluid/images/asgd.gif
+++ b/doc/fluid/images/asgd.gif
--- a/doc/fluid/images/batch_norm_fork.dot
+++ b/doc/fluid/images/batch_norm_fork.dot
--- a/doc/fluid/images/batch_norm_fork.png
+++ b/doc/fluid/images/batch_norm_fork.png
--- a/doc/fluid/images/batch_norm_op_kernel.png
+++ b/doc/fluid/images/batch_norm_op_kernel.png
--- a/doc/fluid/images/beam_search.png
+++ b/doc/fluid/images/beam_search.png
--- a/doc/fluid/images/ci_build_whl.png
+++ b/doc/fluid/images/ci_build_whl.png
--- a/doc/fluid/images/compile_run_time.png
+++ b/doc/fluid/images/compile_run_time.png
--- a/doc/fluid/images/compiler.graffle
+++ b/doc/fluid/images/compiler.graffle
--- a/doc/fluid/images/compiler.png
+++ b/doc/fluid/images/compiler.png
--- a/doc/fluid/images/control_flow_graph.png
+++ b/doc/fluid/images/control_flow_graph.png
--- a/doc/fluid/images/dataflow_equations.png
+++ b/doc/fluid/images/dataflow_equations.png
--- a/doc/fluid/images/dcgan.png
+++ b/doc/fluid/images/dcgan.png
--- a/doc/fluid/images/deep_learning.png
+++ b/doc/fluid/images/deep_learning.png
--- a/doc/fluid/images/dist-graph.graffle
+++ b/doc/fluid/images/dist-graph.graffle
--- a/doc/fluid/images/dist-graph.png
+++ b/doc/fluid/images/dist-graph.png
--- a/doc/fluid/images/distributed_architecture.graffle
+++ b/doc/fluid/images/distributed_architecture.graffle
--- a/doc/fluid/images/distributed_architecture.png
+++ b/doc/fluid/images/distributed_architecture.png
--- a/doc/fluid/images/ds2_network.png
+++ b/doc/fluid/images/ds2_network.png
--- a/doc/fluid/images/executor.png
+++ b/doc/fluid/images/executor.png
--- a/doc/fluid/images/feed_forward.png
+++ b/doc/fluid/images/feed_forward.png
--- a/doc/fluid/images/feed_forward_regularized.png
+++ b/doc/fluid/images/feed_forward_regularized.png
--- a/doc/fluid/images/fluid-compiler.graffle
+++ b/doc/fluid/images/fluid-compiler.graffle
--- a/doc/fluid/images/fluid-compiler.png
+++ b/doc/fluid/images/fluid-compiler.png
--- a/doc/fluid/images/fluid_examples.png
+++ b/doc/fluid/images/fluid_examples.png
--- a/doc/fluid/images/fluid_module_1.png
+++ b/doc/fluid/images/fluid_module_1.png
--- a/doc/fluid/images/fluid_module_2.png
+++ b/doc/fluid/images/fluid_module_2.png
--- a/doc/fluid/images/graph_construction_example.bash
+++ b/doc/fluid/images/graph_construction_example.bash
--- a/doc/fluid/images/graph_construction_example.dot
+++ b/doc/fluid/images/graph_construction_example.dot
--- a/doc/fluid/images/graph_construction_example_all.png
+++ b/doc/fluid/images/graph_construction_example_all.png
--- a/doc/fluid/images/graph_construction_example_forward_backward.png
+++ b/doc/fluid/images/graph_construction_example_forward_backward.png
--- a/doc/fluid/images/graph_construction_example_forward_only.png
+++ b/doc/fluid/images/graph_construction_example_forward_only.png
--- a/doc/fluid/images/l1_regularization.png
+++ b/doc/fluid/images/l1_regularization.png
--- a/doc/fluid/images/l2_regularization.png
+++ b/doc/fluid/images/l2_regularization.png
--- a/doc/fluid/images/layer.png
+++ b/doc/fluid/images/layer.png
--- a/doc/fluid/images/local-graph.graffle
+++ b/doc/fluid/images/local-graph.graffle
--- a/doc/fluid/images/local-graph.png
+++ b/doc/fluid/images/local-graph.png
--- a/doc/fluid/images/local_architecture.graffle
+++ b/doc/fluid/images/local_architecture.graffle
--- a/doc/fluid/images/local_architecture.png
+++ b/doc/fluid/images/local_architecture.png
--- a/doc/fluid/images/lookup_table.png
+++ b/doc/fluid/images/lookup_table.png
--- a/doc/fluid/images/lookup_table_training.png
+++ b/doc/fluid/images/lookup_table_training.png
--- a/doc/fluid/images/loss_equation.png
+++ b/doc/fluid/images/loss_equation.png
--- a/doc/fluid/images/multi-threads.graffle
+++ b/doc/fluid/images/multi-threads.graffle
--- a/doc/fluid/images/multi-threads@3x.png
+++ b/doc/fluid/images/multi-threads@3x.png
--- a/doc/fluid/images/multigpu_allreduce.graffle
+++ b/doc/fluid/images/multigpu_allreduce.graffle
--- a/doc/fluid/images/multigpu_allreduce.png
+++ b/doc/fluid/images/multigpu_allreduce.png
--- a/doc/fluid/images/multigpu_before_convert.graffle
+++ b/doc/fluid/images/multigpu_before_convert.graffle
--- a/doc/fluid/images/multigpu_before_convert.png
+++ b/doc/fluid/images/multigpu_before_convert.png
--- a/doc/fluid/images/multiple_reader.png
+++ b/doc/fluid/images/multiple_reader.png
--- a/doc/fluid/images/op.dot
+++ b/doc/fluid/images/op.dot
--- a/doc/fluid/images/op_op_with_kern_class_diagram.dot
+++ b/doc/fluid/images/op_op_with_kern_class_diagram.dot
--- a/doc/fluid/images/op_with_kernel.dot
+++ b/doc/fluid/images/op_with_kernel.dot
--- a/doc/fluid/images/operator1.png
+++ b/doc/fluid/images/operator1.png
--- a/doc/fluid/images/operator2.png
+++ b/doc/fluid/images/operator2.png
--- a/doc/fluid/images/paddle-compile.graffle
+++ b/doc/fluid/images/paddle-compile.graffle
--- a/doc/fluid/images/paddle-compile.png
+++ b/doc/fluid/images/paddle-compile.png
--- a/doc/fluid/images/place.png
+++ b/doc/fluid/images/place.png
--- a/doc/fluid/images/pprof_1.png
+++ b/doc/fluid/images/pprof_1.png
--- a/doc/fluid/images/pprof_2.png
+++ b/doc/fluid/images/pprof_2.png
--- a/doc/fluid/images/print_fluid_program.png
+++ b/doc/fluid/images/print_fluid_program.png
--- a/doc/fluid/images/profiler.png
+++ b/doc/fluid/images/profiler.png
--- a/doc/fluid/images/program_desc1.png
+++ b/doc/fluid/images/program_desc1.png
--- a/doc/fluid/images/program_desc2.png
+++ b/doc/fluid/images/program_desc2.png
--- a/doc/fluid/images/raw_input.png
+++ b/doc/fluid/images/raw_input.png
--- a/doc/fluid/images/readers.png
+++ b/doc/fluid/images/readers.png
--- a/doc/fluid/images/remote_executor.graffle
+++ b/doc/fluid/images/remote_executor.graffle
--- a/doc/fluid/images/remote_executor.png
+++ b/doc/fluid/images/remote_executor.png
--- a/doc/fluid/images/rnn.dot
+++ b/doc/fluid/images/rnn.dot
--- a/doc/fluid/images/rnn.jpg
+++ b/doc/fluid/images/rnn.jpg
--- a/doc/fluid/images/rnn.png
+++ b/doc/fluid/images/rnn.png
--- a/doc/fluid/images/rnn_2level_data.dot
+++ b/doc/fluid/images/rnn_2level_data.dot
--- a/doc/fluid/images/rnn_2level_data.png
+++ b/doc/fluid/images/rnn_2level_data.png
--- a/doc/fluid/images/scope_variable_tensor.png
+++ b/doc/fluid/images/scope_variable_tensor.png
--- a/doc/fluid/images/single-thread@3x.png
+++ b/doc/fluid/images/single-thread@3x.png
--- a/doc/fluid/images/sorted_input.png
+++ b/doc/fluid/images/sorted_input.png
--- a/doc/fluid/images/sparse_update.graffle
+++ b/doc/fluid/images/sparse_update.graffle
--- a/doc/fluid/images/sparse_update.png
+++ b/doc/fluid/images/sparse_update.png
--- a/doc/fluid/images/test.dot
+++ b/doc/fluid/images/test.dot
--- a/doc/fluid/images/test.dot.png
+++ b/doc/fluid/images/test.dot.png
--- a/doc/fluid/images/theta_star.gif
+++ b/doc/fluid/images/theta_star.gif
--- a/doc/fluid/images/timeline.jpeg
+++ b/doc/fluid/images/timeline.jpeg
--- a/doc/fluid/images/tracing.jpeg
+++ b/doc/fluid/images/tracing.jpeg
--- a/doc/fluid/images/transpiler.png
+++ b/doc/fluid/images/transpiler.png
--- a/doc/fluid/images/user_interface.png
+++ b/doc/fluid/images/user_interface.png
--- a/doc/fluid/index_cn.rst
+++ b/doc/fluid/index_cn.rst
--- a/doc/fluid/index_en.rst
+++ b/doc/fluid/index_en.rst
--- a/doc/fluid/new_docs/advanced_usage/benchmark.rst
+++ b/doc/fluid/new_docs/advanced_usage/benchmark.rst
--- a/doc/fluid/new_docs/advanced_usage/deploy/anakin_arm_benchmark.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/anakin_arm_benchmark.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/anakin_example.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/anakin_example.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/anakin_gpu_benchmark.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/anakin_gpu_benchmark.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/anakin_tutorial.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/anakin_tutorial.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/convert_paddle_to_anakin.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/convert_paddle_to_anakin.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/how_to_add_anakin_op.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/how_to_add_anakin_op.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/index_anakin.rst
+++ b/doc/fluid/new_docs/advanced_usage/deploy/index_anakin.rst
--- a/doc/fluid/new_docs/advanced_usage/deploy/index_mobile.rst
+++ b/doc/fluid/new_docs/advanced_usage/deploy/index_mobile.rst
--- a/doc/fluid/new_docs/advanced_usage/deploy/install_anakin.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/install_anakin.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/mobile_build.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/mobile_build.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/mobile_dev.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/mobile_dev.md
--- a/doc/fluid/new_docs/advanced_usage/deploy/run_anakin_on_arm.md
+++ b/doc/fluid/new_docs/advanced_usage/deploy/run_anakin_on_arm.md
--- a/doc/fluid/new_docs/advanced_usage/development/contribute_to_paddle.md
+++ b/doc/fluid/new_docs/advanced_usage/development/contribute_to_paddle.md
--- a/doc/fluid/new_docs/advanced_usage/development/cpu_profiling_cn.md
+++ b/doc/fluid/new_docs/advanced_usage/development/cpu_profiling_cn.md
--- a/doc/fluid/new_docs/advanced_usage/development/gpu_profiling_cn.rst
+++ b/doc/fluid/new_docs/advanced_usage/development/gpu_profiling_cn.rst
--- a/doc/fluid/new_docs/advanced_usage/development/host_memory_profiling_cn.md
+++ b/doc/fluid/new_docs/advanced_usage/development/host_memory_profiling_cn.md
--- a/doc/fluid/new_docs/advanced_usage/development/new_op.md
+++ b/doc/fluid/new_docs/advanced_usage/development/new_op.md
--- a/doc/fluid/new_docs/advanced_usage/development/nvvp1.png
+++ b/doc/fluid/new_docs/advanced_usage/development/nvvp1.png
--- a/doc/fluid/new_docs/advanced_usage/development/nvvp2.png
+++ b/doc/fluid/new_docs/advanced_usage/development/nvvp2.png
--- a/doc/fluid/new_docs/advanced_usage/development/nvvp3.png
+++ b/doc/fluid/new_docs/advanced_usage/development/nvvp3.png
--- a/doc/fluid/new_docs/advanced_usage/development/nvvp4.png
+++ b/doc/fluid/new_docs/advanced_usage/development/nvvp4.png
--- a/doc/fluid/new_docs/advanced_usage/development/pprof_1.png
+++ b/doc/fluid/new_docs/advanced_usage/development/pprof_1.png
--- a/doc/fluid/new_docs/advanced_usage/development/pprof_2.png
+++ b/doc/fluid/new_docs/advanced_usage/development/pprof_2.png
--- a/doc/fluid/new_docs/advanced_usage/development/timeline.jpeg
+++ b/doc/fluid/new_docs/advanced_usage/development/timeline.jpeg
--- a/doc/fluid/new_docs/advanced_usage/development/timeline_cn.md
+++ b/doc/fluid/new_docs/advanced_usage/development/timeline_cn.md
--- a/doc/fluid/new_docs/advanced_usage/development/tracing.jpeg
+++ b/doc/fluid/new_docs/advanced_usage/development/tracing.jpeg
--- a/doc/fluid/new_docs/advanced_usage/development/write_docs.rst
+++ b/doc/fluid/new_docs/advanced_usage/development/write_docs.rst
--- a/doc/fluid/new_docs/advanced_usage/index.rst
+++ b/doc/fluid/new_docs/advanced_usage/index.rst
--- a/doc/fluid/new_docs/advanced_usage/pics/anakin_fm_ch.png
+++ b/doc/fluid/new_docs/advanced_usage/pics/anakin_fm_ch.png
--- a/doc/fluid/new_docs/beginners_guide/basics/image_classification/.gitignore
+++ b/doc/fluid/new_docs/beginners_guide/basics/image_classification/.gitignore
--- a/doc/fluid/new_docs/beginners_guide/basics/image_classification/README.cn.md
+++ b/doc/fluid/new_docs/beginners_guide/basics/image_classification/README.cn.md
--- a/doc/fluid/new_docs/beginners_guide/basics/index.rst
+++ b/doc/fluid/new_docs/beginners_guide/basics/index.rst
--- a/doc/fluid/new_docs/beginners_guide/basics/label_semantic_roles/.gitignore
+++ b/doc/fluid/new_docs/beginners_guide/basics/label_semantic_roles/.gitignore
--- a/doc/fluid/new_docs/beginners_guide/basics/label_semantic_roles/README.cn.md
+++ b/doc/fluid/new_docs/beginners_guide/basics/label_semantic_roles/README.cn.md
--- a/doc/fluid/new_docs/beginners_guide/basics/learning_materials.md
+++ b/doc/fluid/new_docs/beginners_guide/basics/learning_materials.md
--- a/doc/fluid/new_docs/beginners_guide/basics/machine_translation/.gitignore
+++ b/doc/fluid/new_docs/beginners_guide/basics/machine_translation/.gitignore
--- a/doc/fluid/new_docs/beginners_guide/basics/machine_translation/README.cn.md
+++ b/doc/fluid/new_docs/beginners_guide/basics/machine_translation/README.cn.md
--- a/doc/fluid/new_docs/beginners_guide/basics/recommender_system/.gitignore
+++ b/doc/fluid/new_docs/beginners_guide/basics/recommender_system/.gitignore
--- a/doc/fluid/new_docs/beginners_guide/basics/recommender_system/README.cn.md
+++ b/doc/fluid/new_docs/beginners_guide/basics/recommender_system/README.cn.md
--- a/doc/fluid/new_docs/beginners_guide/basics/understand_sentiment/.gitignore
+++ b/doc/fluid/new_docs/beginners_guide/basics/understand_sentiment/.gitignore
--- a/doc/fluid/new_docs/beginners_guide/basics/understand_sentiment/README.cn.md
+++ b/doc/fluid/new_docs/beginners_guide/basics/understand_sentiment/README.cn.md
--- a/doc/fluid/new_docs/beginners_guide/basics/word2vec/.gitignore
+++ b/doc/fluid/new_docs/beginners_guide/basics/word2vec/.gitignore
--- a/doc/fluid/new_docs/beginners_guide/basics/word2vec/README.cn.md
+++ b/doc/fluid/new_docs/beginners_guide/basics/word2vec/README.cn.md
--- a/doc/fluid/new_docs/beginners_guide/index.rst
+++ b/doc/fluid/new_docs/beginners_guide/index.rst
--- a/doc/fluid/new_docs/beginners_guide/install/install_doc.rst
+++ b/doc/fluid/new_docs/beginners_guide/install/install_doc.rst
--- a/doc/fluid/new_docs/beginners_guide/install/paddleci.png
+++ b/doc/fluid/new_docs/beginners_guide/install/paddleci.png
--- a/doc/fluid/new_docs/beginners_guide/quick_start/fit_a_line/README.cn.md
+++ b/doc/fluid/new_docs/beginners_guide/quick_start/fit_a_line/README.cn.md
--- a/doc/fluid/new_docs/beginners_guide/quick_start/index.rst
+++ b/doc/fluid/new_docs/beginners_guide/quick_start/index.rst
--- a/doc/fluid/new_docs/beginners_guide/quick_start/recognize_digits/README.cn.md
+++ b/doc/fluid/new_docs/beginners_guide/quick_start/recognize_digits/README.cn.md
--- a/doc/fluid/new_docs/faq/faq.rst
+++ b/doc/fluid/new_docs/faq/faq.rst
--- a/doc/fluid/new_docs/faq/index_cn.rst
+++ b/doc/fluid/new_docs/faq/index_cn.rst
--- a/doc/fluid/new_docs/user_guides/howto/basic_concept/fluid_basic_concept.rst
+++ b/doc/fluid/new_docs/user_guides/howto/basic_concept/fluid_basic_concept.rst
--- a/doc/fluid/new_docs/user_guides/howto/basic_concept/fluid_local_train.jpeg
+++ b/doc/fluid/new_docs/user_guides/howto/basic_concept/fluid_local_train.jpeg
--- a/doc/fluid/new_docs/user_guides/howto/basic_concept/fluid_mnist.png
+++ b/doc/fluid/new_docs/user_guides/howto/basic_concept/fluid_mnist.png
--- a/doc/fluid/new_docs/user_guides/howto/configure_simple_model/index.rst
+++ b/doc/fluid/new_docs/user_guides/howto/configure_simple_model/index.rst
--- a/doc/fluid/new_docs/user_guides/howto/debug/index.rst
+++ b/doc/fluid/new_docs/user_guides/howto/debug/index.rst
--- a/doc/fluid/new_docs/user_guides/howto/debug/visualdl.md
+++ b/doc/fluid/new_docs/user_guides/howto/debug/visualdl.md
--- a/doc/fluid/new_docs/user_guides/howto/evaluation/index.rst
+++ b/doc/fluid/new_docs/user_guides/howto/evaluation/index.rst
--- a/doc/fluid/new_docs/user_guides/howto/evaluation/metrics.rst
+++ b/doc/fluid/new_docs/user_guides/howto/evaluation/metrics.rst
--- a/doc/fluid/new_docs/user_guides/howto/inference/build_and_install_lib_cn.rst
+++ b/doc/fluid/new_docs/user_guides/howto/inference/build_and_install_lib_cn.rst
--- a/doc/fluid/new_docs/user_guides/howto/inference/index.rst
+++ b/doc/fluid/new_docs/user_guides/howto/inference/index.rst
--- a/doc/fluid/new_docs/user_guides/howto/inference/native_infer.rst
+++ b/doc/fluid/new_docs/user_guides/howto/inference/native_infer.rst
--- a/doc/fluid/new_docs/user_guides/howto/modification/foo.rst
+++ b/doc/fluid/new_docs/user_guides/howto/modification/foo.rst
--- a/doc/fluid/new_docs/user_guides/howto/prepare_data/feeding_data.rst
+++ b/doc/fluid/new_docs/user_guides/howto/prepare_data/feeding_data.rst
--- a/doc/fluid/new_docs/user_guides/howto/prepare_data/index.rst
+++ b/doc/fluid/new_docs/user_guides/howto/prepare_data/index.rst
--- a/doc/fluid/new_docs/user_guides/howto/prepare_data/reader.md
+++ b/doc/fluid/new_docs/user_guides/howto/prepare_data/reader.md
--- a/doc/fluid/new_docs/user_guides/howto/training/checkpoint_doc_cn.md
+++ b/doc/fluid/new_docs/user_guides/howto/training/checkpoint_doc_cn.md
--- a/doc/fluid/new_docs/user_guides/howto/training/checkpoint_doc_en.md
+++ b/doc/fluid/new_docs/user_guides/howto/training/checkpoint_doc_en.md
--- a/doc/fluid/new_docs/user_guides/howto/training/cluster_howto.rst
+++ b/doc/fluid/new_docs/user_guides/howto/training/cluster_howto.rst
--- a/doc/fluid/new_docs/user_guides/howto/training/cluster_quick_start.rst
+++ b/doc/fluid/new_docs/user_guides/howto/training/cluster_quick_start.rst
--- a/doc/fluid/new_docs/user_guides/howto/training/index.rst
+++ b/doc/fluid/new_docs/user_guides/howto/training/index.rst
--- a/doc/fluid/new_docs/user_guides/howto/training/multi_node.rst
+++ b/doc/fluid/new_docs/user_guides/howto/training/multi_node.rst
--- a/doc/fluid/new_docs/user_guides/howto/training/save_load_variables.rst
+++ b/doc/fluid/new_docs/user_guides/howto/training/save_load_variables.rst
--- a/doc/fluid/new_docs/user_guides/howto/training/single_node.rst
+++ b/doc/fluid/new_docs/user_guides/howto/training/single_node.rst
--- a/doc/fluid/new_docs/user_guides/howto/training/src/dist_train_nccl2.graffle
+++ b/doc/fluid/new_docs/user_guides/howto/training/src/dist_train_nccl2.graffle
--- a/doc/fluid/new_docs/user_guides/howto/training/src/dist_train_nccl2.png
+++ b/doc/fluid/new_docs/user_guides/howto/training/src/dist_train_nccl2.png
--- a/doc/fluid/new_docs/user_guides/howto/training/src/dist_train_pserver.graffle
+++ b/doc/fluid/new_docs/user_guides/howto/training/src/dist_train_pserver.graffle
--- a/doc/fluid/new_docs/user_guides/howto/training/src/dist_train_pserver.png
+++ b/doc/fluid/new_docs/user_guides/howto/training/src/dist_train_pserver.png
--- a/doc/fluid/new_docs/user_guides/howto/training/src/parallelism.png
+++ b/doc/fluid/new_docs/user_guides/howto/training/src/parallelism.png
--- a/doc/fluid/new_docs/user_guides/howto/training/test_while_training.rst
+++ b/doc/fluid/new_docs/user_guides/howto/training/test_while_training.rst
--- a/doc/fluid/new_docs/user_guides/index.rst
+++ b/doc/fluid/new_docs/user_guides/index.rst
--- a/doc/fluid/new_docs/user_guides/models/index.rst
+++ b/doc/fluid/new_docs/user_guides/models/index.rst
--- a/doc/fluid/read_source.md
+++ b/doc/fluid/read_source.md
--- a/doc/v2/dev/contribute_to_paddle_en.md
+++ b/doc/v2/dev/contribute_to_paddle_en.md
--- a/paddle/CMakeLists.txt
+++ b/paddle/CMakeLists.txt
--- a/paddle/contrib/float16/float16_transpiler.py
+++ b/paddle/contrib/float16/float16_transpiler.py
--- a/paddle/fluid/API.spec
+++ b/paddle/fluid/API.spec
--- a/paddle/fluid/CMakeLists.txt
+++ b/paddle/fluid/CMakeLists.txt
--- a/paddle/fluid/framework/CMakeLists.txt
+++ b/paddle/fluid/framework/CMakeLists.txt
--- a/paddle/fluid/framework/attribute.cc
+++ b/paddle/fluid/framework/attribute.cc
--- a/paddle/fluid/framework/attribute.h
+++ b/paddle/fluid/framework/attribute.h
--- a/paddle/fluid/framework/channel.h
+++ b/paddle/fluid/framework/channel.h
--- a/paddle/fluid/framework/channel_impl.h
+++ b/paddle/fluid/framework/channel_impl.h
--- a/paddle/fluid/framework/channel_test.cc
+++ b/paddle/fluid/framework/channel_test.cc
--- a/paddle/fluid/framework/concurrency_test.cc
+++ b/paddle/fluid/framework/concurrency_test.cc
--- a/paddle/fluid/framework/data_device_transform.cc
+++ b/paddle/fluid/framework/data_device_transform.cc
--- a/paddle/fluid/framework/data_device_transform_test.cu
+++ b/paddle/fluid/framework/data_device_transform_test.cu
--- a/paddle/fluid/framework/data_layout_transform.h
+++ b/paddle/fluid/framework/data_layout_transform.h
--- a/paddle/fluid/framework/data_type.h
+++ b/paddle/fluid/framework/data_type.h
--- a/paddle/fluid/framework/details/CMakeLists.txt
+++ b/paddle/fluid/framework/details/CMakeLists.txt
--- a/paddle/fluid/framework/details/all_reduce_op_handle.cc
+++ b/paddle/fluid/framework/details/all_reduce_op_handle.cc
--- a/paddle/fluid/framework/details/broadcast_op_handle.cc
+++ b/paddle/fluid/framework/details/broadcast_op_handle.cc
--- a/paddle/fluid/framework/details/broadcast_op_handle.h
+++ b/paddle/fluid/framework/details/broadcast_op_handle.h
--- a/paddle/fluid/framework/details/broadcast_op_handle_test.cc
+++ b/paddle/fluid/framework/details/broadcast_op_handle_test.cc
--- a/paddle/fluid/framework/details/broadcast_op_handle_test.h
+++ b/paddle/fluid/framework/details/broadcast_op_handle_test.h
--- a/paddle/fluid/framework/details/build_strategy.cc
+++ b/paddle/fluid/framework/details/build_strategy.cc
--- a/paddle/fluid/framework/details/build_strategy.h
+++ b/paddle/fluid/framework/details/build_strategy.h
--- a/paddle/fluid/framework/details/computation_op_handle.cc
+++ b/paddle/fluid/framework/details/computation_op_handle.cc
--- a/paddle/fluid/framework/details/computation_op_handle.h
+++ b/paddle/fluid/framework/details/computation_op_handle.h
--- a/paddle/fluid/framework/details/cow_ptr.h
+++ b/paddle/fluid/framework/details/cow_ptr.h
--- a/paddle/fluid/framework/details/cow_ptr_test.cc
+++ b/paddle/fluid/framework/details/cow_ptr_test.cc
--- a/paddle/fluid/framework/details/data_balance_op_handle.cc
+++ b/paddle/fluid/framework/details/data_balance_op_handle.cc
--- a/paddle/fluid/framework/details/exception_holder.h
+++ b/paddle/fluid/framework/details/exception_holder.h
--- a/paddle/fluid/framework/details/execution_strategy.h
+++ b/paddle/fluid/framework/details/execution_strategy.h
--- a/paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.cc
+++ b/paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.cc
--- a/paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.h
+++ b/paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.h
--- a/paddle/fluid/framework/details/fetch_op_handle.cc
+++ b/paddle/fluid/framework/details/fetch_op_handle.cc
--- a/paddle/fluid/framework/details/fused_broadcast_op_handle.cc
+++ b/paddle/fluid/framework/details/fused_broadcast_op_handle.cc
--- a/paddle/fluid/framework/details/fused_broadcast_op_handle.h
+++ b/paddle/fluid/framework/details/fused_broadcast_op_handle.h
--- a/paddle/fluid/framework/details/fused_broadcast_op_handle_test.cc
+++ b/paddle/fluid/framework/details/fused_broadcast_op_handle_test.cc
--- a/paddle/fluid/framework/details/gather_op_handle.cc
+++ b/paddle/fluid/framework/details/gather_op_handle.cc
--- a/paddle/fluid/framework/details/gather_op_handle_test.cc
+++ b/paddle/fluid/framework/details/gather_op_handle_test.cc
--- a/paddle/fluid/framework/details/modify_op_lock_and_record_event_pass.cc
+++ b/paddle/fluid/framework/details/modify_op_lock_and_record_event_pass.cc
--- a/paddle/fluid/framework/details/modify_op_lock_and_record_event_pass.h
+++ b/paddle/fluid/framework/details/modify_op_lock_and_record_event_pass.h
--- a/paddle/fluid/framework/details/multi_devices_graph_check_pass.cc
+++ b/paddle/fluid/framework/details/multi_devices_graph_check_pass.cc
--- a/paddle/fluid/framework/details/multi_devices_graph_pass.cc
+++ b/paddle/fluid/framework/details/multi_devices_graph_pass.cc
--- a/paddle/fluid/framework/details/multi_devices_graph_pass.h
+++ b/paddle/fluid/framework/details/multi_devices_graph_pass.h
--- a/paddle/fluid/framework/details/multi_devices_graph_print_pass.cc
+++ b/paddle/fluid/framework/details/multi_devices_graph_print_pass.cc
--- a/paddle/fluid/framework/details/multi_devices_helper.h
+++ b/paddle/fluid/framework/details/multi_devices_helper.h
--- a/paddle/fluid/framework/details/op_graph_view.cc
+++ b/paddle/fluid/framework/details/op_graph_view.cc
--- a/paddle/fluid/framework/details/op_graph_view.h
+++ b/paddle/fluid/framework/details/op_graph_view.h
--- a/paddle/fluid/framework/details/op_handle_base.cc
+++ b/paddle/fluid/framework/details/op_handle_base.cc
--- a/paddle/fluid/framework/details/op_handle_base.h
+++ b/paddle/fluid/framework/details/op_handle_base.h
--- a/paddle/fluid/framework/details/reduce_op_handle.cc
+++ b/paddle/fluid/framework/details/reduce_op_handle.cc
--- a/paddle/fluid/framework/details/reduce_op_handle.h
+++ b/paddle/fluid/framework/details/reduce_op_handle.h
--- a/paddle/fluid/framework/details/reduce_op_handle_test.cc
+++ b/paddle/fluid/framework/details/reduce_op_handle_test.cc
--- a/paddle/fluid/framework/details/reference_count_op_handle.h
+++ b/paddle/fluid/framework/details/reference_count_op_handle.h
--- a/paddle/fluid/framework/details/reference_count_pass.cc
+++ b/paddle/fluid/framework/details/reference_count_pass.cc
--- a/paddle/fluid/framework/details/reference_count_pass.h
+++ b/paddle/fluid/framework/details/reference_count_pass.h
--- a/paddle/fluid/framework/details/rpc_op_handle.cc
+++ b/paddle/fluid/framework/details/rpc_op_handle.cc
--- a/paddle/fluid/framework/details/scale_loss_grad_op_handle.cc
+++ b/paddle/fluid/framework/details/scale_loss_grad_op_handle.cc
--- a/paddle/fluid/framework/details/scope_buffered_ssa_graph_executor.cc
+++ b/paddle/fluid/framework/details/scope_buffered_ssa_graph_executor.cc
--- a/paddle/fluid/framework/details/sequential_execution_pass.cc
+++ b/paddle/fluid/framework/details/sequential_execution_pass.cc
--- a/paddle/fluid/framework/details/sequential_execution_pass.h
+++ b/paddle/fluid/framework/details/sequential_execution_pass.h
--- a/paddle/fluid/framework/details/ssa_graph_executor.cc
+++ b/paddle/fluid/framework/details/ssa_graph_executor.cc
--- a/paddle/fluid/framework/details/ssa_graph_executor.h
+++ b/paddle/fluid/framework/details/ssa_graph_executor.h
--- a/paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
+++ b/paddle/fluid/framework/details/threaded_ssa_graph_executor.cc
--- a/paddle/fluid/framework/details/threaded_ssa_graph_executor.h
+++ b/paddle/fluid/framework/details/threaded_ssa_graph_executor.h
--- a/paddle/fluid/framework/details/var_handle.cc
+++ b/paddle/fluid/framework/details/var_handle.cc
--- a/paddle/fluid/framework/details/var_handle.h
+++ b/paddle/fluid/framework/details/var_handle.h
--- a/paddle/fluid/framework/dlpack_tensor.cc
+++ b/paddle/fluid/framework/dlpack_tensor.cc
--- a/paddle/fluid/framework/dlpack_tensor.h
+++ b/paddle/fluid/framework/dlpack_tensor.h
--- a/paddle/fluid/framework/dlpack_tensor_test.cc
+++ b/paddle/fluid/framework/dlpack_tensor_test.cc
--- a/paddle/fluid/framework/eigen.h
+++ b/paddle/fluid/framework/eigen.h
--- a/paddle/fluid/framework/executor.cc
+++ b/paddle/fluid/framework/executor.cc
--- a/paddle/fluid/framework/executor.h
+++ b/paddle/fluid/framework/executor.h
--- a/paddle/fluid/framework/feed_fetch_method.cc
+++ b/paddle/fluid/framework/feed_fetch_method.cc
--- a/paddle/fluid/framework/framework.proto
+++ b/paddle/fluid/framework/framework.proto
--- a/paddle/fluid/framework/garbage_collector.h
+++ b/paddle/fluid/framework/garbage_collector.h
--- a/paddle/fluid/framework/grad_op_desc_maker.h
+++ b/paddle/fluid/framework/grad_op_desc_maker.h
--- a/paddle/fluid/framework/ir/CMakeLists.txt
+++ b/paddle/fluid/framework/ir/CMakeLists.txt
--- a/paddle/fluid/framework/ir/attention_lstm_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/attention_lstm_fuse_pass.cc
--- a/paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.cc
--- a/paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.h
+++ b/paddle/fluid/framework/ir/conv_bias_mkldnn_fuse_pass.h
--- a/paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/conv_bn_fuse_pass.cc
--- a/paddle/fluid/framework/ir/conv_bn_fuse_pass.h
+++ b/paddle/fluid/framework/ir/conv_bn_fuse_pass.h
--- a/paddle/fluid/framework/ir/conv_elementwise_add_mkldnn_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/conv_elementwise_add_mkldnn_fuse_pass.cc
--- a/paddle/fluid/framework/ir/conv_elementwise_add_mkldnn_fuse_pass.h
+++ b/paddle/fluid/framework/ir/conv_elementwise_add_mkldnn_fuse_pass.h
--- a/paddle/fluid/framework/ir/conv_elementwise_add_mkldnn_fuse_pass_tester.cc
+++ b/paddle/fluid/framework/ir/conv_elementwise_add_mkldnn_fuse_pass_tester.cc
--- a/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.cc
--- a/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.h
+++ b/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass.h
--- a/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass_tester.cc
+++ b/paddle/fluid/framework/ir/conv_relu_mkldnn_fuse_pass_tester.cc
--- a/paddle/fluid/framework/ir/depthwise_conv_mkldnn_pass.cc
+++ b/paddle/fluid/framework/ir/depthwise_conv_mkldnn_pass.cc
--- a/paddle/fluid/framework/ir/depthwise_conv_mkldnn_pass.h
+++ b/paddle/fluid/framework/ir/depthwise_conv_mkldnn_pass.h
--- a/paddle/fluid/framework/ir/depthwise_conv_mkldnn_pass_tester.cc
+++ b/paddle/fluid/framework/ir/depthwise_conv_mkldnn_pass_tester.cc
--- a/paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.cc
--- a/paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.h
+++ b/paddle/fluid/framework/ir/embedding_fc_lstm_fuse_pass.h
--- a/paddle/fluid/framework/ir/fc_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/fc_fuse_pass.cc
--- a/paddle/fluid/framework/ir/fc_fuse_pass_tester.cc
+++ b/paddle/fluid/framework/ir/fc_fuse_pass_tester.cc
--- a/paddle/fluid/framework/ir/fc_lstm_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/fc_lstm_fuse_pass.cc
--- a/paddle/fluid/framework/ir/fuse_elewise_add_act_pass.cc
+++ b/paddle/fluid/framework/ir/fuse_elewise_add_act_pass.cc
--- a/paddle/fluid/framework/ir/fuse_elewise_add_act_pass.h
+++ b/paddle/fluid/framework/ir/fuse_elewise_add_act_pass.h
--- a/paddle/fluid/framework/ir/fuse_pass_base.cc
+++ b/paddle/fluid/framework/ir/fuse_pass_base.cc
--- a/paddle/fluid/framework/ir/fuse_pass_base.h
+++ b/paddle/fluid/framework/ir/fuse_pass_base.h
--- a/paddle/fluid/framework/ir/graph.cc
+++ b/paddle/fluid/framework/ir/graph.cc
--- a/paddle/fluid/framework/ir/graph.h
+++ b/paddle/fluid/framework/ir/graph.h
--- a/paddle/fluid/framework/ir/graph_helper.cc
+++ b/paddle/fluid/framework/ir/graph_helper.cc
--- a/paddle/fluid/framework/ir/graph_helper.h
+++ b/paddle/fluid/framework/ir/graph_helper.h
--- a/paddle/fluid/framework/ir/graph_helper_test.cc
+++ b/paddle/fluid/framework/ir/graph_helper_test.cc
--- a/paddle/fluid/framework/ir/graph_pattern_detector.cc
+++ b/paddle/fluid/framework/ir/graph_pattern_detector.cc
--- a/paddle/fluid/framework/ir/graph_pattern_detector.h
+++ b/paddle/fluid/framework/ir/graph_pattern_detector.h
--- a/paddle/fluid/framework/ir/graph_test.cc
+++ b/paddle/fluid/framework/ir/graph_test.cc
--- a/paddle/fluid/framework/ir/graph_to_program_pass.cc
+++ b/paddle/fluid/framework/ir/graph_to_program_pass.cc
--- a/paddle/fluid/framework/ir/graph_traits.cc
+++ b/paddle/fluid/framework/ir/graph_traits.cc
--- a/paddle/fluid/framework/ir/graph_traits.h
+++ b/paddle/fluid/framework/ir/graph_traits.h
--- a/paddle/fluid/framework/ir/graph_viz_pass.cc
+++ b/paddle/fluid/framework/ir/graph_viz_pass.cc
--- a/paddle/fluid/framework/ir/is_test_pass.cc
+++ b/paddle/fluid/framework/ir/is_test_pass.cc
--- a/paddle/fluid/framework/ir/is_test_pass.h
+++ b/paddle/fluid/framework/ir/is_test_pass.h
--- a/paddle/fluid/framework/ir/is_test_pass_tester.cc
+++ b/paddle/fluid/framework/ir/is_test_pass_tester.cc
--- a/paddle/fluid/framework/ir/mkldnn_placement_pass.cc
+++ b/paddle/fluid/framework/ir/mkldnn_placement_pass.cc
--- a/paddle/fluid/framework/ir/mkldnn_placement_pass.h
+++ b/paddle/fluid/framework/ir/mkldnn_placement_pass.h
--- a/paddle/fluid/framework/ir/multi_batch_merge_pass.cc
+++ b/paddle/fluid/framework/ir/multi_batch_merge_pass.cc
--- a/paddle/fluid/framework/ir/multi_batch_merge_pass.h
+++ b/paddle/fluid/framework/ir/multi_batch_merge_pass.h
--- a/paddle/fluid/framework/ir/node.cc
+++ b/paddle/fluid/framework/ir/node.cc
--- a/paddle/fluid/framework/ir/node.h
+++ b/paddle/fluid/framework/ir/node.h
--- a/paddle/fluid/framework/ir/node_test.cc
+++ b/paddle/fluid/framework/ir/node_test.cc
--- a/paddle/fluid/framework/ir/pass.cc
+++ b/paddle/fluid/framework/ir/pass.cc
--- a/paddle/fluid/framework/ir/pass.h
+++ b/paddle/fluid/framework/ir/pass.h
--- a/paddle/fluid/framework/ir/pass_builder.cc
+++ b/paddle/fluid/framework/ir/pass_builder.cc
--- a/paddle/fluid/framework/ir/pass_builder.h
+++ b/paddle/fluid/framework/ir/pass_builder.h
--- a/paddle/fluid/framework/ir/pass_test.cc
+++ b/paddle/fluid/framework/ir/pass_test.cc
--- a/paddle/fluid/framework/ir/seq_concat_fc_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/seq_concat_fc_fuse_pass.cc
--- a/paddle/fluid/framework/ir/seqconv_eltadd_relu_fuse_pass.cc
+++ b/paddle/fluid/framework/ir/seqconv_eltadd_relu_fuse_pass.cc
--- a/paddle/fluid/framework/ir/seqconv_eltadd_relu_fuse_pass.h
+++ b/paddle/fluid/framework/ir/seqconv_eltadd_relu_fuse_pass.h
--- a/paddle/fluid/framework/lod_rank_table.cc
+++ b/paddle/fluid/framework/lod_rank_table.cc
--- a/paddle/fluid/framework/lod_tensor.cc
+++ b/paddle/fluid/framework/lod_tensor.cc
--- a/paddle/fluid/framework/lod_tensor.h
+++ b/paddle/fluid/framework/lod_tensor.h
--- a/paddle/fluid/framework/lod_tensor_array.h
+++ b/paddle/fluid/framework/lod_tensor_array.h
--- a/paddle/fluid/framework/mixed_vector.h
+++ b/paddle/fluid/framework/mixed_vector.h
--- a/paddle/fluid/framework/mixed_vector_test.cc
+++ b/paddle/fluid/framework/mixed_vector_test.cc
--- a/paddle/fluid/framework/naive_executor.cc
+++ b/paddle/fluid/framework/naive_executor.cc
--- a/paddle/fluid/framework/naive_executor.h
+++ b/paddle/fluid/framework/naive_executor.h
--- a/paddle/fluid/framework/naive_executor_test.cc
+++ b/paddle/fluid/framework/naive_executor_test.cc
--- a/paddle/fluid/framework/ngraph_bridge.cc
+++ b/paddle/fluid/framework/ngraph_bridge.cc
--- a/paddle/fluid/framework/ngraph_bridge.h
+++ b/paddle/fluid/framework/ngraph_bridge.h
--- a/paddle/fluid/framework/ngraph_operator.cc
+++ b/paddle/fluid/framework/ngraph_operator.cc
--- a/paddle/fluid/framework/ngraph_operator.h
+++ b/paddle/fluid/framework/ngraph_operator.h
--- a/paddle/fluid/framework/op_desc.cc
+++ b/paddle/fluid/framework/op_desc.cc
--- a/paddle/fluid/framework/op_desc.h
+++ b/paddle/fluid/framework/op_desc.h
--- a/paddle/fluid/framework/op_kernel_type.h
+++ b/paddle/fluid/framework/op_kernel_type.h
--- a/paddle/fluid/framework/op_proto_maker.cc
+++ b/paddle/fluid/framework/op_proto_maker.cc
--- a/paddle/fluid/framework/op_proto_maker.h
+++ b/paddle/fluid/framework/op_proto_maker.h
--- a/paddle/fluid/framework/op_proto_maker_test.cc
+++ b/paddle/fluid/framework/op_proto_maker_test.cc
--- a/paddle/fluid/framework/op_registry.cc
+++ b/paddle/fluid/framework/op_registry.cc
--- a/paddle/fluid/framework/op_registry.h
+++ b/paddle/fluid/framework/op_registry.h
--- a/paddle/fluid/framework/operator.cc
+++ b/paddle/fluid/framework/operator.cc
--- a/paddle/fluid/framework/operator.h
+++ b/paddle/fluid/framework/operator.h
--- a/paddle/fluid/framework/parallel_executor.cc
+++ b/paddle/fluid/framework/parallel_executor.cc
--- a/paddle/fluid/framework/parallel_executor.h
+++ b/paddle/fluid/framework/parallel_executor.h
--- a/paddle/fluid/framework/program_desc.cc
+++ b/paddle/fluid/framework/program_desc.cc
--- a/paddle/fluid/framework/program_desc_test.cc
+++ b/paddle/fluid/framework/program_desc_test.cc
--- a/paddle/fluid/framework/prune.cc
+++ b/paddle/fluid/framework/prune.cc
--- a/paddle/fluid/framework/prune.h
+++ b/paddle/fluid/framework/prune.h
--- a/paddle/fluid/framework/reader_test.cc
+++ b/paddle/fluid/framework/reader_test.cc
--- a/paddle/fluid/framework/rw_lock.h
+++ b/paddle/fluid/framework/rw_lock.h
--- a/paddle/fluid/framework/scope.cc
+++ b/paddle/fluid/framework/scope.cc
--- a/paddle/fluid/framework/scope.h
+++ b/paddle/fluid/framework/scope.h
--- a/paddle/fluid/framework/selected_rows.cc
+++ b/paddle/fluid/framework/selected_rows.cc
--- a/paddle/fluid/framework/selected_rows.h
+++ b/paddle/fluid/framework/selected_rows.h
--- a/paddle/fluid/framework/selected_rows_test.cc
+++ b/paddle/fluid/framework/selected_rows_test.cc
--- a/paddle/fluid/framework/shape_inference.h
+++ b/paddle/fluid/framework/shape_inference.h
--- a/paddle/fluid/framework/tensor.cc
+++ b/paddle/fluid/framework/tensor.cc
--- a/paddle/fluid/framework/tensor.h
+++ b/paddle/fluid/framework/tensor.h
--- a/paddle/fluid/framework/tensor_impl.h
+++ b/paddle/fluid/framework/tensor_impl.h
--- a/paddle/fluid/framework/tensor_test.cc
+++ b/paddle/fluid/framework/tensor_test.cc
--- a/paddle/fluid/framework/tensor_util.cc
+++ b/paddle/fluid/framework/tensor_util.cc
--- a/paddle/fluid/framework/tensor_util.h
+++ b/paddle/fluid/framework/tensor_util.h
--- a/paddle/fluid/framework/tensor_util_test.cc
+++ b/paddle/fluid/framework/tensor_util_test.cc
--- a/paddle/fluid/framework/tensor_util_test.cu
+++ b/paddle/fluid/framework/tensor_util_test.cu
--- a/paddle/fluid/framework/threadpool.cc
+++ b/paddle/fluid/framework/threadpool.cc
--- a/paddle/fluid/framework/threadpool.h
+++ b/paddle/fluid/framework/threadpool.h
--- a/paddle/fluid/framework/threadpool_test.cc
+++ b/paddle/fluid/framework/threadpool_test.cc
--- a/paddle/fluid/framework/transfer_scope_cache.cc
+++ b/paddle/fluid/framework/transfer_scope_cache.cc
--- a/paddle/fluid/framework/transfer_scope_cache.h
+++ b/paddle/fluid/framework/transfer_scope_cache.h
--- a/paddle/fluid/framework/tuple.h
+++ b/paddle/fluid/framework/tuple.h
--- a/paddle/fluid/framework/type_defs.h
+++ b/paddle/fluid/framework/type_defs.h
--- a/paddle/fluid/framework/var_desc.cc
+++ b/paddle/fluid/framework/var_desc.cc
--- a/paddle/fluid/framework/var_desc.h
+++ b/paddle/fluid/framework/var_desc.h
--- a/paddle/fluid/framework/var_type.h
+++ b/paddle/fluid/framework/var_type.h
--- a/paddle/fluid/framework/var_type_inference.h
+++ b/paddle/fluid/framework/var_type_inference.h
--- a/paddle/fluid/framework/variable.h
+++ b/paddle/fluid/framework/variable.h
--- a/paddle/fluid/framework/variable_test.cc
+++ b/paddle/fluid/framework/variable_test.cc
--- a/paddle/fluid/inference/CMakeLists.txt
+++ b/paddle/fluid/inference/CMakeLists.txt
--- a/paddle/fluid/inference/analysis/CMakeLists.txt
+++ b/paddle/fluid/inference/analysis/CMakeLists.txt
--- a/paddle/fluid/inference/analysis/analysis_pass.h
+++ b/paddle/fluid/inference/analysis/analysis_pass.h
--- a/paddle/fluid/inference/analysis/analyzer.cc
+++ b/paddle/fluid/inference/analysis/analyzer.cc
--- a/paddle/fluid/inference/analysis/analyzer.h
+++ b/paddle/fluid/inference/analysis/analyzer.h
--- a/paddle/fluid/inference/analysis/analyzer_lac_tester.cc
+++ b/paddle/fluid/inference/analysis/analyzer_lac_tester.cc
--- a/paddle/fluid/inference/analysis/analyzer_main.cc
+++ b/paddle/fluid/inference/analysis/analyzer_main.cc
--- a/paddle/fluid/inference/analysis/analyzer_ner_tester.cc
+++ b/paddle/fluid/inference/analysis/analyzer_ner_tester.cc
--- a/paddle/fluid/inference/analysis/analyzer_tester.cc
+++ b/paddle/fluid/inference/analysis/analyzer_tester.cc
--- a/paddle/fluid/inference/analysis/analyzer_text_classification_tester.cc
+++ b/paddle/fluid/inference/analysis/analyzer_text_classification_tester.cc
--- a/paddle/fluid/inference/analysis/argument.h
+++ b/paddle/fluid/inference/analysis/argument.h
--- a/paddle/fluid/inference/analysis/data_flow_graph.cc
+++ b/paddle/fluid/inference/analysis/data_flow_graph.cc
--- a/paddle/fluid/inference/analysis/data_flow_graph.h
+++ b/paddle/fluid/inference/analysis/data_flow_graph.h
--- a/paddle/fluid/inference/analysis/data_flow_graph_tester.cc
+++ b/paddle/fluid/inference/analysis/data_flow_graph_tester.cc
--- a/paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass.cc
+++ b/paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass.cc
--- a/paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass.h
+++ b/paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass.h
--- a/paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass_tester.cc
+++ b/paddle/fluid/inference/analysis/data_flow_graph_to_fluid_pass_tester.cc
--- a/paddle/fluid/inference/analysis/dfg_graphviz_draw_pass.cc
+++ b/paddle/fluid/inference/analysis/dfg_graphviz_draw_pass.cc
--- a/paddle/fluid/inference/analysis/dfg_graphviz_draw_pass.h
+++ b/paddle/fluid/inference/analysis/dfg_graphviz_draw_pass.h
--- a/paddle/fluid/inference/analysis/dfg_graphviz_draw_pass_tester.cc
+++ b/paddle/fluid/inference/analysis/dfg_graphviz_draw_pass_tester.cc
--- a/paddle/fluid/inference/analysis/dot_tester.cc
+++ b/paddle/fluid/inference/analysis/dot_tester.cc
--- a/paddle/fluid/inference/analysis/fluid_to_data_flow_graph_pass.cc
+++ b/paddle/fluid/inference/analysis/fluid_to_data_flow_graph_pass.cc
--- a/paddle/fluid/inference/analysis/fluid_to_data_flow_graph_pass.h
+++ b/paddle/fluid/inference/analysis/fluid_to_data_flow_graph_pass.h
--- a/paddle/fluid/inference/analysis/fluid_to_data_flow_graph_pass_tester.cc
+++ b/paddle/fluid/inference/analysis/fluid_to_data_flow_graph_pass_tester.cc
--- a/paddle/fluid/inference/analysis/fluid_to_ir_pass.cc
+++ b/paddle/fluid/inference/analysis/fluid_to_ir_pass.cc
--- a/paddle/fluid/inference/analysis/fluid_to_ir_pass.h
+++ b/paddle/fluid/inference/analysis/fluid_to_ir_pass.h
--- a/paddle/fluid/inference/analysis/fluid_to_ir_pass_tester.cc
+++ b/paddle/fluid/inference/analysis/fluid_to_ir_pass_tester.cc
--- a/paddle/fluid/inference/analysis/graph_traits.cc
+++ b/paddle/fluid/inference/analysis/graph_traits.cc
--- a/paddle/fluid/inference/analysis/graph_traits.h
+++ b/paddle/fluid/inference/analysis/graph_traits.h
--- a/paddle/fluid/inference/analysis/helper.h
+++ b/paddle/fluid/inference/analysis/helper.h
--- a/paddle/fluid/inference/analysis/ir_pass_manager.cc
+++ b/paddle/fluid/inference/analysis/ir_pass_manager.cc
--- a/paddle/fluid/inference/analysis/ir_pass_manager.h
+++ b/paddle/fluid/inference/analysis/ir_pass_manager.h
--- a/paddle/fluid/inference/analysis/ir_passes/CMakeLists.txt
+++ b/paddle/fluid/inference/analysis/ir_passes/CMakeLists.txt
--- a/paddle/fluid/inference/analysis/ir_passes/subgraph_detector.cc
+++ b/paddle/fluid/inference/analysis/ir_passes/subgraph_detector.cc
--- a/paddle/fluid/inference/analysis/ir_passes/subgraph_detector.h
+++ b/paddle/fluid/inference/analysis/ir_passes/subgraph_detector.h
--- a/paddle/fluid/inference/analysis/ir_passes/tensorrt_subgraph_pass.cc
+++ b/paddle/fluid/inference/analysis/ir_passes/tensorrt_subgraph_pass.cc
--- a/paddle/fluid/inference/analysis/ir_passes/tensorrt_subgraph_pass.h
+++ b/paddle/fluid/inference/analysis/ir_passes/tensorrt_subgraph_pass.h
--- a/paddle/fluid/inference/analysis/model_store_pass.cc
+++ b/paddle/fluid/inference/analysis/model_store_pass.cc
--- a/paddle/fluid/inference/analysis/model_store_pass.h
+++ b/paddle/fluid/inference/analysis/model_store_pass.h
--- a/paddle/fluid/inference/analysis/model_store_pass_tester.cc
+++ b/paddle/fluid/inference/analysis/model_store_pass_tester.cc
--- a/paddle/fluid/inference/analysis/node.cc
+++ b/paddle/fluid/inference/analysis/node.cc
--- a/paddle/fluid/inference/analysis/node.h
+++ b/paddle/fluid/inference/analysis/node.h
--- a/paddle/fluid/inference/analysis/node_attr_flags.h
+++ b/paddle/fluid/inference/analysis/node_attr_flags.h
--- a/paddle/fluid/inference/analysis/node_tester.cc
+++ b/paddle/fluid/inference/analysis/node_tester.cc
--- a/paddle/fluid/inference/analysis/pass_manager.cc
+++ b/paddle/fluid/inference/analysis/pass_manager.cc
--- a/paddle/fluid/inference/analysis/pass_manager.h
+++ b/paddle/fluid/inference/analysis/pass_manager.h
--- a/paddle/fluid/inference/analysis/pass_manager_tester.cc
+++ b/paddle/fluid/inference/analysis/pass_manager_tester.cc
--- a/paddle/fluid/inference/analysis/passes/CMakeLists.txt
+++ b/paddle/fluid/inference/analysis/passes/CMakeLists.txt
--- a/paddle/fluid/inference/analysis/passes/ir_analysis_compose_pass.cc
+++ b/paddle/fluid/inference/analysis/passes/ir_analysis_compose_pass.cc
--- a/paddle/fluid/inference/analysis/passes/ir_analysis_compose_pass.h
+++ b/paddle/fluid/inference/analysis/passes/ir_analysis_compose_pass.h
--- a/paddle/fluid/inference/analysis/passes/ir_analysis_pass.cc
+++ b/paddle/fluid/inference/analysis/passes/ir_analysis_pass.cc
--- a/paddle/fluid/inference/analysis/passes/ir_analysis_pass.h
+++ b/paddle/fluid/inference/analysis/passes/ir_analysis_pass.h
--- a/paddle/fluid/inference/analysis/passes/ir_graph_build_pass.cc
+++ b/paddle/fluid/inference/analysis/passes/ir_graph_build_pass.cc
--- a/paddle/fluid/inference/analysis/passes/ir_graph_build_pass.h
+++ b/paddle/fluid/inference/analysis/passes/ir_graph_build_pass.h
--- a/paddle/fluid/inference/analysis/passes/passes.cc
+++ b/paddle/fluid/inference/analysis/passes/passes.cc
--- a/paddle/fluid/inference/analysis/passes/passes.h
+++ b/paddle/fluid/inference/analysis/passes/passes.h
--- a/paddle/fluid/inference/analysis/subgraph_splitter.cc
+++ b/paddle/fluid/inference/analysis/subgraph_splitter.cc
--- a/paddle/fluid/inference/analysis/subgraph_splitter.h
+++ b/paddle/fluid/inference/analysis/subgraph_splitter.h
--- a/paddle/fluid/inference/analysis/subgraph_splitter_tester.cc
+++ b/paddle/fluid/inference/analysis/subgraph_splitter_tester.cc
--- a/paddle/fluid/inference/analysis/tensorrt_subgraph_node_mark_pass.cc
+++ b/paddle/fluid/inference/analysis/tensorrt_subgraph_node_mark_pass.cc
--- a/paddle/fluid/inference/analysis/tensorrt_subgraph_node_mark_pass.h
+++ b/paddle/fluid/inference/analysis/tensorrt_subgraph_node_mark_pass.h
--- a/paddle/fluid/inference/analysis/tensorrt_subgraph_node_mark_pass_tester.cc
+++ b/paddle/fluid/inference/analysis/tensorrt_subgraph_node_mark_pass_tester.cc
--- a/paddle/fluid/inference/analysis/tensorrt_subgraph_pass.cc
+++ b/paddle/fluid/inference/analysis/tensorrt_subgraph_pass.cc
--- a/paddle/fluid/inference/analysis/tensorrt_subgraph_pass.h
+++ b/paddle/fluid/inference/analysis/tensorrt_subgraph_pass.h
--- a/paddle/fluid/inference/analysis/tensorrt_subgraph_pass_tester.cc
+++ b/paddle/fluid/inference/analysis/tensorrt_subgraph_pass_tester.cc
--- a/paddle/fluid/inference/analysis/ut_helper.h
+++ b/paddle/fluid/inference/analysis/ut_helper.h
--- a/paddle/fluid/inference/api/CMakeLists.txt
+++ b/paddle/fluid/inference/api/CMakeLists.txt
--- a/paddle/fluid/inference/api/README.md
+++ b/paddle/fluid/inference/api/README.md
--- a/paddle/fluid/inference/api/analysis_config.cc
+++ b/paddle/fluid/inference/api/analysis_config.cc
--- a/paddle/fluid/inference/api/analysis_predictor.cc
+++ b/paddle/fluid/inference/api/analysis_predictor.cc
--- a/paddle/fluid/inference/api/analysis_predictor.h
+++ b/paddle/fluid/inference/api/analysis_predictor.h
--- a/paddle/fluid/inference/api/analysis_predictor_tester.cc
+++ b/paddle/fluid/inference/api/analysis_predictor_tester.cc
--- a/paddle/fluid/inference/api/api.cc
+++ b/paddle/fluid/inference/api/api.cc
--- a/paddle/fluid/inference/api/api_anakin_engine.cc
+++ b/paddle/fluid/inference/api/api_anakin_engine.cc
--- a/paddle/fluid/inference/api/api_anakin_engine.h
+++ b/paddle/fluid/inference/api/api_anakin_engine.h
--- a/paddle/fluid/inference/api/api_anakin_engine_rnn_tester.cc
+++ b/paddle/fluid/inference/api/api_anakin_engine_rnn_tester.cc
--- a/paddle/fluid/inference/api/api_anakin_engine_tester.cc
+++ b/paddle/fluid/inference/api/api_anakin_engine_tester.cc
--- a/paddle/fluid/inference/api/api_impl.cc
+++ b/paddle/fluid/inference/api/api_impl.cc
--- a/paddle/fluid/inference/api/api_impl.h
+++ b/paddle/fluid/inference/api/api_impl.h
--- a/paddle/fluid/inference/api/api_impl_tester.cc
+++ b/paddle/fluid/inference/api/api_impl_tester.cc
--- a/paddle/fluid/inference/api/api_tensorrt_subgraph_engine.cc
+++ b/paddle/fluid/inference/api/api_tensorrt_subgraph_engine.cc
--- a/paddle/fluid/inference/api/api_tensorrt_subgraph_engine_tester.cc
+++ b/paddle/fluid/inference/api/api_tensorrt_subgraph_engine_tester.cc
--- a/paddle/fluid/inference/api/demo_ci/CMakeLists.txt
+++ b/paddle/fluid/inference/api/demo_ci/CMakeLists.txt
--- a/paddle/fluid/inference/api/demo_ci/run.sh
+++ b/paddle/fluid/inference/api/demo_ci/run.sh
--- a/paddle/fluid/inference/api/demo_ci/simple_on_word2vec.cc
+++ b/paddle/fluid/inference/api/demo_ci/simple_on_word2vec.cc
--- a/paddle/fluid/inference/api/demo_ci/trt_mobilenet_demo.cc
+++ b/paddle/fluid/inference/api/demo_ci/trt_mobilenet_demo.cc
--- a/paddle/fluid/inference/api/demo_ci/utils.h
+++ b/paddle/fluid/inference/api/demo_ci/utils.h
--- a/paddle/fluid/inference/api/demo_ci/vis_demo.cc
+++ b/paddle/fluid/inference/api/demo_ci/vis_demo.cc
--- a/paddle/fluid/inference/api/demo_ci/windows_inference.md
+++ b/paddle/fluid/inference/api/demo_ci/windows_inference.md
--- a/paddle/fluid/inference/api/details/reset_tensor_array.cc
+++ b/paddle/fluid/inference/api/details/reset_tensor_array.cc
--- a/paddle/fluid/inference/api/details/reset_tensor_array.h
+++ b/paddle/fluid/inference/api/details/reset_tensor_array.h
--- a/paddle/fluid/inference/api/details/zero_copy_tensor.cc
+++ b/paddle/fluid/inference/api/details/zero_copy_tensor.cc
--- a/paddle/fluid/inference/api/details/zero_copy_tensor_dummy.cc
+++ b/paddle/fluid/inference/api/details/zero_copy_tensor_dummy.cc
--- a/paddle/fluid/inference/api/helper.h
+++ b/paddle/fluid/inference/api/helper.h
--- a/paddle/fluid/inference/api/paddle_anakin_config.h
+++ b/paddle/fluid/inference/api/paddle_anakin_config.h
--- a/paddle/fluid/inference/api/paddle_analysis_config.h
+++ b/paddle/fluid/inference/api/paddle_analysis_config.h
--- a/paddle/fluid/inference/api/paddle_api.h
+++ b/paddle/fluid/inference/api/paddle_api.h
--- a/paddle/fluid/inference/api/paddle_inference_api.h
+++ b/paddle/fluid/inference/api/paddle_inference_api.h
--- a/paddle/fluid/inference/api/paddle_pass_builder.cc
+++ b/paddle/fluid/inference/api/paddle_pass_builder.cc
--- a/paddle/fluid/inference/api/paddle_pass_builder.h
+++ b/paddle/fluid/inference/api/paddle_pass_builder.h
--- a/paddle/fluid/inference/api/timer.h
+++ b/paddle/fluid/inference/api/timer.h
--- a/paddle/fluid/inference/io.cc
+++ b/paddle/fluid/inference/io.cc
--- a/paddle/fluid/inference/tensorrt/CMakeLists.txt
+++ b/paddle/fluid/inference/tensorrt/CMakeLists.txt
--- a/paddle/fluid/inference/tensorrt/convert/CMakeLists.txt
+++ b/paddle/fluid/inference/tensorrt/convert/CMakeLists.txt
--- a/paddle/fluid/inference/tensorrt/convert/activation_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/activation_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/batch_norm_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/batch_norm_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/concat_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/concat_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/conv2d_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/conv2d_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/dropout_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/dropout_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/elementwise_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/elementwise_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/fc_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/fc_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/leaky_relu_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/leaky_relu_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/mul_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/mul_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/op_converter.h
+++ b/paddle/fluid/inference/tensorrt/convert/op_converter.h
--- a/paddle/fluid/inference/tensorrt/convert/pad_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/pad_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/pool2d_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/pool2d_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/prelu_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/prelu_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/softmax_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/softmax_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/split_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/split_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_activation_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_activation_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_conv2d_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_conv2d_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_dropout_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_dropout_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_elementwise_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_elementwise_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_leaky_relu_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_leaky_relu_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_mul_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_mul_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_pad_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_pad_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_pool2d_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_pool2d_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_prelu_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_prelu_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/test_split_op.cc
+++ b/paddle/fluid/inference/tensorrt/convert/test_split_op.cc
--- a/paddle/fluid/inference/tensorrt/convert/ut_helper.h
+++ b/paddle/fluid/inference/tensorrt/convert/ut_helper.h
--- a/paddle/fluid/inference/tensorrt/engine.cc
+++ b/paddle/fluid/inference/tensorrt/engine.cc
--- a/paddle/fluid/inference/tensorrt/engine.h
+++ b/paddle/fluid/inference/tensorrt/engine.h
--- a/paddle/fluid/inference/tensorrt/helper.h
+++ b/paddle/fluid/inference/tensorrt/helper.h
--- a/paddle/fluid/inference/tensorrt/plugin/CMakeLists.txt
+++ b/paddle/fluid/inference/tensorrt/plugin/CMakeLists.txt
--- a/paddle/fluid/inference/tensorrt/plugin/avg_pool_op_plugin.cu
+++ b/paddle/fluid/inference/tensorrt/plugin/avg_pool_op_plugin.cu
--- a/paddle/fluid/inference/tensorrt/plugin/avg_pool_op_plugin.h
+++ b/paddle/fluid/inference/tensorrt/plugin/avg_pool_op_plugin.h
--- a/paddle/fluid/inference/tensorrt/plugin/elementwise_op_plugin.cu
+++ b/paddle/fluid/inference/tensorrt/plugin/elementwise_op_plugin.cu
--- a/paddle/fluid/inference/tensorrt/plugin/elementwise_op_plugin.h
+++ b/paddle/fluid/inference/tensorrt/plugin/elementwise_op_plugin.h
--- a/paddle/fluid/inference/tensorrt/plugin/prelu_op_plugin.cu
+++ b/paddle/fluid/inference/tensorrt/plugin/prelu_op_plugin.cu
--- a/paddle/fluid/inference/tensorrt/plugin/prelu_op_plugin.h
+++ b/paddle/fluid/inference/tensorrt/plugin/prelu_op_plugin.h
--- a/paddle/fluid/inference/tensorrt/plugin/serialize.h
+++ b/paddle/fluid/inference/tensorrt/plugin/serialize.h
--- a/paddle/fluid/inference/tensorrt/plugin/split_op_plugin.cu
+++ b/paddle/fluid/inference/tensorrt/plugin/split_op_plugin.cu
--- a/paddle/fluid/inference/tensorrt/plugin/split_op_plugin.h
+++ b/paddle/fluid/inference/tensorrt/plugin/split_op_plugin.h
--- a/paddle/fluid/inference/tensorrt/plugin/trt_plugin.cc
+++ b/paddle/fluid/inference/tensorrt/plugin/trt_plugin.cc
--- a/paddle/fluid/inference/tensorrt/plugin/trt_plugin.h
+++ b/paddle/fluid/inference/tensorrt/plugin/trt_plugin.h
--- a/paddle/fluid/inference/tests/api/CMakeLists.txt
+++ b/paddle/fluid/inference/tests/api/CMakeLists.txt
--- a/paddle/fluid/inference/tests/api/anakin_mobilenet_tester.cc
+++ b/paddle/fluid/inference/tests/api/anakin_mobilenet_tester.cc
--- a/paddle/fluid/inference/tests/api/anakin_rnn1_tester.cc
+++ b/paddle/fluid/inference/tests/api/anakin_rnn1_tester.cc
--- a/paddle/fluid/inference/tests/api/analyzer_dam_tester.cc
+++ b/paddle/fluid/inference/tests/api/analyzer_dam_tester.cc
--- a/paddle/fluid/inference/tests/api/analyzer_lac_tester.cc
+++ b/paddle/fluid/inference/tests/api/analyzer_lac_tester.cc
--- a/paddle/fluid/inference/tests/api/analyzer_ner_tester.cc
+++ b/paddle/fluid/inference/tests/api/analyzer_ner_tester.cc
--- a/paddle/fluid/inference/tests/api/analyzer_resnet50_tester.cc
+++ b/paddle/fluid/inference/tests/api/analyzer_resnet50_tester.cc
--- a/paddle/fluid/inference/tests/api/analyzer_rnn1_tester.cc
+++ b/paddle/fluid/inference/tests/api/analyzer_rnn1_tester.cc
--- a/paddle/fluid/inference/tests/api/analyzer_rnn2_tester.cc
+++ b/paddle/fluid/inference/tests/api/analyzer_rnn2_tester.cc
--- a/paddle/fluid/inference/tests/api/analyzer_seq_conv1_tester.cc
+++ b/paddle/fluid/inference/tests/api/analyzer_seq_conv1_tester.cc
--- a/paddle/fluid/inference/tests/api/analyzer_text_classification_tester.cc
+++ b/paddle/fluid/inference/tests/api/analyzer_text_classification_tester.cc
--- a/paddle/fluid/inference/tests/api/analyzer_vis_tester.cc
+++ b/paddle/fluid/inference/tests/api/analyzer_vis_tester.cc
--- a/paddle/fluid/inference/tests/api/config_printer.h
+++ b/paddle/fluid/inference/tests/api/config_printer.h
--- a/paddle/fluid/inference/tests/api/tester_helper.h
+++ b/paddle/fluid/inference/tests/api/tester_helper.h
--- a/paddle/fluid/inference/tests/api/trt_models_tester.cc
+++ b/paddle/fluid/inference/tests/api/trt_models_tester.cc
--- a/paddle/fluid/inference/tests/book/CMakeLists.txt
+++ b/paddle/fluid/inference/tests/book/CMakeLists.txt
--- a/paddle/fluid/inference/tests/test.cmake
+++ b/paddle/fluid/inference/tests/test.cmake
--- a/paddle/fluid/inference/tests/test_helper.h
+++ b/paddle/fluid/inference/tests/test_helper.h
--- a/paddle/fluid/inference/utils/CMakeLists.txt
+++ b/paddle/fluid/inference/utils/CMakeLists.txt
--- a/paddle/fluid/inference/utils/benchmark.cc
+++ b/paddle/fluid/inference/utils/benchmark.cc
--- a/paddle/fluid/inference/utils/benchmark.h
+++ b/paddle/fluid/inference/utils/benchmark.h
--- a/paddle/fluid/inference/utils/benchmark_tester.cc
+++ b/paddle/fluid/inference/utils/benchmark_tester.cc
--- a/paddle/fluid/memory/CMakeLists.txt
+++ b/paddle/fluid/memory/CMakeLists.txt
--- a/paddle/fluid/memory/allocation/CMakeLists.txt
+++ b/paddle/fluid/memory/allocation/CMakeLists.txt
--- a/paddle/fluid/memory/allocation/aligned_allocator.cc
+++ b/paddle/fluid/memory/allocation/aligned_allocator.cc
--- a/paddle/fluid/memory/allocation/aligned_allocator.h
+++ b/paddle/fluid/memory/allocation/aligned_allocator.h
--- a/paddle/fluid/memory/allocation/allocation_and_eigen_test.cu
+++ b/paddle/fluid/memory/allocation/allocation_and_eigen_test.cu
--- a/paddle/fluid/memory/allocation/allocation_with_underlying.h
+++ b/paddle/fluid/memory/allocation/allocation_with_underlying.h
--- a/paddle/fluid/memory/allocation/allocator.cc
+++ b/paddle/fluid/memory/allocation/allocator.cc
--- a/paddle/fluid/memory/allocation/allocator.h
+++ b/paddle/fluid/memory/allocation/allocator.h
--- a/paddle/fluid/memory/allocation/allocator_facade.cc
+++ b/paddle/fluid/memory/allocation/allocator_facade.cc
--- a/paddle/fluid/memory/allocation/allocator_facade.h
+++ b/paddle/fluid/memory/allocation/allocator_facade.h
--- a/paddle/fluid/memory/allocation/allocator_facade_test.cc
+++ b/paddle/fluid/memory/allocation/allocator_facade_test.cc
--- a/paddle/fluid/memory/allocation/allocator_strategy.cc
+++ b/paddle/fluid/memory/allocation/allocator_strategy.cc
--- a/paddle/fluid/memory/allocation/allocator_strategy.h
+++ b/paddle/fluid/memory/allocation/allocator_strategy.h
--- a/paddle/fluid/memory/allocation/auto_increment_allocator.cc
+++ b/paddle/fluid/memory/allocation/auto_increment_allocator.cc
--- a/paddle/fluid/memory/allocation/auto_increment_allocator.h
+++ b/paddle/fluid/memory/allocation/auto_increment_allocator.h
--- a/paddle/fluid/memory/allocation/best_fit_allocator.cc
+++ b/paddle/fluid/memory/allocation/best_fit_allocator.cc
--- a/paddle/fluid/memory/allocation/best_fit_allocator.h
+++ b/paddle/fluid/memory/allocation/best_fit_allocator.h
--- a/paddle/fluid/memory/allocation/best_fit_allocator_test.cc
+++ b/paddle/fluid/memory/allocation/best_fit_allocator_test.cc
--- a/paddle/fluid/memory/allocation/best_fit_allocator_test.cu
+++ b/paddle/fluid/memory/allocation/best_fit_allocator_test.cu
--- a/paddle/fluid/memory/allocation/buffered_allocator.cc
+++ b/paddle/fluid/memory/allocation/buffered_allocator.cc
--- a/paddle/fluid/memory/allocation/buffered_allocator.h
+++ b/paddle/fluid/memory/allocation/buffered_allocator.h
--- a/paddle/fluid/memory/allocation/buffered_allocator_test.cc
+++ b/paddle/fluid/memory/allocation/buffered_allocator_test.cc
--- a/paddle/fluid/memory/allocation/conditional_allocator.cc
+++ b/paddle/fluid/memory/allocation/conditional_allocator.cc
--- a/paddle/fluid/memory/allocation/conditional_allocator.h
+++ b/paddle/fluid/memory/allocation/conditional_allocator.h
--- a/paddle/fluid/memory/allocation/cpu_allocator.cc
+++ b/paddle/fluid/memory/allocation/cpu_allocator.cc
--- a/paddle/fluid/memory/allocation/cpu_allocator.h
+++ b/paddle/fluid/memory/allocation/cpu_allocator.h
--- a/paddle/fluid/memory/allocation/cuda_allocator.cc
+++ b/paddle/fluid/memory/allocation/cuda_allocator.cc
--- a/paddle/fluid/memory/allocation/cuda_allocator.h
+++ b/paddle/fluid/memory/allocation/cuda_allocator.h
--- a/paddle/fluid/memory/allocation/legacy_allocator.cc
+++ b/paddle/fluid/memory/allocation/legacy_allocator.cc
--- a/paddle/fluid/memory/allocation/legacy_allocator.h
+++ b/paddle/fluid/memory/allocation/legacy_allocator.h
--- a/paddle/fluid/memory/allocation/locked_allocator.cc
+++ b/paddle/fluid/memory/allocation/locked_allocator.cc
--- a/paddle/fluid/memory/allocation/locked_allocator.h
+++ b/paddle/fluid/memory/allocation/locked_allocator.h
--- a/paddle/fluid/memory/allocation/pinned_allocator.cc
+++ b/paddle/fluid/memory/allocation/pinned_allocator.cc
--- a/paddle/fluid/memory/allocation/pinned_allocator.h
+++ b/paddle/fluid/memory/allocation/pinned_allocator.h
--- a/paddle/fluid/memory/allocation/retry_allocator.cc
+++ b/paddle/fluid/memory/allocation/retry_allocator.cc
--- a/paddle/fluid/memory/allocation/retry_allocator.h
+++ b/paddle/fluid/memory/allocation/retry_allocator.h
--- a/paddle/fluid/memory/allocation/retry_allocator_test.cc
+++ b/paddle/fluid/memory/allocation/retry_allocator_test.cc
--- a/paddle/fluid/memory/allocation/zero_size_allocator.cc
+++ b/paddle/fluid/memory/allocation/zero_size_allocator.cc
--- a/paddle/fluid/memory/allocation/zero_size_allocator.h
+++ b/paddle/fluid/memory/allocation/zero_size_allocator.h
--- a/paddle/fluid/memory/detail/buddy_allocator.cc
+++ b/paddle/fluid/memory/detail/buddy_allocator.cc
--- a/paddle/fluid/memory/detail/buddy_allocator.h
+++ b/paddle/fluid/memory/detail/buddy_allocator.h
--- a/paddle/fluid/memory/detail/meta_cache.cc
+++ b/paddle/fluid/memory/detail/meta_cache.cc
--- a/paddle/fluid/memory/detail/system_allocator.cc
+++ b/paddle/fluid/memory/detail/system_allocator.cc
--- a/paddle/fluid/memory/malloc.cc
+++ b/paddle/fluid/memory/malloc.cc
--- a/paddle/fluid/memory/malloc.h
+++ b/paddle/fluid/memory/malloc.h
--- a/paddle/fluid/memory/malloc_test.cc
+++ b/paddle/fluid/memory/malloc_test.cc
--- a/paddle/fluid/memory/memcpy.cc
+++ b/paddle/fluid/memory/memcpy.cc
--- a/paddle/fluid/operators/CMakeLists.txt
+++ b/paddle/fluid/operators/CMakeLists.txt
--- a/paddle/fluid/operators/accuracy_op.cc
+++ b/paddle/fluid/operators/accuracy_op.cc
--- a/paddle/fluid/operators/accuracy_op.cu
+++ b/paddle/fluid/operators/accuracy_op.cu
--- a/paddle/fluid/operators/activation_mkldnn_op.cc
+++ b/paddle/fluid/operators/activation_mkldnn_op.cc
--- a/paddle/fluid/operators/activation_op.cc
+++ b/paddle/fluid/operators/activation_op.cc
--- a/paddle/fluid/operators/activation_op.cu
+++ b/paddle/fluid/operators/activation_op.cu
--- a/paddle/fluid/operators/activation_op.h
+++ b/paddle/fluid/operators/activation_op.h
--- a/paddle/fluid/operators/adadelta_op.cc
+++ b/paddle/fluid/operators/adadelta_op.cc
--- a/paddle/fluid/operators/adadelta_op.cu
+++ b/paddle/fluid/operators/adadelta_op.cu
--- a/paddle/fluid/operators/adadelta_op.h
+++ b/paddle/fluid/operators/adadelta_op.h
--- a/paddle/fluid/operators/adagrad_op.cc
+++ b/paddle/fluid/operators/adagrad_op.cc
--- a/paddle/fluid/operators/adagrad_op.cu
+++ b/paddle/fluid/operators/adagrad_op.cu
--- a/paddle/fluid/operators/adagrad_op.h
+++ b/paddle/fluid/operators/adagrad_op.h
--- a/paddle/fluid/operators/adam_op.cc
+++ b/paddle/fluid/operators/adam_op.cc
--- a/paddle/fluid/operators/adam_op.cu
+++ b/paddle/fluid/operators/adam_op.cu
--- a/paddle/fluid/operators/adam_op.h
+++ b/paddle/fluid/operators/adam_op.h
--- a/paddle/fluid/operators/adamax_op.cc
+++ b/paddle/fluid/operators/adamax_op.cc
--- a/paddle/fluid/operators/adamax_op.cu
+++ b/paddle/fluid/operators/adamax_op.cu
--- a/paddle/fluid/operators/adamax_op.h
+++ b/paddle/fluid/operators/adamax_op.h
--- a/paddle/fluid/operators/add_position_encoding_op.cc
+++ b/paddle/fluid/operators/add_position_encoding_op.cc
--- a/paddle/fluid/operators/add_position_encoding_op.h
+++ b/paddle/fluid/operators/add_position_encoding_op.h
--- a/paddle/fluid/operators/affine_channel_op.cc
+++ b/paddle/fluid/operators/affine_channel_op.cc
--- a/paddle/fluid/operators/affine_channel_op.cu
+++ b/paddle/fluid/operators/affine_channel_op.cu
--- a/paddle/fluid/operators/affine_grid_cudnn_op.cu.cc
+++ b/paddle/fluid/operators/affine_grid_cudnn_op.cu.cc
--- a/paddle/fluid/operators/affine_grid_op.cc
+++ b/paddle/fluid/operators/affine_grid_op.cc
--- a/paddle/fluid/operators/affine_grid_op.h
+++ b/paddle/fluid/operators/affine_grid_op.h
--- a/paddle/fluid/operators/argsort_op.cc
+++ b/paddle/fluid/operators/argsort_op.cc
--- a/paddle/fluid/operators/array_operator.h
+++ b/paddle/fluid/operators/array_operator.h
--- a/paddle/fluid/operators/array_to_lod_tensor_op.cc
+++ b/paddle/fluid/operators/array_to_lod_tensor_op.cc
--- a/paddle/fluid/operators/attention_lstm_op.cc
+++ b/paddle/fluid/operators/attention_lstm_op.cc
--- a/paddle/fluid/operators/auc_op.cc
+++ b/paddle/fluid/operators/auc_op.cc
--- a/paddle/fluid/operators/auc_op.h
+++ b/paddle/fluid/operators/auc_op.h
--- a/paddle/fluid/operators/batch_norm_op.cc
+++ b/paddle/fluid/operators/batch_norm_op.cc
--- a/paddle/fluid/operators/batch_norm_op.cu.cc
+++ b/paddle/fluid/operators/batch_norm_op.cu.cc
--- a/paddle/fluid/operators/beam_search_decode_op.cc
+++ b/paddle/fluid/operators/beam_search_decode_op.cc
--- a/paddle/fluid/operators/beam_search_op.cc
+++ b/paddle/fluid/operators/beam_search_op.cc
--- a/paddle/fluid/operators/beam_search_op_test.cc
+++ b/paddle/fluid/operators/beam_search_op_test.cc
--- a/paddle/fluid/operators/bilinear_interp_op.cc
+++ b/paddle/fluid/operators/bilinear_interp_op.cc
--- a/paddle/fluid/operators/bilinear_interp_op.cu
+++ b/paddle/fluid/operators/bilinear_interp_op.cu
--- a/paddle/fluid/operators/bilinear_interp_op.h
+++ b/paddle/fluid/operators/bilinear_interp_op.h
--- a/paddle/fluid/operators/channel_close_op.cc
+++ b/paddle/fluid/operators/channel_close_op.cc
--- a/paddle/fluid/operators/channel_create_op.cc
+++ b/paddle/fluid/operators/channel_create_op.cc
--- a/paddle/fluid/operators/channel_recv_op.cc
+++ b/paddle/fluid/operators/channel_recv_op.cc
--- a/paddle/fluid/operators/channel_send_op.cc
+++ b/paddle/fluid/operators/channel_send_op.cc
--- a/paddle/fluid/operators/checkpoint_notify_op.cc
+++ b/paddle/fluid/operators/checkpoint_notify_op.cc
--- a/paddle/fluid/operators/clip_by_norm_op.h
+++ b/paddle/fluid/operators/clip_by_norm_op.h
--- a/paddle/fluid/operators/clip_op.h
+++ b/paddle/fluid/operators/clip_op.h
--- a/paddle/fluid/operators/compare_op.cc
+++ b/paddle/fluid/operators/compare_op.cc
--- a/paddle/fluid/operators/compare_op.cu
+++ b/paddle/fluid/operators/compare_op.cu
--- a/paddle/fluid/operators/compare_op.h
+++ b/paddle/fluid/operators/compare_op.h
--- a/paddle/fluid/operators/concat_op.cc
+++ b/paddle/fluid/operators/concat_op.cc
--- a/paddle/fluid/operators/concat_op.h
+++ b/paddle/fluid/operators/concat_op.h
--- a/paddle/fluid/operators/concurrency/CMakeLists.txt
+++ b/paddle/fluid/operators/concurrency/CMakeLists.txt
--- a/paddle/fluid/operators/concurrency/channel_util.cc
+++ b/paddle/fluid/operators/concurrency/channel_util.cc
--- a/paddle/fluid/operators/concurrency/channel_util.h
+++ b/paddle/fluid/operators/concurrency/channel_util.h
--- a/paddle/fluid/operators/controlflow/CMakeLists.txt
+++ b/paddle/fluid/operators/controlflow/CMakeLists.txt
--- a/paddle/fluid/operators/controlflow/compare_op.cc
+++ b/paddle/fluid/operators/controlflow/compare_op.cc
--- a/paddle/fluid/operators/controlflow/compare_op.cu
+++ b/paddle/fluid/operators/controlflow/compare_op.cu
--- a/paddle/fluid/operators/controlflow/compare_op.h
+++ b/paddle/fluid/operators/controlflow/compare_op.h
--- a/paddle/fluid/operators/conditional_block_op.cc
+++ b/paddle/fluid/operators/conditional_block_op.cc
--- a/paddle/fluid/operators/controlflow/feed_op.cc
+++ b/paddle/fluid/operators/controlflow/feed_op.cc
--- a/paddle/fluid/operators/controlflow/fetch_op.cc
+++ b/paddle/fluid/operators/controlflow/fetch_op.cc
--- a/paddle/fluid/operators/get_places_op.cc
+++ b/paddle/fluid/operators/get_places_op.cc
--- a/paddle/fluid/operators/controlflow/logical_op.cc
+++ b/paddle/fluid/operators/controlflow/logical_op.cc
--- a/paddle/fluid/operators/controlflow/logical_op.cu
+++ b/paddle/fluid/operators/controlflow/logical_op.cu
--- a/paddle/fluid/operators/logical_op.h
+++ b/paddle/fluid/operators/logical_op.h
--- a/paddle/fluid/operators/controlflow/parallel_do_op.cc
+++ b/paddle/fluid/operators/controlflow/parallel_do_op.cc
--- a/paddle/fluid/operators/controlflow/tensor_array_read_write_op.cc
+++ b/paddle/fluid/operators/controlflow/tensor_array_read_write_op.cc
--- a/paddle/fluid/operators/controlflow/while_op.cc
+++ b/paddle/fluid/operators/controlflow/while_op.cc
--- a/paddle/fluid/operators/conv_cudnn_op.cu.cc
+++ b/paddle/fluid/operators/conv_cudnn_op.cu.cc
--- a/paddle/fluid/operators/conv_cudnn_op_cache.h
+++ b/paddle/fluid/operators/conv_cudnn_op_cache.h
--- a/paddle/fluid/operators/conv_fusion_op.cc
+++ b/paddle/fluid/operators/conv_fusion_op.cc
--- a/paddle/fluid/operators/conv_fusion_op.cu.cc
+++ b/paddle/fluid/operators/conv_fusion_op.cu.cc
--- a/paddle/fluid/operators/conv_mkldnn_op.cc
+++ b/paddle/fluid/operators/conv_mkldnn_op.cc
--- a/paddle/fluid/operators/conv_op.cc
+++ b/paddle/fluid/operators/conv_op.cc
--- a/paddle/fluid/operators/conv_op.h
+++ b/paddle/fluid/operators/conv_op.h
--- a/paddle/fluid/operators/conv_shift_op.cc
+++ b/paddle/fluid/operators/conv_shift_op.cc
--- a/paddle/fluid/operators/conv_transpose_cudnn_op.cu.cc
+++ b/paddle/fluid/operators/conv_transpose_cudnn_op.cu.cc
--- a/paddle/fluid/operators/conv_transpose_op.cc
+++ b/paddle/fluid/operators/conv_transpose_op.cc
--- a/paddle/fluid/operators/conv_transpose_op.h
+++ b/paddle/fluid/operators/conv_transpose_op.h
--- a/paddle/fluid/operators/crf_decoding_op.h
+++ b/paddle/fluid/operators/crf_decoding_op.h
--- a/paddle/fluid/operators/cross_entropy_op.cc
+++ b/paddle/fluid/operators/cross_entropy_op.cc
--- a/paddle/fluid/operators/cross_entropy_op.cu
+++ b/paddle/fluid/operators/cross_entropy_op.cu
--- a/paddle/fluid/operators/cross_entropy_op.h
+++ b/paddle/fluid/operators/cross_entropy_op.h
--- a/paddle/fluid/operators/csp/CMakeLists.txt
+++ b/paddle/fluid/operators/csp/CMakeLists.txt
--- a/paddle/fluid/operators/go_op.cc
+++ b/paddle/fluid/operators/go_op.cc
--- a/paddle/fluid/operators/decayed_adagrad_op.cc
+++ b/paddle/fluid/operators/decayed_adagrad_op.cc
--- a/paddle/fluid/operators/decayed_adagrad_op.cu
+++ b/paddle/fluid/operators/decayed_adagrad_op.cu
--- a/paddle/fluid/operators/decayed_adagrad_op.h
+++ b/paddle/fluid/operators/decayed_adagrad_op.h
--- a/paddle/fluid/operators/delete_var_op.cc
+++ b/paddle/fluid/operators/delete_var_op.cc
--- a/paddle/fluid/operators/dequantize_op.cc
+++ b/paddle/fluid/operators/dequantize_op.cc
--- a/paddle/fluid/operators/dequantize_op.h
+++ b/paddle/fluid/operators/dequantize_op.h
--- a/paddle/fluid/operators/detail/safe_ref.h
+++ b/paddle/fluid/operators/detail/safe_ref.h
--- a/paddle/fluid/operators/detection/CMakeLists.txt
+++ b/paddle/fluid/operators/detection/CMakeLists.txt
--- a/paddle/fluid/operators/detection/bbox_util.h
+++ b/paddle/fluid/operators/detection/bbox_util.h
--- a/paddle/fluid/operators/detection/box_coder_op.cc
+++ b/paddle/fluid/operators/detection/box_coder_op.cc
--- a/paddle/fluid/operators/detection/density_prior_box_op.cc
+++ b/paddle/fluid/operators/detection/density_prior_box_op.cc
--- a/paddle/fluid/operators/detection/density_prior_box_op.cu
+++ b/paddle/fluid/operators/detection/density_prior_box_op.cu
--- a/paddle/fluid/operators/detection/density_prior_box_op.h
+++ b/paddle/fluid/operators/detection/density_prior_box_op.h
--- a/paddle/fluid/operators/detection/generate_proposal_labels_op.cc
+++ b/paddle/fluid/operators/detection/generate_proposal_labels_op.cc